2011-05-08 15:49:22

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification

Hello,

This patchset implements new ptrace requests SEIZE and INTERRUPT and
also add group stop notification mechanism for ptracer. Combined,
this implements "P4. PTRACE_SEIZE" and "P5. ^Z and fg for tracees" of
the ptrace job control improvements proposal[1].

Please note that there are some deviations from the proposal.

* As suggested by Oleg, PTRACE_SEIZE only serves as ATTACH without
signal/job control side-effects. After attached, PTRACE_INTERRUPT
should be used to trap tracee without side effect.

* Group stop notification is implemented as sticky INTERRUPT trap
which gets cleared on PTRACE_GETSIGINFO and notifies both start and
end of group stops.

All the arch changse are for adding siginfo.si_pt_flags. It's tedious
and likely to take some time to be available to userland but I think
it's better this way than adding some hacky flag to si_code or other
already used fields.

PTRACE_SEIZE/INTERRUPT and group stop notification all use INTERRUPT
trap. The trap doesn't affect signal or job control states and is the
job control mechanism for ptracer in the sense that all it does is
just controlling the execution of tracee.

SEIZE/INTERRUPT behaviors are fairly straight-forward. For
notification, making group stop state visible to userland via
PTRACE_GETSIGINFO was easy; however, notifying ptracer of the event
was somewhat more involved. I ended up choosing the followings.

* The trap condition is sticky until GETSIGINFO. This is necessary
because generation of the event may race with CONT and ptracer may
miss the trap.

* If tracee is running, simple trapping is enough. If tracee is
already group stop or INTERRUPT trapped, tracee is re-trapped to
INTERRUPT thus notifying ptracer. If tracee is in other traps,
notification won't happen until the trap is finished. This
simplifies both implementation and usage of the interface and
doesn't lose any capability as tracer can always put tracee into
INTERRUPT trap if it's already in a trap without allowing it to
return to userland.

* If group stop is pending, it has higher priority than INTERRUPT.
This doesn't really affect correctness but avoids an extra
notification trap if tracee is already going for group stop.

Each patch implementing new feature includes test program showing its
functionality. Notification would probably need a bit more polishing
but all the needed functionalities are there.

This patchset contains the following 11 patches.

0001-job-control-rename-signal-group_stop-and-flags-to-jo.patch
0002-ptrace-implement-PTRACE_SEIZE.patch
0003-ptrace-ptrace_check_attach-rename-kill-to-ignore_sta.patch
0004-ptrace-implement-PTRACE_INTERRUPT.patch
0005-ptrace-restructure-ptrace_getsiginfo.patch
0006-ptrace-make-group-stop-state-visible-via-PTRACE_GETS.patch
0007-ptrace-add-JOBCTL_TRAPPED.patch
0008-ptrace-move-fallback-JOBCTL_TRAPPING-clearing-to-get.patch
0009-job-control-reorganize-wait_task_stopped.patch
0010-ptrace-move-JOBCTL_TRAPPING-wait-to-wait-2-and-ptrac.patch
0011-ptrace-implement-group-stop-notification-for-ptracer.patch

and on top of

Oleg's signals-review b013c39924 (signal: cleanup sys_sigprocmask())
+ [2] ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping()
+ [3] ptrace: use GROUP_STOP_TRAPPING for PTRACE_DETACH too

The combined patchset is available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-ptrace-seize

HEAD should be 74b094e53f38691c98ab73499e59eb7d5771dd4c. If not,
git.korg is tasking some time to sync so please wait a while and try
again, or you can pull from master directly.

ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-ptrace-seize

diffstat follows.

arch/ia64/include/asm/siginfo.h | 7 +
arch/ia64/kernel/signal.c | 5
arch/mips/include/asm/compat-signal.h | 7 +
arch/mips/include/asm/siginfo.h | 7 +
arch/mips/kernel/signal32.c | 5
arch/parisc/kernel/signal32.c | 5
arch/parisc/kernel/signal32.h | 7 +
arch/powerpc/kernel/ppc32.h | 7 +
arch/powerpc/kernel/signal_32.c | 5
arch/s390/kernel/compat_linux.h | 7 +
arch/s390/kernel/compat_signal.c | 5
arch/sparc/kernel/signal32.c | 12 +
arch/tile/kernel/compat_signal.c | 11 +
arch/x86/ia32/ia32_signal.c | 4
arch/x86/include/asm/ia32.h | 7 +
fs/exec.c | 2
include/asm-generic/siginfo.h | 10 +
include/linux/ptrace.h | 14 ++
include/linux/sched.h | 26 ++--
kernel/exit.c | 49 ++++++-
kernel/ptrace.c | 193 ++++++++++++++++++++++++++----
kernel/signal.c | 213 +++++++++++++++++++++++++---------
22 files changed, 506 insertions(+), 102 deletions(-)

Thank you.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1107045
[2] http://thread.gmane.org/gmane.linux.kernel/1136303
[3] http://thread.gmane.org/gmane.linux.kernel/1136915


2011-05-08 15:51:36

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 01/11] job control: rename signal->group_stop and flags to jobctl and rearrange flags

signal->group_stop currently hosts mostly group stop related flags;
however, it's gonna be used for wider purposes and the GROUP_STOP_
flag prefix becomes confusing. Rename signal->group_stop to
signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.

Also, reassign JOBCTL_TRAPPING to bit 22 to better accomodate future
additions.

This doesn't cause any functional change.

Signed-off-by: Tejun Heo <[email protected]>
---
fs/exec.c | 2 +-
include/linux/sched.h | 16 ++++----
kernel/ptrace.c | 12 +++---
kernel/signal.c | 91 +++++++++++++++++++++++++------------------------
4 files changed, 61 insertions(+), 60 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 8328beb..fd6a4fb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1659,7 +1659,7 @@ static int zap_process(struct task_struct *start, int exit_code)

t = start;
do {
- task_clear_group_stop_pending(t);
+ task_clear_jobctl_stop_pending(t);
if (t != current && t->mm) {
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3f53c25..1197573 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1260,7 +1260,7 @@ struct task_struct {
int exit_state;
int exit_code, exit_signal;
int pdeath_signal; /* The signal sent when the parent dies */
- unsigned int group_stop; /* GROUP_STOP_*, siglock protected */
+ unsigned int jobctl; /* JOBCTL_*, siglock protected */
/* ??? */
unsigned int personality;
unsigned did_exec:1;
@@ -1778,15 +1778,15 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define used_math() tsk_used_math(current)

/*
- * task->group_stop flags
+ * task->jobctl flags
*/
-#define GROUP_STOP_SIGMASK 0xffff /* signr of the last group stop */
-#define GROUP_STOP_PENDING (1 << 16) /* task should stop for group stop */
-#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */
-#define GROUP_STOP_TRAPPING (1 << 18) /* switching from STOPPED to TRACED */
-#define GROUP_STOP_DEQUEUED (1 << 19) /* stop signal dequeued */
+#define JOBCTL_STOP_SIGMASK 0xffff /* signr of the last group stop */
+#define JOBCTL_STOP_DEQUEUED (1 << 16) /* stop signal dequeued */
+#define JOBCTL_STOP_PENDING (1 << 17) /* task should stop for group stop */
+#define JOBCTL_STOP_CONSUME (1 << 18) /* consume group stop count */
+#define JOBCTL_TRAPPING (1 << 22) /* switching to TRACED */

-extern void task_clear_group_stop_pending(struct task_struct *task);
+extern void task_clear_jobctl_stop_pending(struct task_struct *task);

#ifdef CONFIG_PREEMPT_RCU

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 4b9248d..3c56f54 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -76,7 +76,7 @@ void __ptrace_unlink(struct task_struct *child)
spin_lock(&child->sighand->siglock);

/*
- * Reinstate GROUP_STOP_PENDING if group stop is in effect and
+ * Reinstate JOBCTL_STOP_PENDING if group stop is in effect and
* @child isn't dead. This will trigger TRACED -> RUNNING ->
* STOPPED transition. As this transition can affect the next
* ptracer if it attaches before the transition completes, set
@@ -85,7 +85,7 @@ void __ptrace_unlink(struct task_struct *child)
if (!(child->flags & PF_EXITING) &&
(child->signal->flags & SIGNAL_STOP_STOPPED ||
child->signal->group_stop_count))
- child->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
+ child->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;

/*
* If transition to TASK_STOPPED is pending or in TASK_TRACED, kick
@@ -93,7 +93,7 @@ void __ptrace_unlink(struct task_struct *child)
* is in TASK_TRACED; otherwise, we might unduly disrupt
* TASK_KILLABLE sleeps.
*/
- if (child->group_stop & GROUP_STOP_PENDING || task_is_traced(child))
+ if (child->jobctl & JOBCTL_STOP_PENDING || task_is_traced(child))
signal_wake_up(child, task_is_traced(child));

spin_unlock(&child->sighand->siglock);
@@ -228,7 +228,7 @@ static int ptrace_attach(struct task_struct *task)
spin_lock(&task->sighand->siglock);

/*
- * If the task is already STOPPED, set GROUP_STOP_PENDING and
+ * If the task is already STOPPED, set JOBCTL_STOP_PENDING and
* TRAPPING, and kick it so that it transits to TRACED. TRAPPING
* will be cleared if the child completes the transition or any
* event which clears the group stop states happens. We'll wait
@@ -245,7 +245,7 @@ static int ptrace_attach(struct task_struct *task)
* in and out of STOPPED are protected by siglock.
*/
if (task_is_stopped(task)) {
- task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
+ task->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;
signal_wake_up(task, 1);
}

@@ -258,7 +258,7 @@ unlock_creds:
mutex_unlock(&task->signal->cred_guard_mutex);
out:
wait_event(current->signal->wait_chldexit,
- !(task->group_stop & GROUP_STOP_TRAPPING));
+ !(task->jobctl & JOBCTL_TRAPPING));
return retval;
}

diff --git a/kernel/signal.c b/kernel/signal.c
index dc3f7cf..2b0d719 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -124,7 +124,7 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)

static int recalc_sigpending_tsk(struct task_struct *t)
{
- if ((t->group_stop & GROUP_STOP_PENDING) ||
+ if ((t->jobctl & JOBCTL_STOP_PENDING) ||
PENDING(&t->pending, &t->blocked) ||
PENDING(&t->signal->shared_pending, &t->blocked)) {
set_tsk_thread_flag(t, TIF_SIGPENDING);
@@ -224,27 +224,28 @@ static inline void print_dropped_signal(int sig)
}

/**
- * task_clear_group_stop_trapping - clear group stop trapping bit
+ * task_clear_jobctl_trapping - clear jobctl trapping bit
* @task: target task
*
- * If GROUP_STOP_TRAPPING is set, a ptracer is waiting for us. Clear it
- * and wake up the ptracer. Note that we don't need any further locking.
- * @task->siglock guarantees that @task->parent points to the ptracer.
+ * If JOBCTL_TRAPPING is set, a ptracer is waiting for us to enter TRACED.
+ * Clear it and wake up the ptracer. Note that we don't need any further
+ * locking. @task->siglock guarantees that @task->parent points to the
+ * ptracer.
*
* CONTEXT:
* Must be called with @task->sighand->siglock held.
*/
-static void task_clear_group_stop_trapping(struct task_struct *task)
+static void task_clear_jobctl_trapping(struct task_struct *task)
{
- if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
- task->group_stop &= ~GROUP_STOP_TRAPPING;
+ if (unlikely(task->jobctl & JOBCTL_TRAPPING)) {
+ task->jobctl &= ~JOBCTL_TRAPPING;
__wake_up_sync_key(&task->parent->signal->wait_chldexit,
TASK_UNINTERRUPTIBLE, 1, task);
}
}

/**
- * task_clear_group_stop_pending - clear pending group stop
+ * task_clear_jobctl_stop_pending - clear pending group stop
* @task: target task
*
* Clear group stop states for @task.
@@ -252,19 +253,19 @@ static void task_clear_group_stop_trapping(struct task_struct *task)
* CONTEXT:
* Must be called with @task->sighand->siglock held.
*/
-void task_clear_group_stop_pending(struct task_struct *task)
+void task_clear_jobctl_stop_pending(struct task_struct *task)
{
- task->group_stop &= ~(GROUP_STOP_PENDING | GROUP_STOP_CONSUME |
- GROUP_STOP_DEQUEUED);
+ task->jobctl &= ~(JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME |
+ JOBCTL_STOP_DEQUEUED);
}

/**
* task_participate_group_stop - participate in a group stop
* @task: task participating in a group stop
*
- * @task has GROUP_STOP_PENDING set and is participating in a group stop.
+ * @task has %JOBCTL_STOP_PENDING set and is participating in a group stop.
* Group stop states are cleared and the group stop count is consumed if
- * %GROUP_STOP_CONSUME was set. If the consumption completes the group
+ * %JOBCTL_STOP_CONSUME was set. If the consumption completes the group
* stop, the appropriate %SIGNAL_* flags are set.
*
* CONTEXT:
@@ -277,11 +278,11 @@ void task_clear_group_stop_pending(struct task_struct *task)
static bool task_participate_group_stop(struct task_struct *task)
{
struct signal_struct *sig = task->signal;
- bool consume = task->group_stop & GROUP_STOP_CONSUME;
+ bool consume = task->jobctl & JOBCTL_STOP_CONSUME;

- WARN_ON_ONCE(!(task->group_stop & GROUP_STOP_PENDING));
+ WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING));

- task_clear_group_stop_pending(task);
+ task_clear_jobctl_stop_pending(task);

if (!consume)
return false;
@@ -604,7 +605,7 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
* is to alert stop-signal processing code when another
* processor has come along and cleared the flag.
*/
- current->group_stop |= GROUP_STOP_DEQUEUED;
+ current->jobctl |= JOBCTL_STOP_DEQUEUED;
}
if ((info->si_code & __SI_MASK) == __SI_TIMER && info->si_sys_private) {
/*
@@ -809,7 +810,7 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
rm_from_queue(SIG_KERNEL_STOP_MASK, &signal->shared_pending);
t = p;
do {
- task_clear_group_stop_pending(t);
+ task_clear_jobctl_stop_pending(t);
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
wake_up_state(t, __TASK_STOPPED);
} while_each_thread(p, t);
@@ -925,7 +926,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
signal->group_stop_count = 0;
t = p;
do {
- task_clear_group_stop_pending(t);
+ task_clear_jobctl_stop_pending(t);
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);
} while_each_thread(p, t);
@@ -1160,7 +1161,7 @@ int zap_other_threads(struct task_struct *p)
p->signal->group_stop_count = 0;

while_each_thread(p, t) {
- task_clear_group_stop_pending(t);
+ task_clear_jobctl_stop_pending(t);
count++;

/* Don't bother with already dead threads */
@@ -1738,7 +1739,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
* clear now. We act as if SIGCONT is received after TASK_TRACED
* is entered - ignore it.
*/
- if (why == CLD_STOPPED && (current->group_stop & GROUP_STOP_PENDING))
+ if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
gstop_done = task_participate_group_stop(current);

current->last_siginfo = info;
@@ -1751,12 +1752,12 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
set_current_state(TASK_TRACED);

/*
- * We're committing to trapping. Clearing GROUP_STOP_TRAPPING and
+ * We're committing to trapping. Clearing JOBCTL_TRAPPING and
* transition to TASK_TRACED should be atomic with respect to
- * siglock. This hsould be done after the arch hook as siglock is
+ * siglock. This should be done after the arch hook as siglock is
* released and regrabbed across it.
*/
- task_clear_group_stop_trapping(current);
+ task_clear_jobctl_trapping(current);

spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
@@ -1792,9 +1793,9 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
*
* If @gstop_done, the ptracer went away between group stop
* completion and here. During detach, it would have set
- * GROUP_STOP_PENDING on us and we'll re-enter TASK_STOPPED
- * in do_signal_stop() on return, so notifying the real
- * parent of the group stop completion is enough.
+ * JOBCTL_STOP_PENDING on us and we'll re-enter
+ * TASK_STOPPED in do_signal_stop() on return, so notifying
+ * the real parent of the group stop completion is enough.
*/
if (gstop_done)
do_notify_parent_cldstop(current, false, why);
@@ -1856,14 +1857,14 @@ static int do_signal_stop(int signr)
{
struct signal_struct *sig = current->signal;

- if (!(current->group_stop & GROUP_STOP_PENDING)) {
- unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
+ if (!(current->jobctl & JOBCTL_STOP_PENDING)) {
+ unsigned int gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME;
struct task_struct *t;

- /* signr will be recorded in task->group_stop for retries */
- WARN_ON_ONCE(signr & ~GROUP_STOP_SIGMASK);
+ /* signr will be recorded in task->jobctl for retries */
+ WARN_ON_ONCE(signr & ~JOBCTL_STOP_SIGMASK);

- if (!likely(current->group_stop & GROUP_STOP_DEQUEUED) ||
+ if (!likely(current->jobctl & JOBCTL_STOP_DEQUEUED) ||
unlikely(signal_group_exit(sig)))
return 0;
/*
@@ -1890,19 +1891,19 @@ static int do_signal_stop(int signr)
else
WARN_ON_ONCE(!task_ptrace(current));

- current->group_stop &= ~GROUP_STOP_SIGMASK;
- current->group_stop |= signr | gstop;
+ current->jobctl &= ~JOBCTL_STOP_SIGMASK;
+ current->jobctl |= signr | gstop;
sig->group_stop_count = 1;
for (t = next_thread(current); t != current;
t = next_thread(t)) {
- t->group_stop &= ~GROUP_STOP_SIGMASK;
+ t->jobctl &= ~JOBCTL_STOP_SIGMASK;
/*
* Setting state to TASK_STOPPED for a group
* stop is always done with the siglock held,
* so this check has no races.
*/
if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
- t->group_stop |= signr | gstop;
+ t->jobctl |= signr | gstop;
sig->group_stop_count++;
signal_wake_up(t, 0);
}
@@ -1943,23 +1944,23 @@ retry:

spin_lock_irq(&current->sighand->siglock);
} else {
- ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
+ ptrace_stop(current->jobctl & JOBCTL_STOP_SIGMASK,
CLD_STOPPED, 0, NULL);
current->exit_code = 0;
}

/*
- * GROUP_STOP_PENDING could be set if another group stop has
+ * JOBCTL_STOP_PENDING could be set if another group stop has
* started since being woken up or ptrace wants us to transit
* between TASK_STOPPED and TRACED. Retry group stop.
*/
- if (current->group_stop & GROUP_STOP_PENDING) {
- WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
+ if (current->jobctl & JOBCTL_STOP_PENDING) {
+ WARN_ON_ONCE(!(current->jobctl & JOBCTL_STOP_SIGMASK));
goto retry;
}

/* PTRACE_ATTACH might have raced with task killing, clear trapping */
- task_clear_group_stop_trapping(current);
+ task_clear_jobctl_trapping(current);

spin_unlock_irq(&current->sighand->siglock);

@@ -2078,8 +2079,8 @@ relock:
if (unlikely(signr != 0))
ka = return_ka;
else {
- if (unlikely(current->group_stop &
- GROUP_STOP_PENDING) && do_signal_stop(0))
+ if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) &&
+ do_signal_stop(0))
goto relock;

signr = dequeue_signal(current, &current->blocked,
@@ -2253,7 +2254,7 @@ void exit_signals(struct task_struct *tsk)
signotset(&unblocked);
retarget_shared_pending(tsk, &unblocked);

- if (unlikely(tsk->group_stop & GROUP_STOP_PENDING) &&
+ if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
task_participate_group_stop(tsk))
group_stop = CLD_STOPPED;
out:
--
1.7.1

2011-05-08 15:49:29

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 02/11] ptrace: implement PTRACE_SEIZE

PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
effects on tracee signal and job control states. This patch
implements a new ptrace request PTRACE_SEIZE which attaches and traps
tracee without affecting its signal and job control states.

The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
flags in @data. Currently, the only defined flag is
PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
The changes will be implemented gradually and the DEVEL flag is to
prevent programs which expect full SEIZE behavior from using it before
all the behavior modifications are complete while allowing unit
testing. The flag will be removed once SEIZE behaviors are completely
implemented.

After PTRACE_SEIZE, tracee will trap. Which trap will happen isn't
fixed. If other trap conditions exist (signal delivery or group
stop), they might be taken; otherwise, a trap with exit_code SIGTRAP |
(PTRACE_EVENT_INTERRUPT << 8) is taken. The followings are
guaranteed.

* A trap will happen in finite amount of userland time.

* The trap can be PTRACE_EVENT_INTERRUPT which doesn't have any side
effect. If a different trap is taken, no INTERRUPT trap is pending.

IOW, no matter what, one trap will happen, which might be INTERRUPT.

INTERRUPT trapping is implemented by adding a new trap site in
get_signal_to_deliver() before the actual signal dispatch which is
activated by any flag in JOBCTL_TRAP_MASK. It currently includes only
JOBCTL_TRAP_SEIZE which is cleared whenever ptrace_stop() commits to
trapping.

Seizing sets PT_SEIZED in ->ptrace of the tracee. This flag will be
used to determine whether new SEIZE behaviors should be enabled.

Test program follows.

#define PTRACE_SEIZE 0x4206
#define PTRACE_SEIZE_DEVEL 0x80000000

static const struct timespec ts100ms = { .tv_nsec = 100000000 };
static const struct timespec ts1s = { .tv_sec = 1 };
static const struct timespec ts3s = { .tv_sec = 3 };

int main(int argc, char **argv)
{
pid_t tracee;

tracee = fork();
if (tracee == 0) {
nanosleep(&ts100ms, NULL);
while (1) {
printf("tracee: alive\n");
nanosleep(&ts1s, NULL);
}
}

if (argc > 1)
kill(tracee, SIGSTOP);

nanosleep(&ts100ms, NULL);

ptrace(PTRACE_SEIZE, tracee, NULL,
(void *)(unsigned long)PTRACE_SEIZE_DEVEL);
waitid(P_PID, tracee, NULL, WSTOPPED);
ptrace(PTRACE_CONT, tracee, NULL, NULL);
nanosleep(&ts3s, NULL);
printf("tracer: exiting\n");
return 0;
}

When the above program is called w/o argument, tracee is seized from
running state and continued. When tracer exits, tracee is returned to
running state and keeps printing out.

# ./test-seize
tracee: alive
tracee: alive
tracee: alive
tracer: exiting
# tracee: alive
tracee: alive
tracee: alive

When called with an argument, tracee is seized from stopped state and
continued, and returns to stopped state when tracer exits.

# ./test-seize
tracee: alive
tracee: alive
tracee: alive
tracer: exiting
# ps -el|grep test-seize
1 T 0 4720 1 0 80 0 - 941 signal ttyS0 00:00:00 test-seize

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/ptrace.h | 7 +++++++
include/linux/sched.h | 4 ++++
kernel/ptrace.c | 45 +++++++++++++++++++++++++++++++++++++++------
kernel/signal.c | 32 +++++++++++++++++++++++++-------
4 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index a1147e5..705a47b 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -47,6 +47,11 @@
#define PTRACE_GETREGSET 0x4204
#define PTRACE_SETREGSET 0x4205

+#define PTRACE_SEIZE 0x4206
+
+/* flags in @data for PTRACE_SEIZE */
+#define PTRACE_SEIZE_DEVEL 0x80000000 /* temp flag for development */
+
/* options set using PTRACE_SETOPTIONS */
#define PTRACE_O_TRACESYSGOOD 0x00000001
#define PTRACE_O_TRACEFORK 0x00000002
@@ -65,6 +70,7 @@
#define PTRACE_EVENT_EXEC 4
#define PTRACE_EVENT_VFORK_DONE 5
#define PTRACE_EVENT_EXIT 6
+#define PTRACE_EVENT_INTERRUPT 7

#include <asm/ptrace.h>

@@ -77,6 +83,7 @@
* flags. When the a task is stopped the ptracer owns task->ptrace.
*/

+#define PT_SEIZED 0x00010000 /* SEIZE used, enable new behavior */
#define PT_PTRACED 0x00000001
#define PT_DTRACE 0x00000002 /* delayed trace (used on m68k, i386) */
#define PT_TRACESYSGOOD 0x00000004
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1197573..2f383eb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1784,8 +1784,12 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define JOBCTL_STOP_DEQUEUED (1 << 16) /* stop signal dequeued */
#define JOBCTL_STOP_PENDING (1 << 17) /* task should stop for group stop */
#define JOBCTL_STOP_CONSUME (1 << 18) /* consume group stop count */
+#define JOBCTL_TRAP_SEIZE (1 << 19) /* trap for seize */
#define JOBCTL_TRAPPING (1 << 22) /* switching to TRACED */

+#define JOBCTL_TRAP_MASK JOBCTL_TRAP_SEIZE
+#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
+
extern void task_clear_jobctl_stop_pending(struct task_struct *task);

#ifdef CONFIG_PREEMPT_RCU
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 3c56f54..d1e3740 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -75,6 +75,9 @@ void __ptrace_unlink(struct task_struct *child)

spin_lock(&child->sighand->siglock);

+ /* clear pending jobctl traps */
+ child->jobctl &= ~JOBCTL_TRAP_MASK;
+
/*
* Reinstate JOBCTL_STOP_PENDING if group stop is in effect and
* @child isn't dead. This will trigger TRACED -> RUNNING ->
@@ -184,10 +187,28 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)
return !err;
}

-static int ptrace_attach(struct task_struct *task)
+static int ptrace_attach(struct task_struct *task, long request,
+ unsigned long flags)
{
+ bool seize = request == PTRACE_SEIZE;
int retval;

+ /*
+ * SEIZE will enable new ptrace behaviors which will be implemented
+ * gradually. SEIZE_DEVEL is used to prevent applications
+ * expecting full SEIZE behaviors trapping on kernel commits which
+ * are still in the process of implementing them.
+ *
+ * Only test programs for new ptrace behaviors being implemented
+ * should set SEIZE_DEVEL. If unset, SEIZE will fail with -EIO.
+ *
+ * Once SEIZE behaviors are completely implemented, this flag and
+ * the following test will be removed.
+ */
+ retval = -EIO;
+ if (seize && !(flags & PTRACE_SEIZE_DEVEL))
+ goto out;
+
audit_ptrace(task);

retval = -EPERM;
@@ -219,11 +240,15 @@ static int ptrace_attach(struct task_struct *task)
goto unlock_tasklist;

task->ptrace = PT_PTRACED;
+ if (seize)
+ task->ptrace |= PT_SEIZED;
if (task_ns_capable(task, CAP_SYS_PTRACE))
task->ptrace |= PT_PTRACE_CAP;

__ptrace_link(task, current);
- send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+
+ if (!seize)
+ send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);

spin_lock(&task->sighand->siglock);

@@ -247,6 +272,14 @@ static int ptrace_attach(struct task_struct *task)
if (task_is_stopped(task)) {
task->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;
signal_wake_up(task, 1);
+ } else if (seize) {
+ /*
+ * Otherwise, SEIZE uses jobctl trap to put tracee into
+ * TASK_TRACED, which doesn't have the nasty side effects
+ * of sending SIGSTOP.
+ */
+ task->jobctl |= JOBCTL_TRAP_SEIZE;
+ signal_wake_up(task, 0);
}

spin_unlock(&task->sighand->siglock);
@@ -760,8 +793,8 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
goto out;
}

- if (request == PTRACE_ATTACH) {
- ret = ptrace_attach(child);
+ if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
+ ret = ptrace_attach(child, request, data);
/*
* Some architectures need to do book-keeping after
* a ptrace attach.
@@ -902,8 +935,8 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
goto out;
}

- if (request == PTRACE_ATTACH) {
- ret = ptrace_attach(child);
+ if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
+ ret = ptrace_attach(child, request, data);
/*
* Some architectures need to do book-keeping after
* a ptrace attach.
diff --git a/kernel/signal.c b/kernel/signal.c
index 2b0d719..9249230 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -124,7 +124,7 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)

static int recalc_sigpending_tsk(struct task_struct *t)
{
- if ((t->jobctl & JOBCTL_STOP_PENDING) ||
+ if ((t->jobctl & JOBCTL_PENDING_MASK) ||
PENDING(&t->pending, &t->blocked) ||
PENDING(&t->signal->shared_pending, &t->blocked)) {
set_tsk_thread_flag(t, TIF_SIGPENDING);
@@ -1752,12 +1752,13 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
set_current_state(TASK_TRACED);

/*
- * We're committing to trapping. Clearing JOBCTL_TRAPPING and
- * transition to TASK_TRACED should be atomic with respect to
- * siglock. This should be done after the arch hook as siglock is
- * released and regrabbed across it.
+ * We're committing to trapping. Adjust ->jobctl. Updates to
+ * these flags and transition to TASK_TRACED should be atomic with
+ * respect to siglock. This should be done after the arch hook as
+ * siglock may be released and regrabbed across it.
*/
task_clear_jobctl_trapping(current);
+ current->jobctl &= ~JOBCTL_TRAP_SEIZE;

spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
@@ -1829,7 +1830,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
recalc_sigpending_tsk(current);
}

-void ptrace_notify(int exit_code)
+static void ptrace_notify_locked(int exit_code)
{
siginfo_t info;

@@ -1842,8 +1843,13 @@ void ptrace_notify(int exit_code)
info.si_uid = current_uid();

/* Let the debugger run. */
- spin_lock_irq(&current->sighand->siglock);
ptrace_stop(exit_code, CLD_TRAPPED, 1, &info);
+}
+
+void ptrace_notify(int exit_code)
+{
+ spin_lock_irq(&current->sighand->siglock);
+ ptrace_notify_locked(exit_code);
spin_unlock_irq(&current->sighand->siglock);
}

@@ -2068,6 +2074,18 @@ relock:

for (;;) {
struct k_sigaction *ka;
+
+ /*
+ * Check for ptrace trap conditions. Jobctl traps are used
+ * to trap ptracee while staying transparent regarding
+ * signal and job control.
+ */
+ if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
+ ptrace_notify_locked(SIGTRAP |
+ (PTRACE_EVENT_INTERRUPT << 8));
+ continue;
+ }
+
/*
* Tracing can induce an artifical signal and choose sigaction.
* The return value in @signr determines the default action,
--
1.7.1

2011-05-08 15:49:26

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 03/11] ptrace: ptrace_check_attach(): rename @kill to @ignore_state and add comments

PTRACE_INTERRUPT is going to be added which should also skip
task_is_traced() check in ptrace_check_attach(). Rename @kill to
@ignore_state and make it bool. Add function comment while at it.

This patch doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/ptrace.h | 2 +-
kernel/ptrace.c | 24 +++++++++++++++++++-----
2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 705a47b..8de301a 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -112,7 +112,7 @@ extern long arch_ptrace(struct task_struct *child, long request,
extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
extern int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long dst, int len);
extern void ptrace_disable(struct task_struct *);
-extern int ptrace_check_attach(struct task_struct *task, int kill);
+extern int ptrace_check_attach(struct task_struct *task, bool ignore_state);
extern int ptrace_request(struct task_struct *child, long request,
unsigned long addr, unsigned long data);
extern void ptrace_notify(int exit_code);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d1e3740..0f0121a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -102,10 +102,24 @@ void __ptrace_unlink(struct task_struct *child)
spin_unlock(&child->sighand->siglock);
}

-/*
- * Check that we have indeed attached to the thing..
+/**
+ * ptrace_check_attach - check whether ptracee is ready for ptrace operation
+ * @child: ptracee to check for
+ * @ignore_state: don't check whether @child is currently %TASK_TRACED
+ *
+ * Check whether @child is being ptraced by %current and ready for further
+ * ptrace operations. If @ignore_state is %false, @child also should be in
+ * %TASK_TRACED state and on return the child is guaranteed to be traced
+ * and not executing. If @ignore_state is %true, @child can be in any
+ * state.
+ *
+ * CONTEXT:
+ * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ *
+ * RETURNS:
+ * 0 on success, -ESRCH if %child is not ready.
*/
-int ptrace_check_attach(struct task_struct *child, int kill)
+int ptrace_check_attach(struct task_struct *child, bool ignore_state)
{
int ret = -ESRCH;

@@ -124,13 +138,13 @@ int ptrace_check_attach(struct task_struct *child, int kill)
*/
spin_lock_irq(&child->sighand->siglock);
WARN_ON_ONCE(task_is_stopped(child));
- if (task_is_traced(child) || kill)
+ if (task_is_traced(child) || ignore_state)
ret = 0;
spin_unlock_irq(&child->sighand->siglock);
}
read_unlock(&tasklist_lock);

- if (!ret && !kill)
+ if (!ret && !ignore_state)
ret = wait_task_inactive(child, TASK_TRACED) ? 0 : -ESRCH;

/* All systems go.. */
--
1.7.1

2011-05-08 15:49:31

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Currently, there's no way to trap a running ptracee short of sending a
signal which has various side effects. This patch implements
PTRACE_INTERRUPT which traps ptracee without any signal or job control
related side effect.

The implementation is almost trivial. It uses the same trap site and
event as PTRACE_SEIZE. A new trap flag JOBCTL_TRAP_INTERRUPT is
added, which is set on PTRACE_INTERRUPT and cleared when tracee
commits to INTERRUPT trap. As INTERRUPT should be useable regardless
of the current state of tracee, task_is_traced() test in
ptrace_check_attach() is skipped for INTERRUPT.

PTRACE_INTERRUPT is available iff tracee is attached with
PTRACE_SEIZE.

Test program follows.

#define PTRACE_SEIZE 0x4206
#define PTRACE_INTERRUPT 0x4207

#define PTRACE_SEIZE_DEVEL 0x80000000

static const struct timespec ts100ms = { .tv_nsec = 100000000 };
static const struct timespec ts1s = { .tv_sec = 1 };
static const struct timespec ts3s = { .tv_sec = 3 };

int main(int argc, char **argv)
{
pid_t tracee;

tracee = fork();
if (tracee == 0) {
nanosleep(&ts100ms, NULL);
while (1) {
printf("tracee: alive pid=%d\n", getpid());
nanosleep(&ts1s, NULL);
}
}

if (argc > 1)
kill(tracee, SIGSTOP);

nanosleep(&ts100ms, NULL);

ptrace(PTRACE_SEIZE, tracee, NULL,
(void *)(unsigned long)PTRACE_SEIZE_DEVEL);
waitid(P_PID, tracee, NULL, WSTOPPED);
ptrace(PTRACE_CONT, tracee, NULL, NULL);
nanosleep(&ts3s, NULL);

printf("tracer: INTERRUPT and DETACH\n");
ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
waitid(P_PID, tracee, NULL, WSTOPPED);
ptrace(PTRACE_DETACH, tracee, NULL, NULL);
nanosleep(&ts3s, NULL);

printf("tracer: exiting\n");
kill(tracee, SIGKILL);
return 0;
}

When called without argument, tracee is seized from running state,
continued, interrupted and then detached back to running state.

# ./test-interrupt
tracee: alive pid=4546
tracee: alive pid=4546
tracee: alive pid=4546
tracer: INTERRUPT and DETACH
tracee: alive pid=4546
tracee: alive pid=4546
tracee: alive pid=4546
tracer: exiting

When called with argument, it's the same but tracee is detached back
to stopped state.

# ./test-interrupt 1
tracee: alive pid=4548
tracee: alive pid=4548
tracee: alive pid=4548
tracer: INTERRUPT and DETACH
tracer: exiting

Before PTRACE_INTERRUPT, once the tracee was continued, there was no
easy way to do PTRACE_DETACH without causing side effect as tracee
couldn't be trapped without side effect.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/ptrace.h | 1 +
include/linux/sched.h | 3 ++-
kernel/ptrace.c | 23 +++++++++++++++++++++--
kernel/signal.c | 4 ++++
4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 8de301a..5b6128b 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -48,6 +48,7 @@
#define PTRACE_SETREGSET 0x4205

#define PTRACE_SEIZE 0x4206
+#define PTRACE_INTERRUPT 0x4207

/* flags in @data for PTRACE_SEIZE */
#define PTRACE_SEIZE_DEVEL 0x80000000 /* temp flag for development */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2f383eb..221ab51 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1785,9 +1785,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define JOBCTL_STOP_PENDING (1 << 17) /* task should stop for group stop */
#define JOBCTL_STOP_CONSUME (1 << 18) /* consume group stop count */
#define JOBCTL_TRAP_SEIZE (1 << 19) /* trap for seize */
+#define JOBCTL_TRAP_INTERRUPT (1 << 20) /* trap for interrupt */
#define JOBCTL_TRAPPING (1 << 22) /* switching to TRACED */

-#define JOBCTL_TRAP_MASK JOBCTL_TRAP_SEIZE
+#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_SEIZE | JOBCTL_TRAP_INTERRUPT)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)

extern void task_clear_jobctl_stop_pending(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 0f0121a..1262a36 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -693,6 +693,23 @@ int ptrace_request(struct task_struct *child, long request,
ret = ptrace_setsiginfo(child, &siginfo);
break;

+ case PTRACE_INTERRUPT:
+ if (!likely(child->ptrace & PT_SEIZED))
+ break;
+ /*
+ * Stop tracee without any side-effect on signal or job
+ * control. If @child is already trapped, the current trap
+ * is not disturbed and INTERRUPT trap will happen after
+ * the current trap is ended with PTRACE_CONT. Note that
+ * other traps may happen before the scheduled INTERRUPT.
+ */
+ spin_lock(&child->sighand->siglock);
+ child->jobctl |= JOBCTL_TRAP_INTERRUPT;
+ signal_wake_up(child, 0);
+ spin_unlock(&child->sighand->siglock);
+ ret = 0;
+ break;
+
case PTRACE_DETACH: /* detach a process that was attached. */
ret = ptrace_detach(child, data);
break;
@@ -818,7 +835,8 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
goto out_put_task_struct;
}

- ret = ptrace_check_attach(child, request == PTRACE_KILL);
+ ret = ptrace_check_attach(child, request == PTRACE_KILL ||
+ request == PTRACE_INTERRUPT);
if (ret < 0)
goto out_put_task_struct;

@@ -960,7 +978,8 @@ asmlinkage long compat_sys_ptrace(compat_long_t request, compat_long_t pid,
goto out_put_task_struct;
}

- ret = ptrace_check_attach(child, request == PTRACE_KILL);
+ ret = ptrace_check_attach(child, request == PTRACE_KILL ||
+ request == PTRACE_INTERRUPT);
if (!ret)
ret = compat_arch_ptrace(child, request, addr, data);

diff --git a/kernel/signal.c b/kernel/signal.c
index 9249230..7add912 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1711,6 +1711,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
__releases(&current->sighand->siglock)
__acquires(&current->sighand->siglock)
{
+ bool is_intr = exit_code == (SIGTRAP | (PTRACE_EVENT_INTERRUPT << 8));
bool gstop_done = false;

if (arch_ptrace_stop_needed(exit_code, info)) {
@@ -1760,6 +1761,9 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
task_clear_jobctl_trapping(current);
current->jobctl &= ~JOBCTL_TRAP_SEIZE;

+ if (is_intr)
+ current->jobctl &= ~JOBCTL_TRAP_INTERRUPT;
+
spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
if (may_ptrace_stop()) {
--
1.7.1

2011-05-08 15:51:20

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 05/11] ptrace: restructure ptrace_getsiginfo()

Flatten ptrace_getsiginfo() to prepare for more logic in the success
path. While at it, remove [un]likely() on child->last_siginfo check -
signal delivery and group stop traps can only be distinguished by NULL
siginfo and group stop isn't that unlikely.

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/ptrace.c | 21 ++++++++++++---------
1 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1262a36..2a8e930 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -529,16 +529,19 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
{
unsigned long flags;
- int error = -ESRCH;
+ int error;

- if (lock_task_sighand(child, &flags)) {
- error = -EINVAL;
- if (likely(child->last_siginfo != NULL)) {
- *info = *child->last_siginfo;
- error = 0;
- }
- unlock_task_sighand(child, &flags);
- }
+ if (!lock_task_sighand(child, &flags))
+ return -ESRCH;
+
+ error = -EINVAL;
+ if (!child->last_siginfo)
+ goto out_unlock;
+
+ error = 0;
+ *info = *child->last_siginfo;
+out_unlock:
+ unlock_task_sighand(child, &flags);
return error;
}

--
1.7.1

2011-05-08 15:50:38

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 06/11] ptrace: make group stop state visible via PTRACE_GETSIGINFO

Ptracer can detect tracee entering group stop by watching for the
group stop trap; however, there is no reliable way to find out when
group stop ends - SIGCONT may be processed by another thread and
signal delivery is blocked while tracee is trapped.

This patch adds siginfo.si_pt_flags and uses PTRACE_SI_STOPPED flag to
indicate whether group stop is in effect or not. While tracee is
trapped for anything other than signal delivery and group stop itself,
tracer can use PTRACE_GETSIGINFO to access this information. Note
that it's only available if tracee was seized.

This doesn't address notification and tracer has to put tracee in an
appropriate trap and poll the flag. Later patches will deal with
notification and trap transition.

This essentially is a simple addition of a flag but seems complicated
thanks to siginfo_t convolution. _sigtrap struct, which contains all
the fields used by ptrace_notify[_locked]() and the new _pt_flags, is
added to siginfo._sifields union along with the field abbreviation
macro si_pt_flags; then, __SI_TRAP is defined to implement copying of
the new field to userland.

Two architectures - ia64 and mips - define their own versions of
siginfo_t and ia64 implements its own copy_siginfo_to_user(). Also,
x86, mips, parisc, powerpc, s390, sparc and tile have compat_siginfo_t
and copy_siginfo_to_user32() for 32bit compatibility. All are updated
such that [compat_]siginfo_t have _sigtrap and all the fields are
copied out.

x86 is tested. Affected code in mips, powerpc, s390 and sparc are
compile tested. mips and tile are untested.

Test program follows.

#define PTRACE_SEIZE 0x4206
#define PTRACE_INTERRUPT 0x4207

#define PTRACE_SEIZE_DEVEL 0x80000000

static const struct timespec ts1s = { .tv_sec = 1 };

int main(int argc, char **argv)
{
pid_t tracee, tracer;
int i;

tracee = fork();
if (!tracee)
while (1)
nanosleep(&ts1s, NULL);

tracer = fork();
if (!tracer) {
int last_stopped = 0, stopped;
siginfo_t si;

ptrace(PTRACE_SEIZE, tracee, NULL,
(void *)(unsigned long)PTRACE_SEIZE_DEVEL);
repeat:
waitid(P_PID, tracee, NULL, WSTOPPED);

if (!ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si)) {
if (si.si_code) {
stopped = !!si.si_status;
if (stopped != last_stopped)
printf("tracer: stopped=%d\n", stopped);
last_stopped = stopped;
ptrace(PTRACE_CONT, tracee, NULL, NULL);
} else {
printf("tracer: SIG %d\n", si.si_signo);
ptrace(PTRACE_CONT, tracee, NULL,
(void *)(unsigned long)si.si_signo);
}
} else
ptrace(PTRACE_CONT, tracee, NULL, NULL);
ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
goto repeat;
}

for (i = 0; i < 3; i++) {
nanosleep(&ts1s, NULL);
printf("mother: SIGSTOP\n");
kill(tracee, SIGSTOP);
nanosleep(&ts1s, NULL);
printf("mother: SIGCONT\n");
kill(tracee, SIGCONT);
}
nanosleep(&ts1s, NULL);

kill(tracer, SIGKILL);
kill(tracee, SIGKILL);
return 0;
}

Tracer delivers signal, resumes group stop, induces INTERRUPT traps
and reports group stop state change in busy loop. Mother sends
SIGSTOP or CONT to tracee on each second. Note that si_pt_flags and
flag testing are replaced with si_status testing. si_status occupies
the same offset as si_pt_flags and PTRACE_SI_STOPPED is the only flag
defined, so I took a dirty short cut.

# ./test-stopped
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1
mother: SIGCONT
tracer: SIG 18
tracer: stopped=0
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1
mother: SIGCONT
tracer: stopped=0
tracer: SIG 18
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1
mother: SIGCONT
tracer: SIG 18
tracer: stopped=0

Signed-off-by: Tejun Heo <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Kyle McMartin <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: [email protected]
---
arch/ia64/include/asm/siginfo.h | 7 +++++++
arch/ia64/kernel/signal.c | 5 +++++
arch/mips/include/asm/compat-signal.h | 7 +++++++
arch/mips/include/asm/siginfo.h | 7 +++++++
arch/mips/kernel/signal32.c | 5 +++++
arch/parisc/kernel/signal32.c | 5 +++++
arch/parisc/kernel/signal32.h | 7 +++++++
arch/powerpc/kernel/ppc32.h | 7 +++++++
arch/powerpc/kernel/signal_32.c | 5 +++++
arch/s390/kernel/compat_linux.h | 7 +++++++
arch/s390/kernel/compat_signal.c | 5 +++++
arch/sparc/kernel/signal32.c | 12 ++++++++++++
arch/tile/kernel/compat_signal.c | 11 +++++++++++
arch/x86/ia32/ia32_signal.c | 4 ++++
arch/x86/include/asm/ia32.h | 7 +++++++
include/asm-generic/siginfo.h | 10 ++++++++++
include/linux/ptrace.h | 3 +++
kernel/ptrace.c | 13 +++++++++++++
kernel/signal.c | 7 ++++++-
19 files changed, 133 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/include/asm/siginfo.h b/arch/ia64/include/asm/siginfo.h
index c8fcaa2..2cff1ce 100644
--- a/arch/ia64/include/asm/siginfo.h
+++ b/arch/ia64/include/asm/siginfo.h
@@ -70,6 +70,13 @@ typedef struct siginfo {
long _band; /* POLL_IN, POLL_OUT, POLL_MSG (XPG requires a "long") */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ pid_t _pid; /* sender's pid */
+ uid_t _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
} siginfo_t;

diff --git a/arch/ia64/kernel/signal.c b/arch/ia64/kernel/signal.c
index 7bdafc8..ee18366 100644
--- a/arch/ia64/kernel/signal.c
+++ b/arch/ia64/kernel/signal.c
@@ -142,6 +142,11 @@ copy_siginfo_to_user (siginfo_t __user *to, siginfo_t *from)
err |= __put_user(from->si_addr, &to->si_addr);
err |= __put_user(from->si_imm, &to->si_imm);
break;
+ case __SI_TRAP >> 16:
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+ break;
case __SI_TIMER >> 16:
err |= __put_user(from->si_tid, &to->si_tid);
err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/mips/include/asm/compat-signal.h b/arch/mips/include/asm/compat-signal.h
index 368a99e..47b2e4f 100644
--- a/arch/mips/include/asm/compat-signal.h
+++ b/arch/mips/include/asm/compat-signal.h
@@ -54,6 +54,13 @@ typedef struct compat_siginfo {
int _fd;
} _sigpoll;

+ /* SIGTRAP */
+ struct {
+ compat_pid_t _pid; /* sender's pid */
+ compat_uid_t _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
+
/* POSIX.1b timers */
struct {
timer_t _tid; /* timer id */
diff --git a/arch/mips/include/asm/siginfo.h b/arch/mips/include/asm/siginfo.h
index 1ca64b4..232920f 100644
--- a/arch/mips/include/asm/siginfo.h
+++ b/arch/mips/include/asm/siginfo.h
@@ -96,6 +96,13 @@ typedef struct siginfo {
__ARCH_SI_BAND_T _band; /* POLL_IN, POLL_OUT, POLL_MSG */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ pid_t _pid; /* sender's pid */
+ __ARCH_SI_UID_T _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
} siginfo_t;

diff --git a/arch/mips/kernel/signal32.c b/arch/mips/kernel/signal32.c
index aae9866..7e392e1 100644
--- a/arch/mips/kernel/signal32.c
+++ b/arch/mips/kernel/signal32.c
@@ -452,6 +452,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
err |= __put_user(from->si_band, &to->si_band);
err |= __put_user(from->si_fd, &to->si_fd);
break;
+ case __SI_TRAP >> 16:
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+ break;
case __SI_RT >> 16: /* This is not generated by the kernel as of now. */
case __SI_MESGQ >> 16:
err |= __put_user(from->si_pid, &to->si_pid);
diff --git a/arch/parisc/kernel/signal32.c b/arch/parisc/kernel/signal32.c
index e141324..ead8ca4 100644
--- a/arch/parisc/kernel/signal32.c
+++ b/arch/parisc/kernel/signal32.c
@@ -482,6 +482,11 @@ copy_siginfo_to_user32 (compat_siginfo_t __user *to, siginfo_t *from)
err |= __put_user(from->si_band, &to->si_band);
err |= __put_user(from->si_fd, &to->si_fd);
break;
+ case __SI_TRAP >> 16:
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+ break;
case __SI_TIMER >> 16:
err |= __put_user(from->si_tid, &to->si_tid);
err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/parisc/kernel/signal32.h b/arch/parisc/kernel/signal32.h
index c780084..8016f51 100644
--- a/arch/parisc/kernel/signal32.h
+++ b/arch/parisc/kernel/signal32.h
@@ -104,6 +104,13 @@ typedef struct compat_siginfo {
int _band; /* POLL_IN, POLL_OUT, POLL_MSG */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ unsigned int _pid; /* sender's pid */
+ unsigned int _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
} compat_siginfo_t;

diff --git a/arch/powerpc/kernel/ppc32.h b/arch/powerpc/kernel/ppc32.h
index dc16aef..4293542 100644
--- a/arch/powerpc/kernel/ppc32.h
+++ b/arch/powerpc/kernel/ppc32.h
@@ -64,6 +64,13 @@ typedef struct compat_siginfo {
int _band; /* POLL_IN, POLL_OUT, POLL_MSG */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ compat_pid_t _pid; /* sender's pid */
+ compat_uid_t _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
} compat_siginfo_t;

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index b96a3a0..d072458 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -716,6 +716,11 @@ int copy_siginfo_to_user32(struct compat_siginfo __user *d, siginfo_t *s)
err |= __put_user(s->si_band, &d->si_band);
err |= __put_user(s->si_fd, &d->si_fd);
break;
+ case __SI_TRAP:
+ err |= __put_user(s->si_pid, &d->si_pid);
+ err |= __put_user(s->si_uid, &d->si_uid);
+ err |= __put_user(s->si_pt_flags, &d->si_pt_flags);
+ break;
case __SI_TIMER >> 16:
err |= __put_user(s->si_tid, &d->si_tid);
err |= __put_user(s->si_overrun, &d->si_overrun);
diff --git a/arch/s390/kernel/compat_linux.h b/arch/s390/kernel/compat_linux.h
index 9635d75..f8c973f 100644
--- a/arch/s390/kernel/compat_linux.h
+++ b/arch/s390/kernel/compat_linux.h
@@ -72,6 +72,13 @@ typedef struct compat_siginfo {
int _band; /* POLL_IN, POLL_OUT, POLL_MSG */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ pid_t _pid; /* sender's pid */
+ uid_t _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
} compat_siginfo_t;

diff --git a/arch/s390/kernel/compat_signal.c b/arch/s390/kernel/compat_signal.c
index eee9998..b3c9f6b 100644
--- a/arch/s390/kernel/compat_signal.c
+++ b/arch/s390/kernel/compat_signal.c
@@ -96,6 +96,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
err |= __put_user(from->si_band, &to->si_band);
err |= __put_user(from->si_fd, &to->si_fd);
break;
+ case __SI_TRAP >> 16:
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+ break;
case __SI_TIMER >> 16:
err |= __put_user(from->si_tid, &to->si_tid);
err |= __put_user(from->si_overrun, &to->si_overrun);
diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c
index 75fad42..c545212 100644
--- a/arch/sparc/kernel/signal32.c
+++ b/arch/sparc/kernel/signal32.c
@@ -102,6 +102,13 @@ typedef struct compat_siginfo{
int _band; /* POLL_IN, POLL_OUT, POLL_MSG */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ compat_pid_t _pid; /* sender's pid */
+ unsigned int _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
}compat_siginfo_t;

@@ -165,6 +172,11 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
err |= __put_user(from->si_band, &to->si_band);
err |= __put_user(from->si_fd, &to->si_fd);
break;
+ case __SI_TRAP >> 16:
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+ break;
case __SI_RT >> 16: /* This is not generated by the kernel as of now. */
case __SI_MESGQ >> 16:
err |= __put_user(from->si_pid, &to->si_pid);
diff --git a/arch/tile/kernel/compat_signal.c b/arch/tile/kernel/compat_signal.c
index dbb0dfc..0a5d694 100644
--- a/arch/tile/kernel/compat_signal.c
+++ b/arch/tile/kernel/compat_signal.c
@@ -109,6 +109,13 @@ struct compat_siginfo {
int _band; /* POLL_IN, POLL_OUT, POLL_MSG */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ unsigned int _pid; /* sender's pid */
+ unsigned int _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
};

@@ -219,6 +226,10 @@ int copy_siginfo_to_user32(struct compat_siginfo __user *to, siginfo_t *from)
case __SI_POLL >> 16:
err |= __put_user(from->si_fd, &to->si_fd);
break;
+ case __SI_TRAP >> 16:
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+ break;
case __SI_TIMER >> 16:
err |= __put_user(from->si_overrun, &to->si_overrun);
err |= __put_user(ptr_to_compat(from->si_ptr),
diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 588a7aa..1df88cc 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -85,6 +85,10 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
case __SI_POLL >> 16:
put_user_ex(from->si_fd, &to->si_fd);
break;
+ case __SI_TRAP:
+ put_user_ex(from->si_uid, &to->si_uid);
+ put_user_ex(from->si_pt_flags, &to->si_pt_flags);
+ break;
case __SI_TIMER >> 16:
put_user_ex(from->si_overrun, &to->si_overrun);
put_user_ex(ptr_to_compat(from->si_ptr),
diff --git a/arch/x86/include/asm/ia32.h b/arch/x86/include/asm/ia32.h
index 1f7e625..7eab27a 100644
--- a/arch/x86/include/asm/ia32.h
+++ b/arch/x86/include/asm/ia32.h
@@ -126,6 +126,13 @@ typedef struct compat_siginfo {
int _band; /* POLL_IN, POLL_OUT, POLL_MSG */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ unsigned int _pid; /* sender's pid */
+ unsigned int _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
} compat_siginfo_t;

diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 942d30b..e243927 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -90,6 +90,13 @@ typedef struct siginfo {
__ARCH_SI_BAND_T _band; /* POLL_IN, POLL_OUT, POLL_MSG */
int _fd;
} _sigpoll;
+
+ /* SIGTRAP */
+ struct {
+ __kernel_pid_t _pid; /* sender's pid */
+ __ARCH_SI_UID_T _uid; /* sender's uid */
+ unsigned int _pt_flags;
+ } _sigtrap;
} _sifields;
} siginfo_t;

@@ -116,6 +123,7 @@ typedef struct siginfo {
#define si_addr_lsb _sifields._sigfault._addr_lsb
#define si_band _sifields._sigpoll._band
#define si_fd _sifields._sigpoll._fd
+#define si_pt_flags _sifields._sigtrap._pt_flags

#ifdef __KERNEL__
#define __SI_MASK 0xffff0000u
@@ -126,6 +134,7 @@ typedef struct siginfo {
#define __SI_CHLD (4 << 16)
#define __SI_RT (5 << 16)
#define __SI_MESGQ (6 << 16)
+#define __SI_TRAP (7 << 16)
#define __SI_CODE(T,N) ((T) | ((N) & 0xffff))
#else
#define __SI_KILL 0
@@ -135,6 +144,7 @@ typedef struct siginfo {
#define __SI_CHLD 0
#define __SI_RT 0
#define __SI_MESGQ 0
+#define __SI_TRAP 0
#define __SI_CODE(T,N) (N)
#endif

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 5b6128b..567e189 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -73,6 +73,9 @@
#define PTRACE_EVENT_EXIT 6
#define PTRACE_EVENT_INTERRUPT 7

+/* flags in siginfo.si_pt_flags from PTRACE_GETSIGINFO */
+#define PTRACE_SI_STOPPED 0x00000001 /* tracee is job control stopped */
+
#include <asm/ptrace.h>

#ifdef __KERNEL__
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 2a8e930..b18a9b3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -528,11 +528,13 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)

static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
{
+ struct signal_struct *sig;
unsigned long flags;
int error;

if (!lock_task_sighand(child, &flags))
return -ESRCH;
+ sig = child->signal;

error = -EINVAL;
if (!child->last_siginfo)
@@ -540,6 +542,17 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)

error = 0;
*info = *child->last_siginfo;
+
+ /*
+ * If reporting ptrace trap for a seized tracee, enable reporting
+ * of info->si_pt_flags.
+ */
+ if ((child->ptrace & PT_SEIZED) &&
+ (info->si_code & (0x7f | ~0xffff)) == (__SI_TRAP | SIGTRAP)) {
+ /* report whether group stop is in effect w/ SI_STOPPED */
+ if (sig->group_stop_count || (sig->flags & SIGNAL_STOP_STOPPED))
+ info->si_pt_flags |= PTRACE_SI_STOPPED;
+ }
out_unlock:
unlock_task_sighand(child, &flags);
return error;
diff --git a/kernel/signal.c b/kernel/signal.c
index 7add912..9705f5d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1842,7 +1842,7 @@ static void ptrace_notify_locked(int exit_code)

memset(&info, 0, sizeof info);
info.si_signo = SIGTRAP;
- info.si_code = exit_code;
+ info.si_code = __SI_TRAP | exit_code;
info.si_pid = task_pid_vnr(current);
info.si_uid = current_uid();

@@ -2494,6 +2494,11 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
err |= __put_user(from->si_band, &to->si_band);
err |= __put_user(from->si_fd, &to->si_fd);
break;
+ case __SI_TRAP:
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user(from->si_pt_flags, &to->si_pt_flags);
+ break;
case __SI_FAULT:
err |= __put_user(from->si_addr, &to->si_addr);
#ifdef __ARCH_SI_TRAPNO
--
1.7.1

2011-05-08 15:50:35

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 07/11] ptrace: add JOBCTL_TRAPPED

For to-be-added end of group stop notification, ptracer needs to tell
whether tracee is in group stop or INTERRUPT trap. Add JOBCTL_TRAPPED
flag which is set when tracee commits to group stop or INTERRUPT trap
in ptrace_stop() and cleared when leaving ptrace_stop().

The flag isn't yet used and doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/sched.h | 1 +
kernel/signal.c | 9 +++++++++
2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 221ab51..9d92444 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1787,6 +1787,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define JOBCTL_TRAP_SEIZE (1 << 19) /* trap for seize */
#define JOBCTL_TRAP_INTERRUPT (1 << 20) /* trap for interrupt */
#define JOBCTL_TRAPPING (1 << 22) /* switching to TRACED */
+#define JOBCTL_TRAPPED (1 << 23) /* trapped for group stop */

#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_SEIZE | JOBCTL_TRAP_INTERRUPT)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/signal.c b/kernel/signal.c
index 9705f5d..208f061 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1764,6 +1764,14 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
if (is_intr)
current->jobctl &= ~JOBCTL_TRAP_INTERRUPT;

+ /*
+ * Traps for both actual group stop and INTERRUPT are considered
+ * trapped for job control and have TRAPPED set. This enables
+ * end-of-group-stop retrapping of INTERRUPTed tracees.
+ */
+ if (why == CLD_STOPPED || is_intr)
+ current->jobctl |= JOBCTL_TRAPPED;
+
spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
if (may_ptrace_stop()) {
@@ -1824,6 +1832,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
* any signal-sending on another CPU that wants to examine it.
*/
spin_lock_irq(&current->sighand->siglock);
+ current->jobctl &= ~JOBCTL_TRAPPED;
current->last_siginfo = NULL;

/*
--
1.7.1

2011-05-08 15:50:14

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 08/11] ptrace: move fallback JOBCTL_TRAPPING clearing to get_signal_to_deliver()

JOBCTL_TRAPPING is currently used to hide TASK_STOPPED -> TASK_TRACED
transition on ptrace attach/seize. As such, it is set only while
tracee is inside do_signal_stop() and gets cleread by entering
TASK_TRACED in ptrace_stop(); however, if attach races with kill,
ptrace_stop() can be skipped. To make sure the tracer is woken up in
this case, task_clear_jobctl_trapping() is always called before
leaving do_signal_stop().

To-be-added end of group stop notification will extend the use of
JOBCTL_TRAPPING to move tracee from group stop trap to INTERRUPT trap
or repeat INTERRUPT trap. This requires TASK_TRAPPING to be
maintained inside signal delivery path.

Move the fallback clearing to the end of get_signal_to_deliver() so
that TRAPPING is maintained while tracee is inside signal delivery
path. When killed, tracee is guaranteed to leave signal delivery path
in finite amount of time and thus TRAPPING is still guaranteed to be
cleared on kill.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/signal.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 208f061..a7f65a6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -227,7 +227,10 @@ static inline void print_dropped_signal(int sig)
* task_clear_jobctl_trapping - clear jobctl trapping bit
* @task: target task
*
- * If JOBCTL_TRAPPING is set, a ptracer is waiting for us to enter TRACED.
+ * If %JOBCTL_TRAPPING is set, ptracer is waiting for us to enter
+ * %TASK_TRACED. It can be set only while we're inside do_signal_stop()
+ * and must be cleared before leaving signal delivery path.
+ *
* Clear it and wake up the ptracer. Note that we don't need any further
* locking. @task->siglock guarantees that @task->parent points to the
* ptracer.
@@ -1978,9 +1981,6 @@ retry:
goto retry;
}

- /* PTRACE_ATTACH might have raced with task killing, clear trapping */
- task_clear_jobctl_trapping(current);
-
spin_unlock_irq(&current->sighand->siglock);

tracehook_finish_jctl();
@@ -2226,6 +2226,13 @@ relock:
do_group_exit(info->si_signo);
/* NOTREACHED */
}
+
+ /*
+ * PTRACE_ATTACH might have raced with task killing. Make sure
+ * trapping is clear before leaving signal delivery path.
+ */
+ task_clear_jobctl_trapping(current);
+
spin_unlock_irq(&sighand->siglock);
return signr;
}
--
1.7.1

2011-05-08 15:49:38

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 09/11] job control: reorganize wait_task_stopped()

wait_task_stopped() tested task_stopped_code() without acquiring
siglock and, if stop condition existed, called wait_task_stopped() and
directly returned the result. This patch moves the initial
task_stopped_code() testing into wait_task_stopped() and make
wait_consider_task() fall through to wait_task_continue() on 0 return.

This is for the following two reasons.

* Because the initial task_stopped_code() test is done without
acquiring siglock, it may race against SIGCONT generation. The
stopped condition might have been replaced by continued state by the
time wait_task_stopped() acquired siglock. This may lead to
unexpected failure of WNOHANG waits.

This reorganization addresses this single race case but there are
other cases - TASK_RUNNING -> TASK_STOPPED transition and EXIT_*
transitions.

It seems that WNOHANG wait correctness has never been guaranteed and
everybody has been happy with it for very long time. As such,
although this reorganization improves the situation a bit, I don't
consider this to be a bug fix.

* Scheduled ptrace updates require changes to the initial test which
would fit better inside wait_task_stopped().

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/exit.c | 30 +++++++++++++++++++++++-------
1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 5cbc83e..3383793 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1377,11 +1377,23 @@ static int *task_stopped_code(struct task_struct *p, bool ptrace)
return NULL;
}

-/*
- * Handle sys_wait4 work for one task in state TASK_STOPPED. We hold
- * read_lock(&tasklist_lock) on entry. If we return zero, we still hold
- * the lock and this task is uninteresting. If we return nonzero, we have
- * released the lock and the system call should return.
+/**
+ * wait_task_stopped - Wait for %TASK_STOPPED or %TASK_TRACED
+ * @wo: wait options
+ * @ptrace: is the wait for ptrace
+ * @p: task to wait for
+ *
+ * Handle sys_wait4() work for %p in state %TASK_STOPPED or %TASK_TRACED.
+ *
+ * CONTEXT:
+ * read_lock(&tasklist_lock), which is released if return value is
+ * non-zero. Also, grabs and releases @p->sighand->siglock.
+ *
+ * RETURNS:
+ * 0 if wait condition didn't exist and search for other wait conditions
+ * should continue. Non-zero return, -errno on failure and @p's pid on
+ * success, implies that tasklist_lock is released and wait condition
+ * search should terminate.
*/
static int wait_task_stopped(struct wait_opts *wo,
int ptrace, struct task_struct *p)
@@ -1397,6 +1409,9 @@ static int wait_task_stopped(struct wait_opts *wo,
if (!ptrace && !(wo->wo_flags & WUNTRACED))
return 0;

+ if (!task_stopped_code(p, ptrace))
+ return 0;
+
exit_code = 0;
spin_lock_irq(&p->sighand->siglock);

@@ -1607,8 +1622,9 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
* Wait for stopped. Depending on @ptrace, different stopped state
* is used and the two don't interact with each other.
*/
- if (task_stopped_code(p, ptrace))
- return wait_task_stopped(wo, ptrace, p);
+ ret = wait_task_stopped(wo, ptrace, p);
+ if (ret)
+ return ret;

/*
* Wait for continued. There's only one continued state and the
--
1.7.1

2011-05-08 15:49:40

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

Currently, JOBCTL_TRAPPING is used by PTRACE_ATTACH and SEIZE to hide
TASK_STOPPED -> TRACED transition from ptracer. If tracee is in group
stop, TRAPPING is set, tracee is kicked and tracer waits for the
transition to complete before completing attach. This prevents tracer
from seeing tracee during transition.

The transition is visible only through wait(2) and following ptrace(2)
requests. Without TRAPPING, WNOHANG which should succeed right after
attach (when tracer knows tracee was stopped) might fail and likewise
for the following ptrace requests.

TRAPPING will also be used to implement end of group stop retrapping,
which can be initiated by tasks other than tracer. To allow this,
this patch moves TRAPPING wait from attach completion path to
operations which are actually affected by the transition - wait(2) and
following ptrace(2) requests.

As reliably checking and modifying TASK_STOPPED/TRACED transition
together with JOBCTL_TRAPPING require siglock and both ptrace and wait
paths are holding tasklist_lock and siglock where TRAPPING check is
needed, ptrace_wait_trapping() assumes both locks to be held on entry
and releases them if it actually had to wait for TRAPPING.

Both wait and ptrace paths are updated to retry the operation after
TRAPPING wait. Note that wait_task_stopped() now always grabs siglock
for ptrace waits. This can be avoided with "task_stopped_code() ->
rmb() -> TRAPPING -> rmb() -> task_stopped_code()" sequence but given
that ptrace isn't particularly sensitive to performance or
scalability, choosing simpler implementation seems better.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/ptrace.h | 1 +
include/linux/sched.h | 2 +-
kernel/exit.c | 21 +++++++++++++++++++--
kernel/ptrace.c | 48 +++++++++++++++++++++++++++++++++++++++++++++---
kernel/signal.c | 5 +++--
5 files changed, 69 insertions(+), 8 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 567e189..4b42a15 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -116,6 +116,7 @@ extern long arch_ptrace(struct task_struct *child, long request,
extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
extern int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned long dst, int len);
extern void ptrace_disable(struct task_struct *);
+extern bool ptrace_wait_trapping(struct task_struct *child);
extern int ptrace_check_attach(struct task_struct *task, bool ignore_state);
extern int ptrace_request(struct task_struct *child, long request,
unsigned long addr, unsigned long data);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9d92444..972f1db 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1786,7 +1786,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define JOBCTL_STOP_CONSUME (1 << 18) /* consume group stop count */
#define JOBCTL_TRAP_SEIZE (1 << 19) /* trap for seize */
#define JOBCTL_TRAP_INTERRUPT (1 << 20) /* trap for interrupt */
-#define JOBCTL_TRAPPING (1 << 22) /* switching to TRACED */
+#define JOBCTL_TRAPPING (1 << 22) /* switching to TRACED/STOPPED */
#define JOBCTL_TRAPPED (1 << 23) /* trapped for group stop */

#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_SEIZE | JOBCTL_TRAP_INTERRUPT)
diff --git a/kernel/exit.c b/kernel/exit.c
index 3383793..a43af3a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1409,15 +1409,29 @@ static int wait_task_stopped(struct wait_opts *wo,
if (!ptrace && !(wo->wo_flags & WUNTRACED))
return 0;

- if (!task_stopped_code(p, ptrace))
+ /*
+ * For ptrace waits, we can't reliably check whether wait condition
+ * exists without grabbing siglock due to JOBCTL_TRAPPING
+ * transitions. A task might be temporarily in TASK_RUNNING while
+ * trapping which should be transparent to the ptracer.
+ *
+ * Note that we can avoid unconditionally grabbing siglock with by
+ * wrapping TRAPPING test with two rmb's; however, let's stick with
+ * simpler implementation for now.
+ */
+ if (!ptrace && !(p->signal->flags & SIGNAL_STOP_STOPPED))
return 0;

exit_code = 0;
spin_lock_irq(&p->sighand->siglock);

p_code = task_stopped_code(p, ptrace);
- if (unlikely(!p_code))
+ if (unlikely(!p_code)) {
+ /* if trapping, wait for it and restart the whole process */
+ if (ptrace && ptrace_wait_trapping(p))
+ return -EAGAIN;
goto unlock_sig;
+ }

exit_code = *p_code;
if (!exit_code)
@@ -1740,6 +1754,9 @@ notask:
}
}
end:
+ /* -EAGAIN if we hit trapping ptracee and slept - retry */
+ if (unlikely(retval == -EAGAIN))
+ goto repeat;
__set_current_state(TASK_RUNNING);
remove_wait_queue(&current->signal->wait_chldexit, &wo->child_wait);
return retval;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b18a9b3..7411eb2 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -103,6 +103,48 @@ void __ptrace_unlink(struct task_struct *child)
}

/**
+ * ptrace_wait_trapping - wait ptracee to finish %TASK_TRACED/STOPPED transition
+ * @child: child to wait for
+ *
+ * There are cases where ptracer needs to ask the ptracee to [re]enter
+ * %TASK_TRACED or %TASK_STOPPED which involves the tracee going through
+ * %TASK_RUNNING briefly, which could affect operation of ptrace(2) and
+ * wait(2).
+ *
+ * %JOBCTL_TRAPPING is used to hide such transitions from the ptracer.
+ * It's set when such transition is initiated by the ptracer and cleared on
+ * completion. Operations which may be affected should call this function
+ * to make sure no transition is in progress before proceeding.
+ *
+ * This function checks whether @child is trapping and if so waits for the
+ * transition to complete.
+ *
+ * CONTEXT:
+ * read_lock(&tasklist_lock) and spin_lock_irq(&child->sighand->siglock).
+ * On %true return, both locks are released and the function might have
+ * slept.
+ *
+ * RETURNS:
+ * %false if @child wasn't trapping and nothing happened. %true if waited
+ * for trapping transition and released both locks.
+ */
+bool ptrace_wait_trapping(struct task_struct *child)
+ __releases(&child->sighand->siglock)
+ __releases(&tasklist_lock)
+{
+ if (likely(!(child->jobctl & JOBCTL_TRAPPING)))
+ return false;
+
+ spin_unlock_irq(&child->sighand->siglock);
+ get_task_struct(child);
+ read_unlock(&tasklist_lock);
+ wait_event(current->signal->wait_chldexit,
+ !(child->jobctl & JOBCTL_TRAPPING));
+ put_task_struct(child);
+ return true;
+}
+
+/**
* ptrace_check_attach - check whether ptracee is ready for ptrace operation
* @child: ptracee to check for
* @ignore_state: don't check whether @child is currently %TASK_TRACED
@@ -122,7 +164,7 @@ void __ptrace_unlink(struct task_struct *child)
int ptrace_check_attach(struct task_struct *child, bool ignore_state)
{
int ret = -ESRCH;
-
+retry:
/*
* We take the read lock around doing both checks to close a
* possible race where someone else was tracing our child and
@@ -140,6 +182,8 @@ int ptrace_check_attach(struct task_struct *child, bool ignore_state)
WARN_ON_ONCE(task_is_stopped(child));
if (task_is_traced(child) || ignore_state)
ret = 0;
+ else if (ptrace_wait_trapping(child))
+ goto retry;
spin_unlock_irq(&child->sighand->siglock);
}
read_unlock(&tasklist_lock);
@@ -304,8 +348,6 @@ unlock_tasklist:
unlock_creds:
mutex_unlock(&task->signal->cred_guard_mutex);
out:
- wait_event(current->signal->wait_chldexit,
- !(task->jobctl & JOBCTL_TRAPPING));
return retval;
}

diff --git a/kernel/signal.c b/kernel/signal.c
index a7f65a6..dce2961 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -228,8 +228,9 @@ static inline void print_dropped_signal(int sig)
* @task: target task
*
* If %JOBCTL_TRAPPING is set, ptracer is waiting for us to enter
- * %TASK_TRACED. It can be set only while we're inside do_signal_stop()
- * and must be cleared before leaving signal delivery path.
+ * %TASK_TRACED or %TASK_STOPPED. It can be set only while we're inside
+ * do_signal_stop() and must be cleared before leaving signal delivery
+ * path.
*
* Clear it and wake up the ptracer. Note that we don't need any further
* locking. @task->siglock guarantees that @task->parent points to the
--
1.7.1

2011-05-08 15:49:58

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 11/11] ptrace: implement group stop notification for ptracer

Currently there's no way for ptracer to find out whether group stop
that tracee was in finished other than polling with PTRACE_GETSIGINFO.
Also, tracer can't detect new group stop started by an untraced thread
if tracee is already trapped. This patch implements group stop
notification for ptracer using INTERRUPT traps.

When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
is set, which triggers INTERRUPT trap but is sticky until the next
PTRACE_GETSIGINFO. As GETSIGINFO exports the current group stop
state, this guarantees that tracer checks the current group stop state
at least once after group stop state change. Stickiness is necessary
because notification trap may race with PTRACE_CONT for other traps
and get lost.

-EINVAL return from GETSIGINFO also clears the sticky trap. This is
because -EINVAL clearly indicates that tracee is in group stop. To
avoid unnecessarily taking INTERRUPT trap on the way to group stop, if
JOBCTL_STOP_PENDING is set, INTERRUPT trap is not taken.

Note that simply scheduling such trap isn't enough. If tracee is
running (PTRACE_CONT'd from group stop trap), the usual trapping -
setting NOTIFY followed by the usual signal_wake_up() - is enough;
however, if tracee is trapped, the scheduled trap won't happen until
the trap is continued.

This is solved by re-trapping if tracee is in group stop or INTERRUPT
trap. Along with JOBCTL_TRAP_NOTIFY, JOBCTL_TRAPPING is set and
tracee is woken up from TASK_TRACED. Tracee then (re-)enters
INTERRUPT trap generating notification for tracer. TRAPPING hides the
TRACED -> RUNNING -> TRACED transition from tracer.

Re-trapping is used only for group stop and INTERRUPT traps. If
tracer wants to get notified about group stop, it either leaves tracee
in the initial group stop trap or puts it into INTERRUPT trap. When
INTERRUPT trap is scheduled while tracee is already in a trap, it's
guaranteed that tracee will enter INTERRUPT trap without returning to
userland, so tracer doesn't lose any control over tracee execution for
group stop notification.

This exclusion is intentional as enabling re-trapping on all traps may
make using ptrace(2) more confusing than necessary and confining
re-trapping to group stop and INTERRUPT doesn't lose any
functionality. It also simplifies implementation.

An example program follows.

#define PTRACE_SEIZE 0x4206
#define PTRACE_INTERRUPT 0x4207

#define PTRACE_SEIZE_DEVEL 0x80000000

static const struct timespec ts1s = { .tv_sec = 1 };

int main(int argc, char **argv)
{
pid_t tracee, tracer;
int i;

tracee = fork();
if (!tracee)
while (1)
pause();

tracer = fork();
if (!tracer) {
int last_stopped = 0, stopped;
siginfo_t si;

ptrace(PTRACE_SEIZE, tracee, NULL,
(void *)(unsigned long)PTRACE_SEIZE_DEVEL);
repeat:
waitid(P_PID, tracee, NULL, WSTOPPED);

if (!ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si)) {
if (!si.si_code) {
printf("tracer: SIG %d\n", si.si_signo);
ptrace(PTRACE_CONT, tracee, NULL,
(void *)(unsigned long)si.si_signo);
goto repeat;
}
stopped = !!si.si_status;
} else
stopped = 1;

if (stopped != last_stopped)
printf("tracer: stopped=%d\n", stopped);
last_stopped = stopped;

if (!stopped)
ptrace(PTRACE_CONT, tracee, NULL, NULL);
goto repeat;
}

for (i = 0; i < 3; i++) {
nanosleep(&ts1s, NULL);
printf("mother: SIGSTOP\n");
kill(tracee, SIGSTOP);
nanosleep(&ts1s, NULL);
printf("mother: SIGCONT\n");
kill(tracee, SIGCONT);
}
nanosleep(&ts1s, NULL);

kill(tracer, SIGKILL);
kill(tracee, SIGKILL);
return 0;
}

In the above program, tracer gets notification of group stop state
changes and can track stopped state without polling PTRACE_GETSIGINFO.

# ./test-gstop-notify
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1
mother: SIGCONT
tracer: stopped=0
tracer: SIG 18
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1
mother: SIGCONT
tracer: stopped=0
tracer: SIG 18
mother: SIGSTOP
tracer: SIG 19
tracer: stopped=1
mother: SIGCONT
tracer: stopped=0
tracer: SIG 18

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/sched.h | 4 ++-
kernel/ptrace.c | 13 +++++++++-
kernel/signal.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 972f1db..e3d4e3d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1786,10 +1786,12 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define JOBCTL_STOP_CONSUME (1 << 18) /* consume group stop count */
#define JOBCTL_TRAP_SEIZE (1 << 19) /* trap for seize */
#define JOBCTL_TRAP_INTERRUPT (1 << 20) /* trap for interrupt */
+#define JOBCTL_TRAP_NOTIFY (1 << 21) /* sticky trap for notifications */
#define JOBCTL_TRAPPING (1 << 22) /* switching to TRACED/STOPPED */
#define JOBCTL_TRAPPED (1 << 23) /* trapped for group stop */

-#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_SEIZE | JOBCTL_TRAP_INTERRUPT)
+#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_SEIZE | JOBCTL_TRAP_INTERRUPT | \
+ JOBCTL_TRAP_NOTIFY)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)

extern void task_clear_jobctl_stop_pending(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7411eb2..4e9473b 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -570,6 +570,7 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)

static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
{
+ bool clear_notify = false;
struct signal_struct *sig;
unsigned long flags;
int error;
@@ -579,8 +580,14 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
sig = child->signal;

error = -EINVAL;
- if (!child->last_siginfo)
+ if (!child->last_siginfo) {
+ /*
+ * Clear notification trap on NULL siginfo too. It clearly
+ * indicates group stop trap.
+ */
+ clear_notify = true;
goto out_unlock;
+ }

error = 0;
*info = *child->last_siginfo;
@@ -594,8 +601,12 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
/* report whether group stop is in effect w/ SI_STOPPED */
if (sig->group_stop_count || (sig->flags & SIGNAL_STOP_STOPPED))
info->si_pt_flags |= PTRACE_SI_STOPPED;
+ /* tracer got siginfo, clear the sticky trap */
+ clear_notify = true;
}
out_unlock:
+ if (clear_notify)
+ child->jobctl &= ~JOBCTL_TRAP_NOTIFY;
unlock_task_sighand(child, &flags);
return error;
}
diff --git a/kernel/signal.c b/kernel/signal.c
index dce2961..271788d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -778,6 +778,54 @@ static int check_kill_permission(int sig, struct siginfo *info,
return security_task_kill(t, info, sig, 0);
}

+/**
+ * ptrace_trap_notify - schedule trap to notify ptracer
+ * @t: tracee wanting to notify tracer
+ *
+ * This function schedules sticky ptrace trap which is cleared on
+ * PTRACE_GETSIGINFO to notify ptracer of an event. @t must have been
+ * seized by ptracer.
+ *
+ * If @t is running, INTERRUPT trap will be taken. If trapped for group
+ * stop or INTERRUPT, it will re-trap into INTERRUPT. If trapped for other
+ * traps, INTERRUPT trap will be eventually taken without returning to
+ * userland after the existing traps are finished by PTRACE_CONT.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void ptrace_trap_notify(struct task_struct *t)
+{
+ WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
+ assert_spin_locked(&t->sighand->siglock);
+
+ /*
+ * @t is being ptraced and new SEIZE behavior is in effect.
+ * Schedule sticky trap which will clear on the next GETSIGINFO.
+ */
+ t->jobctl |= JOBCTL_TRAP_NOTIFY;
+
+ /*
+ * If @t is currently trapped for group stop or INTERRUPT
+ * (JOBCTL_TRAPPED set), it should re-trap with new exit_code
+ * indicating continuation so that the ptracer can notice the
+ * event; otherwise, use normal signal delivery wake up.
+ *
+ * The re-trapping sets JOBCTL_TRAPPING such that the transition is
+ * hidden from the ptracer.
+ *
+ * This means that if @t is trapped for other reasons than group
+ * stop or INTERRUPT, the notification trap won't be delievered
+ * until the current one is complete. This is the intended
+ * behavior.
+ */
+ if (task_is_traced(t) && (t->jobctl & JOBCTL_TRAPPED)) {
+ t->jobctl |= JOBCTL_TRAPPING;
+ signal_wake_up(t, true);
+ } else
+ signal_wake_up(t, false);
+}
+
/*
* Handle magic process-wide effects of stop/continue signals. Unlike
* the signal actions, these happen immediately at signal-generation
@@ -816,7 +864,10 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
do {
task_clear_jobctl_stop_pending(t);
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
- wake_up_state(t, __TASK_STOPPED);
+ if (likely(!(t->ptrace & PT_SEIZED)))
+ wake_up_state(t, __TASK_STOPPED);
+ else
+ ptrace_trap_notify(t);
} while_each_thread(p, t);

/*
@@ -1928,7 +1979,10 @@ static int do_signal_stop(int signr)
if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
t->jobctl |= signr | gstop;
sig->group_stop_count++;
- signal_wake_up(t, 0);
+ if (likely(!(t->ptrace & PT_SEIZED)))
+ signal_wake_up(t, 0);
+ else
+ ptrace_trap_notify(t);
}
}
}
@@ -2093,8 +2147,14 @@ relock:
* Check for ptrace trap conditions. Jobctl traps are used
* to trap ptracee while staying transparent regarding
* signal and job control.
+ *
+ * If group stop is pending, give it priority. INTERRUPT
+ * is used for job control notifications and giving the
+ * actual group stop trap higher priority gives prettier
+ * stream of traps.
*/
- if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
+ if (unlikely(current->jobctl & JOBCTL_TRAP_MASK) &&
+ !(current->jobctl & JOBCTL_STOP_PENDING)) {
ptrace_notify_locked(SIGTRAP |
(PTRACE_EVENT_INTERRUPT << 8));
continue;
--
1.7.1

2011-05-08 21:58:25

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

On Sunday 08 May 2011 17:48, Tejun Heo wrote:
> Currently, there's no way to trap a running ptracee short of sending a
> signal which has various side effects. This patch implements
> PTRACE_INTERRUPT which traps ptracee without any signal or job control
> related side effect.

What are the rules for the userspace? You said about PTRACE_SEIZE:

> After PTRACE_SEIZE, tracee will trap. Which trap will happen isn't
> fixed. If other trap conditions exist (signal delivery or group
> stop), they might be taken; otherwise, a trap with exit_code SIGTRAP |
> (PTRACE_EVENT_INTERRUPT << 8) is taken. The followings are
> guaranteed.
>
> * A trap will happen in finite amount of userland time.
>
> * The trap can be PTRACE_EVENT_INTERRUPT which doesn't have any side
> effect. If a different trap is taken, no INTERRUPT trap is pending.
>
> IOW, no matter what, one trap will happen, which might be INTERRUPT.

Are rules for PTRACE_INTERRUPT the same? That is, what happens
if a different trap is taken?

Can you add API notes in the header, above corresponding defines? -

--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -48,6 +48,7 @@
#define PTRACE_SETREGSET 0x4205

<============================================ ADD COMMENT HERE
#define PTRACE_SEIZE 0x4206
<============================================ ADD COMMENT HERE
+#define PTRACE_INTERRUPT 0x4207


They will be much more visible and up-to-date there than in Documentation,
git logs etc...

And finally, thanks for working on the long-needed evolutionary fixes to ptrace!

--
vda

2011-05-08 22:27:46

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification

On Sunday 08 May 2011 17:48, Tejun Heo wrote:
> Hello,
>
> This patchset implements new ptrace requests SEIZE and INTERRUPT and
> also add group stop notification mechanism for ptracer. Combined,
> this implements "P4. PTRACE_SEIZE" and "P5. ^Z and fg for tracees" of
> the ptrace job control improvements proposal[1].
>
> Please note that there are some deviations from the proposal.

No battle plan survives contact with the enemy :)

> * As suggested by Oleg, PTRACE_SEIZE only serves as ATTACH without
> signal/job control side-effects. After attached, PTRACE_INTERRUPT
> should be used to trap tracee without side effect.

Hmm, in "[PATCH 02/11] ptrace: implement PTRACE_SEIZE" you are saying:

> After PTRACE_SEIZE, tracee will trap.

So which is it? Does PTRACE_SEIZE trap or not?


> * Group stop notification is implemented as sticky INTERRUPT trap
> which gets cleared on PTRACE_GETSIGINFO and notifies both start and
> end of group stops.
> ...
> * The trap condition is sticky until GETSIGINFO. This is necessary
> because generation of the event may race with CONT and ptracer may
> miss the trap.

As a userspace guy, I find this explanation unclear.

What is "sticky" exactly? siginfo.si_pt_flags contents?

What exactly "sticky" means? If PTRACE_GETSIGINFO is not queried
by userspace after it observed ptrace stop, what will happen?

Your example code in
"[PATCH 11/11] ptrace: implement group stop notification for ptracer"
does this:

waitid(P_PID, tracee, NULL, WSTOPPED);

if (!ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si)) {
if (!si.si_code) {
printf("tracer: SIG %d\n", si.si_signo);
ptrace(PTRACE_CONT, tracee, NULL,
(void *)(unsigned long)si.si_signo);
goto repeat;
}
stopped = !!si.si_status;
} else
stopped = 1;

if (stopped != last_stopped)
printf("tracer: stopped=%d\n", stopped);
last_stopped = stopped;

if (!stopped)
ptrace(PTRACE_CONT, tracee, NULL, NULL);

That is, it always queries PTRACE_GETSIGINFO, which means this example
doesn't demonstrate "stickiness".


Real world users of ptrace, such as strace, will likely avoid
PTRACE_GETSIGINFO'ing on every stop - they will examine wait status,
and query PTRACE_GETSIGINFO only if they know they have to -
such as when they see a job stop signal (SIGSTOP/SIGTSTP/etc),
not SIGTRAP.

Is it still possible, or PTRACE_GETSIGINFO will be a mandatory call
after each ptrace stop for any userspace usage which wants to track
job stop status of the tracee?


> Each patch implementing new feature includes test program showing its
> functionality. Notification would probably need a bit more polishing
> but all the needed functionalities are there.

This is great! Thanks!

I don't know the status of ptrace test suite
after I stopped working on strace (did the suite bit rot, or is it
maintained and still relevant?). If it is still in use,
I can adapt these tests and add them to it.

--
vda

2011-05-08 22:43:03

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

On Sunday 08 May 2011 17:49, Tejun Heo wrote:
> Currently there's no way for ptracer to find out whether group stop
> that tracee was in finished other than polling with PTRACE_GETSIGINFO.
> Also, tracer can't detect new group stop started by an untraced thread
> if tracee is already trapped. This patch implements group stop
> notification for ptracer using INTERRUPT traps.
>
> When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
> is set, which triggers INTERRUPT trap but is sticky until the next
> PTRACE_GETSIGINFO. As GETSIGINFO exports the current group stop
> state, this guarantees that tracer checks the current group stop state
> at least once after group stop state change. Stickiness is necessary
> because notification trap may race with PTRACE_CONT for other traps
> and get lost.

To clarify on "sticky":

Does this mean that if tracer userspace won't query PTRACE_GETSIGINFO,
but will simply do:

waitid(P_PID, tracee, NULL, WSTOPPED);
ptrace(PTRACE_CONT, tracee, NULL, NULL);

then kernel will generate INTERRUPT trap again and again
(without letting tracee run) if job stop state of the tracee has changed?


--
vda

2011-05-09 09:48:46

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification

Hello,

On Mon, May 09, 2011 at 12:27:41AM +0200, Denys Vlasenko wrote:
> > After PTRACE_SEIZE, tracee will trap.
>
> So which is it? Does PTRACE_SEIZE trap or not?

PTRACE_SEIZE attaches to a new tracee and traps it. INTERRUPT traps
already attached tracee.

> > * Group stop notification is implemented as sticky INTERRUPT trap
> > which gets cleared on PTRACE_GETSIGINFO and notifies both start and
> > end of group stops.
> > ...
> > * The trap condition is sticky until GETSIGINFO. This is necessary
> > because generation of the event may race with CONT and ptracer may
> > miss the trap.
>
> As a userspace guy, I find this explanation unclear.
>
> What is "sticky" exactly? siginfo.si_pt_flags contents?
>
> What exactly "sticky" means? If PTRACE_GETSIGINFO is not queried
> by userspace after it observed ptrace stop, what will happen?

It means that INTERRUPT trap will happen repeatedly until the
condition is cleared by PTRACE_GETSIGINFO.

> Real world users of ptrace, such as strace, will likely avoid
> PTRACE_GETSIGINFO'ing on every stop - they will examine wait status,
> and query PTRACE_GETSIGINFO only if they know they have to -
> such as when they see a job stop signal (SIGSTOP/SIGTSTP/etc),
> not SIGTRAP.
>
> Is it still possible, or PTRACE_GETSIGINFO will be a mandatory call
> after each ptrace stop for any userspace usage which wants to track
> job stop status of the tracee?

You can tell INTERRUPT trap from the exit_code stored in si_status
(si_code CLD_TRAPPED and si_status SIGTRAP | (PTRACE_EVENT_INTERRUPT
<< 8)), so you can call GETSIGINFO only after INTERRUPT traps, which
should be enough for job control notifications, but you would also
have to do GETSIGINFO to discern signal delivery and actual group stop
traps (si_code zero and si_status is the signo for both).

> I don't know the status of ptrace test suite after I stopped working
> on strace (did the suite bit rot, or is it maintained and still
> relevant?). If it is still in use, I can adapt these tests and add
> them to it.

I have no idea but am planning write more complete example program.
The ones included in the commit messages are mostly just to show that
the basic functionality implemented by the commit works, so they
aren't necessarily good examples.

Thank you.

--
tejun

2011-05-09 10:09:37

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hello,

On Sun, May 08, 2011 at 11:58:20PM +0200, Denys Vlasenko wrote:
> Are rules for PTRACE_INTERRUPT the same? That is, what happens
> if a different trap is taken?

Oh, PTRACE_INTERRUPT is different. If it's scheduled, it will
eventually happen. Other traps may happen before but INTERRUPT trap
will take place before control is returned to userland.

> Can you add API notes in the header, above corresponding defines? -
>
> --- a/include/linux/ptrace.h
> +++ b/include/linux/ptrace.h
> @@ -48,6 +48,7 @@
> #define PTRACE_SETREGSET 0x4205
>
> <============================================ ADD COMMENT HERE
> #define PTRACE_SEIZE 0x4206
> <============================================ ADD COMMENT HERE
> +#define PTRACE_INTERRUPT 0x4207
>
>
> They will be much more visible and up-to-date there than in Documentation,
> git logs etc...

Hmmm... I was thinking about writing a proper ptrace API doc with
example programs under Documentation/. It's userland visible API so
it shouldn't change all that much and the amount of necessary
documentation would be too much for comments.

Thank you.

--
tejun

2011-05-09 10:11:01

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

Hello,

On Mon, May 09, 2011 at 12:42:55AM +0200, Denys Vlasenko wrote:
> To clarify on "sticky":
>
> Does this mean that if tracer userspace won't query PTRACE_GETSIGINFO,
> but will simply do:
>
> waitid(P_PID, tracee, NULL, WSTOPPED);
> ptrace(PTRACE_CONT, tracee, NULL, NULL);
>
> then kernel will generate INTERRUPT trap again and again
> (without letting tracee run) if job stop state of the tracee has changed?

Correct, it will be in infinite trap loop.

Thanks.

--
tejun

2011-05-09 10:56:06

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

On Mon, May 9, 2011 at 12:09 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Sun, May 08, 2011 at 11:58:20PM +0200, Denys Vlasenko wrote:
>> Are rules for PTRACE_INTERRUPT the same? That is, what happens
>> if a different trap is taken?
>
> Oh, PTRACE_INTERRUPT is different. ?If it's scheduled, it will
> eventually happen. ?Other traps may happen before but INTERRUPT trap
> will take place before control is returned to userland.
>
>> Can you add API notes in the header, above corresponding defines? -
>>
>> --- a/include/linux/ptrace.h
>> +++ b/include/linux/ptrace.h
>> @@ -48,6 +48,7 @@
>> ?#define PTRACE_SETREGSET ? ? ? 0x4205
>>
>> <============================================ ADD COMMENT HERE
>> ?#define PTRACE_SEIZE ? ? ? ? ? 0x4206
>> <============================================ ADD COMMENT HERE
>> +#define PTRACE_INTERRUPT ? ? ? 0x4207
>>
>>
>> They will be much more visible and up-to-date there than in Documentation,
>> git logs etc...
>
> Hmmm... I was thinking about writing a proper ptrace API doc with
> example programs under Documentation/. ?It's userland visible API so
> it shouldn't change all that much

My motivation is mostly to prevent or make it less likely for someone
who will hack on this code in the future to misunderstand semantics
of these ops. A hacker changing PTRACE_INTERRUPT-related code
is much more likely to notice a comment above PTRACE_INTERRUPT
definition than a file buried in Documentation/.

> and the amount of necessary
> documentation would be too much for comments.

Yes, if you'd try to put there a comprehensive doc.
The comment I envision is short, yet captures the most important
traits of the operation. For example:

/* After PTRACE_SEIZE, tracee will trap - once. It may be a signal delivery
* or group stop, if one is already pending; otherwise,
* a SIGTRAP | (PTRACE_EVENT_INTERRUPT << 8) trap is taken.
* A trap will happen in finite amount of userland time.
*/
#define PTRACE_SEIZE 0x4206

--
vda

2011-05-09 16:20:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE

On 05/08, Tejun Heo wrote:
>
> After PTRACE_SEIZE, tracee will trap. Which trap will happen isn't
> fixed. If other trap conditions exist (signal delivery or group
> stop), they might be taken; otherwise, a trap with exit_code SIGTRAP |
> (PTRACE_EVENT_INTERRUPT << 8) is taken.
> guaranteed.

Personally, I think the new behaviour is fine. But, as usual, I'd like
to know what Jan/Denys think.


As for the implementation,

> -static int ptrace_attach(struct task_struct *task)
> +static int ptrace_attach(struct task_struct *task, long request,
> + unsigned long flags)
> {
> + bool seize = request == PTRACE_SEIZE;

Cough. I really hate the cosmetic nits but can't resist...

bool seize = (request == PTRACE_SEIZE);

looks more parseable, but feel free to ignore.

> @@ -247,6 +272,14 @@ static int ptrace_attach(struct task_struct *task)
> if (task_is_stopped(task)) {
> task->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;
> signal_wake_up(task, 1);
> + } else if (seize) {
> + /*
> + * Otherwise, SEIZE uses jobctl trap to put tracee into
> + * TASK_TRACED, which doesn't have the nasty side effects
> + * of sending SIGSTOP.
> + */
> + task->jobctl |= JOBCTL_TRAP_SEIZE;
> + signal_wake_up(task, 0);

OK... I am a bit worried we can set JOBCTL_TRAP_SEIZE even if the tracee
was already killed, and if it is killed later JOBCTL_TRAP_SEIZE won't be
cleared. Probably this is fine, ptrace_stop()->schedule() won't sleep in
this case.

Hmm. but see below.

> @@ -1752,12 +1752,13 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
> set_current_state(TASK_TRACED);
>
> /*
> - * We're committing to trapping. Clearing JOBCTL_TRAPPING and
> - * transition to TASK_TRACED should be atomic with respect to
> - * siglock. This should be done after the arch hook as siglock is
> - * released and regrabbed across it.
> + * We're committing to trapping. Adjust ->jobctl. Updates to
> + * these flags and transition to TASK_TRACED should be atomic with
> + * respect to siglock. This should be done after the arch hook as
> + * siglock may be released and regrabbed across it.
> */
> task_clear_jobctl_trapping(current);
> + current->jobctl &= ~JOBCTL_TRAP_SEIZE;

Yes. But, it seems, this is too late.

Suppose that the JOBCTL_TRAP_SEIZE tracee was SIGKILLED before it reports
PTRACE_EVENT_INTERRUPT. Now, if arch_ptrace_stop_needed() == T, ptrace_stop()
returns without clearing JOBCTL_TRAP_SEIZE/TIF_SIGPENDING. This means
get_signal_to_deliver() will loop forever.

I never understood why ptrace_stop()->sigkill_pending() logic, I think
we should check fatal_signal_pending() unconditionally. Oh, and we have
other subtle issues here.

> for (;;) {
> struct k_sigaction *ka;
> +
> + /*
> + * Check for ptrace trap conditions. Jobctl traps are used
> + * to trap ptracee while staying transparent regarding
> + * signal and job control.
> + */
> + if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
> + ptrace_notify_locked(SIGTRAP |
> + (PTRACE_EVENT_INTERRUPT << 8));
> + continue;

Shouldn't we recheck SIGNAL_CLD_MASK after ptrace_notify_locked() returns?
Probably not, but I am not sure...

In any case. This doesn't really matter, but can't we check JOBCTL_TRAP_MASK
outside of the main loop? Unless we drop ->siglock this bit can't be changed,
and every time we drop ->siglock we go to "relock".

Oleg.

2011-05-09 17:00:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

On 05/08, Tejun Heo wrote:
>
> Currently, there's no way to trap a running ptracee short of sending a
> signal which has various side effects. This patch implements
> PTRACE_INTERRUPT which traps ptracee without any signal or job control
> related side effect.
>
> The implementation is almost trivial. It uses the same trap site and
> event as PTRACE_SEIZE. A new trap flag JOBCTL_TRAP_INTERRUPT is
> added, which is set on PTRACE_INTERRUPT and cleared when tracee
> commits to INTERRUPT trap. As INTERRUPT should be useable regardless
> of the current state of tracee, task_is_traced() test in
> ptrace_check_attach() is skipped for INTERRUPT.

Heh. As usual, I can never review the patches in time. Will continue
tomorrow.

Right now I am a bit puzzled why do we have 2 bits, JOBCTL_TRAP_INTERRUPT
and JOBCTL_TRAP_SEIZE... But I didn't read this + other patches yet.

At first glance, JOBCTL_TRAP_INTERRUPT has the same problem with the
killed tracee. I think this is easy to fix.

Oleg.

2011-05-10 09:46:57

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE

Hey, Oleg.

On Mon, May 09, 2011 at 06:18:38PM +0200, Oleg Nesterov wrote:
> On 05/08, Tejun Heo wrote:
> > -static int ptrace_attach(struct task_struct *task)
> > +static int ptrace_attach(struct task_struct *task, long request,
> > + unsigned long flags)
> > {
> > + bool seize = request == PTRACE_SEIZE;
>
> Cough. I really hate the cosmetic nits but can't resist...
>
> bool seize = (request == PTRACE_SEIZE);
>
> looks more parseable, but feel free to ignore.

Heh yeah, I usually try to avoid unnecessary parantheses mostly
because that seems the easiest way to reach at least some level of
style uniformity. The minimum is well defined idndependently of
anyone's taste and can serve as a meaningful convergence point.

That said, it's true that there are times when things just look better
with extra parantheses. If the above bothers you, I'll update it. :-)

> > @@ -247,6 +272,14 @@ static int ptrace_attach(struct task_struct *task)
> > if (task_is_stopped(task)) {
> > task->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;
> > signal_wake_up(task, 1);
> > + } else if (seize) {
> > + /*
> > + * Otherwise, SEIZE uses jobctl trap to put tracee into
> > + * TASK_TRACED, which doesn't have the nasty side effects
> > + * of sending SIGSTOP.
> > + */
> > + task->jobctl |= JOBCTL_TRAP_SEIZE;
> > + signal_wake_up(task, 0);
>
> OK... I am a bit worried we can set JOBCTL_TRAP_SEIZE even if the tracee
> was already killed, and if it is killed later JOBCTL_TRAP_SEIZE won't be
> cleared. Probably this is fine, ptrace_stop()->schedule() won't sleep in
> this case.

Hmmm... if killed, the tracee would go through __ptrace_unlink() which
always clears JOBCTL_TRAP_MASK which includes JOBCTL_TRAP_SEIZE. What
am I missing?

> Hmm. but see below.
>
> > @@ -1752,12 +1752,13 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
> > set_current_state(TASK_TRACED);
> >
> > /*
> > - * We're committing to trapping. Clearing JOBCTL_TRAPPING and
> > - * transition to TASK_TRACED should be atomic with respect to
> > - * siglock. This should be done after the arch hook as siglock is
> > - * released and regrabbed across it.
> > + * We're committing to trapping. Adjust ->jobctl. Updates to
> > + * these flags and transition to TASK_TRACED should be atomic with
> > + * respect to siglock. This should be done after the arch hook as
> > + * siglock may be released and regrabbed across it.
> > */
> > task_clear_jobctl_trapping(current);
> > + current->jobctl &= ~JOBCTL_TRAP_SEIZE;
>
> Yes. But, it seems, this is too late.
>
> Suppose that the JOBCTL_TRAP_SEIZE tracee was SIGKILLED before it reports
> PTRACE_EVENT_INTERRUPT. Now, if arch_ptrace_stop_needed() == T, ptrace_stop()
> returns without clearing JOBCTL_TRAP_SEIZE/TIF_SIGPENDING. This means
> get_signal_to_deliver() will loop forever.

Argh... right it has an early exit path there.

> I never understood why ptrace_stop()->sigkill_pending() logic, I think
> we should check fatal_signal_pending() unconditionally.

Probably the best way to do it is adding fatal_signal_pending() into
may_ptrace_stop() so that the failure path shares most of the code
path instead of doing quick dirty exit? It's just nasty to have early
exit above there and walking through the normal code path wouldn't
hurt SIGKILL either.

> Oh, and we have other subtle issues here.
>
> > for (;;) {
> > struct k_sigaction *ka;
> > +
> > + /*
> > + * Check for ptrace trap conditions. Jobctl traps are used
> > + * to trap ptracee while staying transparent regarding
> > + * signal and job control.
> > + */
> > + if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
> > + ptrace_notify_locked(SIGTRAP |
> > + (PTRACE_EVENT_INTERRUPT << 8));
> > + continue;
>
> Shouldn't we recheck SIGNAL_CLD_MASK after ptrace_notify_locked() returns?
> Probably not, but I am not sure...

Yeah, I thought about that one too. I don't think it really matters
one way or the other. I mostly just wanted to avoid an extra
spin_unlock_irq() after ptrace_notify_locked().

> In any case. This doesn't really matter, but can't we check
> JOBCTL_TRAP_MASK outside of the main loop? Unless we drop ->siglock
> this bit can't be changed, and every time we drop ->siglock we go to
> "relock".

Yeah, sure. I basically just wanted to say "continue" there. :-) I'll
take it out of the loop and make it unlock and jump to relock.

Thank you.

--
tejun

2011-05-10 09:50:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hello,

On Mon, May 09, 2011 at 06:58:57PM +0200, Oleg Nesterov wrote:
> Right now I am a bit puzzled why do we have 2 bits, JOBCTL_TRAP_INTERRUPT
> and JOBCTL_TRAP_SEIZE... But I didn't read this + other patches yet.

It eventually ends up with three trap flags - SEIZE, INTERRUPT and
NOTIFY. They all use PTRACE_EVENT_INTERRUPT trap but are different as
for when they're cleared. SEIZE is cleared after any trap. INTERRUPT
is cleared after an INTERRUPT trap and NOTIFY is cleared after
GETSIGINFO. We can add different pending flags and adjust INTERRUPT
flag according to different pending conditions but I think it's
cleaner to have multiple trap flags than multiplexing things over
single trap flag.

> At first glance, JOBCTL_TRAP_INTERRUPT has the same problem with the
> killed tracee. I think this is easy to fix.

Again, isn't this cleared during __ptrace_unlink()?

Thanks.

--
tejun

2011-05-10 13:21:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE

On 05/10, Tejun Heo wrote:
>
> On Mon, May 09, 2011 at 06:18:38PM +0200, Oleg Nesterov wrote:
>
> > > @@ -247,6 +272,14 @@ static int ptrace_attach(struct task_struct *task)
> > > if (task_is_stopped(task)) {
> > > task->jobctl |= JOBCTL_STOP_PENDING | JOBCTL_TRAPPING;
> > > signal_wake_up(task, 1);
> > > + } else if (seize) {
> > > + /*
> > > + * Otherwise, SEIZE uses jobctl trap to put tracee into
> > > + * TASK_TRACED, which doesn't have the nasty side effects
> > > + * of sending SIGSTOP.
> > > + */
> > > + task->jobctl |= JOBCTL_TRAP_SEIZE;
> > > + signal_wake_up(task, 0);
> >
> > OK... I am a bit worried we can set JOBCTL_TRAP_SEIZE even if the tracee
> > was already killed, and if it is killed later JOBCTL_TRAP_SEIZE won't be
> > cleared. Probably this is fine, ptrace_stop()->schedule() won't sleep in
> > this case.
>
> Hmmm... if killed, the tracee would go through __ptrace_unlink() which
> always clears JOBCTL_TRAP_MASK which includes JOBCTL_TRAP_SEIZE. What
> am I missing?

No. If killed, the tracee becomes a zombie (but see below). __ptrace_unlink()
won't be called until the tracee does wait().

> > > @@ -1752,12 +1752,13 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
> > > set_current_state(TASK_TRACED);
> > >
> > > /*
> > > - * We're committing to trapping. Clearing JOBCTL_TRAPPING and
> > > - * transition to TASK_TRACED should be atomic with respect to
> > > - * siglock. This should be done after the arch hook as siglock is
> > > - * released and regrabbed across it.
> > > + * We're committing to trapping. Adjust ->jobctl. Updates to
> > > + * these flags and transition to TASK_TRACED should be atomic with
> > > + * respect to siglock. This should be done after the arch hook as
> > > + * siglock may be released and regrabbed across it.
> > > */
> > > task_clear_jobctl_trapping(current);
> > > + current->jobctl &= ~JOBCTL_TRAP_SEIZE;
> >
> > Yes. But, it seems, this is too late.
> >
> > Suppose that the JOBCTL_TRAP_SEIZE tracee was SIGKILLED before it reports
> > PTRACE_EVENT_INTERRUPT. Now, if arch_ptrace_stop_needed() == T, ptrace_stop()
> > returns without clearing JOBCTL_TRAP_SEIZE/TIF_SIGPENDING. This means
> > get_signal_to_deliver() will loop forever.
>
> Argh... right it has an early exit path there.

Yes, and thus the tracee will loop forever until the tracer detaches, so
it can't even become a zombie. And the tracer can't detach via ptrace().

> > I never understood why ptrace_stop()->sigkill_pending() logic, I think
> > we should check fatal_signal_pending() unconditionally.
>
> Probably the best way to do it is adding fatal_signal_pending() into
> may_ptrace_stop() so that the failure path shares most of the code
> path instead of doing quick dirty exit? It's just nasty to have early
> exit above there and walking through the normal code path wouldn't
> hurt SIGKILL either.

Yes, probably. But, apart from the user-visible change which needs another
discussion, there are more problems. fatal_signal_pending() can "wrongly"
return false if the caller is tracehook_report_exit(). OTOH, it can correctly
return true (exec), but we probably want to stop. Let's discuss this later,
I think this has nothing to do with your patches. The main problem is not
fatal_signal_pending(), we need the better definition of explicit SIGKILL
behaviour. Not for ptrace only, there are other issues.

Oleg.

2011-05-10 13:47:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE

Hey, Oleg.

On Tue, May 10, 2011 at 03:20:01PM +0200, Oleg Nesterov wrote:
> > Hmmm... if killed, the tracee would go through __ptrace_unlink() which
> > always clears JOBCTL_TRAP_MASK which includes JOBCTL_TRAP_SEIZE. What
> > am I missing?
>
> No. If killed, the tracee becomes a zombie (but see below). __ptrace_unlink()
> won't be called until the tracee does wait().

Ah, right.

> Yes, and thus the tracee will loop forever until the tracer detaches, so
> it can't even become a zombie. And the tracer can't detach via ptrace().

I think this basically is the same problem as the using TRAPPING on
PTRACE_DETACH one. Do you recall the original version where TRAPPING
was cleared on kill via clear_pending instead of at the end of
do_signal_stop()? I think that's what we should do. We should be
clearing these conditions when the actions which should override these
conditions happen instead of trying to put them in the tracee
expecting control to be at certain places and to me
task_clear_jobctl_stop_pending() seems to be the best place to do
that. We might need to rename it and maybe add a parameter tho.

How does it sound to you? It would solve the TRAPPING issue too.

> > > I never understood why ptrace_stop()->sigkill_pending() logic, I think
> > > we should check fatal_signal_pending() unconditionally.
> >
> > Probably the best way to do it is adding fatal_signal_pending() into
> > may_ptrace_stop() so that the failure path shares most of the code
> > path instead of doing quick dirty exit? It's just nasty to have early
> > exit above there and walking through the normal code path wouldn't
> > hurt SIGKILL either.
>
> Yes, probably. But, apart from the user-visible change which needs another
> discussion, there are more problems. fatal_signal_pending() can "wrongly"
> return false if the caller is tracehook_report_exit(). OTOH, it can correctly
> return true (exec), but we probably want to stop. Let's discuss this later,
> I think this has nothing to do with your patches. The main problem is not
> fatal_signal_pending(), we need the better definition of explicit SIGKILL
> behaviour. Not for ptrace only, there are other issues.

Alright, let's put it aside for now.

Thank you.

--
tejun

2011-05-10 14:07:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hi,

On 05/10, Tejun Heo wrote:
>
> Hello,
>
> On Mon, May 09, 2011 at 06:58:57PM +0200, Oleg Nesterov wrote:
> > Right now I am a bit puzzled why do we have 2 bits, JOBCTL_TRAP_INTERRUPT
> > and JOBCTL_TRAP_SEIZE... But I didn't read this + other patches yet.
>
> It eventually ends up with three trap flags - SEIZE, INTERRUPT and
> NOTIFY. They all use PTRACE_EVENT_INTERRUPT trap but are different as
> for when they're cleared. SEIZE is cleared after any trap. INTERRUPT
> is cleared after an INTERRUPT trap

Yes, I see, but still can't understand. OK, please ignore, let me read
other patches first.

Well, in fact I seem to understand why do we have 2 bits. Unless I misread
the code, PTRACE_INTERRUPT is sticky. It can't be lost even if the tracee
can report another event before. Contrary, SEIZE is cleared if the tracee
reports something else right after attach, but they both report the same
PTRACE_EVENT_INTERRUPT. So, if the user does

ptrace(PTRACE_SEIZE);
ptrace(PTRACE_INTERRUPT);

we need 2 bits to ensure PTRACE_EVENT_INTERRUPT will be reported in any
case.

And, you know, I am not sure this is very clear. What if we change the
rules so that PTRACE_SEIZE always reports PTRACE_EVENT_INTERRUPT like
PTRACE_INTERRUPT? This looks more symmetrical and simple to me. IOW,
ptrace(PTRACE_SEIZE) simply implies the implicit PTRACE_INTERRUPT.

Oh. And this reminds me the previous discussion. Should PTRACE_SEIZE
stop the tracee? Perhaps it should only attach or do PTRACE_INTERRUPT
depending on flags? Personally I think it should stop...

To clarify, personally I do not know. Jan, Denys, all, please comment.
If PTRACE_SEIZE doesn't stop the tracee, then we should probably pass
more options.



Either way, these changes do not handle the auto-attach case correctly.
tracehook_report_clone() shouldn't send SIGSTOP unconditionally, we
should check PT_SEIZED. And perhaps we can do auto-attach better, but
right now this is off-topic.


> > At first glance, JOBCTL_TRAP_INTERRUPT has the same problem with the
> > killed tracee. I think this is easy to fix.
>
> Again, isn't this cleared during __ptrace_unlink()?

Please see the previous email.

Also,

> @@ -693,6 +693,23 @@ int ptrace_request(struct task_struct *child, long request,
> ret = ptrace_setsiginfo(child, &siginfo);
> break;
>
> + case PTRACE_INTERRUPT:
> + if (!likely(child->ptrace & PT_SEIZED))
> + break;
> + /*
> + * Stop tracee without any side-effect on signal or job
> + * control. If @child is already trapped, the current trap
> + * is not disturbed and INTERRUPT trap will happen after
> + * the current trap is ended with PTRACE_CONT. Note that
> + * other traps may happen before the scheduled INTERRUPT.
> + */
> + spin_lock(&child->sighand->siglock);
> + child->jobctl |= JOBCTL_TRAP_INTERRUPT;
> + signal_wake_up(child, 0);
> + spin_unlock(&child->sighand->siglock);
> + ret = 0;

spin_lock() is not safe. we need _irq, and ->sighand can be NULL if our
sub-thread reaps the dead tracee. IOW, this needs lock_task_sighand().

Well. Perhaps PTRACE_INTERRUPT should return -EALREADY or something if
JOBCTL_TRAP_INTERRUPT is already set? Again, again, it is not that I think
this is really useful. But since we are going to add the new API it is
better to discuss every detail.

Oleg.

2011-05-10 14:20:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hey,

On Tue, May 10, 2011 at 04:06:20PM +0200, Oleg Nesterov wrote:
> > It eventually ends up with three trap flags - SEIZE, INTERRUPT and
> > NOTIFY. They all use PTRACE_EVENT_INTERRUPT trap but are different as
> > for when they're cleared. SEIZE is cleared after any trap. INTERRUPT
> > is cleared after an INTERRUPT trap
>
> Yes, I see, but still can't understand. OK, please ignore, let me read
> other patches first.

Heh, my communication skill sucks.

> Well, in fact I seem to understand why do we have 2 bits. Unless I misread
> the code, PTRACE_INTERRUPT is sticky. It can't be lost even if the tracee
> can report another event before. Contrary, SEIZE is cleared if the tracee
> reports something else right after attach, but they both report the same
> PTRACE_EVENT_INTERRUPT. So, if the user does
>
> ptrace(PTRACE_SEIZE);
> ptrace(PTRACE_INTERRUPT);
>
> we need 2 bits to ensure PTRACE_EVENT_INTERRUPT will be reported in any
> case.

But you've figured it out anyway. :-)

I used the term 'sticky' for PTRACE_NOTIFY which may cause more than
on trap, so using sticky for INTERRUPT could be a bit confusing. To
sum up,

SEIZE : schedules INTERRUPT trap, cleared on any trap

INTERRUPT : schedules INTERRUPT trap, cleared on INTERRUPT trap

NOTIFY : schedules INTERRUPT trap, cleared on GETSIGINFO,
may cause multiple INTERRUPT traps

> And, you know, I am not sure this is very clear. What if we change the
> rules so that PTRACE_SEIZE always reports PTRACE_EVENT_INTERRUPT like
> PTRACE_INTERRUPT? This looks more symmetrical and simple to me. IOW,
> ptrace(PTRACE_SEIZE) simply implies the implicit PTRACE_INTERRUPT.

Actually, that was my initial implementation but as we don't guarantee
that the first trap would be an INTERRUPT one, so it seemed nicer to
avoid double trapping after SEIZE. Maybe we can make sure that
INTERRUPT is always the trap to be taken but it would be a tad bit
more complex than the current implementation. I think it's doable
although there are some corner cases, I think. Would that be better?

> Oh. And this reminds me the previous discussion. Should PTRACE_SEIZE
> stop the tracee? Perhaps it should only attach or do PTRACE_INTERRUPT
> depending on flags? Personally I think it should stop...
>
> To clarify, personally I do not know. Jan, Denys, all, please comment.
> If PTRACE_SEIZE doesn't stop the tracee, then we should probably pass
> more options.

I'm rather strongly against not trapping and somebody would have to
give me _very_ convincing rationales to change my mind on the subject.

> Either way, these changes do not handle the auto-attach case correctly.
> tracehook_report_clone() shouldn't send SIGSTOP unconditionally, we
> should check PT_SEIZED. And perhaps we can do auto-attach better, but
> right now this is off-topic.

Oh yeah, that's something I meant to ask you but forgot. The SEIZE
behaviors are not complete after this patchset and thus there's no
patch to remove the DEVEL flag yet. ISTR you telling me there are two
or three conditions where the reporting/stopping isn't transparent in
a previous mail, which I can't find anymore. One apparently is clone.
What were the others?

> > + case PTRACE_INTERRUPT:
> > + if (!likely(child->ptrace & PT_SEIZED))
> > + break;
> > + /*
> > + * Stop tracee without any side-effect on signal or job
> > + * control. If @child is already trapped, the current trap
> > + * is not disturbed and INTERRUPT trap will happen after
> > + * the current trap is ended with PTRACE_CONT. Note that
> > + * other traps may happen before the scheduled INTERRUPT.
> > + */
> > + spin_lock(&child->sighand->siglock);
> > + child->jobctl |= JOBCTL_TRAP_INTERRUPT;
> > + signal_wake_up(child, 0);
> > + spin_unlock(&child->sighand->siglock);
> > + ret = 0;
>
> spin_lock() is not safe. we need _irq, and ->sighand can be NULL if our
> sub-thread reaps the dead tracee. IOW, this needs lock_task_sighand().

Definitely. Thanks for spotting that.

> Well. Perhaps PTRACE_INTERRUPT should return -EALREADY or something if
> JOBCTL_TRAP_INTERRUPT is already set? Again, again, it is not that I think
> this is really useful. But since we are going to add the new API it is
> better to discuss every detail.

At first I made PTRACE_INTERRUPT noop if it was already in INTERRUPT
trap but then changed my mind because guaranteeing that there will be
at least one PTRACE_EVENT_INTERRUPT trap after PTRACE_INTERRUPT seems
better than having subtle behavior difference which might be nicer in
limited use cases.

Thank you.

--
tejun

2011-05-10 16:58:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 06/11] ptrace: make group stop state visible via PTRACE_GETSIGINFO

On 05/08, Tejun Heo wrote:
>
> Ptracer can detect tracee entering group stop by watching for the
> group stop trap; however, there is no reliable way to find out when
> group stop ends - SIGCONT may be processed by another thread and
> signal delivery is blocked while tracee is trapped.

Confused.

> This patch adds siginfo.si_pt_flags and uses PTRACE_SI_STOPPED flag to
> indicate whether group stop is in effect or not. While tracee is
> trapped for anything other than signal delivery and group stop itself,
> tracer can use PTRACE_GETSIGINFO to access this information. Note
> that it's only available if tracee was seized.

IOW, if the tracee reports via ptrace_notify*, the tracee can look at
si_pt_flags == stop-in-effect. If the tracer reports a signal, the
tracer obviously lacks this info, hmm.

Probably I need more time to get used to this... But at first glance
this looks a bit unnatural. Say, can't we simply implement
PTRACE_GET_GROUP_STOP_STATUS request which returns this (and probably
more) info?

> Later patches will deal with
> notification and trap transition.

OK, probably I'll understand the intent later.

> __SI_TRAP is defined to implement copying of
> the new field to userland.

Heh. I am shy to admit, I didn't know copy_siginfo_to_user() trims
si_code, that is why your change is correct but I spent a lot of time
before I was able to understand this.

> int main(int argc, char **argv)
> {
> pid_t tracee, tracer;
> int i;
>
> tracee = fork();
> if (!tracee)
> while (1)
> nanosleep(&ts1s, NULL);
>
> tracer = fork();
> if (!tracer) {
> int last_stopped = 0, stopped;
> siginfo_t si;
>
> ptrace(PTRACE_SEIZE, tracee, NULL,
> (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
> repeat:
> waitid(P_PID, tracee, NULL, WSTOPPED);
>
> if (!ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si)) {
> if (si.si_code) {
> stopped = !!si.si_status;

In this case this "si_code != 0" check is correct, but how can the
tracer detect this case in general?

> @@ -540,6 +542,17 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
>
> error = 0;
> *info = *child->last_siginfo;
> +
> + /*
> + * If reporting ptrace trap for a seized tracee, enable reporting
> + * of info->si_pt_flags.
> + */
> + if ((child->ptrace & PT_SEIZED) &&
> + (info->si_code & (0x7f | ~0xffff)) == (__SI_TRAP | SIGTRAP)) {

Can't we simply check (from->si_code & __SI_MASK) == __SI_TRAP ?

> + /* report whether group stop is in effect w/ SI_STOPPED */
> + if (sig->group_stop_count || (sig->flags & SIGNAL_STOP_STOPPED))

We have more and more "group_stop_count || SIGNAL_STOP_STOPPED" checks,
perhaps we should make a helper. Or at least invent the short name to
denote the group-stopped-or-in-progress to simplify the discussions ;)


Still, this is strange. With this change ptrace_getsiginfo() reports
the extra "volatile" info which wasn't reported by the tracee itself.
If the tracer does PTRACE_SETSIGINFO twice in a row, it can see the
different si_pt_flags's.

Oleg.

2011-05-10 17:13:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 06/11] ptrace: make group stop state visible via PTRACE_GETSIGINFO

On 05/10, Oleg Nesterov wrote:
>
> > @@ -540,6 +542,17 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
> >
> > error = 0;
> > *info = *child->last_siginfo;
> > +
> > + /*
> > + * If reporting ptrace trap for a seized tracee, enable reporting
> > + * of info->si_pt_flags.
> > + */
> > + if ((child->ptrace & PT_SEIZED) &&
> > + (info->si_code & (0x7f | ~0xffff)) == (__SI_TRAP | SIGTRAP)) {
>
> Can't we simply check (from->si_code & __SI_MASK) == __SI_TRAP ?
>
> > + /* report whether group stop is in effect w/ SI_STOPPED */
> > + if (sig->group_stop_count || (sig->flags & SIGNAL_STOP_STOPPED))
>
> We have more and more "group_stop_count || SIGNAL_STOP_STOPPED" checks,
> perhaps we should make a helper. Or at least invent the short name to
> denote the group-stopped-or-in-progress to simplify the discussions ;)
>
>
> Still, this is strange. With this change ptrace_getsiginfo() reports
> the extra "volatile" info which wasn't reported by the tracee itself.
> If the tracer does PTRACE_SETSIGINFO twice in a row, it can see the
^^^^^^^^^^^^^^^^^
PTRACE_GETSIGINFO
> different si_pt_flags's.

Forgot to mention... Probably we can ignore this but the tracer can
set/clear __SI_TRAP via PTRACE_SETSIGINFO, after that PTRACE_GETSIGINFO
changes its behaviour.

Oleg.

2011-05-10 18:09:52

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

On 05/10, Tejun Heo wrote:
>
> > And, you know, I am not sure this is very clear. What if we change the
> > rules so that PTRACE_SEIZE always reports PTRACE_EVENT_INTERRUPT like
> > PTRACE_INTERRUPT? This looks more symmetrical and simple to me. IOW,
> > ptrace(PTRACE_SEIZE) simply implies the implicit PTRACE_INTERRUPT.
>
> Actually, that was my initial implementation but as we don't guarantee
> that the first trap would be an INTERRUPT one,

Yes, and ptrace(PTRACE_INTERRUPT) doesn't guarantee this too,

> so it seemed nicer to
> avoid double trapping after SEIZE.

Why? We can simply say that ptrace(PTRACE_SEIZE) == attach + INTERRUPT,
the rules are the same.

Hmm. Suddenly I got lost. Perhaps instead JOBCTL_TRAP_INTERRUPT should
be cleared on any trap too, like SEIZE.

Again, agaian, I don't have the strong opinion, I am asking.

> Maybe we can make sure that
> INTERRUPT is always the trap to be taken

I don't know. Personally, I do not think this will make the API much
better.

> but it would be a tad bit
> more complex than the current implementation.

Yes.

> > Oh. And this reminds me the previous discussion. Should PTRACE_SEIZE
> > stop the tracee? Perhaps it should only attach or do PTRACE_INTERRUPT
> > depending on flags? Personally I think it should stop...
> >
> > To clarify, personally I do not know. Jan, Denys, all, please comment.
> > If PTRACE_SEIZE doesn't stop the tracee, then we should probably pass
> > more options.
>
> I'm rather strongly against not trapping and somebody would have to
> give me _very_ convincing rationales to change my mind on the subject.

Yes, I agree.

> > Either way, these changes do not handle the auto-attach case correctly.
> > tracehook_report_clone() shouldn't send SIGSTOP unconditionally, we
> > should check PT_SEIZED. And perhaps we can do auto-attach better, but
> > right now this is off-topic.
>
> Oh yeah, that's something I meant to ask you but forgot. The SEIZE
> behaviors are not complete after this patchset and thus there's no
> patch to remove the DEVEL flag yet. ISTR you telling me there are two
> or three conditions where the reporting/stopping isn't transparent in
> a previous mail, which I can't find anymore.

Hmm. No, I can't recall anything related...

> One apparently is clone.
> What were the others?

clone() is special because (I think) auto-attach should obviously
inherit PT_SEIZED. ptrace_init_task() copies PT_SEIZED correctly, so
we should only change tracehook_report_clone().

Another special (and nasty!) case is PTRACE_TRACEME. I do not know
how often it is used, but probabaly it is important enough. At least,
iirc, it is used by strace. Probably we need PTRACE_SEIZEME as well.

> > Well. Perhaps PTRACE_INTERRUPT should return -EALREADY or something if
> > JOBCTL_TRAP_INTERRUPT is already set? Again, again, it is not that I think
> > this is really useful. But since we are going to add the new API it is
> > better to discuss every detail.
>
> At first I made PTRACE_INTERRUPT noop if it was already in INTERRUPT
> trap but then changed my mind because guaranteeing that there will be
> at least one PTRACE_EVENT_INTERRUPT trap after PTRACE_INTERRUPT seems
> better than having subtle behavior difference which might be nicer in
> limited use cases.

No, I meant -EALREADY if JOBCTL_TRAP_INTERRUPT is pending set but not
yet reported. But anyway, this is really minor.

Oleg.

2011-05-10 18:20:16

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE

On 05/10, Tejun Heo wrote:
>
> I think this basically is the same problem as the using TRAPPING on
> PTRACE_DETACH one. Do you recall the original version where TRAPPING
> was cleared on kill

Yes, and it was me who suggested to not do this ;)

> I think that's what we should do.

Yes, I thought about this too. But I am still not sure, _perhaps_ this
needs more thinking. Just a fuzzy feeling, I do not have any particular
idea right now.

Oleg.

2011-05-10 22:00:06

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

On Tuesday 10 May 2011 16:06, Oleg Nesterov wrote:
> On 05/10, Tejun Heo wrote:
> > On Mon, May 09, 2011 at 06:58:57PM +0200, Oleg Nesterov wrote:
> > > Right now I am a bit puzzled why do we have 2 bits, JOBCTL_TRAP_INTERRUPT
> > > and JOBCTL_TRAP_SEIZE... But I didn't read this + other patches yet.
> >
> > It eventually ends up with three trap flags - SEIZE, INTERRUPT and
> > NOTIFY. They all use PTRACE_EVENT_INTERRUPT trap but are different as
> > for when they're cleared. SEIZE is cleared after any trap. INTERRUPT
> > is cleared after an INTERRUPT trap
>
> Yes, I see, but still can't understand. OK, please ignore, let me read
> other patches first.
>
> Well, in fact I seem to understand why do we have 2 bits. Unless I misread
> the code, PTRACE_INTERRUPT is sticky. It can't be lost even if the tracee
> can report another event before. Contrary, SEIZE is cleared if the tracee
> reports something else right after attach, but they both report the same
> PTRACE_EVENT_INTERRUPT. So, if the user does
>
> ptrace(PTRACE_SEIZE);
> ptrace(PTRACE_INTERRUPT);
>
> we need 2 bits to ensure PTRACE_EVENT_INTERRUPT will be reported in any
> case.
>
> And, you know, I am not sure this is very clear. What if we change the
> rules so that PTRACE_SEIZE always reports PTRACE_EVENT_INTERRUPT like
> PTRACE_INTERRUPT? This looks more symmetrical and simple to me. IOW,
> ptrace(PTRACE_SEIZE) simply implies the implicit PTRACE_INTERRUPT.
>
> Oh. And this reminds me the previous discussion. Should PTRACE_SEIZE
> stop the tracee? Perhaps it should only attach or do PTRACE_INTERRUPT
> depending on flags? Personally I think it should stop...

Off the top of my head, I don't see the use case for non-stopping
SEIZE right not, but arguably some people might want that.

Regarding three SEIZE/INTERRUPT/NOTIFY bits. Let's see whether
we really need them.

Tejun, why exactly do you want userspace to always see INTERRUPT stop?

If tracer did ptrace(PTRACE_INTERRUPT), it wants tracee to stop.
It then goes to waitpid, and whatever stop it sees, it handles.
I don't see any problem if it instead happens to see, say, a signal delivery
stop, and no INTERRUPT stop after that, ever. No information is lost.

Therefore, we can merge SEIZE and INTERRUPT bits into one
(or drop SEIZE bit altogether, if we decide that SEIZE doesn't stop).

Then, NOTIFY bit.
Tejun, let us know why did you design group stop notification
in a "sticky" way. Is it because of some races with SIGSTOP/SIGCONT?
>From userspace POV, it's not obvious why we can't just have
*one* INTERRUPT stop (that is, non-sticky one) every time there
is a group stop state change. Tracer can retrieve status via
GETSIGINFO just like as provided by your patch, but it doesn't
absolutely has to: it can simply CONT the tracee.

(
No, a bit different. Not
"every time there is a group stop state change"
but
"every time there is a SIGCONT which releases tracee from group stop"
- because group stop notification is _already_ delivered
to the tracer, even by the current kernel's code,
and it is already detectable (by observing that GETSIGINFO
fails on it) and we can avoid changing this.
)

Therefore, NOTIFY bit is also not needed. Only INTERRUPT bit is.
Unless I miss something...

Another note: even though PTRACE_INTERRUPT solves the problem that
PTRACE_DETACH of a running tracee was butt-ugly thing to do correctly,
the "new" way is still a bit ugly: tracer needs PTRACE_INTERRUPT,
waitpid, and only then PTRACE_DETACH. Why not go all the way
and make PTRACE_DETACH work on running tracee too?

--
vda

2011-05-10 22:37:40

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

On Sunday 08 May 2011 17:49, Tejun Heo wrote:
> Currently there's no way for ptracer to find out whether group stop
> that tracee was in finished other than polling with PTRACE_GETSIGINFO.
> Also, tracer can't detect new group stop started by an untraced thread
> if tracee is already trapped. This patch implements group stop
> notification for ptracer using INTERRUPT traps.

Group stop notification already is performed by current kernels.
What we don't have is "group cont notification".

> When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
> is set, which triggers INTERRUPT trap but is sticky until the next
> PTRACE_GETSIGINFO.

Why INTERRUPT trap? For group stops, we already have perfectly working
way to detect such a stop.

Can we just add a "group cont" notification which looks like
a waitpid result with WIFCONTINUED(waitpid_status) == 1 to the tracer?


> -EINVAL return from GETSIGINFO also clears the sticky trap. This is
> because -EINVAL clearly indicates that tracee is in group stop. To
> avoid unnecessarily taking INTERRUPT trap on the way to group stop, if
> JOBCTL_STOP_PENDING is set, INTERRUPT trap is not taken.

Exactly.


> Re-trapping is used only for group stop and INTERRUPT traps. If
> tracer wants to get notified about group stop, it either leaves tracee
> in the initial group stop trap or puts it into INTERRUPT trap. When
> INTERRUPT trap is scheduled while tracee is already in a trap,

Sane tracer has no need to do PTRACE_INTERRUPT on a tracee
which is already stopped (for whatever reason): it already knows
it's stopped, and why. PTRACE_INTERRUPT is useful to cleanly stop
_running_ tracees.


--
vda

2011-05-11 15:36:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 06/11] ptrace: make group stop state visible via PTRACE_GETSIGINFO

Hello,

On Tue, May 10, 2011 at 06:55:45PM +0200, Oleg Nesterov wrote:
> IOW, if the tracee reports via ptrace_notify*, the tracee can look at
> si_pt_flags == stop-in-effect. If the tracer reports a signal, the
> tracer obviously lacks this info, hmm.

Which indicates tracee is in group stop trap.

> Probably I need more time to get used to this... But at first glance
> this looks a bit unnatural. Say, can't we simply implement
> PTRACE_GET_GROUP_STOP_STATUS request which returns this (and probably
> more) info?

I don't know. PTRACE_GETSIGINFO seemed to already fit the bill and I
want to avoid introducing a new request if at all possible. It sure
is a bit quirky but doesn't compromisea functionality.

> > __SI_TRAP is defined to implement copying of
> > the new field to userland.
>
> Heh. I am shy to admit, I didn't know copy_siginfo_to_user() trims
> si_code, that is why your change is correct but I spent a lot of time
> before I was able to understand this.

Oh, don't be shy. I scratched my head for quite a while trying to
figure out why the hell the new flag field isn't getting out to
userland. It's an ugly piece of sh*t. :-)

> > if (!ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si)) {
> > if (si.si_code) {
> > stopped = !!si.si_status;
>
> In this case this "si_code != 0" check is correct, but how can the
> tracer detect this case in general?

This was quick hack. Proper test would look like,

si.si_code && (si.si_pt_flags & PTRACE_SI_STOPPED)

> > @@ -540,6 +542,17 @@ static int ptrace_getsiginfo(struct task_struct *child, siginfo_t *info)
> > + if ((child->ptrace & PT_SEIZED) &&
> > + (info->si_code & (0x7f | ~0xffff)) == (__SI_TRAP | SIGTRAP)) {
>
> Can't we simply check (from->si_code & __SI_MASK) == __SI_TRAP ?

Right, I originally lifted the test from ptrace_notify() before adding
__SI_TRAP and forgot to update it later. Will change.

> > + /* report whether group stop is in effect w/ SI_STOPPED */
> > + if (sig->group_stop_count || (sig->flags & SIGNAL_STOP_STOPPED))
>
> We have more and more "group_stop_count || SIGNAL_STOP_STOPPED" checks,
> perhaps we should make a helper. Or at least invent the short name to
> denote the group-stopped-or-in-progress to simplify the discussions ;)

Yeah, how about group_stop_in_effect()?

> Still, this is strange. With this change ptrace_getsiginfo() reports
> the extra "volatile" info which wasn't reported by the tracee itself.
> If the tracer does PTRACE_SETSIGINFO twice in a row, it can see the
> different si_pt_flags's.

(answering to both get/setsiginfo concerns)

* I think we better block PTRACE_SETSIGINFO for non signal delivery
traps. It doesn't make any sense. Let's just fail that with
-EINVAL if PT_SEIZED.

* I don't think PTRACE_GETSIGINFO returning volatile information to be
problematic. The information is generated on the fly on trap
anyway. For non signal delivery traps, PTRACE_GETSIGINFO is
basically (ab)using siginfo as a container for debugging
information. It might have been better if something else was used
from the beginning but the damage is already done and I don't see
too much benefit in making things pretty at this point.

Thank you.

--
tejun

2011-05-11 15:55:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hello,

On Tue, May 10, 2011 at 08:08:11PM +0200, Oleg Nesterov wrote:
> On 05/10, Tejun Heo wrote:
> > > And, you know, I am not sure this is very clear. What if we change the
> > > rules so that PTRACE_SEIZE always reports PTRACE_EVENT_INTERRUPT like
> > > PTRACE_INTERRUPT? This looks more symmetrical and simple to me. IOW,
> > > ptrace(PTRACE_SEIZE) simply implies the implicit PTRACE_INTERRUPT.
> >
> > Actually, that was my initial implementation but as we don't guarantee
> > that the first trap would be an INTERRUPT one,
>
> Yes, and ptrace(PTRACE_INTERRUPT) doesn't guarantee this too,

Yeah, sure.

> > so it seemed nicer to
> > avoid double trapping after SEIZE.
>
> Why? We can simply say that ptrace(PTRACE_SEIZE) == attach + INTERRUPT,
> the rules are the same.

I guess the thing I wanted to avoid was programs looping over
PTRACE_CONT waiting for the INTERRUPT trap to happen.

But, yeah, it might be easier to explain that way. Maybe I'll change
it. I don't know.

> Hmm. Suddenly I got lost. Perhaps instead JOBCTL_TRAP_INTERRUPT should
> be cleared on any trap too, like SEIZE.

I don't think that's a good idea especially because there are
functionality differences among different traps. ie. group stop and
interrupt traps support re-trapping on job control events while other
traps don't, so there will be cases where the debugger wants to put
tracee specifically into INTERRUPT trap. It's just cleaner to use and
say that if you ask for INTERRUPT, you get an INTERRUPT.

> > One apparently is clone. What were the others?
>
> clone() is special because (I think) auto-attach should obviously
> inherit PT_SEIZED. ptrace_init_task() copies PT_SEIZED correctly, so
> we should only change tracehook_report_clone().

Alright.

> Another special (and nasty!) case is PTRACE_TRACEME. I do not know
> how often it is used, but probabaly it is important enough. At least,
> iirc, it is used by strace. Probably we need PTRACE_SEIZEME as well.

I don't agree. PTRACE_TRACEME predates PTRACE_ATTACH and is
completely redundant. If you can make the child do PTRACE_TRACEME,
you might as well just make it do pause() and PTRACE_SEIZE yourself,
so unless there's something PTRACE_SEIZE can't do, I don't think I'll
be adding SEIZEME.

> > At first I made PTRACE_INTERRUPT noop if it was already in INTERRUPT
> > trap but then changed my mind because guaranteeing that there will be
> > at least one PTRACE_EVENT_INTERRUPT trap after PTRACE_INTERRUPT seems
> > better than having subtle behavior difference which might be nicer in
> > limited use cases.
>
> No, I meant -EALREADY if JOBCTL_TRAP_INTERRUPT is pending set but not
> yet reported. But anyway, this is really minor.

Sure, noop or -EALREADY or whatever. As I wrote above, I think it's
better not to add special cases, so the rule becomes "if you schedule
an INTERRUPT, you'll get at least one INTERRUPT in the future, no
matter what."

Thanks.

--
tejun

2011-05-11 15:49:03

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

Hello, Denys.

On Wed, May 11, 2011 at 12:37:34AM +0200, Denys Vlasenko wrote:
> > When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
> > is set, which triggers INTERRUPT trap but is sticky until the next
> > PTRACE_GETSIGINFO.
>
> Why INTERRUPT trap? For group stops, we already have perfectly working
> way to detect such a stop.

Debugger may continue tracee and put it in a different trap. In such
cases, group stop may be initiated and lifted multiple times while a
tracee is trapped. It's much nicer to have symmetric notifications in
those cases.

Also, we might end up adding more status flags to si_pt_flags and it's
much better to be able to say "INTERRUPT becomes and stays pending
until the next GETSIGINFO if any of these flags may have chagned" than
"if XXX change from x to y, or YYY changes to x to y or y to x,
... blah blah".

> Can we just add a "group cont" notification which looks like
> a waitpid result with WIFCONTINUED(waitpid_status) == 1 to the tracer?

Consider the following scenario.

1. A syscall traced tracee enters group stop while returning from a
syscall. Tracer is notified.

2. SIGCONT generated. Tracee is woken up and wait state is generated.

3. Tracee traps for syscall completion before tracer has chance to
wait(2).

4. Oops, continued wait state lost.

Note that generation of another trap can't be used to determine end of
group stop. It might already have been PTRACE_CONT'd.

This happens because wait states don't queue and shows why edge (or
instance) notifications are usually worse than level ones. Think
about RT signals which are jokes.

> > Re-trapping is used only for group stop and INTERRUPT traps. If
> > tracer wants to get notified about group stop, it either leaves tracee
> > in the initial group stop trap or puts it into INTERRUPT trap. When
> > INTERRUPT trap is scheduled while tracee is already in a trap,
>
> Sane tracer has no need to do PTRACE_INTERRUPT on a tracee
> which is already stopped (for whatever reason): it already knows
> it's stopped, and why. PTRACE_INTERRUPT is useful to cleanly stop
> _running_ tracees.

1. User wants to continue stopped task to investigate.

2. Investigation done, now the user wants to wait for the end of group
stop which will happen externally.

3. Regardless of the current trap tracee is in, debugger can schedule
INTERRUPT and be sure that tracee will trap into INTERRUPT without
returning to userland and that it will be notified of group stop
state changes.

According to you, the above use case is insane, which might be true
but I don't really care. I still want to have gdb which knows how to
do the above.

Thanks.

--
tejun

2011-05-11 16:30:44

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hey,

On Tue, May 10, 2011 at 11:59:58PM +0200, Denys Vlasenko wrote:
> Tejun, why exactly do you want userspace to always see INTERRUPT stop?
>
> If tracer did ptrace(PTRACE_INTERRUPT), it wants tracee to stop.
> It then goes to waitpid, and whatever stop it sees, it handles.
> I don't see any problem if it instead happens to see, say, a signal delivery
> stop, and no INTERRUPT stop after that, ever. No information is lost.
>
> Therefore, we can merge SEIZE and INTERRUPT bits into one
> (or drop SEIZE bit altogether, if we decide that SEIZE doesn't stop).

First of all, I think it's cleaner that way - if you ask for
INTERRUPT, you get an INTERRUPT. Secondly, because INTERRUPT trap is
special regarding retrapping notifications and there will be cases
where debugger wants to put a tracee into INTERRUPT trap and it will
be pretty annoying to do that safely if INTERRUPT disappears on each
trap and/or INTERRUPT doesn't work if tracee is already trapped.

> Then, NOTIFY bit.
> Tejun, let us know why did you design group stop notification
> in a "sticky" way. Is it because of some races with SIGSTOP/SIGCONT?
> From userspace POV, it's not obvious why we can't just have
> *one* INTERRUPT stop (that is, non-sticky one) every time there
> is a group stop state change. Tracer can retrieve status via
> GETSIGINFO just like as provided by your patch, but it doesn't
> absolutely has to: it can simply CONT the tracee.

Design like that is fragile like hell. Going back to the debugging
stopped tracee case I talked about in another reply, let's say
debugger issues PTRACE_CONT at the same time SIGCONT is generated.
Tracee will continue execution and debugger would believe that the
trap was continued by it and be oblivious about the SIGCONT which
raced with PTRACE_CONT.

> (
> No, a bit different. Not
> "every time there is a group stop state change"
> but
> "every time there is a SIGCONT which releases tracee from group stop"
> - because group stop notification is _already_ delivered
> to the tracer, even by the current kernel's code,
> and it is already detectable (by observing that GETSIGINFO
> fails on it) and we can avoid changing this.
> )
> Therefore, NOTIFY bit is also not needed. Only INTERRUPT bit is.
> Unless I miss something...

As I replied in another message, group stop may also be initiated
while a tracee is INTERRUPT trapped.

Those different TRAP bits are invisible to userland. The only changes
visible from userland are new ptrace requests - PTRACE_SEIZE and
PTRACE_INTERRUPT, and a new trap condition - PTRACE_EVENT_INTERRUPT.

PTRACE_EVENT_INTERRUPT traps are mostly side-effect-less and I'm
planning on documenting specifically that it may seem spurious and the
debugger should only rely on the information obtained from GETSIGINFO.

> Another note: even though PTRACE_INTERRUPT solves the problem that
> PTRACE_DETACH of a running tracee was butt-ugly thing to do correctly,
> the "new" way is still a bit ugly: tracer needs PTRACE_INTERRUPT,
> waitpid, and only then PTRACE_DETACH. Why not go all the way
> and make PTRACE_DETACH work on running tracee too?

I don't think I'll change that. It's only three syscall sequence -
INTERRUPT, wait(STOPPED) and DETACH which will always work reliably
(unless tracee gets killed or something). It might be nice but
ultimately unnecessary and pre-SEIZE ptrace would have to be supported
for a very long time anyway so I don't see much point in deviating
DETACH behavior when there is no new capability to gain by doing so.

Thank you.

--
tejun

2011-05-11 15:48:09

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

On Wed, May 11, 2011 at 11:05 AM, Tejun Heo <[email protected]> wrote:
> Hello, Denys.
>
> On Wed, May 11, 2011 at 12:37:34AM +0200, Denys Vlasenko wrote:
>> > When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
>> > is set, which triggers INTERRUPT trap but is sticky until the next
>> > PTRACE_GETSIGINFO.
>>
>> Why INTERRUPT trap? For group stops, we already have perfectly working
>> way to detect such a stop.
>
> Debugger may continue tracee and put it in a different trap. ?In such
> cases, group stop may be initiated and lifted multiple times while a
> tracee is trapped. ?It's much nicer to have symmetric notifications in
> those cases.

I don't understand what exactly you mean by this.
You mean that it's better to have one INTERRUPT trap per each
stop/cont transition? Such as:

tracer tracee 3rd process
waitpid <---------------some ptrace stop
<tracer is doing something,
not yet ptrace(CONT)ing>
<-----------------------kill -STOP
<-----------------------kill -CONT
<-----------------------kill -STOP
<-----------------------kill -CONT
ptrace(CONT)----------->
waitpid <---------------INTERRUPT
ptrace(GETSIGINFO)<-----"it's a group stop"
waitpid <---------------INTERRUPT
ptrace(GETSIGINFO)<-----"it's a cont"
ptrace(CONT)----------->
waitpid <---------------INTERRUPT
ptrace(GETSIGINFO)<-----"it's a group stop"
waitpid <---------------INTERRUPT
ptrace(GETSIGINFO)<-----"it's a cont"
ptrace(CONT)----------->
............................................................

I agree that it's not a bad thing to deliver ALL notifications.

But from what I understood, with your patch it won't do this either:
you don't remember every stop and cont in a queue, you have just one bit?

And second, I don't think we really need this. We only need this:
if tracer did see group stop, it MUST see cont (or kill) - cont
should never be missed. But it's ok to miss whole pairs of stop+cont
which were done in quick succession (and similarly for cont+stop
pairs on a stopped tracee, as in example above).

> Also, we might end up adding more status flags to si_pt_flags and it's
> much better to be able to say "INTERRUPT becomes and stays pending
> until the next GETSIGINFO if any of these flags may have chagned" than
> "if XXX change from x to y, or YYY changes to x to y or y to x,
> ... blah blah".

Why do you want this data to be sticky at all?
Do you think tracer will "forget" to retrieve it?
That'd be a bug in the tracer if it doesn't retrieve the data
and therefore mishandles stop and/or cont.

>> Can we just add a "group cont" notification which looks like
>> a waitpid result with WIFCONTINUED(waitpid_status) == 1 to the tracer?
>
> Consider the following scenario.
>
> 1. A syscall traced tracee enters group stop while returning from a
> ? syscall. ?Tracer is notified.
>
> 2. SIGCONT generated. ?Tracee is woken up and wait state is generated.
>
> 3. Tracee traps for syscall completion before tracer has chance to
> ? wait(2).

The order in (3) seems wrong:
Tracee doesn't enter group stop while it is in syscall.
It exits syscall first, and then it enters group stop.
Therefore in step (3) above it can't "trap for syscall completion".
It can trap for syscall entry at worst; however:

> 4. Oops, continued wait state lost.

Why is it lost? Why kernel signals syscall entry instead of CONT
in your scenario?

Here's the example how stop and syscall tracing interact in current kernels:

sh -c 'echo $$; exec strace -tt -s999 -oLOG strace sleep 999'
kill -STOP 24655
kill -KILL 24655

inner strace prints:

nanosleep({999, 0}, NULL) = ? ERESTART_RESTARTBLOCK (To
be restarted)
--- {si_signo=SIGSTOP, si_code=SI_USER, si_pid=24647, si_uid=0,
si_value={int=15, ptr=0xf}} (Stopped (signal)) ---
--- Stopped (signal) by SIGSTOP ---
restart_syscall(<... resuming interrupted call ...> <unfinished ...>
+++ killed by SIGKILL +++

inner strace's strace:

13:17:54.776431 write(2, "nanosleep({999, 0}, ", 20) = 20
13:17:54.776490 ptrace(PTRACE_SYSCALL, 24655, 0x1, SIG_0) = 0
13:17:54.776542 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
13:17:54.776591 wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}],
__WALL, NULL) = 24655
13:18:02.430991 --- {si_signo=SIGCHLD, si_code=CLD_TRAPPED,
si_pid=24655, si_status=SIGTRAP, si_utime=0, si_stime=0} (Child
exited) ---
13:18:02.431238 write(2, "NULL) = ?
ERESTART_RESTARTBLOCK (To be restarted)\n", 64) = 64
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
See? We got syscall exit notification *first*. And only now:

13:18:02.431369 ptrace(PTRACE_SYSCALL, 24655, 0x1, SIG_0) = 0
13:18:02.431461 --- {si_signo=SIGCHLD, si_code=CLD_TRAPPED,
si_pid=24655, si_status=SIGSTOP, si_utime=0, si_stime=0} (Child
exited) ---
13:18:02.431583 wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}],
__WALL, NULL) = 24655
13:18:02.431737 ptrace(PTRACE_GETSIGINFO, 24655, 0, {si_signo=SIGSTOP,
si_code=SI_USER, si_pid=24647, si_uid=0, si_value={int=15, ptr=0xf}})
= 0
13:18:02.431871 write(2, "--- {si_signo=SIGSTOP, si_code=SI_USER,
si_pid=24647, si_uid=0, si_value={int=15, ptr=0xf}} (Stopped (signal))
---\n", 115) = 115
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...we got signal notification. We inject it:

13:18:02.431995 ptrace(PTRACE_SYSCALL, 24655, 0x1, SIGSTOP) = 0
13:18:02.432068 --- {si_signo=SIGCHLD, si_code=CLD_STOPPED,
si_pid=24655, si_status=SIGSTOP, si_utime=0, si_stime=0} (Child
exited) ---
13:18:02.432183 wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}],
__WALL, NULL) = 24655
13:18:02.432338 ptrace(PTRACE_GETSIGINFO, 24655, 0, 0xbffc4fd8) = -1
EINVAL (Invalid argument)
13:18:02.432413 write(2, "--- Stopped (signal) by SIGSTOP ---\n", 36) = 36
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...and only now we got group stop notification. My point: ***we aren't
inside syscall***.
The rest isn't interesting (typical buggy handling of stop
notification in current strace/kernel):

13:18:02.432501 ptrace(PTRACE_SYSCALL, 24655, 0x1, SIGSTOP) = 0
13:18:02.432573 --- {si_signo=SIGCHLD, si_code=CLD_TRAPPED,
si_pid=24655, si_status=SIGTRAP, si_utime=0, si_stime=0} (Child
exited) ---
13:18:02.432697 wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}],
__WALL, NULL) = 24655
13:18:02.432993 write(2, "restart_syscall(<... resuming interrupted
call ...>", 51) = 51
13:18:02.433089 ptrace(PTRACE_SYSCALL, 24655, 0x1, SIG_0) = 0
13:18:02.433237 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) ==
SIGKILL}], __WALL, NULL) = 24655
13:18:12.031145 --- {si_signo=SIGCHLD, si_code=CLD_KILLED,
si_pid=24655, si_status=SIGKILL, si_utime=0, si_stime=0} (Child
exited) ---
13:18:12.031297 write(2, " <unfinished ...>\n", 18) = 18
13:18:12.031422 write(2, "+++ killed by SIGKILL +++\n", 26) = 26
13:18:12.031670 gettid() = 24654
13:18:12.031747 tgkill(24654, 24654, SIGKILL <unfinished ...>
13:18:12.031877 +++ killed by SIGKILL +++


> Note that generation of another trap can't be used to determine end of
> group stop. ?It might already have been PTRACE_CONT'd.
>
> This happens because wait states don't queue and shows why edge (or
> instance) notifications are usually worse than level ones. ?Think
> about RT signals which are jokes.

Can you describe your scenario in more details. I don't understand it.


>> > Re-trapping is used only for group stop and INTERRUPT traps. ?If
>> > tracer wants to get notified about group stop, it either leaves tracee
>> > in the initial group stop trap or puts it into INTERRUPT trap. ?When
>> > INTERRUPT trap is scheduled while tracee is already in a trap,
>>
>> Sane tracer has no need to do PTRACE_INTERRUPT on a tracee
>> which is already stopped (for whatever reason): it already knows
>> it's stopped, and why. PTRACE_INTERRUPT is useful to cleanly stop
>> _running_ tracees.
>
> 1. User wants to continue stopped task to investigate.

You mean like this? -
waitpid <---------------INTERRUPT
ptrace(GETSIGINFO)<-----"it's a group stop"
ptrace(CONT)----------->
...

> 2. Investigation done, now the user wants to wait for the end of group
> ? stop which will happen externally.

You mean, continuing above example like this? -

case 1: tracee stops
...
ptrace(CONT)----------->
waitpid <---------------some ptrace stop
(tracer doesnt PTRACE_CONT! it wants to wait till SIGCONT)
waitpid <---------------...

case 2: tracee runs
...
ptrace(CONT)----------->
waitpid <---------------...

> 3. Regardless of the current trap tracee is in, debugger can schedule
> ? INTERRUPT and be sure that tracee will trap into INTERRUPT without
> ? returning to userland and that it will be notified of group stop
> ? state changes.

Why tracer needs to schedule anything? As you see above,
at this point tracer is in waitpid anyway. It is ALREADY prepared
to get cont notification. It doesn't need to generate any extra
ptrace stops for that.

--
vda

2011-05-11 18:10:48

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hi Tejun,

On Wed, May 11, 2011 at 11:19 AM, Tejun Heo <[email protected]> wrote:
> On Tue, May 10, 2011 at 11:59:58PM +0200, Denys Vlasenko wrote:
>> Tejun, why exactly do you want userspace to always see INTERRUPT stop?
>>
>> If tracer did ptrace(PTRACE_INTERRUPT), it wants tracee to stop.
>> It then goes to waitpid, and whatever stop it sees, it handles.
>> I don't see any problem if it instead happens to see, say, a signal delivery
>> stop, and no INTERRUPT stop after that, ever. No information is lost.
>>
>> Therefore, we can merge SEIZE and INTERRUPT bits into one
>> (or drop SEIZE bit altogether, if we decide that SEIZE doesn't stop).
>
> First of all, I think it's cleaner that way - if you ask for
> INTERRUPT, you get an INTERRUPT. ?Secondly, because INTERRUPT trap is
> special regarding retrapping notifications and there will be cases
> where debugger wants to put a tracee into INTERRUPT trap and it will
> be pretty annoying to do that safely if INTERRUPT disappears on each
> trap and/or INTERRUPT doesn't work if tracee is already trapped.
>
>> Then, NOTIFY bit.
>> Tejun, let us know why did you design group stop notification
>> in a "sticky" way. Is it because of some races with SIGSTOP/SIGCONT?
>> From userspace POV, it's not obvious why we can't just have
>> *one* INTERRUPT stop (that is, non-sticky one) every time there
>> is a group stop state change. Tracer can retrieve status via
>> GETSIGINFO just like as provided by your patch, but it doesn't
>> absolutely has to: it can simply CONT the tracee.
>
> Design like that is fragile like hell. ?Going back to the debugging
> stopped tracee case I talked about in another reply, let's say
> debugger issues PTRACE_CONT at the same time SIGCONT is generated.

If tracer issues PTRACE_CONT, it means that tracee was in some
sort of ptrace stop anyway at that point.
Therefore "cont" notification, just like any other ptrace notification
(say, signal notification) which was generated while tracee
was ptrace-stopped, should be reported *at the next waitpid*.

If, as you say in this example, SIGCONT was generated while tracee
was ptrace-stopped, then next waitpid will return "cont" notification
to tracer. Tracer will know that it is a "cont" notification
by looking at GETSIGINFO result.

Why do you need stickiness of "cont" notification?
I don't see this need from userspace POV.

> Tracee will continue execution and debugger would believe that the
> trap was continued by it

Yes...

> and be oblivious about the SIGCONT which
> raced with PTRACE_CONT.

...yes, until tracer gets the next waitpid result, which
will inform tracer that "cont" has happened.

(After which, tracer will likely PTRACE_CONT,
get SIGCONT signal delivery notification, inject it via
PTRACE_CONT(SIGCONT) so that tracee can run the handler,
and so forth...)


>> (
>> No, a bit different. Not
>> "every time there is a group stop state change"
>> but
>> "every time there is a SIGCONT which releases tracee from group stop"
>> - because group stop notification is _already_ delivered
>> to the tracer, even by the current kernel's code,
>> and it is already detectable (by observing that GETSIGINFO
>> fails on it) and we can avoid changing this.
>> )
>> Therefore, NOTIFY bit is also not needed. Only INTERRUPT bit is.
>> Unless I miss something...
>
> As I replied in another message, group stop may also be initiated
> while a tracee is INTERRUPT trapped.
>
> Those different TRAP bits are invisible to userland.

Sure, I'm not arguing about details of kernel-side machinery.
If you need three bits (or 33) in task struct to make it work, so be it.
Oleg is your guy to discuss that.

>?The only changes
> visible from userland are new ptrace requests - PTRACE_SEIZE and
> PTRACE_INTERRUPT, and a new trap condition - PTRACE_EVENT_INTERRUPT.
>
> PTRACE_EVENT_INTERRUPT traps are mostly side-effect-less and I'm
> planning on documenting specifically that it may seem spurious and the
> debugger should only rely on the information obtained from GETSIGINFO.

I understand this. I am trying to understand what feature are you trying
to provide to userland, or what problematic race scenario you are trying
to make resolve-able *in userland* by making "stop" and "cont"
notifications sticky wrt GETSIGINFO. I just don't see this scenario.

--
vda

2011-05-11 16:03:11

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

Hey,

On Wed, May 11, 2011 at 02:01:16PM +0200, Denys Vlasenko wrote:
> On Wed, May 11, 2011 at 11:05 AM, Tejun Heo <[email protected]> wrote:
> > Debugger may continue tracee and put it in a different trap. ?In such
> > cases, group stop may be initiated and lifted multiple times while a
> > tracee is trapped. ?It's much nicer to have symmetric notifications in
> > those cases.
>
> I don't understand what exactly you mean by this.
> You mean that it's better to have one INTERRUPT trap per each
> stop/cont transition? Such as:
>
> tracer tracee 3rd process
> waitpid <---------------some ptrace stop
> <tracer is doing something,
> not yet ptrace(CONT)ing>
> <-----------------------kill -STOP
> <-----------------------kill -CONT
> <-----------------------kill -STOP
> <-----------------------kill -CONT
> ptrace(CONT)----------->
> waitpid <---------------INTERRUPT
> ptrace(GETSIGINFO)<-----"it's a group stop"
> waitpid <---------------INTERRUPT
> ptrace(GETSIGINFO)<-----"it's a cont"
> ptrace(CONT)----------->
> waitpid <---------------INTERRUPT
> ptrace(GETSIGINFO)<-----"it's a group stop"
> waitpid <---------------INTERRUPT
> ptrace(GETSIGINFO)<-----"it's a cont"
> ptrace(CONT)----------->
> ............................................................
>
> I agree that it's not a bad thing to deliver ALL notifications.

No, not at all.

> But from what I understood, with your patch it won't do this either:
> you don't remember every stop and cont in a queue, you have just one bit?

What I want to guarantee is that after one or more events, at least
one notification is always delivered in finite userland time.

> And second, I don't think we really need this. We only need this:
> if tracer did see group stop, it MUST see cont (or kill) - cont
> should never be missed. But it's ok to miss whole pairs of stop+cont
> which were done in quick succession (and similarly for cont+stop
> pairs on a stopped tracee, as in example above).

Yeah, sure. There's no event queueing and I'm not gonna add one.

> > Also, we might end up adding more status flags to si_pt_flags and it's
> > much better to be able to say "INTERRUPT becomes and stays pending
> > until the next GETSIGINFO if any of these flags may have chagned" than
> > "if XXX change from x to y, or YYY changes to x to y or y to x,
> > ... blah blah".
>
> Why do you want this data to be sticky at all?
> Do you think tracer will "forget" to retrieve it?
> That'd be a bug in the tracer if it doesn't retrieve the data
> and therefore mishandles stop and/or cont.

Please read on.

> >> Can we just add a "group cont" notification which looks like
> >> a waitpid result with WIFCONTINUED(waitpid_status) == 1 to the tracer?
> >
> > Consider the following scenario.
> >
> > 1. A syscall traced tracee enters group stop while returning from a
> > ? syscall. ?Tracer is notified.
> >
> > 2. SIGCONT generated. ?Tracee is woken up and wait state is generated.
> >
> > 3. Tracee traps for syscall completion before tracer has chance to
> > ? wait(2).
>
> The order in (3) seems wrong:
> Tracee doesn't enter group stop while it is in syscall.
> It exits syscall first, and then it enters group stop.
> Therefore in step (3) above it can't "trap for syscall completion".
> It can trap for syscall entry at worst; however:

Okay, whatever other trap then.

> > 4. Oops, continued wait state lost.
>
> Why is it lost? Why kernel signals syscall entry instead of CONT
> in your scenario?

SIGCONT doesn't have to be delivered by the specific task being traced
and the userland can continue executing without any specific time
limit breaking the 'in finite userland time' part of the guarantee.

> The rest isn't interesting (typical buggy handling of stop
> notification in current strace/kernel):

The whole thing is uninteresting. The trap type doesn't matter at
all. The only thing matter is that PTRACE_CONT may race with end of
group stop and control may return to userland and event can be lost.

> Can you describe your scenario in more details. I don't understand it.

Stop thinking about strace(1). It's the most trivial case. A tracee
can be in any trap regardless of group stop state and tracer can issue
PTRACE_CONT anytime which can race with SIGCONT generation. You want
to deliver the notification to the tracer in finite userland time.
Think about SIGCONT and PTRACE_CONT racing each other and SIGCONT
being delivered to another task.

> Why tracer needs to schedule anything? As you see above,
> at this point tracer is in waitpid anyway. It is ALREADY prepared
> to get cont notification. It doesn't need to generate any extra
> ptrace stops for that.

The debugger doesn't want the debuggee to be running. It wants tracee
to be in basically the same state as in group stop after the temporary
inspection. It doesn't want to leave it running while group stop is
in effect but still wants to be notified of end of group stop.

Thanks.

--
tejun

2011-05-11 16:05:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hey,

On Wed, May 11, 2011 at 02:23:44PM +0200, Denys Vlasenko wrote:
> I understand this. I am trying to understand what feature are you trying
> to provide to userland, or what problematic race scenario you are trying
> to make resolve-able *in userland* by making "stop" and "cont"
> notifications sticky wrt GETSIGINFO. I just don't see this scenario.

If you still don't see how events can get lost, I'm afraid I can't
explain any better. You're repeating that exit_code can record the
event until it's fetched but it doesn't queue and any trap will
overwrite it and debuggers don't have to call GETSIGINFO after each
trap - it only has to do so for signal delivery, group stop and
INTERRUPT trap. So, it can just look at the exit_code (which doesn't
indicate continuation) and let tracee return to userland.

Maybe I'm horribly confused. Oleg, am I?

Thanks.

--
tejun

2011-05-11 15:50:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 08/11] ptrace: move fallback JOBCTL_TRAPPING clearing to get_signal_to_deliver()

On 05/08, Tejun Heo wrote:
>
> Move the fallback clearing to the end of get_signal_to_deliver() so
> that TRAPPING is maintained while tracee is inside signal delivery
> path. When killed, tracee is guaranteed to leave signal delivery path
> in finite amount of time and thus TRAPPING is still guaranteed to be
> cleared on kill.

Mostly yes, but we can race with freeze_processes() and deadlock.

> @@ -1978,9 +1981,6 @@ retry:
> goto retry;
> }
>
> - /* PTRACE_ATTACH might have raced with task killing, clear trapping */
> - task_clear_jobctl_trapping(current);
> -
> spin_unlock_irq(&current->sighand->siglock);
>
> tracehook_finish_jctl();
> @@ -2226,6 +2226,13 @@ relock:
> do_group_exit(info->si_signo);
> /* NOTREACHED */
> }
> +
> + /*
> + * PTRACE_ATTACH might have raced with task killing. Make sure
> + * trapping is clear before leaving signal delivery path.
> + */
> + task_clear_jobctl_trapping(current);

before the tracee does this, it returns from do_signal_stop(), goes to
relock:, and calls try_to_freeze(). If it becomes frozen,
try_to_freeze_tasks() can't succeed because the tracer waits for
!JOBCTL_TRAPPING.

And. The main problem is that "leave signal delivery path" is not true
at all. When the tracee dequeues SIGKILL it calls do_group_exit() inside
the main loop.

So far this all looks easily fixeable though...

Oleg.

2011-05-11 15:50:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

On 05/08, Tejun Heo wrote:
>
> wait_task_stopped() tested task_stopped_code() without acquiring
> siglock and, if stop condition existed, called wait_task_stopped() and
> directly returned the result.
>
> it may race against SIGCONT generation.

Hmm. This is the plain bug, even if unlikely and minor.

> It seems that WNOHANG wait correctness has never been guaranteed and
> everybody has been happy with it for very long time.

Yes, the window is tiny. May be it was never noticed or never
reported because this is hard to diagnose/reproduced.

> As such,
> although this reorganization improves the situation a bit, I don't
> consider this to be a bug fix.

But it is?

Can't we push this patch ahead of these changes? I can merge it into
ptrace branch.


> static int wait_task_stopped(struct wait_opts *wo,
> int ptrace, struct task_struct *p)
> @@ -1397,6 +1409,9 @@ static int wait_task_stopped(struct wait_opts *wo,
> if (!ptrace && !(wo->wo_flags & WUNTRACED))
> return 0;
>
> + if (!task_stopped_code(p, ptrace))
> + return 0;
> +
> exit_code = 0;
> spin_lock_irq(&p->sighand->siglock);
>
> @@ -1607,8 +1622,9 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
> * Wait for stopped. Depending on @ptrace, different stopped state
> * is used and the two don't interact with each other.
> */
> - if (task_stopped_code(p, ptrace))
> - return wait_task_stopped(wo, ptrace, p);
> + ret = wait_task_stopped(wo, ptrace, p);
> + if (ret)
> + return ret;

Reviewed-by: Oleg Nesterov <[email protected]>

2011-05-11 16:20:55

by Bryan Donlan

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

On Wed, May 11, 2011 at 09:22, Tejun Heo <[email protected]> wrote:
> Hey,
>
> On Wed, May 11, 2011 at 02:23:44PM +0200, Denys Vlasenko wrote:
>> I understand this. I am trying to understand what feature are you trying
>> to provide to userland, or what problematic race scenario you are trying
>> to make resolve-able *in userland* by making "stop" and "cont"
>> notifications sticky wrt GETSIGINFO. I just don't see this scenario.
>
> If you still don't see how events can get lost, I'm afraid I can't
> explain any better. ?You're repeating that exit_code can record the
> event until it's fetched but it doesn't queue and any trap will
> overwrite it and debuggers don't have to call GETSIGINFO after each
> trap - it only has to do so for signal delivery, group stop and
> INTERRUPT trap. ?So, it can just look at the exit_code (which doesn't
> indicate continuation) and let tracee return to userland.
>
> Maybe I'm horribly confused. ?Oleg, am I?

If a process issues PTRACE_INTERRUPT, it just wants the target process
to stop. Presumably it doesn't really care _why_, as long as it
doesn't introduce spurious effects visible to the target. As such, if
the process stops on its own due to some other trap, that ought to be
good enough, right? Is there really a need for a trap that will be
reliably echoed back to the trapping process?

2011-05-11 17:41:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

On 05/08, Tejun Heo wrote:
>
> TRAPPING will also be used to implement end of group stop retrapping,
> which can be initiated by tasks other than tracer. To allow this,

I didn't read the next patch yet, so I can't undestand/comment the
motivation.

But,

> this patch moves TRAPPING wait from attach completion path to
> operations which are actually affected by the transition - wait(2) and
> following ptrace(2) requests.

You know, I'd wish I could find the serious bugs in this patch. The
code becomes really hairy. -EAGAIN in do_wait() doesn't make it more
simple ;)

> Both wait and ptrace paths are updated to retry the operation after
> TRAPPING wait. Note that wait_task_stopped() now always grabs siglock
> for ptrace waits. This can be avoided with "task_stopped_code() ->
> rmb() -> TRAPPING -> rmb() -> task_stopped_code()" sequence

And so far I think this would be better, because it seems we can avoid
the retry logic.


First of all, this patch returns one of the user-visible and undesirable
changes. The tracer know that the task is stopped, attaches, and then it
can see the TASK_RUNNING tracee after ptrace(PTRACE_ATTACH) returns.

I agree, this looks minor. But if we can tolerate this, probably we can
tolerate another oddity: wait_task_stopped() can succeed and eat the
stop code before the tracee actually stopps, no?

IOW, ignoring mb's and read-ordering, suppose that we simply change
task_stopped_code:

if (ptrace) {
- if (task_is_stopped_or_traced(p))
+ if (task_is_traced(p) || JOBCTL_TRAPPING)
return &p->exit_code;
} else {

As for ptrace_check_attach(), it can simply do wait_event(), we
only need to verify the caller is the tracer. No need to play with
lock/unlock/retry.

What do you think?

Oleg.

2011-05-11 17:02:37

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

On 05/11, Oleg Nesterov wrote:
>
> You know, I'd wish I could find the serious bugs in this patch. The
> code becomes really hairy. -EAGAIN in do_wait() doesn't make it more
> simple ;)

Mwahaha! I seem to see the bug ;)

bool ptrace_wait_trapping(struct task_struct *child)
__releases(&child->sighand->siglock)
__releases(&tasklist_lock)
{
if (likely(!(child->jobctl & JOBCTL_TRAPPING)))
return false;

spin_unlock_irq(&child->sighand->siglock);
get_task_struct(child);
read_unlock(&tasklist_lock);

--> WINDOW

wait_event(current->signal->wait_chldexit,
!(child->jobctl & JOBCTL_TRAPPING));
put_task_struct(child);
return true;
}

When the caller is do_wait(), we can't assume we are the tracer when
we drop tasklist. Original tracer can detach, then another unrelated
process can attach again and provoke JOBCTL_TRAPPING. wait_event()
can hang forever.

Unfortunately, this is easy to fix :/

Oleg.

2011-05-11 19:17:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 08/11] ptrace: move fallback JOBCTL_TRAPPING clearing to get_signal_to_deliver()

Hello,

On Wed, May 11, 2011 at 05:48:34PM +0200, Oleg Nesterov wrote:
> On 05/08, Tejun Heo wrote:
> > Move the fallback clearing to the end of get_signal_to_deliver() so
> > that TRAPPING is maintained while tracee is inside signal delivery
> > path. When killed, tracee is guaranteed to leave signal delivery path
> > in finite amount of time and thus TRAPPING is still guaranteed to be
> > cleared on kill.
>
> Mostly yes, but we can race with freeze_processes() and deadlock.

Ah... the try_to_freeze(). Nice spotting. It isn't necessarily a
deadlock tho. freeze_processes() will back out after a while. I
think the right thing to do here is making the TRAPPING sleep an
interruptible one and let the syscall restart logic deal with it.
This will also remove the dreaded unkillable tracer even if we get
something wrong somewhere. How does that sound?

> And. The main problem is that "leave signal delivery path" is not true
> at all. When the tracee dequeues SIGKILL it calls do_group_exit() inside
> the main loop.
>
> So far this all looks easily fixeable though...

Yeah, as I wrote before, I wanna do two things.

* Make TRAPPING wait INTERRUPTIBLE.

* Move clearing of pending group stop and traps to the actions which
require such clearing.

Thanks.

--
tejun

2011-05-11 19:24:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hello,

On Wed, May 11, 2011 at 12:20:11PM -0400, Bryan Donlan wrote:
> If a process issues PTRACE_INTERRUPT, it just wants the target process
> to stop. Presumably it doesn't really care _why_, as long as it
> doesn't introduce spurious effects visible to the target. As such, if
> the process stops on its own due to some other trap, that ought to be
> good enough, right? Is there really a need for a trap that will be
> reliably echoed back to the trapping process?

It can go either way. I just chose one. One use case which made me
favor not clearing INTERRUPT on other traps was that INTERRUPT, along
with group stop trap, is special in that it allows notification
re-traps. It serves as per-tracee job control trap site which
debugger can use as it pleases.

As such, there are times where debugger wants to put tracee into that
specific trap. If INTERRUPT gets cleared on every trap, it would have
to set INTERRUPT on each trap until it reaches INTERRUPT trap which
seemed kinda silly and given that INTERRUPT may happen for spurious
notification reasons, there didn't seem much point in avoiding the
trap because there was another trap.

That said, as it makes INTERRUPT behavior different from SEIZE and I
think occassional multiple traps on SEIZE could be uglier. I think
I'll change it such that it clears on any trap. If debugger wants to
put tracee specifically into INTERRUPT trap, it will have to raise it
after each unsuccessful trap. It would be annoying but I suppose it's
lesser of the two evils.

Thanks.

--
tejun

2011-05-11 19:29:10

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

Hello,

On Wed, May 11, 2011 at 05:48:54PM +0200, Oleg Nesterov wrote:
> > It seems that WNOHANG wait correctness has never been guaranteed and
> > everybody has been happy with it for very long time.
>
> Yes, the window is tiny. May be it was never noticed or never
> reported because this is hard to diagnose/reproduced.

Yeah, most likely.

> > As such,
> > although this reorganization improves the situation a bit, I don't
> > consider this to be a bug fix.
>
> But it is?
>
> Can't we push this patch ahead of these changes? I can merge it into
> ptrace branch.

It doesn't really fix the problem tho. The whole thing is full of
holes and I think it would be better to just declare "WNOHANG might
fail even when it's not supposed to, retry later" than making the
locking heavier there, which could easily be much more relevant
regression. Of course, if we can fix it without adding extra locking
or too much complexity, it would be nice.

Anyways, yeah, sure. I'll resend it as a separate patch.

Thanks.

--
tejun

2011-05-11 19:46:04

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

Hello,

On Wed, May 11, 2011 at 07:00:58PM +0200, Oleg Nesterov wrote:
> On 05/11, Oleg Nesterov wrote:
> >
> > You know, I'd wish I could find the serious bugs in this patch. The
> > code becomes really hairy. -EAGAIN in do_wait() doesn't make it more
> > simple ;)
>
> Mwahaha! I seem to see the bug ;)

:-)

> When the caller is do_wait(), we can't assume we are the tracer when
> we drop tasklist. Original tracer can detach, then another unrelated
> process can attach again and provoke JOBCTL_TRAPPING. wait_event()
> can hang forever.
>
> Unfortunately, this is easy to fix :/

Yeah, it's basically the same problem raised in other two patches.
Making wait interruptible and clearing pending stop/traps on
detach/kill/cont/whatever should do the trick. I'll reply the
previous message for the suggested change.

Thanks.

--
tejun

2011-05-11 19:53:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

Hello,

On Wed, May 11, 2011 at 06:49:47PM +0200, Oleg Nesterov wrote:
> On 05/08, Tejun Heo wrote:
> > this patch moves TRAPPING wait from attach completion path to
> > operations which are actually affected by the transition - wait(2) and
> > following ptrace(2) requests.
>
> You know, I'd wish I could find the serious bugs in this patch. The
> code becomes really hairy. -EAGAIN in do_wait() doesn't make it more
> simple ;)

I don't know. Why is retrying hairy? The whole waiting logic is
built for clean retries. The suggested change just does it without
intervening sleeping and waking up. I don't see anything particularly
hairy there.

> > Both wait and ptrace paths are updated to retry the operation after
> > TRAPPING wait. Note that wait_task_stopped() now always grabs siglock
> > for ptrace waits. This can be avoided with "task_stopped_code() ->
> > rmb() -> TRAPPING -> rmb() -> task_stopped_code()" sequence
>
> And so far I think this would be better, because it seems we can avoid
> the retry logic.

Well, the above memory barrier dance wouldn't really change whether
retry logic is required or not and I'd _really_ like to avoid complex
barrier dances. Even the typical write-B wmb() write-A / read-A rmb()
read-B barriers often confuse people. I don't wanna throw in stacked
wmb()/rmb() pairs there even if that means an extra locking for ptrace
waits.

> First of all, this patch returns one of the user-visible and undesirable
> changes. The tracer know that the task is stopped, attaches, and then it
> can see the TASK_RUNNING tracee after ptrace(PTRACE_ATTACH) returns.

Yes, it does. Sorry about forgetting to mention it in the patch
description. I believe this is something we can swallow.

> I agree, this looks minor. But if we can tolerate this, probably we can
> tolerate another oddity: wait_task_stopped() can succeed and eat the
> stop code before the tracee actually stopps, no?
>
> IOW, ignoring mb's and read-ordering, suppose that we simply change
> task_stopped_code:
>
> if (ptrace) {
> - if (task_is_stopped_or_traced(p))
> + if (task_is_traced(p) || JOBCTL_TRAPPING)
> return &p->exit_code;
> } else {
>
> As for ptrace_check_attach(), it can simply do wait_event(), we
> only need to verify the caller is the tracer. No need to play with
> lock/unlock/retry.
>
> What do you think?

Hmmm... interesting. Yeah, the state is visible only through wait(2)
and ptrace(2) and for wait(2) TRAPPING is as good as STOPPED/TRACED
and we can wait all we want in ptrace_check_attach(). I'll think more
about it but seems like a nice idea.

Thank you.

--
tejun

2011-05-11 20:00:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

On 05/08, Tejun Heo wrote:
>
> +static void ptrace_trap_notify(struct task_struct *t)
> +{
> + WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
> + assert_spin_locked(&t->sighand->siglock);
> +
> + /*
> + * @t is being ptraced and new SEIZE behavior is in effect.
> + * Schedule sticky trap which will clear on the next GETSIGINFO.
> + */
> + t->jobctl |= JOBCTL_TRAP_NOTIFY;
> +
> + /*
> + * If @t is currently trapped for group stop or INTERRUPT
> + * (JOBCTL_TRAPPED set), it should re-trap with new exit_code
> + * indicating continuation so that the ptracer can notice the
> + * event; otherwise, use normal signal delivery wake up.
> + *
> + * The re-trapping sets JOBCTL_TRAPPING such that the transition is
> + * hidden from the ptracer.
> + *
> + * This means that if @t is trapped for other reasons than group
> + * stop or INTERRUPT, the notification trap won't be delievered
> + * until the current one is complete. This is the intended
> + * behavior.
> + */
> + if (task_is_traced(t) && (t->jobctl & JOBCTL_TRAPPED)) {
> + t->jobctl |= JOBCTL_TRAPPING;
> + signal_wake_up(t, true);

and its tracer can be inside sys_ptrace().

No, I don't think this can be right. Otherwise, why ptrace_check_attach()
calls wait_task_inactive() ? The tracee can be scheduled but only if killed,
in this case we don't care.

I know very little about this low-level (and worse, arch dependant) magic,
but even on x86 this doesn't look safe. Suppose that the tracer changes the
fpu state of the tracer and __switch_to(next_p == tracee) does
__math_state_restore() before the tracer finishes.

Or PTRACE_GETSIGINFO can fail while it shouldn't, but this is minor.


Hmm. I think we need a bit more discussion, even ignoring the implementation
details.

Oleg.

2011-05-11 20:18:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

Hey, Oleg.

On Wed, May 11, 2011 at 09:58:24PM +0200, Oleg Nesterov wrote:
> > + /*
> > + * If @t is currently trapped for group stop or INTERRUPT
> > + * (JOBCTL_TRAPPED set), it should re-trap with new exit_code
> > + * indicating continuation so that the ptracer can notice the
> > + * event; otherwise, use normal signal delivery wake up.
> > + *
> > + * The re-trapping sets JOBCTL_TRAPPING such that the transition is
> > + * hidden from the ptracer.
> > + *
> > + * This means that if @t is trapped for other reasons than group
> > + * stop or INTERRUPT, the notification trap won't be delievered
> > + * until the current one is complete. This is the intended
> > + * behavior.
> > + */
> > + if (task_is_traced(t) && (t->jobctl & JOBCTL_TRAPPED)) {
> > + t->jobctl |= JOBCTL_TRAPPING;
> > + signal_wake_up(t, true);
>
> and its tracer can be inside sys_ptrace().
>
> No, I don't think this can be right. Otherwise, why ptrace_check_attach()
> calls wait_task_inactive() ? The tracee can be scheduled but only if killed,
> in this case we don't care.
>
> I know very little about this low-level (and worse, arch dependant) magic,
> but even on x86 this doesn't look safe. Suppose that the tracer changes the
> fpu state of the tracer and __switch_to(next_p == tracee) does
> __math_state_restore() before the tracer finishes.

Yeah, yeah, it's dangerous to let ptrace operations while the target
task is still on CPU. It shouldn't happen.

> Or PTRACE_GETSIGINFO can fail while it shouldn't, but this is minor.
>
> Hmm. I think we need a bit more discussion, even ignoring the implementation
> details.

Hmmm... this whole re-trapping thing is turning out to be more
problematic than expected. Maybe it would be better to somehow notify
ptracer directly from prepare_signal().

Thanks.

--
tejun

2011-05-11 20:21:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

On Wed, May 11, 2011 at 10:18:48PM +0200, Tejun Heo wrote:
> Hmmm... this whole re-trapping thing is turning out to be more
> problematic than expected. Maybe it would be better to somehow notify
> ptracer directly from prepare_signal().

I need to think more about this but basically we just need to modify
exit_code, last_siginfo and then wake up the parent, all of which can
be done safely while holding child's siglock. I'll play with it.

Thanks.

--
tejun

2011-05-12 10:23:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

Hello, Oleg.

On Wed, May 11, 2011 at 09:53:33PM +0200, Tejun Heo wrote:
> > As for ptrace_check_attach(), it can simply do wait_event(), we
> > only need to verify the caller is the tracer. No need to play with
> > lock/unlock/retry.
> >
> > What do you think?
>
> Hmmm... interesting. Yeah, the state is visible only through wait(2)
> and ptrace(2) and for wait(2) TRAPPING is as good as STOPPED/TRACED
> and we can wait all we want in ptrace_check_attach(). I'll think more
> about it but seems like a nice idea.

Eh, problem. Please consider the following scenario.

* A task is in TASK_STOPPED. current->exit_code contains zero.

* Tracer seizes the task which triggers TRAPPING.

* Tracer wait(2)s which sees TRAPPING but tracee->exit_code is still
zero.

I think retrying and ensuring that tracee is in the expected state
after going through the regular trapping procedure is better way to
handle this. Bypassing wait(2) seems smart and simple but it entails
much more subtleties than simple stupid waiting/retrying.

Thanks.

--
tejun

2011-05-12 10:24:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer

On Wed, May 11, 2011 at 10:21:22PM +0200, Tejun Heo wrote:
> On Wed, May 11, 2011 at 10:18:48PM +0200, Tejun Heo wrote:
> > Hmmm... this whole re-trapping thing is turning out to be more
> > problematic than expected. Maybe it would be better to somehow notify
> > ptracer directly from prepare_signal().
>
> I need to think more about this but basically we just need to modify
> exit_code, last_siginfo and then wake up the parent, all of which can
> be done safely while holding child's siglock. I'll play with it.

This too involves much more subtleties than making tracee actually
retrap. I'll just make retrapping safe.

Thanks.

--
tejun

2011-05-12 15:41:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 08/11] ptrace: move fallback JOBCTL_TRAPPING clearing to get_signal_to_deliver()

On 05/11, Tejun Heo wrote:
>
> On Wed, May 11, 2011 at 05:48:34PM +0200, Oleg Nesterov wrote:
> > On 05/08, Tejun Heo wrote:
> > > Move the fallback clearing to the end of get_signal_to_deliver() so
> > > that TRAPPING is maintained while tracee is inside signal delivery
> > > path. When killed, tracee is guaranteed to leave signal delivery path
> > > in finite amount of time and thus TRAPPING is still guaranteed to be
> > > cleared on kill.
> >
> > Mostly yes, but we can race with freeze_processes() and deadlock.
>
> Ah... the try_to_freeze(). Nice spotting. It isn't necessarily a
> deadlock tho.

Yes, it is not "fatal" for the system.

> I
> think the right thing to do here is making the TRAPPING sleep an
> interruptible one and let the syscall restart logic deal with it.

Oh, yes, we can. But this adds more complications again. We can fix
this particular problem simpler. Say, freezer_do_not_count(). There
are other options afaics.

> * Make TRAPPING wait INTERRUPTIBLE.

I am starting to strongly dislike TRAPPING outside of do_signal_stop().

> * Move clearing of pending group stop and traps to the actions which
> require such clearing.

Not sure this is enough. OTOH, perhaps this is unnneeded.

I'll write another email to explain what I mean.

Oleg.

2011-05-12 15:44:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

On 05/11, Tejun Heo wrote:
>
> > Can't we push this patch ahead of these changes? I can merge it into
> > ptrace branch.
>
> It doesn't really fix the problem tho. The whole thing is full of
> holes

Hmm. Could you explain? (unless you mean ptrace holes)

> Anyways, yeah, sure. I'll resend it as a separate patch.

Thanks, I'll aply the patch you sent today.

Oleg.

2011-05-12 16:00:34

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

On 05/11, Tejun Heo wrote:
>
> On Wed, May 11, 2011 at 06:49:47PM +0200, Oleg Nesterov wrote:
> > On 05/08, Tejun Heo wrote:
> > > this patch moves TRAPPING wait from attach completion path to
> > > operations which are actually affected by the transition - wait(2) and
> > > following ptrace(2) requests.
> >
> > You know, I'd wish I could find the serious bugs in this patch. The
> > code becomes really hairy. -EAGAIN in do_wait() doesn't make it more
> > simple ;)
>
> I don't know. Why is retrying hairy? The whole waiting logic is
> built for clean retries. The suggested change just does it without
> intervening sleeping and waking up. I don't see anything particularly
> hairy there.

As always, this is subjective. But I didn't mean -EAGAIN itself. In fact
I was going to add this (simple) logic some time ago and kill the EXIT_DEAD
state. Hmm, and I'd still like to do this...

I meant the whole ptrace_wait_trapping() + lock dance + retry thing.
But of course I do not pretend my feeling is right.

Also. _Perhaps_ we can rethink the SIGCONT trapping, and perhaps in
this case do_wait() won't need any changes. May be.

> > > Both wait and ptrace paths are updated to retry the operation after
> > > TRAPPING wait. Note that wait_task_stopped() now always grabs siglock
> > > for ptrace waits. This can be avoided with "task_stopped_code() ->
> > > rmb() -> TRAPPING -> rmb() -> task_stopped_code()" sequence
> >
> > And so far I think this would be better, because it seems we can avoid
> > the retry logic.
>
> Well, the above memory barrier dance wouldn't really change whether
> retry logic is required or not and I'd _really_ like to avoid complex
> barrier dances.

Agreed, the barriers always complicate the understanding.

Oleg.

2011-05-12 16:03:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

Hello,

On Thu, May 12, 2011 at 05:42:47PM +0200, Oleg Nesterov wrote:
> On 05/11, Tejun Heo wrote:
> >
> > > Can't we push this patch ahead of these changes? I can merge it into
> > > ptrace branch.
> >
> > It doesn't really fix the problem tho. The whole thing is full of
> > holes
>
> Hmm. Could you explain? (unless you mean ptrace holes)

I meant other cases, RUNNING -> STOPPED and EXIT_* transitions.
Sleeping wait(2) is reliable without grabbing siglock thanks to
setting TASK_INTERRUPTIBLE on start and events waking up the waiter
after updating the state, so wait(2) is guaranteed to check the states
at least once after change actually has happened.

WNOHANG disables that mechanism. We probably can add another set of
memory barrier hackery to fix that, but that's gonna be ugly.

Thanks.

--
tejun

2011-05-12 16:08:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

On 05/12, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Wed, May 11, 2011 at 09:53:33PM +0200, Tejun Heo wrote:
> > > As for ptrace_check_attach(), it can simply do wait_event(), we
> > > only need to verify the caller is the tracer. No need to play with
> > > lock/unlock/retry.
> > >
> > > What do you think?
> >
> > Hmmm... interesting. Yeah, the state is visible only through wait(2)
> > and ptrace(2) and for wait(2) TRAPPING is as good as STOPPED/TRACED
> > and we can wait all we want in ptrace_check_attach(). I'll think more
> > about it but seems like a nice idea.
>
> Eh, problem. Please consider the following scenario.
>
> * A task is in TASK_STOPPED. current->exit_code contains zero.
>
> * Tracer seizes the task which triggers TRAPPING.
>
> * Tracer wait(2)s which sees TRAPPING but tracee->exit_code is still
> zero.

At first glance, this is easy to fix. do_signal_stop() can set
->exit_code unconditionally as it did before.

> I think retrying and ensuring that tracee is in the expected state
> after going through the regular trapping procedure is better way to
> handle this. Bypassing wait(2) seems smart and simple but it entails
> much more subtleties than simple stupid waiting/retrying.

Agreed. But this wait_event(!TRAPPING) logic has a lot of subtleties
too.

Oleg.

2011-05-12 16:07:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

Hey,

On Thu, May 12, 2011 at 05:59:10PM +0200, Oleg Nesterov wrote:
> > I don't know. Why is retrying hairy? The whole waiting logic is
> > built for clean retries. The suggested change just does it without
> > intervening sleeping and waking up. I don't see anything particularly
> > hairy there.
>
> As always, this is subjective. But I didn't mean -EAGAIN itself. In fact
> I was going to add this (simple) logic some time ago and kill the EXIT_DEAD
> state. Hmm, and I'd still like to do this...

Oh, I've been updating code and it's now using interruptible sleep +
-ERESTARTNOINTR return. It's simpler and the freezer problem is gone
too.

> I meant the whole ptrace_wait_trapping() + lock dance + retry thing.
> But of course I do not pretend my feeling is right.

Yeah, it's more complex than I would have liked. The problem is that
ptrace(2) isn't really equipped with facilities to handle async
events, so it's a bit painful to add async mechanism into the existing
mechanics, but most problems seem solvable.

> Also. _Perhaps_ we can rethink the SIGCONT trapping, and perhaps in
> this case do_wait() won't need any changes. May be.

But, if there's a better way, sure.

Thanks.

--
tejun

2011-05-12 16:49:38

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 06/11] ptrace: make group stop state visible via PTRACE_GETSIGINFO

On 05/11, Tejun Heo wrote:
>
> On Tue, May 10, 2011 at 06:55:45PM +0200, Oleg Nesterov wrote:
> > IOW, if the tracee reports via ptrace_notify*, the tracee can look at
> > si_pt_flags == stop-in-effect. If the tracer reports a signal, the
> > tracer obviously lacks this info, hmm.
>
> Which indicates tracee is in group stop trap.

What do you mean?

si_pt_flags doesn't "exist" when the tracee reports the signal or
CLD_STOPPED. This doesn't look clean.

> > Probably I need more time to get used to this... But at first glance
> > this looks a bit unnatural. Say, can't we simply implement
> > PTRACE_GET_GROUP_STOP_STATUS request which returns this (and probably
> > more) info?
>
> I don't know. PTRACE_GETSIGINFO seemed to already fit the bill and I
> want to avoid introducing a new request if at all possible. It sure
> is a bit quirky but doesn't compromisea functionality.

I am not sure too, but the new request is much simpler to use, and it
is more extensible. We can report more info. Say, the state of
JOBCTL_STOP_CONSUME or something else.

> > > if (!ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si)) {
> > > if (si.si_code) {
> > > stopped = !!si.si_status;
> >
> > In this case this "si_code != 0" check is correct, but how can the
> > tracer detect this case in general?
>
> This was quick hack. Proper test would look like,
>
> si.si_code && (si.si_pt_flags & PTRACE_SI_STOPPED)

This doesn't look right too? How can we know we can trust si_pt_flags?
This needs some YES_YOU_CAN_CHECK_si_pt_flags(si_code), but I can't
understand what it should do right now...

> > > + /* report whether group stop is in effect w/ SI_STOPPED */
> > > + if (sig->group_stop_count || (sig->flags & SIGNAL_STOP_STOPPED))
> >
> > We have more and more "group_stop_count || SIGNAL_STOP_STOPPED" checks,
> > perhaps we should make a helper. Or at least invent the short name to
> > denote the group-stopped-or-in-progress to simplify the discussions ;)
>
> Yeah, how about group_stop_in_effect()?

Or may me signal_stop_stopped(struct signal_struct *sig), like
signal_group_exit/SIGNAL_GROUP_EXIT. But I am fine with
group_stop_in_effect, probably it is more explanatorily.

> > Still, this is strange. With this change ptrace_getsiginfo() reports
> > the extra "volatile" info which wasn't reported by the tracee itself.
> > If the tracer does PTRACE_SETSIGINFO twice in a row, it can see the
> > different si_pt_flags's.
>
> (answering to both get/setsiginfo concerns)
>
> * I think we better block PTRACE_SETSIGINFO for non signal delivery
> traps. It doesn't make any sense. Let's just fail that with
> -EINVAL if PT_SEIZED.

Oh I agree, it does not make any sense. Should we change the current
behaviour for PT_SEIZED? I don't really care, this looks minor.

> * I don't think PTRACE_GETSIGINFO returning volatile information to be
> problematic. The information is generated on the fly on trap
> anyway.

Yes. And I'd understand if si_pt_flags was filled by the tracee
during the trap (although I do not think this makes sense) to record
the state at the time of this trap.

But PTRACE_GETSIGINFO returns the dynamic info which reflects the
process-wide state at the time of syscall.

Oleg.

2011-05-12 17:08:00

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

On 05/11, Tejun Heo wrote:
>
> On Tue, May 10, 2011 at 08:08:11PM +0200, Oleg Nesterov wrote:
> >
> > Hmm. Suddenly I got lost. Perhaps instead JOBCTL_TRAP_INTERRUPT should
> > be cleared on any trap too, like SEIZE.
>
> I don't think that's a good idea especially because there are
> functionality differences among different traps. ie. group stop and
> interrupt traps support re-trapping on job control events while other
> traps don't, so there will be cases where the debugger wants to put
> tracee specifically into INTERRUPT trap. It's just cleaner to use and
> say that if you ask for INTERRUPT, you get an INTERRUPT.

Hmm. This is not clear to me... OK, I'll read other emails first.

> > Another special (and nasty!) case is PTRACE_TRACEME. I do not know
> > how often it is used, but probabaly it is important enough. At least,
> > iirc, it is used by strace. Probably we need PTRACE_SEIZEME as well.
>
> I don't agree. PTRACE_TRACEME predates PTRACE_ATTACH and is
> completely redundant. If you can make the child do PTRACE_TRACEME,
> you might as well just make it do pause() and PTRACE_SEIZE yourself,
> so unless there's something PTRACE_SEIZE can't do, I don't think I'll
> be adding SEIZEME.

Heh. I think that you are very right technically and I thought the
same. That is why I never mentioned PTRACE_TRACEME before. In fact
I never understood why PTRACE_TRACEME exists.

However. Perhaps this is wrong from the practical pov. SEIZEME can
simplify the conversion of the existing code. People are lazy, but
we need the users of PTRACE_SEIZE.

Anyway. SEIZEME is absolutely trivial. We can add it at any moment,
right now this is almost offtopic.

Oleg.

2011-05-12 17:16:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 06/11] ptrace: make group stop state visible via PTRACE_GETSIGINFO

Hello,

On Thu, May 12, 2011 at 06:47:45PM +0200, Oleg Nesterov wrote:
> On 05/11, Tejun Heo wrote:
> >
> > On Tue, May 10, 2011 at 06:55:45PM +0200, Oleg Nesterov wrote:
> > > IOW, if the tracee reports via ptrace_notify*, the tracee can look at
> > > si_pt_flags == stop-in-effect. If the tracer reports a signal, the
> > > tracer obviously lacks this info, hmm.
> >
> > Which indicates tracee is in group stop trap.
>
> What do you mean?

I meant that whether group stop is in effect or not can be determined
reliably in most cases. The only exception is signal delivery trap as
siginfo there isn't from trap but I think we can survive that.

> si_pt_flags doesn't "exist" when the tracee reports the signal or
> CLD_STOPPED. This doesn't look clean.

Sure, it ain't clean. Fully agreed.

> > I don't know. PTRACE_GETSIGINFO seemed to already fit the bill and I
> > want to avoid introducing a new request if at all possible. It sure
> > is a bit quirky but doesn't compromisea functionality.
>
> I am not sure too, but the new request is much simpler to use, and it
> is more extensible. We can report more info. Say, the state of
> JOBCTL_STOP_CONSUME or something else.

Functionality-wise, I don't think GETSIGINFO is worse. We can add
more flags and fields there too. It sure is cumbersome to use because
the information isn't there for signal delivery and group stop traps;
however, both aren't critical. Debugger can simply force INTERRUPT
trap and get it there.

> > This was quick hack. Proper test would look like,
> >
> > si.si_code && (si.si_pt_flags & PTRACE_SI_STOPPED)
>
> This doesn't look right too? How can we know we can trust si_pt_flags?
> This needs some YES_YOU_CAN_CHECK_si_pt_flags(si_code), but I can't
> understand what it should do right now...

Ah, okay, you were asking how to tell si has the SIGTRAP info. IIUC,
si_code is non-zero iff the information is generated from kernel so
testing lower bits against SIGTRAP should be good enough. It can't
happen by user generated signals. The only way to side step that
would be using PTRACE_SETSIGINFO - but that's the debugger shooting
its own foot. I don't feel particularly sympathetic.

> > > We have more and more "group_stop_count || SIGNAL_STOP_STOPPED" checks,
> > > perhaps we should make a helper. Or at least invent the short name to
> > > denote the group-stopped-or-in-progress to simplify the discussions ;)
> >
> > Yeah, how about group_stop_in_effect()?
>
> Or may me signal_stop_stopped(struct signal_struct *sig), like
> signal_group_exit/SIGNAL_GROUP_EXIT. But I am fine with
> group_stop_in_effect, probably it is more explanatorily.

Given that stop might be in progress (not complete yet), I think a
name which clearly notes that is preferable. signal_stpo_stopped()
sounds like it's already stopped.

> > * I think we better block PTRACE_SETSIGINFO for non signal delivery
> > traps. It doesn't make any sense. Let's just fail that with
> > -EINVAL if PT_SEIZED.
>
> Oh I agree, it does not make any sense. Should we change the current
> behaviour for PT_SEIZED? I don't really care, this looks minor.

Hmmm... I don't wanna change PTRACE_ATTACH behavior unless necessary.
The pt_flags part isn't even reported if ATTACHed.

> > * I don't think PTRACE_GETSIGINFO returning volatile information to be
> > problematic. The information is generated on the fly on trap
> > anyway.
>
> Yes. And I'd understand if si_pt_flags was filled by the tracee
> during the trap (although I do not think this makes sense) to record
> the state at the time of this trap.
>
> But PTRACE_GETSIGINFO returns the dynamic info which reflects the
> process-wide state at the time of syscall.

Hmmm... Well, if we're gonna introduce a new request, it's gonna have
to be able to replace PTRACE_GETSIGINFO. I don't wanna ask userland
to make two ptrace requests on each trap to tell what's going on.

Another point is that most programs would have to support pre-SEIZE
behavior as well so I'd like to avoid deviating the interface too much
if possible.

If you combine the two, we basically end up something which carries
both siginfo and something else at one go. Its semantics can
definitely be made prettier, I agree, but I'm not sure whether the
benefits would be enough to justify a new request. Maybe it would be.

If you can propose something sane, it would be awesome.

Thanks.

--
tejun

2011-05-12 17:21:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT

Hello, again.

On Thu, May 12, 2011 at 07:06:17PM +0200, Oleg Nesterov wrote:
> > I don't agree. PTRACE_TRACEME predates PTRACE_ATTACH and is
> > completely redundant. If you can make the child do PTRACE_TRACEME,
> > you might as well just make it do pause() and PTRACE_SEIZE yourself,
> > so unless there's something PTRACE_SEIZE can't do, I don't think I'll
> > be adding SEIZEME.
>
> Heh. I think that you are very right technically and I thought the
> same. That is why I never mentioned PTRACE_TRACEME before. In fact
> I never understood why PTRACE_TRACEME exists.
>
> However. Perhaps this is wrong from the practical pov. SEIZEME can
> simplify the conversion of the existing code. People are lazy, but
> we need the users of PTRACE_SEIZE.

Hmmm... given that the attaching part is somewhat isolated from the
rest of code. I'm hoping to get away with providing a sample code in
documentation. :-)

Thanks.

--
tejun

2011-05-12 17:26:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

On 05/12, Tejun Heo wrote:
>
> Hello,
>
> On Thu, May 12, 2011 at 05:42:47PM +0200, Oleg Nesterov wrote:
> > On 05/11, Tejun Heo wrote:
> > >
> > > > Can't we push this patch ahead of these changes? I can merge it into
> > > > ptrace branch.
> > >
> > > It doesn't really fix the problem tho. The whole thing is full of
> > > holes
> >
> > Hmm. Could you explain? (unless you mean ptrace holes)
>
> I meant other cases, RUNNING -> STOPPED and EXIT_* transitions.
> Sleeping wait(2) is reliable without grabbing siglock thanks to
> setting TASK_INTERRUPTIBLE on start and events waking up the waiter
> after updating the state, so wait(2) is guaranteed to check the states
> at least once after change actually has happened.
>
> WNOHANG disables that mechanism.

Yes, this is clear. WNOHANG can "race" with the transitions above.
But we do not care, this is like reading the word which can be
changed by another thread, no?

But this bug is different. Say, the parent does wait(WNOWAIT) and
gets CLD_STOPPED. After that it has all rights to assume that
wait(WNOHANG) must report either STOPPED or CONTINUED.

Oleg.

2011-05-12 17:32:34

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

Hello,

On Thu, May 12, 2011 at 07:25:06PM +0200, Oleg Nesterov wrote:
> > WNOHANG disables that mechanism.
>
> Yes, this is clear. WNOHANG can "race" with the transitions above.
> But we do not care, this is like reading the word which can be
> changed by another thread, no?
>
> But this bug is different. Say, the parent does wait(WNOWAIT) and
> gets CLD_STOPPED. After that it has all rights to assume that
> wait(WNOHANG) must report either STOPPED or CONTINUED.

They aren't that different. Please consider the following program.

#define PTRACE_SEIZE 0x4206
#define PTRACE_INTERRUPT 0x4207

#define PTRACE_SEIZE_DEVEL 0x80000000

static const struct timespec ts1ms = { .tv_nsec = 1000000 };

int main(int argc, char **argv)
{
pid_t child, control;

child = fork();
if (!child)
while (1)
pause();

kill(child, SIGSTOP);
waitid(P_PID, child, NULL, WSTOPPED | WNOWAIT);

control = fork();
if (!control) {
while (1) {
kill(child, SIGCONT);
nanosleep(&ts1ms, NULL);
kill(child, SIGSTOP);
nanosleep(&ts1ms, NULL);
}
}

while (1) {
siginfo_t si = {};

waitid(P_PID, child, &si,
WSTOPPED | WCONTINUED | WNOWAIT | WNOHANG);
if (!si.si_pid)
break;
}

kill(control, SIGKILL);
kill(child, SIGKILL);
return 0;
}

waitid(2) should always succeed as it's never consuming wait state,
but it does, with or without the patch. All transitions need to be
made water tight to remove the bug.

Thanks.

--
tejun

2011-05-12 17:33:27

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

On Thu, May 12, 2011 at 07:32:28PM +0200, Tejun Heo wrote:
> waitid(2) should always succeed as it's never consuming wait state,
> but it does, with or without the patch. All transitions need to be

I meant "but it does fail (0 in si.si_pid)".

> made water tight to remove the bug.

--
tejun

2011-05-12 18:21:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

On 05/12, Tejun Heo wrote:
>
> On Thu, May 12, 2011 at 05:59:10PM +0200, Oleg Nesterov wrote:
>
> > Also. _Perhaps_ we can rethink the SIGCONT trapping, and perhaps in
> > this case do_wait() won't need any changes. May be.
>
> But, if there's a better way, sure.

Unfortunately, I have no any particular ideas right now, and I doubt
I can invent something clean. But I'd like at least to try to think
a bit.


Now about GROUP_STOP_TRAPPING. To simplify the discussion, lets
forget about this series, please recall the previous
"ptrace: use GROUP_STOP_TRAPPING for PTRACE_DETACH too" change.

http://marc.info/?l=linux-kernel&m=130486589601593

In short, it changes __ptrace_unlink() to set _TRAPPING if needed,
and ptrace_attach() waits for !TRAPPING unconditionally.

Problem: TRAPPING can be set outside of do_signal_stop() paths, and
I think we should avoid this as much as possible.

(I am ignoring the problem this patch addresses temporary, I think
we can fix it a bit differently).

As we already discussed, this patch is not right, we have the problems
with KILL/CONT. The proposed solution is to clear TRAPPING on kill,
but I think this is not enough.

One particular example. Note that de_thread() waits ->notify_count == 0
in TASK_UNINTERRUPTIBLE. Btw, this is not good, we need TASK_KILLABLE,
but this doesn't matter in this discussion. The only imporant thing is
that it is practically impossible to make this path restartable.

Note also we have PT_TRACE_EXIT, the sub-thread stops even if killed by
the execing thread. (to clarify, this depends on /dev/random and should
be fixed, and in fact it is debatable whether it should stop, please
ignore).

Now suppose we are tracing 2 threads, T1 execs and kills T2, T2 reports
PTRACE_EVENT_EXIT. Now, if the tracer waits for !(T1 & TRAPPING), it will
wait forever.


Say, the thread group was stopped, the tracer PTRACE_CONT's T1, it calls
sys_execve() and reports the trap from syscall_trace_enter().

The tracer does ptrace(T1, DETACH) + ptrace(T1, SEIZE) and hangs forever.


Once again, this is only one example. coredump, vfork, probably something
else. In short: I think that TRAPPING-outside-of-do_signal_stop is the
can of worms.

Oleg.

2011-05-12 18:34:52

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

On 05/12, Tejun Heo wrote:
>
> int main(int argc, char **argv)
> {
> pid_t child, control;
>
> child = fork();
> if (!child)
> while (1)
> pause();
>
> kill(child, SIGSTOP);
> waitid(P_PID, child, NULL, WSTOPPED | WNOWAIT);
>
> control = fork();
> if (!control) {
> while (1) {
> kill(child, SIGCONT);
> nanosleep(&ts1ms, NULL);
> kill(child, SIGSTOP);
> nanosleep(&ts1ms, NULL);

Damn, you are right, I think.

At first glance, do_wait() does

wait_task_stopped();
wait_task_continued();

and the state can be changed CONTINIUED -> STOPPED in between, right?
Or something else?

Thanks, for the explanation.

Oleg.

2011-05-13 08:46:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

Hello,

On Thu, May 12, 2011 at 08:33:26PM +0200, Oleg Nesterov wrote:
> Damn, you are right, I think.

Yay, for once!

> At first glance, do_wait() does
>
> wait_task_stopped();
> wait_task_continued();
>
> and the state can be changed CONTINIUED -> STOPPED in between, right?
> Or something else?

Yeah and exit transitions too. There simply is no synchronization
there. We can probably solve it without acquiring siglock by adding
"clear this before making state transitions" flag followed by a mb()
and then making children clear it before transitions. Then, we can
retry WNOHANG wait if the flag is cleared after checking.

Thanks.

--
tejun

2011-05-13 09:13:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

Hello,

On Thu, May 12, 2011 at 08:20:06PM +0200, Oleg Nesterov wrote:
> Now about GROUP_STOP_TRAPPING. To simplify the discussion, lets
> forget about this series, please recall the previous
> "ptrace: use GROUP_STOP_TRAPPING for PTRACE_DETACH too" change.
>
> http://marc.info/?l=linux-kernel&m=130486589601593
>
> In short, it changes __ptrace_unlink() to set _TRAPPING if needed,
> and ptrace_attach() waits for !TRAPPING unconditionally.
>
> Problem: TRAPPING can be set outside of do_signal_stop() paths, and
> I think we should avoid this as much as possible.

For PTRACE_DETACH, this was true but it isn't true for notification
re-traps. Notification re-traps happen iff JOBCTL_TRAPPED (now
renamed to JOBCTL_ALLOW_NOTIFY) is set and tracee sets it only while
it's inside signal delivery path - more specifically, when
ptrace_stop() is called from do_signal_stop() or for INTERRUPT trap,
both of which are always inside get_signal_to_deliver().

> (I am ignoring the problem this patch addresses temporary, I think
> we can fix it a bit differently).

Just leaving it alone also is an option. We can just mandate
ptrace(2) users to always perform sleeping wait(2) after PTRACE_SEIZE.
If it can be fixed easily, sure. If not, let's leave it alone.

> Say, the thread group was stopped, the tracer PTRACE_CONT's T1, it calls
> sys_execve() and reports the trap from syscall_trace_enter().
>
> The tracer does ptrace(T1, DETACH) + ptrace(T1, SEIZE) and hangs forever.
>
> Once again, this is only one example. coredump, vfork, probably something
> else. In short: I think that TRAPPING-outside-of-do_signal_stop is the
> can of worms.

Yeap, fully agreed. We need something different for detach case;
however, for re-trap, tracee is known to be in signal delivery path
and should be okay, I think. Right?

Thanks.

--
tejun

2011-05-13 17:23:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

Hi,

On 05/13, Tejun Heo wrote:
>
> On Thu, May 12, 2011 at 08:33:26PM +0200, Oleg Nesterov wrote:
>
> > At first glance, do_wait() does
> >
> > wait_task_stopped();
> > wait_task_continued();
> >
> > and the state can be changed CONTINIUED -> STOPPED in between, right?
> > Or something else?
>
> Yeah and exit transitions too.

I am not sure... but probably this depends on definition.

We already checked ->exit_state != ZOMBIE, and we are holding tasklist.
The child can't exit. I mean, it can't change its ->exit_state.

However, SIGKILL can clear SIGNAL_STOP_STOPPED, and we can "miss" it.
But this looks correct, the child is no longer stopped but it is still
not dead. So I think in this case wait(WNOHANG | WEXITED | WSTOPPED)
can fail, notabug.

OTOH, perhaps SIGKILL should set SIGNAL_STOP_CONTINUED in this case?
And keep it if it was already set.

> There simply is no synchronization
> there. We can probably solve it without acquiring siglock by adding
> "clear this before making state transitions" flag followed by a mb()

perhaps even simpler if ->EXIT transition is fine.

Oleg.

2011-05-13 18:35:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 10/11] ptrace: move JOBCTL_TRAPPING wait to wait(2) and ptrace_check_attach()

On 05/13, Tejun Heo wrote:
>
> On Thu, May 12, 2011 at 08:20:06PM +0200, Oleg Nesterov wrote:
> > Now about GROUP_STOP_TRAPPING. To simplify the discussion, lets
> > forget about this series,
> >
> > ...
> >
> > Problem: TRAPPING can be set outside of do_signal_stop() paths, and
> > I think we should avoid this as much as possible.
>
> For PTRACE_DETACH, this was true but it isn't true for notification
> re-traps. Notification re-traps happen iff JOBCTL_TRAPPED

Yes I see, and task_clear_jobctl_trapping() was moved into
get_signal_to_deliver().

So, if we return to this series then "outside of do_signal_stop()" is
not accurate.

> > (I am ignoring the problem this patch addresses temporary, I think
> > we can fix it a bit differently).
>
> Just leaving it alone also is an option. We can just mandate
> ptrace(2) users to always perform sleeping wait(2) after PTRACE_SEIZE.
> If it can be fixed easily, sure. If not, let's leave it alone.

Let's leave it alone in this thread, unless I missed something this is
simple. Or I missed something and it is not that simple ;)

> > Say, the thread group was stopped, the tracer PTRACE_CONT's T1, it calls
> > sys_execve() and reports the trap from syscall_trace_enter().
> >
> > The tracer does ptrace(T1, DETACH) + ptrace(T1, SEIZE) and hangs forever.
> >
> > Once again, this is only one example. coredump, vfork, probably something
> > else. In short: I think that TRAPPING-outside-of-do_signal_stop is the
> > can of worms.
>
> Yeap, fully agreed. We need something different for detach case;
> however, for re-trap, tracee is known to be in signal delivery path
> and should be okay, I think. Right?

I think this is right, but there were so many issues, that is why I
suggested to ignore this series temporary. And, to be honest, because
I was a bit confused, somehow I thought you are going to add more
TRAPPING's.

So, we agree that we can only set TRAPPING iff we know what the tracee
should do "really soon". Great.

I have other concerns, but I noticed you have already sent V2. Let's
continue there.

Oleg.

2011-05-14 10:58:49

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/11] job control: reorganize wait_task_stopped()

Hello, Oleg.

On Fri, May 13, 2011 at 07:21:17PM +0200, Oleg Nesterov wrote:
> We already checked ->exit_state != ZOMBIE, and we are holding tasklist.
> The child can't exit. I mean, it can't change its ->exit_state.
>
> However, SIGKILL can clear SIGNAL_STOP_STOPPED, and we can "miss" it.
> But this looks correct, the child is no longer stopped but it is still
> not dead. So I think in this case wait(WNOHANG | WEXITED | WSTOPPED)
> can fail, notabug.

Hmmm... I don't know.

> OTOH, perhaps SIGKILL should set SIGNAL_STOP_CONTINUED in this case?
> And keep it if it was already set.

I'd rather avoid that. First of all, it's an extreme corner case and
I think introducing an extra state transition there is more likely to
cause trouble than helping anything. It might be theoretically
correct but there's no reason to introduce that at this point. If at
all possible, I think it would be better to make it either see STOPPED
or EXIT.

Thanks.

--
tejun

2011-05-15 14:04:42

by Jan Kratochvil

[permalink] [raw]
Subject: ptrace-testsuite status [Re: [PATCHSET ptrace] ptrace: implement PTRACE_SEIZE/INTERRUPT and group stop notification]

On Mon, 09 May 2011 00:27:41 +0200, Denys Vlasenko wrote:
> I don't know the status of ptrace test suite
> after I stopped working on strace (did the suite bit rot, or is it
> maintained and still relevant?). If it is still in use,
> I can adapt these tests and add them to it.

That is:
http://sourceware.org/systemtap/wiki/utrace/tests
cvs -d :pserver:anoncvs:[email protected]:/cvs/systemtap co ptrace-tests

The testsuite is actively in use but it has always been a regression testsuite
only and there are no known new regressions. So there is no need for
testsuite updates now.

With new ptrace features sure their testcases would be nice to include there.
Despite there has never been an attempt to cover existing ptrace functionality
by the testcases.


Thanks,
Jan

2011-05-15 14:05:20

by Jan Kratochvil

[permalink] [raw]
Subject: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]

On Sun, 08 May 2011 17:49:05 +0200, Tejun Heo wrote:
> which triggers INTERRUPT trap but is sticky until the next
> PTRACE_GETSIGINFO.

PTRACE_GETSIGINFO is a getter. It must not modify anything. Or there should
be some other way how to query the siginfo_t state fully transparently (*).

(*) But if there exists such syscall it probably does not make sense to modify
anything by PTRACE_GETSIGINFO.

Imagine various LD_PRELOAD tools which try to wrap system/library calls and
operate with ptrace while keeping it transparent for the original debugger.
(I have a bunch of such libraries for testing gdb/strace/etc. written there.)

Also complicated debuggers with internal OO hierarchy would need to just wrap
PTRACE_GETSIGINFO into an internal function to make it transparent for calls
not intending to modify the debuggee state.


Thanks,
Jan

2011-05-15 14:28:34

by Tejun Heo

[permalink] [raw]
Subject: Re: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]

Hello, Jan.

On Sun, May 15, 2011 at 04:02:32PM +0200, Jan Kratochvil wrote:
> Or there should be some other way how to query the siginfo_t state
> fully transparently (*).
>
> (*) But if there exists such syscall it probably does not make sense to modify
> anything by PTRACE_GETSIGINFO.

If there's a syscall which doesn't affect notification state, then we
need something which does clear it. Either way, I suppose you're
saying we need both something which clears the notification and
something which doesn't.

> Imagine various LD_PRELOAD tools which try to wrap system/library calls and
> operate with ptrace while keeping it transparent for the original debugger.
> (I have a bunch of such libraries for testing gdb/strace/etc. written there.)
>
> Also complicated debuggers with internal OO hierarchy would need to just wrap
> PTRACE_GETSIGINFO into an internal function to make it transparent for calls
> not intending to modify the debuggee state.

We can add a flag or new request for that but I don't know. Those are
pretty fringe use cases and they don't even strictly require such
feature - Even for LD_PRELOAD, it can simply keep scheduling INTERRUPT
until the application calls the wrapped GETSIGINFO when it detects the
stopped state has changed. It can be easily done from userland.

Thank you.

--
tejun

2011-05-15 14:40:40

by Jan Kratochvil

[permalink] [raw]
Subject: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

On Thu, 12 May 2011 19:32:28 +0200, Tejun Heo wrote:
[...]
> while (1) {
> siginfo_t si = {};
>
> waitid(P_PID, child, &si,
> WSTOPPED | WCONTINUED | WNOWAIT | WNOHANG);
> if (!si.si_pid)
> break;
> }
>
> kill(control, SIGKILL);
> kill(child, SIGKILL);
> return 0;
> }
>
> waitid(2) should always succeed as it's never consuming wait state,
> but it does, with or without the patch. All transitions need to be
> made water tight to remove the bug.

It may be not related, I do not inderstand the kernel internals being
discussed.

I was told by Roland McGrath that after SIGCHLD of a ptrace even from tracee
invokes sighandler for that SIGCHLD in the tracer then waitpid(WNOHANG) still
may return 0 as the signal is not yet ready. I was not able to reproduce it
with a testcase myself.

But if it is a case it should be fixed as there is no later notification when
to call waitpid(WNOHANG) again. And sure the debugger cannot busy-loop poll
it.


Thanks,
Jan

2011-05-15 15:56:31

by Jan Kratochvil

[permalink] [raw]
Subject: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

On Sun, 08 May 2011 17:48:56 +0200, Tejun Heo wrote:
> The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
> flags in @data.

> After PTRACE_SEIZE, tracee will trap.

PTRACE_SEIZE does not need to stop, there is that new PTRACE_INTERRUPT for it.
This is not an improvement.

It was already addressed by me before so I will give more reasons:
https://lkml.org/lkml/2011/3/1/309

GDB already has mode `set observer on' (in this case we are interested in its
part `set may-interrupt off') - see: $ info '(gdb)Observer Mode'
# If you want to [...] observe program behavior without any chance of
# disruption by GDB
This is an increasingly requested feature as one of the ways of monitoring.

There are also requests to handle applications using 10000+ threads, which
currently have problems with GDB. One can imagine a needless
waitpid+PTRACE_CONT is not a help.

There could be a new PTRACE_SEIZE_INTERRUPT option in @data so that
applications does not have to use two syscalls (PTRACE_SEIZE
+ PTRACE_INTERRUPT) if the applications really want to perform some operations
on the tracee requiring having it stopped after the attachment. (Personally
I do not think this single vs. double syscall difference is worth the new
flag.)


> Which trap will happen isn't fixed. If other trap conditions exist (signal
> delivery or group stop), they might be taken; otherwise, a trap with
> exit_code SIGTRAP | (PTRACE_EVENT_INTERRUPT << 8) is taken.

What if PTRACE_INTERRUPT is called by tracer only after the tracee has stopped
on a signal delivery? It should be ignored in such case - as the first signal
will not be PTRACE_EVENT_INTERRUPT. (Sorry if you have stated it somewhere.)


Thanks,
Jan

2011-05-15 16:10:21

by Jan Kratochvil

[permalink] [raw]
Subject: PTRACE_DETACH without stop [Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT]

On Wed, 11 May 2011 11:19:55 +0200, Tejun Heo wrote:
> On Tue, May 10, 2011 at 11:59:58PM +0200, Denys Vlasenko wrote:
> > Another note: even though PTRACE_INTERRUPT solves the problem that
> > PTRACE_DETACH of a running tracee was butt-ugly thing to do correctly,
> > the "new" way is still a bit ugly: tracer needs PTRACE_INTERRUPT,
> > waitpid, and only then PTRACE_DETACH. Why not go all the way
> > and make PTRACE_DETACH work on running tracee too?
>
> I don't think I'll change that. It's only three syscall sequence -
> INTERRUPT, wait(STOPPED) and DETACH which will always work reliably
> (unless tracee gets killed or something).

I do not think this change is much related to this patchset.

But having to PTRACE_INTERRUPT the tracee before PTRACE_DETACH has no
advantage, it is just a performance (see transparent tracking of 10000+ thread
https://lkml.org/lkml/2011/5/15/115
) problem and also getting it correct. As when one wait()s and gets
WIFSTOPPED one needs to respawn to signal otherwise the signal gets lost on
PTRACE_ATTACH. How to respawn it? By PTRACE_INTERRUPT with DATA==signal?
Or PTRACE_CONT with DATA==signal? With rapid signalling of the tracee the
debugger may never have a chance to correctly quit. Handling other cases
transparently for the original parent also may not be fully clear.

It would be nice to write documentation already while discussing this patch,
I do not know if PTRACE_INTERRUPT respects DATA etc., it may show ptrace is
still tricky.



Thanks,
Jan

2011-05-15 16:26:36

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hello, Jan.

On Sun, May 15, 2011 at 05:56:02PM +0200, Jan Kratochvil wrote:
> It was already addressed by me before so I will give more reasons:
> https://lkml.org/lkml/2011/3/1/309

Hmmm... I don't think the argument given in the message is valid. The
new trap doesn't have any side effect (other than possible syscall
restart (or -EINTR failure) and performance overhead) so there
wouldn't be any problem from correctness POV and the code to SEIZE and
establish initial state would be simple.

> GDB already has mode `set observer on' (in this case we are interested in its
> part `set may-interrupt off') - see: $ info '(gdb)Observer Mode'
> # If you want to [...] observe program behavior without any chance of
> # disruption by GDB
> This is an increasingly requested feature as one of the ways of monitoring.

Hmmm... transparency. Other than signal_wakeup (syscall restart /
-EINTR), there is no side effect of SEIZE - from transparency POV,
this should be considered fully transparent. signal_wakeup can happen
anytime even without signals. It's an integral part of the
programming interface. I don't know. Maybe.

> There are also requests to handle applications using 10000+ threads, which
> currently have problems with GDB. One can imagine a needless
> waitpid+PTRACE_CONT is not a help.

Yeap, performance could be a legitimate concern but then again I'm not
sure here either - the whole thing is quite slow and not trapping on
SEIZE might not amount to all that much. Again, maybe.

How long does it take to attach to / detach from 10000+ threads? If
you don't do it serially, it shouldn't take that long.

> > Which trap will happen isn't fixed. If other trap conditions exist (signal
> > delivery or group stop), they might be taken; otherwise, a trap with
> > exit_code SIGTRAP | (PTRACE_EVENT_INTERRUPT << 8) is taken.
>
> What if PTRACE_INTERRUPT is called by tracer only after the tracee has stopped
> on a signal delivery? It should be ignored in such case - as the first signal
> will not be PTRACE_EVENT_INTERRUPT. (Sorry if you have stated it somewhere.)

I think you're somewhat confused here. Well, ptrace manpage is pretty
confused too. Those are different traps. You can tell them apart
from userland and it doesn't matter which order or how many times
INTERRUPT occurs.

Thanks.

--
tejun

2011-05-15 16:35:29

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_DETACH without stop [Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT]

Hey, again.

On Sun, May 15, 2011 at 06:10:00PM +0200, Jan Kratochvil wrote:
> I do not think this change is much related to this patchset.
>
> But having to PTRACE_INTERRUPT the tracee before PTRACE_DETACH has no
> advantage, it is just a performance (see transparent tracking of 10000+ thread
> https://lkml.org/lkml/2011/5/15/115
> ) problem and also getting it correct. As when one wait()s and gets
> WIFSTOPPED one needs to respawn to signal otherwise the signal gets lost on
> PTRACE_ATTACH. How to respawn it? By PTRACE_INTERRUPT with DATA==signal?
> Or PTRACE_CONT with DATA==signal? With rapid signalling of the tracee the
> debugger may never have a chance to correctly quit. Handling other cases
> transparently for the original parent also may not be fully clear.
>
> It would be nice to write documentation already while discussing this patch,
> I do not know if PTRACE_INTERRUPT respects DATA etc., it may show ptrace is
> still tricky.

Hmmm... you seem to be a bit confused about different ptrace traps
(again, a lot of this probably stems from incorrect manpage). They
don't interact with each other that way and there is no correctness
issue (if there is, it's a but and should be fixed) of
INTERRUPT+DETACH compared to DETACH without interrupt. The only
difference is performance, so let's concentrate on that.

The reason why I'm reluctant to drop trapped requirement from both
SEIZE and DETACH is that removing those synchronization points opens
up a lot of corner cases. I'm they can all be dealt with but it's
gonna be measurably more complex and possibly fragile. Given that the
current implementation mandates traps on both ATTACH and DETACH, I'd
like to leave that part alone unless there are pretty strong
arguments, and, sure, performance might as well be strong enough.

But, I'd really like to know how much it actually costs to attach and
detach a lot of tasks before deciding anything.

Thank you.

--
tejun

2011-05-15 16:47:11

by Tejun Heo

[permalink] [raw]
Subject: Re: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

Hello, Jan.

On Sun, May 15, 2011 at 04:40:17PM +0200, Jan Kratochvil wrote:
> It may be not related, I do not inderstand the kernel internals being
> discussed.
>
> I was told by Roland McGrath that after SIGCHLD of a ptrace even from tracee
> invokes sighandler for that SIGCHLD in the tracer then waitpid(WNOHANG) still
> may return 0 as the signal is not yet ready. I was not able to reproduce it
> with a testcase myself.

Hmmm... I could easily be wrong but AFAICS that shouldn't happen. If
you can reproduce the problem, please let us know.

> But if it is a case it should be fixed as there is no later notification when
> to call waitpid(WNOHANG) again. And sure the debugger cannot busy-loop poll
> it.

But, the current WNOHANG wait is racy. It's unlikely but definitely
possible for WNOHANG to fail when it's expected to succeed (not the
above case but more convoluted ones).

But, just out of curiosity, is there any reason the ptracer itself
should be doing something other than waitpid() while tracee is
running? It's not like ptrace requests can be issued during that time
and sleeping waitpid() is way saner mechanism to wait for tracee
events than signal.

Thanks.

--
tejun

2011-05-15 17:01:45

by Tejun Heo

[permalink] [raw]
Subject: Re: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

Hello,

BTW, I'm pretty close to send the second take and the INTERRUPT trap
has changed quite a bit. It got renamed to PTRACE_EVENT_STOP and now
acts as both group stop trap and INTERRUPT trap. IOW,
PTRACE_INTERRUPT makes tracee go park at group stop trap site. The
whole thing is much prettier this way. Well, being ptrace, it's not
really pretty but at least it is much less disgusting.

Thanks.

--
tejun

2011-05-15 17:15:41

by Jan Kratochvil

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

On Sun, 15 May 2011 18:26:30 +0200, Tejun Heo wrote:
> the code to SEIZE and establish initial state would be simple.

In normal case yes; but one needs to handle all the corner cases when the
first signal is not INTERRUPT; which one usually does not handle as during
development (=in normal cases) it is always INTERRUPT.

Such thing is even difficult to test in QA testcases as in some cases one just
cannot reproduce the (in current case) non-SIGSTOP signal arriving as first
one.


> How long does it take to attach to / detach from 10000+ threads? If
> you don't do it serially, it shouldn't take that long.

It is not (such) a problem it takes time. It is a problem it stops the tracee
for a moment which completely changes the tracee's racy behavior one tries to
debug.


> You can tell them apart from userland and it doesn't matter which order or
> how many times INTERRUPT occurs.

I must know in which order they come to know when the tracee is still stopped
and I collect the signals to be displayed to the user and at which moment
there are no more signals in the queue and I start waiting on the debuggee
which started running.

Otherwise I can workaround it by various waitpid(NOHANG)s but it is better if
the ordering and when INTERRUPT is / is not reported is well defined.


Thanks,
Jan

2011-05-15 17:18:06

by Jan Kratochvil

[permalink] [raw]
Subject: Re: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]

Hi Tejun,

just a general point for all the mails:

On Sun, 15 May 2011 16:28:27 +0200, Tejun Heo wrote:
> Even for LD_PRELOAD, it can simply keep scheduling INTERRUPT
> until the application calls the wrapped GETSIGINFO when it detects the
> stopped state has changed. It can be easily done from userland.

I thought the goal is to simplify the interface, not making it even more
complicated.


Thanks,
Jan

2011-05-15 17:25:12

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hello,

On Sun, May 15, 2011 at 07:15:12PM +0200, Jan Kratochvil wrote:
> On Sun, 15 May 2011 18:26:30 +0200, Tejun Heo wrote:
> > the code to SEIZE and establish initial state would be simple.
>
> In normal case yes; but one needs to handle all the corner cases when the
> first signal is not INTERRUPT; which one usually does not handle as during
> development (=in normal cases) it is always INTERRUPT.
>
> Such thing is even difficult to test in QA testcases as in some cases one just
> cannot reproduce the (in current case) non-SIGSTOP signal arriving as first
> one.

Understood. AFAICS, there will be two different initial traps
(they're not signals!) which can happen - signal delivery trap and
INTERRUPT (renamed to STOP) trap. We should be able to exclude signal
delivery trap from happening as the first trap. Oleg, can you think
of other traps which could happen right after SEIZE?

Maybe this is best solved with a test case which can reliably trigger
different initial traps sites?

> > How long does it take to attach to / detach from 10000+ threads? If
> > you don't do it serially, it shouldn't take that long.
>
> It is not (such) a problem it takes time. It is a problem it stops the tracee
> for a moment which completely changes the tracee's racy behavior one tries to
> debug.

Okay, timing artifacts. Yes, it is a valid point but I'm still not
sure. Let's think about it.

> > You can tell them apart from userland and it doesn't matter which order or
> > how many times INTERRUPT occurs.
>
> I must know in which order they come to know when the tracee is still stopped
> and I collect the signals to be displayed to the user and at which moment
> there are no more signals in the queue and I start waiting on the debuggee
> which started running.
>
> Otherwise I can workaround it by various waitpid(NOHANG)s but it is better if
> the ordering and when INTERRUPT is / is not reported is well defined.

Hmmm... you should be able to tell that without resorting to WNOHANG
or depending on order of traps. That's the goal anyway. I'm a bit
confused tho. What do you mean by "the tracee is still stopped"?
Tracee is always stopped (or rather trapped) after reporting a trap.

Thanks.

--
tejun

2011-05-15 17:28:17

by Tejun Heo

[permalink] [raw]
Subject: Re: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]

Hello,

On Sun, May 15, 2011 at 07:17:48PM +0200, Jan Kratochvil wrote:
> just a general point for all the mails:
>
> On Sun, 15 May 2011 16:28:27 +0200, Tejun Heo wrote:
> > Even for LD_PRELOAD, it can simply keep scheduling INTERRUPT
> > until the application calls the wrapped GETSIGINFO when it detects the
> > stopped state has changed. It can be easily done from userland.
>
> I thought the goal is to simplify the interface, not making it even more
> complicated.

It's a balancing act. The primary goal is to make it _functional_ as
the current ptrace is outright broken in many places. The second, at
least for me, is not deviating from the current behavior too much
unless required by the first goal or not doing so is extremely silly.

It would be nice to have a concise pretty interface but we already
have this ugly thing which works half ways and it's not like the
current users can drop support for old kernels quickly, so I don't see
a lot of benefit in making changes for prettiness, but please note
that I'm not trying to make it any more complicated. It won't be
pretty but won't be more complicated than now either.

After all, this is a pretty low level API which only a handful are
gonna use. If it involves some ugliness to use it, as long as no
functionality is compromised, I would go that way than introducing
larger behavior difference from the current one.

That said, give me strong enough reasons, I'll be happy to change the
behavior.

And, thanks a lot for your comments. I really appreciate them.

--
tejun

2011-05-15 17:40:07

by Jan Kratochvil

[permalink] [raw]
Subject: Re: PTRACE_DETACH without stop [Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT]

Hi Tejun,

On Sun, 15 May 2011 18:35:23 +0200, Tejun Heo wrote:
> On Sun, May 15, 2011 at 06:10:00PM +0200, Jan Kratochvil wrote:
> > But having to PTRACE_INTERRUPT the tracee before PTRACE_DETACH has no
> > advantage, it is just a performance (see transparent tracking of 10000+ thread
> > https://lkml.org/lkml/2011/5/15/115
> > ) problem and also getting it correct. As when one wait()s and gets
> > WIFSTOPPED one needs to respawn to signal otherwise the signal gets lost on
> > PTRACE_ATTACH. How to respawn it? By PTRACE_INTERRUPT with DATA==signal?
> > Or PTRACE_CONT with DATA==signal? With rapid signalling of the tracee the
> > debugger may never have a chance to correctly quit. Handling other cases
> > transparently for the original parent also may not be fully clear.
[...]
> Hmmm... you seem to be a bit confused about different ptrace traps
> (again, a lot of this probably stems from incorrect manpage). They
> don't interact with each other that way and there is no correctness
> issue (if there is, it's a but and should be fixed) of
> INTERRUPT+DETACH compared to DETACH without interrupt. The only
> difference is performance, so let's concentrate on that.

Here is a naive approach to PTRACE_DETACH with current kernels requiring
inferior to be stopped. You can see it FAILs. Still I think it is offtopic
problem for this specific patchset but it still should be fixed in ptrace:

#include <unistd.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <stdio.h>
#include <stdlib.h>
static int status;
static pid_t child;
void
handler (int signo)
{
puts ("PASS");
exit (0);
}
/* Ensure CHILD is stopped even if it is running now - for PTRACE_DETACH. */
void
stop_child_for_detach (void)
{
kill (child, SIGCONT); /* To be PTRACE_INTERRUPT in the future. */
/* The problem - here a signal may get lost. */
wait (&status);
}
int
main (void)
{
child = fork ();
switch (child)
{
case 0:
signal (SIGUSR2, handler);
ptrace (PTRACE_TRACEME, 0, NULL, NULL);
raise (SIGUSR1);
puts ("FAIL");
exit (1);
default:
wait (&status);
/* Sent by arbitrary external program. */
kill (child, SIGUSR2);
/* Comment out for PTRACE_DETACH not requiring stopped tracee. */
#if 1
ptrace (PTRACE_CONT, child, NULL, NULL);
stop_child_for_detach ();
#endif
ptrace (PTRACE_DETACH, child, NULL, NULL);
}
return 0;
}


> The reason why I'm reluctant to drop trapped requirement from both
> SEIZE and DETACH is that removing those synchronization points opens
> up a lot of corner cases.

They are either (currently) in the userland or they would be in the kernel.


Thanks,
Jan

2011-05-15 17:47:41

by Jan Kratochvil

[permalink] [raw]
Subject: Re: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

Hi Tejun,

On Sun, 15 May 2011 18:47:05 +0200, Tejun Heo wrote:
> Hmmm... I could easily be wrong but AFAICS that shouldn't happen. If
> you can reproduce the problem, please let us know.

I was unable to reproduce it before so OK, I will follow now your statement.


> > But if it is a case it should be fixed as there is no later notification when
> > to call waitpid(WNOHANG) again. And sure the debugger cannot busy-loop poll
> > it.
>
> But, the current WNOHANG wait is racy. It's unlikely but definitely
> possible for WNOHANG to fail when it's expected to succeed (not the
> above case but more convoluted ones).

OK, so FYI it breaks current GDB.


> But, just out of curiosity, is there any reason the ptracer itself
> should be doing something other than waitpid() while tracee is
> running? It's not like ptrace requests can be issued during that time
> and sleeping waitpid() is way saner mechanism to wait for tracee
> events than signal.

If the debugger wants to be single-threaded ("poll() model", not "threads
model") and it wants to communicate with user and examine debuggee symbols and
memory data it cannot use sleeping wait. GDB is single-threaded and it
supports `set target-async 1': info '(gdb)Background Execution'


Thanks,
Jan

2011-05-15 19:49:03

by Jan Kratochvil

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hi Tejun,

On Sun, 15 May 2011 19:25:05 +0200, Tejun Heo wrote:
> On Sun, May 15, 2011 at 07:15:12PM +0200, Jan Kratochvil wrote:
> > On Sun, 15 May 2011 18:26:30 +0200, Tejun Heo wrote:
> > > the code to SEIZE and establish initial state would be simple.
> >
> > In normal case yes; but one needs to handle all the corner cases when the
> > first signal is not INTERRUPT; which one usually does not handle as during
> > development (=in normal cases) it is always INTERRUPT.
[...]
> Maybe this is best solved with a test case which can reliably trigger
> different initial traps sites?

How to trigger it reliably? One can just try it in a loop but it takes minutes
and depends on hardware specifics making it I guess even unreproducible in
various configurations. We were allocating various machines for hours in the
farm but it may be unreproducible anyway.

http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/attach-into-signal.c?cvsroot=systemtap
// Even DEFAULT_LOOPS of 400 is not enough to catch it reliably.
// With TESTTIME=60 or more it should be close to 100%,
// but takes long time (~10 minutes).
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/attach-sigcont-wait.c?cvsroot=systemtap
/* Failure occurs either immediately or in about 20 runs.
But sometimes not. */
etc.


> > > You can tell them apart from userland and it doesn't matter which order or
> > > how many times INTERRUPT occurs.
> >
> > I must know in which order they come to know when the tracee is still stopped
> > and I collect the signals to be displayed to the user and at which moment
> > there are no more signals in the queue and I start waiting on the debuggee
> > which started running.
> >
> > Otherwise I can workaround it by various waitpid(NOHANG)s but it is better if
> > the ordering and when INTERRUPT is / is not reported is well defined.
>
> Hmmm... you should be able to tell that without resorting to WNOHANG
> or depending on order of traps. That's the goal anyway. I'm a bit
> confused tho. What do you mean by "the tracee is still stopped"?
> Tracee is always stopped (or rather trapped) after reporting a trap.

When debugging races in multithreaded applications a thread may get multiple
signals at once. GDB in the default all-stop mode (the other is non-stop mode)
stops all the other threads when it sees the first event on some thread.

#include <pthread.h>
#include <assert.h>
#include <asm/unistd.h>
#include <unistd.h>
#include <signal.h>
#define tkill(tid, sig) syscall (__NR_tkill, (tid), (sig))
#define gettid() syscall (__NR_gettid)
static volatile pid_t tid;
static void *
start (void *arg)
{
int i = (intptr_t) arg;
while (!tid);
sleep (1);
tkill (tid, i & 1 ? SIGUSR1 : SIGUSR2);
pause ();
return arg;
}
int main (void)
{
pthread_t thread;
int i;
for (i = 0; i < 10; i++)
pthread_create (&thread, NULL, start, (void *) (intptr_t) i);
tid = gettid (); /* line 25 */
sleep (1);
return pause ();
}

gdb -nx ./threadsigs -ex 'tb 25' -ex r -ex 'set debug lin-lwp 1' -ex c
GNU gdb (GDB) 7.3.50.20110514-cvs
This GDB was configured as "x86_64-unknown-linux-gnu".
Program received signal SIGUSR1, User defined signal 1.
(gdb) info threads
Id Target Id Frame
11 Thread 0x7ffff3019700 (LWP 31784) "threadsigs" 0x00007ffff7bcecfd in pause () at ../sysdeps/unix/syscall-template.S:82
... [ everything in pause () ]
2 Thread 0x7ffff7822700 (LWP 31775) "threadsigs" 0x00007ffff7bcecfd in pause () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 0x7ffff7fe6720 (LWP 31772) "threadsigs" 0x00007ffff78d0ced in nanosleep () at ../sysdeps/unix/syscall-template.S:82
(gdb) _

$ grep SigCgt /proc/31772/task/31772/status
SigCgt: 0000000180000000

OK, so the threads managed to deliver both SIGUSR1 and SIGUSR2 but GDB has
reported only SIGUSR1 to the user.

# The debugee does not handle SIGUSR1 so it would crash on its delivery:
(gdb) handle SIGUSR1 nopass
Signal Stop Print Pass to program Description
SIGUSR1 Yes Yes No User defined signal 1
(gdb) continue
Program received signal SIGUSR1, User defined signal 1.

OK, GDB has waitpid()ed SIGUSR1 already and still some thread has delivered
afterwards before GDB has managed to stop that thread.

(gdb) continue
Program received signal SIGUSR2, User defined signal 2.

Only now the user has found SIGUSR2 has also been delivered. The main thread
(receiving the signals) has not run yet been resumed at all. It would be nice
if GDB could display all the signals the inferior has received as the other
threads are stopped already after the signals were sent (in pause ()) - this
gives user a skewed picture of different state in time for each thread.

I would prefer if GDB would print all the signals at once on a single stop:

Program received signal SIGUSR1, User defined signal 1.
Program received signal SIGUSR2, User defined signal 2.
(gdb) _

(This is not a simple change for GDB as it has many operations bound to
receiving single signal.)

Currently when GDB receives SIGUSR1 it has to do PTRACE_CONT before waitpid()
and receiving SIGUSR2. The time it does PTRACE_CONT it does not know if then
waitpid() returns immediately or if the application will run for another hour.

There are similar problems GDB wanting to do something-like-INTERRUPT sends now
SIGSTOP and then it wants to remove that SIGSTOP from the inferior's queue as
it would confuse both user and the debuggee if left there. Fortunately this
paragraph's pain will no longer be needed with PTRACE_INTERRUPT.

For example if you guarantee that after PTRACE_INTERRUPT the INTERRUPT even
will always get delivered as the last one after all the other signals GDB could
safely operate on all the delivered signals without a risk of accidentally
resuming the debuggee before explicitly instructed to do so by the user.

This is not a real plan how it should be done - but I hope it gives a picture
debuggers are interested the processing all the already delivered signals.
GDB should probably check the SigCgt /proc field (it already does in some
cases) for the informational display of delivered threads.


Thanks,
Jan

2011-05-15 20:07:18

by Jan Kratochvil

[permalink] [raw]
Subject: Re: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]

On Sun, 15 May 2011 19:28:11 +0200, Tejun Heo wrote:
> It's a balancing act. The primary goal is to make it _functional_ as
> the current ptrace is outright broken in many places. The second, at
> least for me, is not deviating from the current behavior too much
> unless required by the first goal or not doing so is extremely silly.

You are introducing new API which requires new codepaths in every
debugger-like program using it. I do not find the "not deviating" reason as
valid for making the _new_ API parts needlessly imperfect. Otherwise in the
next step we will want to fix the new imperfect parts and - there will be
three APIs that time to be supported in each debugger-like program depending
on how old kernel versions the debugger wants to support.


> After all, this is a pretty low level API which only a handful are
> gonna use.

People are now rather directed on #gdb@freenode to use gdb instead of dealing
with ptrace when coding various monitoring/debugging helpers. While sure
ptrace is not a mainstream syscall I would find great making it more easy.


Sure the changes should be still small - I do not ask for making unrelated new
changes; just making the already needed changes perfect in their scope.


Thanks,
Jan

2011-05-16 08:31:19

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hello,

On Sun, May 15, 2011 at 09:48:29PM +0200, Jan Kratochvil wrote:
> How to trigger it reliably? One can just try it in a loop but it takes minutes
> and depends on hardware specifics making it I guess even unreproducible in
> various configurations. We were allocating various machines for hours in the
> farm but it may be unreproducible anyway.

First of all, I don't think hitting different trap spots would be that
difficult. Handling different traps after SEIZE might not be easy but
could just be something which needs to be done anyway. I'm reluctant
to treat the initial trap differently because it's essentially
identical to what happens when a program does PTRACE_INTERRUPT. It's
just something the program should be able to deal with, somewhat like
read(2) may return shorter amount of data than provided buffer.

> # The debugee does not handle SIGUSR1 so it would crash on its delivery:
> (gdb) handle SIGUSR1 nopass
> Signal Stop Print Pass to program Description
> SIGUSR1 Yes Yes No User defined signal 1
> (gdb) continue
> Program received signal SIGUSR1, User defined signal 1.
>
> OK, GDB has waitpid()ed SIGUSR1 already and still some thread has delivered
> afterwards before GDB has managed to stop that thread.

I can't understand the above sentence. A thread can't deliver signal
without going through tracer while ptraced. Can you elaborate a bit
more?

> (gdb) continue
> Program received signal SIGUSR2, User defined signal 2.
>
> Only now the user has found SIGUSR2 has also been delivered. The main thread
> (receiving the signals) has not run yet been resumed at all.

There's no distinction between main or sub threads in terms of signal
delivery unless signal itself is specifically directed to a thread.

> It would be nice if GDB could display all the signals the inferior
> has received as the other threads are stopped already after the
> signals were sent (in pause ()) - this gives user a skewed picture
> of different state in time for each thread.

Isn't that the signal pending mask?

> I would prefer if GDB would print all the signals at once on a single stop:
>
> Program received signal SIGUSR1, User defined signal 1.
> Program received signal SIGUSR2, User defined signal 2.
> (gdb) _

Ditto.

> (This is not a simple change for GDB as it has many operations bound to
> receiving single signal.)
>
> Currently when GDB receives SIGUSR1 it has to do PTRACE_CONT before waitpid()
> and receiving SIGUSR2. The time it does PTRACE_CONT it does not know if then
> waitpid() returns immediately or if the application will run for another hour.
>
> There are similar problems GDB wanting to do something-like-INTERRUPT sends now
> SIGSTOP and then it wants to remove that SIGSTOP from the inferior's queue as
> it would confuse both user and the debuggee if left there. Fortunately this
> paragraph's pain will no longer be needed with PTRACE_INTERRUPT.
>
> For example if you guarantee that after PTRACE_INTERRUPT the INTERRUPT even
> will always get delivered as the last one after all the other signals GDB could
> safely operate on all the delivered signals without a risk of accidentally
> resuming the debuggee before explicitly instructed to do so by the user.

Signal delivery is sequential in nature and delivering a signal which
has user specified signal handler involves roundtrip to userland. I'm
not following what you're suggesting.

> This is not a real plan how it should be done - but I hope it gives a picture
> debuggers are interested the processing all the already delivered signals.
> GDB should probably check the SigCgt /proc field (it already does in some
> cases) for the informational display of delivered threads.

Okay, I'm a bit confused, so let's clear things up a bit.

* Signal is sent to a group of threads of a specific thread. Note
that SIGCONT wakes up stopped process at this point.

* On the receipient, the signal becomes pending. The mask of pending
signals is visible through /proc.

* Signal is delievered when the receipient processes those pending
signals. This, of course, happens one signal after another.
Depending on signal and configuration, signal may be ignored, kill,
stop the process or trigger signal handler which involves roundtrip
to userland.

* ptrace is notified of and can alter signal delivery.

Given the different modes of signal deliveries, I don't think
prioritizing signal delivery to other traps makes sense.

Hmmm... but I think what you want can be achieved with simply calling
PTRACE_INTERRUPT on each signal delivery trap. The tracee will
deliver the signal and then immediately take INTERRUPT trap. ie.

* Check if there are pending signals which can be delivered by this
thread. Note that different threads may have different pending and
blocked masks so there isn't a single thread which can do
everything.

* If there are signals to deliver, CONT it and it will take the signal
trap (eventually). During signal trap, do PTRACE_INTERRUPT and then
let the tracee deliver the signal. Tracee will deliver the signal
and take STOP trap.

Is the above enough for your use case?

Thanks.

--
tejun

2011-05-16 08:43:55

by Tejun Heo

[permalink] [raw]
Subject: Re: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]

Hello,

On Sun, May 15, 2011 at 10:06:54PM +0200, Jan Kratochvil wrote:
> You are introducing new API which requires new codepaths in every
> debugger-like program using it. I do not find the "not deviating" reason as
> valid for making the _new_ API parts needlessly imperfect. Otherwise in the
> next step we will want to fix the new imperfect parts and - there will be
> three APIs that time to be supported in each debugger-like program depending
> on how old kernel versions the debugger wants to support.

There's distinction between "broken" and "ugly". If it's ugly but
functional, you don't need to "fix" it. What we have is a broken ugly
one and my primary goal is fixing the broken part. I don't
necessarily object to making it pretty but that's not my primary goal.

> Sure the changes should be still small - I do not ask for making unrelated new
> changes; just making the already needed changes perfect in their scope.

Perfect is enemy of good. I'll listen to your and other's suggestions
but given the rather subpar history of ptrace and its development am
not too convinced that the existing ideas or directions were
especially effective.

Frankly, I think the biggest disease was this obsession with
perfection. Whether ptrace is slightly prettier or not, or whether it
can do the obscure feature three enterprise customers requested
doesn't matter all that much. Let's leave them alone and concentrate
on fixing mainstream use cases.

So, I'm gonna push back quite a bit unless it actually compromises
functionality or correctness.

Thank you.

--
tejun

2011-05-16 09:01:31

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_DETACH without stop [Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT]

Hey,

On Sun, May 15, 2011 at 07:39:40PM +0200, Jan Kratochvil wrote:
> #include <unistd.h>
> #include <sys/wait.h>
> #include <sys/ptrace.h>
> #include <stdio.h>
> #include <stdlib.h>
> static int status;
> static pid_t child;
> void
> handler (int signo)
> {
> puts ("PASS");
> exit (0);
> }
> /* Ensure CHILD is stopped even if it is running now - for PTRACE_DETACH. */
> void
> stop_child_for_detach (void)
> {
> kill (child, SIGCONT); /* To be PTRACE_INTERRUPT in the future. */
> /* The problem - here a signal may get lost. */
> wait (&status);
> }
> int
> main (void)
> {
> child = fork ();
> switch (child)
> {
> case 0:
> signal (SIGUSR2, handler);
> ptrace (PTRACE_TRACEME, 0, NULL, NULL);
> raise (SIGUSR1);
> puts ("FAIL");
> exit (1);
> default:
> wait (&status);
> /* Sent by arbitrary external program. */
> kill (child, SIGUSR2);
> /* Comment out for PTRACE_DETACH not requiring stopped tracee. */
> #if 1
> ptrace (PTRACE_CONT, child, NULL, NULL);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can lose signal here. You must check which trap happened why and
then take action accordingly. Argh... we really need better
documentation of ptrace behaviors and traps. Anyways, sans the
confusion between debugger sent signals and signals from other sources
and the side effects caused by debugger sent ones, the current ptrace
interface isn't broken to the point where you just lose signal.

> stop_child_for_detach ();
> #endif
> ptrace (PTRACE_DETACH, child, NULL, NULL);
> }
> return 0;
> }
>
>
> > The reason why I'm reluctant to drop trapped requirement from both
> > SEIZE and DETACH is that removing those synchronization points opens
> > up a lot of corner cases.
>
> They are either (currently) in the userland or they would be in the kernel.

Nope. What userland is currently dealing with isn't that type of
conditions. It's dealing with nasty side effects of implied and
required signals, which will be removed with the new interface. Those
attach/detach sync points are currently in the kernel and wouldn't
change with the proposed updates. You're suggesting to remove them.

So, it's not about moving them, at all. It's about removing them.

That said, it might not be such a bad idea. Let's see how difficult
or easy it actually is.

Thanks.

--
tejun

2011-05-16 09:13:24

by Tejun Heo

[permalink] [raw]
Subject: Re: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

Hey, again.

On Sun, May 15, 2011 at 07:47:22PM +0200, Jan Kratochvil wrote:
> > But, the current WNOHANG wait is racy. It's unlikely but definitely
> > possible for WNOHANG to fail when it's expected to succeed (not the
> > above case but more convoluted ones).
>
> OK, so FYI it breaks current GDB.

That might be true but the race cases are very obscure. Not sure
whether the race conditions could actually affect ptracer. With
SEIZE, it won't, I think.

> > But, just out of curiosity, is there any reason the ptracer itself
> > should be doing something other than waitpid() while tracee is
> > running? It's not like ptrace requests can be issued during that time
> > and sleeping waitpid() is way saner mechanism to wait for tracee
> > events than signal.
>
> If the debugger wants to be single-threaded ("poll() model", not "threads
> model") and it wants to communicate with user and examine debuggee symbols and
> memory data it cannot use sleeping wait. GDB is single-threaded and it
> supports `set target-async 1': info '(gdb)Background Execution'

I don't think target-async is necessarily related. It doesn't really
matter whether the execution per-se is async or not. The ptracer can
be a separate thread regardless and the interlocking can be added on
top (or not).

Anyways, I would recommend using sleeping wait(2)'s for ptrace event
tracking. Well, signal notification would work but I think that would
be much more error-prone and has much higher chance of being fragile.

Thanks.

--
tejun

2011-05-16 12:08:55

by Jan Kratochvil

[permalink] [raw]
Subject: Re: PTRACE_DETACH without stop [Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT]

Hi Tejun,

On Mon, 16 May 2011 11:01:25 +0200, Tejun Heo wrote:
> On Sun, May 15, 2011 at 07:39:40PM +0200, Jan Kratochvil wrote:
> > #include <unistd.h>
> > #include <sys/wait.h>
> > #include <sys/ptrace.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > static int status;
> > static pid_t child;
> > void
> > handler (int signo)
> > {
> > puts ("PASS");
> > exit (0);
> > }
> > /* Ensure CHILD is stopped even if it is running now - for PTRACE_DETACH. */
> > void
> > stop_child_for_detach (void)
> > {
> > kill (child, SIGCONT); /* To be PTRACE_INTERRUPT in the future. */
> > /* The problem - here a signal may get lost. */
> > wait (&status);
> > }
> > int
> > main (void)
> > {
> > child = fork ();
> > switch (child)
> > {
> > case 0:
> > signal (SIGUSR2, handler);
> > ptrace (PTRACE_TRACEME, 0, NULL, NULL);
> > raise (SIGUSR1);
> > puts ("FAIL");
> > exit (1);
> > default:
> > wait (&status);
> > /* Sent by arbitrary external program. */
> > kill (child, SIGUSR2);
> > /* Comment out for PTRACE_DETACH not requiring stopped tracee. */
> > #if 1
> > ptrace (PTRACE_CONT, child, NULL, NULL);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> You can lose signal here.

Not here but there should be PTRACE_CONT in stop_child_for_detach.

That SIGUSR1 was intended to be dropped - it was there just to get stopped by
PTRACE_TRACEME.


> You must check which trap happened why and then take action accordingly.
> Argh... we really need better documentation of ptrace behaviors and traps.

We could misunderstand each other here. Sure I know where is the problem.
Just demonstration that writing correct `stop_child_for_detach' is not easy
and naive programmer may write it looking right and working right during
development but in fact it will break debuggees in corner cases.


> Nope. What userland is currently dealing with isn't that type of
> conditions. It's dealing with nasty side effects of implied and
> required signals, which will be removed with the new interface.

They won't as there will be new INTERRUPT event and when one wants to trap it
one has to deal with various signals coming before or after it.


> Those attach/detach sync points are currently in the kernel and wouldn't
> change with the proposed updates. You're suggesting to remove them.

I am suggesting not to introduce the existing pain into new API.


Thanks,
Jan

2011-05-16 12:11:34

by Jan Kratochvil

[permalink] [raw]
Subject: Re: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

Hi Tejun,

On Mon, 16 May 2011 11:13:18 +0200, Tejun Heo wrote:
> On Sun, May 15, 2011 at 07:47:22PM +0200, Jan Kratochvil wrote:
> > If the debugger wants to be single-threaded ("poll() model", not "threads
> > model") and it wants to communicate with user and examine debuggee symbols and
> > memory data it cannot use sleeping wait. GDB is single-threaded and it
> > supports `set target-async 1': info '(gdb)Background Execution'
>
> I don't think target-async is necessarily related. It doesn't really
> matter whether the execution per-se is async or not. The ptracer can
> be a separate thread

It cannot as various GDB-supported platforms do not support threads properly.
And you want to have a common codebase to get it will supported.

Besides that it is a matter of coding style, I perfer "poll() model" even on
threads-supporting GNU/Linux.


> Anyways, I would recommend using sleeping wait(2)'s for ptrace event
> tracking.

You cannot as I described above.


Thanks,
Jan

2011-05-16 12:17:30

by Jan Kratochvil

[permalink] [raw]
Subject: Re: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]

Hi Tejun,

On Mon, 16 May 2011 10:43:50 +0200, Tejun Heo wrote:
> There's distinction between "broken" and "ugly". If it's ugly but
> functional, you don't need to "fix" it.

The final goal is the user experience (such as the users of GDB), nothing else
matters. If it is so "ugly" the userland developers fail to use it the
project as a whole still broken.


> Frankly, I think the biggest disease was this obsession with
> perfection.

I try to suggest fixes which seem to be easier on the kernel side than trying
to workaround them in all the debugger-like applications. After various
strace fixes and for years gdb linux-nat fixes there is a need to move to
gdbserver which will mean to reimplement all the ptrace workarounds again.


> So, I'm gonna push back quite a bit unless it actually compromises
> functionality or correctness.

With your position "if it is workaroundable in userland let's make the new
kernel API broken again" it no longer makes sense to comment on it. Yes,
everything is workaroundable but that is usually not the goal of new APIs.


Thanks,
Jan

2011-05-16 12:24:20

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_DETACH without stop [Re: [PATCH 04/11] ptrace: implement PTRACE_INTERRUPT]

On Mon, May 16, 2011 at 02:08:22PM +0200, Jan Kratochvil wrote:
> > Nope. What userland is currently dealing with isn't that type of
> > conditions. It's dealing with nasty side effects of implied and
> > required signals, which will be removed with the new interface.
>
> They won't as there will be new INTERRUPT event and when one wants to trap it
> one has to deal with various signals coming before or after it.

I'd rather lean toward handling it properly from userland. Strictly
defining trap order is too fragile. I think the right thing to do
here is properly documenting how to recognize and handle different
types of traps.

Thanks.

--
tejun

2011-05-16 12:27:07

by Jan Kratochvil

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hi Tejun,

On Mon, 16 May 2011 10:31:13 +0200, Tejun Heo wrote:
> On Sun, May 15, 2011 at 09:48:29PM +0200, Jan Kratochvil wrote:
> > # The debugee does not handle SIGUSR1 so it would crash on its delivery:
> > (gdb) handle SIGUSR1 nopass
> > Signal Stop Print Pass to program Description
> > SIGUSR1 Yes Yes No User defined signal 1
> > (gdb) continue
> > Program received signal SIGUSR1, User defined signal 1.
> >
> > OK, GDB has waitpid()ed SIGUSR1 already and still some thread has delivered
> > afterwards before GDB has managed to stop that thread.
>
> I can't understand the above sentence. A thread can't deliver signal
> without going through tracer while ptraced. Can you elaborate a bit
> more?

I tried to explain why GDB will see SIGUSR1 twice. Despite it is not
a realtime signal and therefore the signal is "flag", it does not queue/count.
You know better than me why GDB sees SIGUSR1 twice.


> > (gdb) continue
> > Program received signal SIGUSR2, User defined signal 2.
> >
> > Only now the user has found SIGUSR2 has also been delivered. The main thread
> > (receiving the signals) has not run yet been resumed at all.
>
> There's no distinction between main or sub threads in terms of signal
> delivery unless signal itself is specifically directed to a thread.

This sample code uses only tkill to avoid any mess with which TID will get
which signal.


> > It would be nice if GDB could display all the signals the inferior
> > has received as the other threads are stopped already after the
> > signals were sent (in pause ()) - this gives user a skewed picture
> > of different state in time for each thread.
>
> Isn't that the signal pending mask?

Yes but how do you query siginfo_t (GDB $_siginfo) of a pending signal to make
it accessible to the user? You also need to mask out blocked signals and
properly order them like kernel does - which is not guaranteed by POSIX.
You need to reimplement part of the kernel functionality and if you implement
it a bit differently it will break transparency of the debugging.


> > I would prefer if GDB would print all the signals at once on a single stop:
> >
> > Program received signal SIGUSR1, User defined signal 1.
> > Program received signal SIGUSR2, User defined signal 2.
> > (gdb) _
>
> Ditto.
>
> > (This is not a simple change for GDB as it has many operations bound to
> > receiving single signal.)
> >
> > Currently when GDB receives SIGUSR1 it has to do PTRACE_CONT before waitpid()
> > and receiving SIGUSR2. The time it does PTRACE_CONT it does not know if then
> > waitpid() returns immediately or if the application will run for another hour.
> >
> > There are similar problems GDB wanting to do something-like-INTERRUPT sends now
> > SIGSTOP and then it wants to remove that SIGSTOP from the inferior's queue as
> > it would confuse both user and the debuggee if left there. Fortunately this
> > paragraph's pain will no longer be needed with PTRACE_INTERRUPT.
> >
> > For example if you guarantee that after PTRACE_INTERRUPT the INTERRUPT even
> > will always get delivered as the last one after all the other signals GDB could
> > safely operate on all the delivered signals without a risk of accidentally
> > resuming the debuggee before explicitly instructed to do so by the user.
>
> Signal delivery is sequential in nature and delivering a signal which
> has user specified signal handler involves roundtrip to userland. I'm
> not following what you're suggesting.
>
> > This is not a real plan how it should be done - but I hope it gives a picture
> > debuggers are interested the processing all the already delivered signals.
> > GDB should probably check the SigCgt /proc field (it already does in some
> > cases) for the informational display of delivered threads.
>
> Okay, I'm a bit confused, so let's clear things up a bit.
>
> * Signal is sent to a group of threads of a specific thread. Note
> that SIGCONT wakes up stopped process at this point.

Normally yes but this sample code uses tkill to avoid it.


> * On the receipient, the signal becomes pending. The mask of pending
> signals is visible through /proc.

But not their siginfo_t, not which are blocked, their ordering etc.


> * Signal is delievered when the receipient processes those pending
> signals. This, of course, happens one signal after another.
> Depending on signal and configuration, signal may be ignored, kill,
> stop the process or trigger signal handler which involves roundtrip
> to userland.
>
> * ptrace is notified of and can alter signal delivery.
>
> Given the different modes of signal deliveries, I don't think
> prioritizing signal delivery to other traps makes sense.
>
> Hmmm... but I think what you want can be achieved with simply calling
> PTRACE_INTERRUPT on each signal delivery trap. The tracee will
> deliver the signal and then immediately take INTERRUPT trap. ie.
>
> * Check if there are pending signals which can be delivered by this
> thread. Note that different threads may have different pending and
> blocked masks so there isn't a single thread which can do
> everything.
>
> * If there are signals to deliver,

This is the question if the debugger can reliably detect. Maybe it can.


> CONT it and it will take the signal
> trap (eventually). During signal trap, do PTRACE_INTERRUPT and then
> let the tracee deliver the signal. Tracee will deliver the signal
> and take STOP trap.
>
> Is the above enough for your use case?

If there is enough documentation - or one reads the soures - one can
reimplement the signal delivery login in userland to expect what will kernel
do. TBH I do not think it is the right API but you are right it is
workaroundable in userland.


Thanks,
Jan

2011-05-16 12:28:02

by Tejun Heo

[permalink] [raw]
Subject: Re: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

Hey,

On Mon, May 16, 2011 at 02:11:16PM +0200, Jan Kratochvil wrote:
> It cannot as various GDB-supported platforms do not support threads properly.
> And you want to have a common codebase to get it will supported.
>
> Besides that it is a matter of coding style, I perfer "poll() model" even on
> threads-supporting GNU/Linux.

The above wouldn't be linux kernel's top design concerns, right?

> > Anyways, I would recommend using sleeping wait(2)'s for ptrace event
> > tracking.
>
> You cannot as I described above.

If you prefer that, go ahead. It's not like new interface is gonna
break that, but I think it's gonna be more fragile. Well, it
shouldn't be worse than now.

Thanks.

--
tejun

2011-05-16 12:39:52

by Jan Kratochvil

[permalink] [raw]
Subject: Re: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

Hi Tejun,

On Mon, 16 May 2011 14:27:56 +0200, Tejun Heo wrote:
> On Mon, May 16, 2011 at 02:11:16PM +0200, Jan Kratochvil wrote:
> > It cannot as various GDB-supported platforms do not support threads properly.
> > And you want to have a common codebase to get it will supported.
> >
> > Besides that it is a matter of coding style, I perfer "poll() model" even on
> > threads-supporting GNU/Linux.
>
> The above wouldn't be linux kernel's top design concerns, right?

They are - as without threads one needs reliable WNOHANG - while which you say
it is not reliable.


> It's not like new interface is gonna break that, but I think it's gonna be
> more fragile. Well, it shouldn't be worse than now.

BTW I really do not think this WNOHANG issue is anyhow related to this whole
discussion.


Thanks,
Jan

2011-05-16 12:43:06

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hello,

On Mon, May 16, 2011 at 02:26:42PM +0200, Jan Kratochvil wrote:
> > I can't understand the above sentence. A thread can't deliver signal
> > without going through tracer while ptraced. Can you elaborate a bit
> > more?
>
> I tried to explain why GDB will see SIGUSR1 twice. Despite it is not
> a realtime signal and therefore the signal is "flag", it does not queue/count.
> You know better than me why GDB sees SIGUSR1 twice.

Ah, okay. Well, there are five threads sending USR1 and five USR2,
right? The main thread would enter signal delivery path when the
first signal (be it USR1 or USR2) wakes it up and on dequeueing the
first signal (USR1 here), it would trap for signal delivery.
Depending on timing, this may happen after all signals are generated
but more likely to happen before some of them haven't finished sending
yet.

So, while the first USR1 is being delivered through ptrace and
whatnot, the remaining signals are sent, some of them are USR1, thus
making USR1 pending again.

Once you finish delivering USR1, signal delivery path is restarted and
the new pending USR1 is delivered and then USR2.

> > There's no distinction between main or sub threads in terms of signal
> > delivery unless signal itself is specifically directed to a thread.
>
> This sample code uses only tkill to avoid any mess with which TID will get
> which signal.

Yeap. Missed that.

> > Isn't that the signal pending mask?
>
> Yes but how do you query siginfo_t (GDB $_siginfo) of a pending signal to make
> it accessible to the user?

You can't, at least not yet, but wouldn't presenting list of pending
signals be helpful enough?

> You also need to mask out blocked signals

I thought we export this through /proc. Maybe not. I'll check.

> and properly order them like kernel does - which is not guaranteed
> by POSIX. You need to reimplement part of the kernel functionality
> and if you implement it a bit differently it will break transparency
> of the debugging.

I don't get why the delivery ordering matters.

> > * If there are signals to deliver,
>
> This is the question if the debugger can reliably detect. Maybe it can.

This shouldn't be too hard. You just need to know all the masks.

> > CONT it and it will take the signal
> > trap (eventually). During signal trap, do PTRACE_INTERRUPT and then
> > let the tracee deliver the signal. Tracee will deliver the signal
> > and take STOP trap.
> >
> > Is the above enough for your use case?
>
> If there is enough documentation - or one reads the soures - one can
> reimplement the signal delivery login in userland to expect what will kernel
> do. TBH I do not think it is the right API but you are right it is
> workaroundable in userland.

Oh, I would strongly recomment something like that. Don't depend on
implementation details. I still don't understand why you need to know
the order beforehand. Wouldn't pending list be enough? What are you
trying to achieve?

Thanks.

--
tejun

2011-05-16 12:46:30

by Tejun Heo

[permalink] [raw]
Subject: Re: waitpid(WNOHANG) should report SIGCHLD-notified signals [Re: [PATCH 09/11] job control: reorganize wait_task_stopped()]

Hello,

On Mon, May 16, 2011 at 02:39:30PM +0200, Jan Kratochvil wrote:
> > The above wouldn't be linux kernel's top design concerns, right?
>
> They are - as without threads one needs reliable WNOHANG - while which you say
> it is not reliable.

As I wrote before, I think it's reliable for ptrace. The problematic
ones are transitions between continued and stopped and running and I
don't think it would affect ptrace.

> > It's not like new interface is gonna break that, but I think it's gonna be
> > more fragile. Well, it shouldn't be worse than now.
>
> BTW I really do not think this WNOHANG issue is anyhow related to this whole
> discussion.

Probably not but I've seen gdb going out of sync with the expected
state of the tracee. Dunno whether it was kernel bug or gdb bug but
it seemed pretty fragile and adding group stop notifications might
make it a bit more difficult. I don't know. It was just my five
cents. Feel free to ignore.

Thanks.

--
tejun

2011-05-16 12:56:21

by Tejun Heo

[permalink] [raw]
Subject: Re: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]

Hello,

On Mon, May 16, 2011 at 02:17:11PM +0200, Jan Kratochvil wrote:
> On Mon, 16 May 2011 10:43:50 +0200, Tejun Heo wrote:
> > There's distinction between "broken" and "ugly". If it's ugly but
> > functional, you don't need to "fix" it.
>
> The final goal is the user experience (such as the users of GDB), nothing else
> matters. If it is so "ugly" the userland developers fail to use it the
> project as a whole still broken.

To me, it seems the breakage and mountain of workarounds come more
from lack of proper documentation plus the current ptrace + job
control + signal interaction which is really broken. For example, it
seems nobody really understood how group stop and ptrace interacts and
the different types of traps being used - strace(2) thought the same
signal was being delivered twice.

> > So, I'm gonna push back quite a bit unless it actually compromises
> > functionality or correctness.
>
> With your position "if it is workaroundable in userland let's make the new
> kernel API broken again" it no longer makes sense to comment on it. Yes,
> everything is workaroundable but that is usually not the goal of new APIs.

First of all, let's distinguish "broken" and "ugly" properly and
please keep in mind that what's broken is broken but beauty is in the
eyes of the beholder.

Secondly, I'm listening to your comments and will incorporate them, so
please don't give up so easily.

Thank you.

--
tejun

2011-05-16 13:00:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: getter PTRACE_GETSIGINFO should not modify anything [Re: [PATCH 11/11] ptrace: implement group stop notification for ptracer]


* Tejun Heo <[email protected]> wrote:

> Hello,
>
> On Mon, May 16, 2011 at 02:17:11PM +0200, Jan Kratochvil wrote:
> > On Mon, 16 May 2011 10:43:50 +0200, Tejun Heo wrote:
> > > There's distinction between "broken" and "ugly". If it's ugly but
> > > functional, you don't need to "fix" it.
> >
> > The final goal is the user experience (such as the users of GDB), nothing else
> > matters. If it is so "ugly" the userland developers fail to use it the
> > project as a whole still broken.
>
> To me, it seems the breakage and mountain of workarounds come more from lack
> of proper documentation plus the current ptrace + job control + signal
> interaction which is really broken. For example, it seems nobody really
> understood how group stop and ptrace interacts and the different types of
> traps being used - strace(2) thought the same signal was being delivered
> twice.

Btw., i'd also like to offer the observation that even these days, running
brand-new Rawhide distribution with strace-4.5.20-2.fc15.x86_64 and a fresh
kernel, i *still* often see strace (or ptrace?) misbehavior like it
misprocessing Ctrl-C and leaving hung threads around which i have to clean up
manually ...

So if you could bring sanity in here i think we could live with a regression or
two, if the situation could be improved in the longer run - it's not like we
are protecting a stable bastion of robust functionality here ...

Thanks,

Ingo

2011-05-16 13:04:00

by Jan Kratochvil

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hi Tejun,

On Mon, 16 May 2011 14:42:59 +0200, Tejun Heo wrote:
> I still don't understand why you need to know the order beforehand.
> Wouldn't pending list be enough? What are you trying to achieve?

If you check the GDB debugging session transcript I gave GDB stopped in
a moment when all the threads already returned from tkill()s sending SIGUSR1s
and SIGUSR2.

All threads are stopped, user is investigating the situation. And GDB tells
the user (only) SIGUSR1 was delivered. The user has no chance to find out
SIGUSR2 is already pending / to be delivered. This is one of the many reasons
why debugging various racy cases is a nightmare.

I was trying to suggest some ways how to give user the complete overview of
the debuggee situation - where both SIGUSR1 and SIGUSR2 would be reported on
the first stop.

You are right GDB could examine SigCgt, SigBlk (not sure if others) and report
those signals. Maybe it is right that way and we can forget about it.


There is (was) a larger problem of signals reordering which I fixed in
[patch 3/4]#3 linux-nat: Do not respawn signals
http://sourceware.org/ml/gdb-patches/2010-09/msg00360.html
but that mess you should have fixed by PTRACE_INTERRUPT which no longer
interacts with signals. But it gave me the idea of another situation above
where the debugger may want to know all the currently pending signals at once.



Thanks,
Jan

2011-05-16 13:21:51

by Jan Kratochvil

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hi Tejun,

On Mon, 16 May 2011 10:31:13 +0200, Tejun Heo wrote:
> First of all, I don't think hitting different trap spots would be that
> difficult. Handling different traps after SEIZE might not be easy but
> could just be something which needs to be done anyway. I'm reluctant
> to treat the initial trap differently because it's essentially
> identical to what happens when a program does PTRACE_INTERRUPT. It's
> just something the program should be able to deal with, somewhat like
> read(2) may return shorter amount of data than provided buffer.

If not clear I would like also PTRACE_INTERRUPT to deliver the INTERRUPT event
always as the first one, if it is not clear.

tracer tracee foreign process
--------------- ------ ---------------
attaches tracee
normal run
tkill(tracee, SIGUSR1)
stopped, to report SIGUSR1 to tracer
PTRACE_INTERRUPT (but tracee is already stopped)
waitpid(tracee)

Currently that waitpid will report SIGUSR1 first and then the INTERRUPT trap.

I suggest - if possible - that tracer will get that INTERRUPT trap first.

But you made the optimization that the INTERRUPT gets suppressed in such case.
Which improves performance for the cost of more complicated debuggers.

I guess after PTRACE_SEIZE if the first signal/event is not INTERRUPT there
will not be any INTERRUPT later, as you said.


Thanks,
Jan

2011-05-16 13:45:16

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hello, Jan.

On Mon, May 16, 2011 at 03:21:23PM +0200, Jan Kratochvil wrote:
> Currently that waitpid will report SIGUSR1 first and then the INTERRUPT trap.

Yes, it would.

> I suggest - if possible - that tracer will get that INTERRUPT trap first.
>
> But you made the optimization that the INTERRUPT gets suppressed in such case.
> Which improves performance for the cost of more complicated debuggers.

INTERRUPT is suppressed if group stop is pending not signal. That was
more about duplicity between INTERRUPT trap and group stop, which made
the whole thing uglier. I merged the two together so there's no such
distinction anymore.

> I guess after PTRACE_SEIZE if the first signal/event is not INTERRUPT there
> will not be any INTERRUPT later, as you said.

I replied in another message too but now SEIZE and INTERRUPT behave
identically. Any trap clears the condition.

I don't think INTERRUPT can be prioritized like that above existing
trap conditions. Traps are taken sometimes deep in the kernel
(e.g. fork/exec) and often after modifying states irrevocably
(e.g. signal is already dequeued on signal trap). I don't think how
it would be possible to rewind the state changes and replay it later.

FWIW, if INTERRUPT exists while signal is pending, INTERRUPT will be
taken first but that doesn't mean all that much as we can't roll back
from traps which already have happened and tracee might be in any trap
by the time PTRACE_INTERRUPT is issued.

Thank you.

--
tejun

2011-05-16 13:48:34

by Jan Kratochvil

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hi Tejun,

On Mon, 16 May 2011 15:45:10 +0200, Tejun Heo wrote:
> I don't think INTERRUPT can be prioritized like that above existing
> trap conditions. Traps are taken sometimes deep in the kernel
> (e.g. fork/exec) and often after modifying states irrevocably
> (e.g. signal is already dequeued on signal trap). I don't think how
> it would be possible to rewind the state changes and replay it later.

OK, so that closes one of the major issues I was trying to get "fixed".


Thanks for info,
Jan

2011-05-16 13:51:47

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hello, Jan.

On Mon, May 16, 2011 at 03:03:39PM +0200, Jan Kratochvil wrote:
> If you check the GDB debugging session transcript I gave GDB stopped in
> a moment when all the threads already returned from tkill()s sending SIGUSR1s
> and SIGUSR2.

Yeap.

> All threads are stopped, user is investigating the situation. And GDB tells
> the user (only) SIGUSR1 was delivered. The user has no chance to find out
> SIGUSR2 is already pending / to be delivered. This is one of the many reasons
> why debugging various racy cases is a nightmare.
>
> I was trying to suggest some ways how to give user the complete overview of
> the debuggee situation - where both SIGUSR1 and SIGUSR2 would be reported on
> the first stop.
>
> You are right GDB could examine SigCgt, SigBlk (not sure if others) and report
> those signals. Maybe it is right that way and we can forget about it.

Yes, I think this is the correct way to deal with it. Multiple
signals can be pending and/or blocked but a single thread can only
deliver a single signal at any given time, which may involve userland
execution. Parallel delivery simply isn't defined, so I think what
you want here is the list of pending signals, not deliveries, and then
consulting the pending mask is the obvious thing to do.

> There is (was) a larger problem of signals reordering which I fixed in
> [patch 3/4]#3 linux-nat: Do not respawn signals
> http://sourceware.org/ml/gdb-patches/2010-09/msg00360.html
> but that mess you should have fixed by PTRACE_INTERRUPT which no longer
> interacts with signals. But it gave me the idea of another situation above
> where the debugger may want to know all the currently pending signals at once.

Yeap, I agree that it would be nice if gdb informs the user of the
pending signals when it stops for signal delivery.

Thank you.

--
tejun

2011-05-16 13:54:20

by Tejun Heo

[permalink] [raw]
Subject: Re: PTRACE_SEIZE should not stop [Re: [PATCH 02/11] ptrace: implement PTRACE_SEIZE]

Hello,

On Mon, May 16, 2011 at 03:48:08PM +0200, Jan Kratochvil wrote:
> On Mon, 16 May 2011 15:45:10 +0200, Tejun Heo wrote:
> > I don't think INTERRUPT can be prioritized like that above existing
> > trap conditions. Traps are taken sometimes deep in the kernel
> > (e.g. fork/exec) and often after modifying states irrevocably
> > (e.g. signal is already dequeued on signal trap). I don't think how
> > it would be possible to rewind the state changes and replay it later.
>
> OK, so that closes one of the major issues I was trying to get "fixed".

Sorry, this one was too difficult and, even if somehow I pulled it,
unlikely to make upstream. It's gonna be extremely fragile.
Unfortunately, userland would still have to deal with arbitrary order
of traps.

Thank you.

--
tejun