2010-12-24 14:01:07

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET RFC] ptrace,signal: clean transition between STOPPED and TRACED

Hello,

This patchset is spun off from "ptrace,signal: sane interaction
between ptrace and job control signals, take#2" patchset[1]. From the
series, four fix and cleanup patches were put into git branch
ptrace-reviewed[2]. This patchset is the subset which tries to fix
problems with the actual group stop mechanism and make the transition
between STOPPED and TRACED clean.

These changes haven't been agreed upon yet and require more
discussion. I'm posting this series so that the already pointed out
problems don't hinder in the discussion and we can separate this part
from the SIGCHLD notification changes.

Most changes are to reflect Oleg's review on the original patchset.

* 0001 is added to remove the deprecated CLONE_STOPPED.

* 0002-0005 only received changes for minor issues pointed out during
review.

* 0006 description updated to note the user visible behavior changes.

* 0007 now uses signal->wait_chldexit instead of waiting on bit.
Oleg, I kept TASK_UNINTERRUPTIBLE wait for now. I tried to switch
to TASK_KILLABLE but it makes things more subtle down the line. The
child can be left in the middle of transition and may end up
continuing running while the rest are stopped, which in itself is
okay but adds another layer of complexity on top of an already very
complex set of behaviors. As the transition is well defined and
lock-stepped, I think it would be better to just get it right and
remove the variable there.

* 0007 also waits for trapping on attach instead of the next ptrace
operation such that an immediately following WNOHANG(2) wait from
the ptracer would always succeed if the ptracee was already stopped.

* Comments added and other misc updates to 0007.

Most behavior differences caused by this series is mostly caused by
tracees stopping in TRACED instead of STOPPED when trapping for a
group stop. The two most notable ones are

1. When attaching to a STOPPED task or a traced task stops for group
stop, the tracee now enters TRACED instead of STOPPED. This is
visible via fs/proc but, more importantly, SIGCONT is ignored if a
task is TRACED.

The behavior before the change was quite erratic. The first ptrace
operation after the tracee enters STOPPED would silently transit
its state to TRACED behind its back bypassing arch_ptrace_stop().
This means that SIGCONT is honored until the first following ptrace
operation but ignored after that.

This may, for example, affect the operation of strace but given how
strace always need to issue further ptrace operations on trap to
determine what's going on, I doubt it would actually be worse.

2. The transition between STOPPED and TRACED involves a short window
of RUNNING inbetween. On attach, the transition is hidden from the
tracer using GROUP_STOP_TRAPPING but it still is visible to other
threads in the tracer's group. IOW, if another thread performs
WNOHANG wait(2) on the tracee while attach is in progress, the
wait(2) may fail even if the tracee is known to be in stopped state
before.

The same problem exists the other direction during detach.
Currently, the code doesn't try to hide this transition even from
the tracer. IOW, if the tracer attaches to a stopped task,
detaches, reattaches and then performs WNOHANG wait(2), the wait(2)
may fail. However, given the previous behavior where the tracee is
always woken up by wake_up_process() on detach, this is highly
unlikely to cause any problem.

This patchset contains the following seven patches.

0001-clone-kill-CLONE_STOPPED.patch
0002-ptrace-add-why-to-ptrace_stop.patch
0003-signal-fix-premature-completion-of-group-stop-when-i.patch
0004-signal-use-GROUP_STOP_PENDING-to-stop-once-for-a-sin.patch
0005-ptrace-participate-in-group-stop-from-ptrace_stop-if.patch
0006-ptrace-make-do_signal_stop-use-ptrace_stop-if-the-ta.patch
0007-ptrace-clean-transitions-between-TASK_STOPPED-and-TR.patch

and is available in the following git tree.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git ptrace-clean-transition

diffstat follows.

fs/exec.c | 1
include/linux/sched.h | 12 ++-
kernel/fork.c | 28 -------
kernel/ptrace.c | 49 +++++++++++-
kernel/signal.c | 192 +++++++++++++++++++++++++++++++++++++++-----------
5 files changed, 208 insertions(+), 74 deletions(-)

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1072474
[2] git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git ptrace-reviewed


2010-12-24 14:01:13

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/7] ptrace: add @why to ptrace_stop()

To prepare for cleanup of the interaction between group stop and
ptrace, add @why to ptrace_stop(). Existing users are updateda such
that there is no behavior change.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/signal.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7dc0ca2..4569801 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1617,7 +1617,7 @@ static int sigkill_pending(struct task_struct *tsk)
* If we actually decide not to stop at all because the tracer
* is gone, we keep current->exit_code unless clear_code.
*/
-static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
+static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
__releases(&current->sighand->siglock)
__acquires(&current->sighand->siglock)
{
@@ -1655,7 +1655,7 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
if (may_ptrace_stop()) {
- do_notify_parent_cldstop(current, CLD_TRAPPED);
+ do_notify_parent_cldstop(current, why);
/*
* Don't want to allow preemption here, because
* sys_ptrace() needs this task to be inactive.
@@ -1714,7 +1714,7 @@ void ptrace_notify(int exit_code)

/* Let the debugger run. */
spin_lock_irq(&current->sighand->siglock);
- ptrace_stop(exit_code, 1, &info);
+ ptrace_stop(exit_code, CLD_TRAPPED, 1, &info);
spin_unlock_irq(&current->sighand->siglock);
}

@@ -1795,7 +1795,7 @@ static int ptrace_signal(int signr, siginfo_t *info,
ptrace_signal_deliver(regs, cookie);

/* Let the debugger run. */
- ptrace_stop(signr, 0, info);
+ ptrace_stop(signr, CLD_TRAPPED, 0, info);

/* We're back. Did the debugger cancel the sig? */
signr = current->exit_code;
--
1.7.1

2010-12-24 14:01:17

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 6/7] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced

A ptraced task would still stop at do_signal_stop() when it's stopping
for stop signals and do_signal_stop() behaves the same whether the
task is ptraced or not. However, in addition to stopping,
ptrace_stop() also does ptrace specific stuff like calling
architecture specific callbacks, so this behavior makes the code more
fragile and difficult to understand.

This patch makes do_signal_stop() test whether the task is ptraced and
use ptrace_stop() if so. This renders tracehook_notify_jctl() rather
pointless as the ptrace notification is now handled by ptrace_stop()
regardless of the return value from the tracehook. It probably is a
good idea to update it.

This doesn't solve the whole problem as tasks already in stopped state
would stay in the regular stop when ptrace attached. That part will
be handled by the next patch.

Oleg pointed out that this makes a userland-visible change. Before,
SIGCONT would be able to wake up a task in group stop even if the task
is ptraced if the tracer hasn't issued another ptrace command
afterwards (as the next ptrace commands transitions the state into
TASK_TRACED which ignores SIGCONT wakeups). With this and the next
patch, SIGCONT may race with the transition into TASK_TRACED and is
ignored if the tracee already entered TASK_TRACED.

Another userland visible change of this and the next patch is that the
ptracee's state would now be TASK_TRACED where it used to be
TASK_STOPPED, which is visible via fs/proc.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Jan Kratochvil <[email protected]>
---
kernel/signal.c | 43 +++++++++++++++++++++++++------------------
1 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0b36f1b..6656145 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1783,7 +1783,6 @@ void ptrace_notify(int exit_code)
static int do_signal_stop(int signr)
{
struct signal_struct *sig = current->signal;
- int notify = 0;

if (!(current->group_stop & GROUP_STOP_PENDING)) {
unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
@@ -1813,29 +1812,37 @@ static int do_signal_stop(int signr)
} else
task_clear_group_stop(t);
}
- /*
- * If there are no other threads in the group, or if there is
- * a group stop in progress and we are the last to stop, report
- * to the parent. When ptraced, every thread reports itself.
- */
- if (task_participate_group_stop(current))
- notify = CLD_STOPPED;
- if (task_ptrace(current))
- notify = CLD_STOPPED;

current->exit_code = sig->group_exit_code;
__set_current_state(TASK_STOPPED);

- spin_unlock_irq(&current->sighand->siglock);
+ if (likely(!task_ptrace(current))) {
+ int notify = 0;

- if (notify) {
- read_lock(&tasklist_lock);
- do_notify_parent_cldstop(current, notify);
- read_unlock(&tasklist_lock);
- }
+ /*
+ * If there are no other threads in the group, or if there
+ * is a group stop in progress and we are the last to stop,
+ * report to the parent.
+ */
+ if (task_participate_group_stop(current))
+ notify = CLD_STOPPED;

- /* Now we don't run again until woken by SIGCONT or SIGKILL */
- schedule();
+ spin_unlock_irq(&current->sighand->siglock);
+
+ if (notify) {
+ read_lock(&tasklist_lock);
+ do_notify_parent_cldstop(current, notify);
+ read_unlock(&tasklist_lock);
+ }
+
+ /* Now we don't run again until woken by SIGCONT or SIGKILL */
+ schedule();
+
+ spin_lock_irq(&current->sighand->siglock);
+ } else
+ ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+
+ spin_unlock_irq(&current->sighand->siglock);

tracehook_finish_jctl();
current->exit_code = 0;
--
1.7.1

2010-12-24 14:01:18

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/7] ptrace: participate in group stop from ptrace_stop() iff the task is trapping for group stop

Currently, ptrace_stop() unconditionally participates in group stop
bookkeeping. This is unnecessary and inaccurate. Make it only
participate if the task is trapping for group stop - ie. if @why is
CLD_STOPPED. As ptrace_stop() currently is not used when trapping for
group stop, this equals to disabling group stop participation from
ptrace_stop().

A visible behavior change is increased likelihood of delayed group
stop completion if the thread group contains one or more ptraced
tasks.

This is to preapre for further cleanup of the interaction between
group stop and ptrace.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
kernel/signal.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 7e85e42..0b36f1b 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1694,10 +1694,13 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
}

/*
- * If there is a group stop in progress,
- * we must participate in the bookkeeping.
+ * If @why is CLD_STOPPED, we're trapping to participate in a group
+ * stop. Do the bookkeeping. Note that if SIGCONT was delievered
+ * while siglock was released for the arch hook, PENDING could be
+ * clear now. We act as if SIGCONT is received after TASK_TRACED
+ * is entered - ignore it.
*/
- if (current->group_stop & GROUP_STOP_PENDING)
+ if (why == CLD_STOPPED && (current->group_stop & GROUP_STOP_PENDING))
task_participate_group_stop(current);

current->last_siginfo = info;
--
1.7.1

2010-12-24 14:01:32

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 7/7] ptrace: clean transitions between TASK_STOPPED and TRACED

Currently, if the task is STOPPED on ptrace attach, it's left alone
and the state is silently changed to TRACED on the next ptrace call.
The behavior breaks the assumption that arch_ptrace_stop() is called
before any task is poked by ptrace and is ugly in that a task
manipulates the state of another task directly.

With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
TRACED can be made clean. The tracer can use the flag to tell the
tracee to retry stop on attach and detach. On retry, the tracee will
enter the desired state in the correct way. The lower 16bits of
task->group_stop is used to remember the signal number which caused
the last group stop. This is used while retrying for ptrace attach as
the original group_exit_code could have been consumed with wait(2) by
then.

As the real parent may wait(2) and consume the group_exit_code
anytime, the group_exit_code needs to be saved separately so that it
can be used when switching from regular sleep to ptrace_stop(). This
is recorded in the lower 16bits of task->group_stop.

If a task is already stopped and there's no intervening SIGCONT, a
ptrace request immediately following a successful PTRACE_ATTACH should
always succeed even if the tracer doesn't wait(2) for attach
completion; however, with this change, the tracee might still be
TASK_RUNNING trying to enter TASK_TRACED which would cause the
following request to fail with -ESRCH.

This intermediate state is hidden from the ptracer by setting
GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
for it to clear on its signal->wait_chldexit. Completing the
transition or any event which clears the group stop states of the task
clears the bit and wakes up the ptracer if waiting.

Note that the STOPPED -> RUNNING -> TRACED transition is still visible
to other threads which are in the same group as the ptracer and the
reverse transition is visible to all. Please read the comments for
details.

Oleg:

* Spotted a race condition where a task may retry group stop without
proper bookkeeping. Fixed by redoing bookkeeping on retry.

* Spotted that the transition is visible to userland in several
different ways. Most are fixed with GROUP_STOP_TRAPPING. Unhandled
corner case is documented.

* Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
task would result in more consistent behavior.

* Pointed out that calling ptrace_stop() from do_signal_stop() in
TASK_STOPPED can race with group stop start logic and then confuse
the TRAPPING wait in ptrace_check_attach(). ptrace_stop() is now
called with TASK_RUNNING.

* Suggested using signal->wait_chldexit instead of bit wait.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Jan Kratochvil <[email protected]>
---
include/linux/sched.h | 2 +
kernel/ptrace.c | 49 ++++++++++++++++++++++++++++++---
kernel/signal.c | 72 ++++++++++++++++++++++++++++++++++++++++--------
3 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f97ba1b..8e6f9dc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1759,8 +1759,10 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
/*
* task->group_stop flags
*/
+#define GROUP_STOP_SIGMASK 0xffff /* signr of the last group stop */
#define GROUP_STOP_PENDING (1 << 16) /* task should stop for group stop */
#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */
+#define GROUP_STOP_TRAPPING (1 << 18) /* switching from STOPPED to TRACED */

extern void task_clear_group_stop(struct task_struct *task);

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a8c9f26..31de220 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -49,14 +49,22 @@ static void ptrace_untrace(struct task_struct *child)
spin_lock(&child->sighand->siglock);
if (task_is_traced(child)) {
/*
- * If the group stop is completed or in progress,
- * this thread was already counted as stopped.
+ * If group stop is completed or in progress, it should
+ * participate in the group stop. Set GROUP_STOP_PENDING
+ * before kicking it.
+ *
+ * This involves TRACED -> RUNNING -> STOPPED transition
+ * which is similar to but in the opposite direction of
+ * what happens while attaching to a stopped task.
+ * However, in this direction, the intermediate RUNNING
+ * state is not hidden even from the current ptracer and if
+ * it immediately re-attaches and performs a WNOHANG
+ * wait(2), it may fail.
*/
if (child->signal->flags & SIGNAL_STOP_STOPPED ||
child->signal->group_stop_count)
- __set_task_state(child, TASK_STOPPED);
- else
- signal_wake_up(child, 1);
+ child->group_stop |= GROUP_STOP_PENDING;
+ signal_wake_up(child, 1);
}
spin_unlock(&child->sighand->siglock);
}
@@ -165,6 +173,7 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)

int ptrace_attach(struct task_struct *task)
{
+ bool wait_trap = false;
int retval;

audit_ptrace(task);
@@ -204,12 +213,42 @@ int ptrace_attach(struct task_struct *task)
__ptrace_link(task, current);
send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);

+ spin_lock(&task->sighand->siglock);
+
+ /*
+ * If the task is already STOPPED, set GROUP_STOP_PENDING and
+ * TRAPPING, and kick it so that it transits to TRACED. TRAPPING
+ * will be cleared if the child completes the transition or any
+ * event which clears the group stop states happens. We'll wait
+ * for the transition to complete before returning from this
+ * function.
+ *
+ * This hides STOPPED -> RUNNING -> TRACED transition from the
+ * attaching thread but a different thread in the same group can
+ * still observe the transient RUNNING state. IOW, if another
+ * thread's WNOHANG wait(2) on the stopped tracee races against
+ * ATTACH, the wait(2) may fail due to the transient RUNNING.
+ *
+ * The following task_is_stopped() test is safe as both transitions
+ * in and out of STOPPED are protected by siglock.
+ */
+ if (task_is_stopped(task)) {
+ task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
+ signal_wake_up(task, 1);
+ wait_trap = true;
+ }
+
+ spin_unlock(&task->sighand->siglock);
+
retval = 0;
unlock_tasklist:
write_unlock_irq(&tasklist_lock);
unlock_creds:
mutex_unlock(&task->signal->cred_guard_mutex);
out:
+ if (wait_trap)
+ wait_event(current->signal->wait_chldexit,
+ !(task->group_stop & GROUP_STOP_TRAPPING));
return retval;
}

diff --git a/kernel/signal.c b/kernel/signal.c
index 6656145..574c3e9 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -224,10 +224,31 @@ static inline void print_dropped_signal(int sig)
}

/**
+ * task_clear_group_stop_trapping - clear group stop trapping bit
+ * @task: target task
+ *
+ * If GROUP_STOP_TRAPPING is set, a ptracer is waiting for us. Clear it
+ * and wake up the ptracer. Note that we don't need any further locking.
+ * @task->siglock guarantees that @task->parent points to the ptracer.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void task_clear_group_stop_trapping(struct task_struct *task)
+{
+ if (unlikely(task->group_stop & GROUP_STOP_TRAPPING)) {
+ task->group_stop &= ~GROUP_STOP_TRAPPING;
+ __wake_up_sync(&task->parent->signal->wait_chldexit,
+ TASK_UNINTERRUPTIBLE, 1);
+ }
+}
+
+/**
* task_clear_group_stop - clear pending group stop
* @task: target task
*
- * Clear group stop states for @task.
+ * Clear group stop pending state for @task. All group stop states except
+ * for the recorded last stop signal are cleared.
*
* CONTEXT:
* Must be called with @task->sighand->siglock held.
@@ -235,6 +256,7 @@ static inline void print_dropped_signal(int sig)
void task_clear_group_stop(struct task_struct *task)
{
task->group_stop &= ~(GROUP_STOP_PENDING | GROUP_STOP_CONSUME);
+ task_clear_group_stop_trapping(task);
}

/**
@@ -1694,6 +1716,14 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
}

/*
+ * We're committing to trapping. Clearing GROUP_STOP_TRAPPING and
+ * transition to TASK_TRACED should be atomic with respect to
+ * siglock. Do it after the arch hook as siglock is released and
+ * regrabbed across it.
+ */
+ task_clear_group_stop_trapping(current);
+
+ /*
* If @why is CLD_STOPPED, we're trapping to participate in a group
* stop. Do the bookkeeping. Note that if SIGCONT was delievered
* while siglock was released for the arch hook, PENDING could be
@@ -1788,6 +1818,9 @@ static int do_signal_stop(int signr)
unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
struct task_struct *t;

+ /* signr will be recorded in task->group_stop for retries */
+ WARN_ON_ONCE(signr & ~GROUP_STOP_SIGMASK);
+
if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
unlikely(signal_group_exit(sig)))
return 0;
@@ -1797,25 +1830,27 @@ static int do_signal_stop(int signr)
*/
sig->group_exit_code = signr;

- current->group_stop = gstop;
+ current->group_stop &= ~GROUP_STOP_SIGMASK;
+ current->group_stop |= signr | gstop;
sig->group_stop_count = 1;
- for (t = next_thread(current); t != current; t = next_thread(t))
+ for (t = next_thread(current); t != current;
+ t = next_thread(t)) {
+ t->group_stop &= ~GROUP_STOP_SIGMASK;
/*
* Setting state to TASK_STOPPED for a group
* stop is always done with the siglock held,
* so this check has no races.
*/
if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
- t->group_stop = gstop;
+ t->group_stop |= signr | gstop;
sig->group_stop_count++;
signal_wake_up(t, 0);
- } else
+ } else {
task_clear_group_stop(t);
+ }
+ }
}
-
- current->exit_code = sig->group_exit_code;
- __set_current_state(TASK_STOPPED);
-
+retry:
if (likely(!task_ptrace(current))) {
int notify = 0;

@@ -1827,6 +1862,7 @@ static int do_signal_stop(int signr)
if (task_participate_group_stop(current))
notify = CLD_STOPPED;

+ __set_current_state(TASK_STOPPED);
spin_unlock_irq(&current->sighand->siglock);

if (notify) {
@@ -1839,13 +1875,25 @@ static int do_signal_stop(int signr)
schedule();

spin_lock_irq(&current->sighand->siglock);
- } else
- ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+ } else {
+ ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
+ CLD_STOPPED, 0, NULL);
+ current->exit_code = 0;
+ }
+
+ /*
+ * GROUP_STOP_PENDING could be set if another group stop has
+ * started since being woken up or ptrace wants us to transit
+ * between TASK_STOPPED and TRACED. Retry group stop.
+ */
+ if (current->group_stop & GROUP_STOP_PENDING) {
+ WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
+ goto retry;
+ }

spin_unlock_irq(&current->sighand->siglock);

tracehook_finish_jctl();
- current->exit_code = 0;

return 1;
}
--
1.7.1

2010-12-24 14:01:59

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/7] signal: use GROUP_STOP_PENDING to stop once for a single group stop

Currently task->signal->group_stop_count is used to decide whether to
stop for group stop. However, if there is a task in the group which
is taking a long time to stop, other tasks which are continued by
ptrace would repeatedly stop for the same group stop until the group
stop is complete.

Conversely, if a ptraced task is in TASK_TRACED state, the debugger
won't get notified of group stops which is inconsistent compared to
the ptraced task in any other state.

This patch introduces GROUP_STOP_PENDING which tracks whether a task
is yet to stop for the group stop in progress. The flag is set when a
group stop starts and cleared when the task stops the first time for
the group stop, and consulted whenever whether the task should
participate in a group stop needs to be determined. Note that now
tasks in TASK_TRACED also participate in group stop.

This results in the following behavior changes.

* For a single group stop, a ptracer would see at most one stop
reported.

* A ptracee in TASK_TRACED now also participates in group stop and the
tracer would get the notification. However, as a ptraced task could
be in TASK_STOPPED state or any ptrace trap could consume group
stop, the notification may still be missing. These will be
addressed with further patches.

* A ptracee may start a group stop while one is still in progress if
the tracer let it continue with stop signal delivery. Group stop
code handles this correctly.

Oleg:

* Spotted that a task might skip signal check even when its
GROUP_STOP_PENDING is set. Fixed by updating
recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
group_stop_count.

* Pointed out that task->group_stop should be cleared whenever
task->signal->group_stop_count is cleared. Fixed accordingly.

* Pointed out the behavior inconsistency between TASK_TRACED and
RUNNING and the last behavior change.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
fs/exec.c | 1 +
include/linux/sched.h | 3 +++
kernel/signal.c | 36 +++++++++++++++++++++---------------
3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c62efcb..03cafa3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1653,6 +1653,7 @@ static int zap_process(struct task_struct *start, int exit_code)

t = start;
do {
+ task_clear_group_stop(t);
if (t != current && t->mm) {
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4790cf9..f97ba1b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1759,8 +1759,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
/*
* task->group_stop flags
*/
+#define GROUP_STOP_PENDING (1 << 16) /* task should stop for group stop */
#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */

+extern void task_clear_group_stop(struct task_struct *task);
+
#ifdef CONFIG_PREEMPT_RCU

#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
diff --git a/kernel/signal.c b/kernel/signal.c
index 89aad36..7e85e42 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -124,7 +124,7 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)

static int recalc_sigpending_tsk(struct task_struct *t)
{
- if (t->signal->group_stop_count > 0 ||
+ if ((t->group_stop & GROUP_STOP_PENDING) ||
PENDING(&t->pending, &t->blocked) ||
PENDING(&t->signal->shared_pending, &t->blocked)) {
set_tsk_thread_flag(t, TIF_SIGPENDING);
@@ -232,19 +232,19 @@ static inline void print_dropped_signal(int sig)
* CONTEXT:
* Must be called with @task->sighand->siglock held.
*/
-static void task_clear_group_stop(struct task_struct *task)
+void task_clear_group_stop(struct task_struct *task)
{
- task->group_stop &= ~GROUP_STOP_CONSUME;
+ task->group_stop &= ~(GROUP_STOP_PENDING | GROUP_STOP_CONSUME);
}

/**
* task_participate_group_stop - participate in a group stop
* @task: task participating in a group stop
*
- * @task is participating in a group stop. Group stop states are cleared
- * and the group stop count is consumed if %GROUP_STOP_CONSUME was set. If
- * the consumption completes the group stop, the appropriate %SIGNAL_*
- * flags are set.
+ * @task has GROUP_STOP_PENDING set and is participating in a group stop.
+ * Group stop states are cleared and the group stop count is consumed if
+ * %GROUP_STOP_CONSUME was set. If the consumption completes the group
+ * stop, the appropriate %SIGNAL_* flags are set.
*
* CONTEXT:
* Must be called with @task->sighand->siglock held.
@@ -254,6 +254,8 @@ static bool task_participate_group_stop(struct task_struct *task)
struct signal_struct *sig = task->signal;
bool consume = task->group_stop & GROUP_STOP_CONSUME;

+ WARN_ON_ONCE(!(task->group_stop & GROUP_STOP_PENDING));
+
task_clear_group_stop(task);

if (!consume)
@@ -765,6 +767,9 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
t = p;
do {
unsigned int state;
+
+ task_clear_group_stop(t);
+
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
/*
* If there is a handler for SIGCONT, we must make
@@ -906,6 +911,7 @@ static void complete_signal(int sig, struct task_struct *p, int group)
signal->group_stop_count = 0;
t = p;
do {
+ task_clear_group_stop(t);
sigaddset(&t->pending.signal, SIGKILL);
signal_wake_up(t, 1);
} while_each_thread(p, t);
@@ -1139,6 +1145,7 @@ int zap_other_threads(struct task_struct *p)
p->signal->group_stop_count = 0;

while_each_thread(p, t) {
+ task_clear_group_stop(t);
count++;

/* Don't bother with already dead threads */
@@ -1690,7 +1697,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
* If there is a group stop in progress,
* we must participate in the bookkeeping.
*/
- if (current->signal->group_stop_count > 0)
+ if (current->group_stop & GROUP_STOP_PENDING)
task_participate_group_stop(current);

current->last_siginfo = info;
@@ -1775,8 +1782,8 @@ static int do_signal_stop(int signr)
struct signal_struct *sig = current->signal;
int notify = 0;

- if (!sig->group_stop_count) {
- unsigned int gstop = GROUP_STOP_CONSUME;
+ if (!(current->group_stop & GROUP_STOP_PENDING)) {
+ unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
struct task_struct *t;

if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1796,8 +1803,7 @@ static int do_signal_stop(int signr)
* stop is always done with the siglock held,
* so this check has no races.
*/
- if (!(t->flags & PF_EXITING) &&
- !task_is_stopped_or_traced(t)) {
+ if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
t->group_stop = gstop;
sig->group_stop_count++;
signal_wake_up(t, 0);
@@ -1926,8 +1932,8 @@ relock:
if (unlikely(signr != 0))
ka = return_ka;
else {
- if (unlikely(signal->group_stop_count > 0) &&
- do_signal_stop(0))
+ if (unlikely(current->group_stop &
+ GROUP_STOP_PENDING) && do_signal_stop(0))
goto relock;

signr = dequeue_signal(current, &current->blocked,
@@ -2073,7 +2079,7 @@ void exit_signals(struct task_struct *tsk)
if (!signal_pending(t) && !(t->flags & PF_EXITING))
recalc_sigpending_and_wake(t);

- if (unlikely(tsk->signal->group_stop_count) &&
+ if (unlikely(tsk->group_stop & GROUP_STOP_PENDING) &&
task_participate_group_stop(tsk))
group_stop = CLD_STOPPED;
out:
--
1.7.1

2010-12-24 14:02:01

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/7] signal: fix premature completion of group stop when interfered by ptrace

task->signal->group_stop_count is used to track the progress of group
stop. It's initialized to the number of tasks which need to stop for
group stop to finish and each stopping or trapping task decrements.
However, each task doesn't keep track of whether it decremented the
counter or not and if woken up before the group stop is complete and
stops again, it can decrement the counter multiple times.

Please consider the following example code.

static void *worker(void *arg)
{
while (1) ;
return NULL;
}

int main(void)
{
pthread_t thread;
pid_t pid;
int i;

pid = fork();
if (!pid) {
for (i = 0; i < 5; i++)
pthread_create(&thread, NULL, worker, NULL);
while (1) ;
return 0;
}

ptrace(PTRACE_ATTACH, pid, NULL, NULL);
while (1) {
waitid(P_PID, pid, NULL, WSTOPPED);
ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
}
return 0;
}

The child creates five threads and the parent continuously traps the
first thread and whenever the child gets a signal, SIGSTOP is
delivered. If an external process sends SIGSTOP to the child, all
other threads in the process should reliably stop. However, due to
the above bug, the first thread will often end up consuming
group_stop_count multiple times and SIGSTOP often ends up stopping
none or part of the other four threads.

This patch adds a new field task->group_stop which is protected by
siglock and uses GROUP_STOP_CONSUME flag to track which task is still
to consume group_stop_count to fix this bug.

task_clear_group_stop() and task_participate_group_stop() are added to
help manipulating group stop states. As ptrace_stop() now also uses
task_participate_group_stop(), it will set SIGNAL_STOP_STOPPED if it
completes a group stop.

There still are many issues regarding the interaction between group
stop and ptrace. Patches to address them will follow.

- Oleg spotted duplicate GROUP_STOP_CONSUME. Dropped.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
include/linux/sched.h | 6 ++++
kernel/signal.c | 62 ++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 653644d..4790cf9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1244,6 +1244,7 @@ struct task_struct {
int exit_state;
int exit_code, exit_signal;
int pdeath_signal; /* The signal sent when the parent dies */
+ unsigned int group_stop; /* GROUP_STOP_*, siglock protected */
/* ??? */
unsigned int personality;
unsigned did_exec:1;
@@ -1755,6 +1756,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
#define used_math() tsk_used_math(current)

+/*
+ * task->group_stop flags
+ */
+#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */
+
#ifdef CONFIG_PREEMPT_RCU

#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
diff --git a/kernel/signal.c b/kernel/signal.c
index 4569801..89aad36 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -223,6 +223,52 @@ static inline void print_dropped_signal(int sig)
current->comm, current->pid, sig);
}

+/**
+ * task_clear_group_stop - clear pending group stop
+ * @task: target task
+ *
+ * Clear group stop states for @task.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static void task_clear_group_stop(struct task_struct *task)
+{
+ task->group_stop &= ~GROUP_STOP_CONSUME;
+}
+
+/**
+ * task_participate_group_stop - participate in a group stop
+ * @task: task participating in a group stop
+ *
+ * @task is participating in a group stop. Group stop states are cleared
+ * and the group stop count is consumed if %GROUP_STOP_CONSUME was set. If
+ * the consumption completes the group stop, the appropriate %SIGNAL_*
+ * flags are set.
+ *
+ * CONTEXT:
+ * Must be called with @task->sighand->siglock held.
+ */
+static bool task_participate_group_stop(struct task_struct *task)
+{
+ struct signal_struct *sig = task->signal;
+ bool consume = task->group_stop & GROUP_STOP_CONSUME;
+
+ task_clear_group_stop(task);
+
+ if (!consume)
+ return false;
+
+ if (!WARN_ON_ONCE(sig->group_stop_count == 0))
+ sig->group_stop_count--;
+
+ if (!sig->group_stop_count) {
+ sig->flags = SIGNAL_STOP_STOPPED;
+ return true;
+ }
+ return false;
+}
+
/*
* allocate a new signal queue record
* - this may be called without locks if and only if t == current, otherwise an
@@ -1645,7 +1691,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
* we must participate in the bookkeeping.
*/
if (current->signal->group_stop_count > 0)
- --current->signal->group_stop_count;
+ task_participate_group_stop(current);

current->last_siginfo = info;
current->exit_code = exit_code;
@@ -1730,6 +1776,7 @@ static int do_signal_stop(int signr)
int notify = 0;

if (!sig->group_stop_count) {
+ unsigned int gstop = GROUP_STOP_CONSUME;
struct task_struct *t;

if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1741,6 +1788,7 @@ static int do_signal_stop(int signr)
*/
sig->group_exit_code = signr;

+ current->group_stop = gstop;
sig->group_stop_count = 1;
for (t = next_thread(current); t != current; t = next_thread(t))
/*
@@ -1750,19 +1798,19 @@ static int do_signal_stop(int signr)
*/
if (!(t->flags & PF_EXITING) &&
!task_is_stopped_or_traced(t)) {
+ t->group_stop = gstop;
sig->group_stop_count++;
signal_wake_up(t, 0);
- }
+ } else
+ task_clear_group_stop(t);
}
/*
* If there are no other threads in the group, or if there is
* a group stop in progress and we are the last to stop, report
* to the parent. When ptraced, every thread reports itself.
*/
- if (!--sig->group_stop_count) {
- sig->flags = SIGNAL_STOP_STOPPED;
+ if (task_participate_group_stop(current))
notify = CLD_STOPPED;
- }
if (task_ptrace(current))
notify = CLD_STOPPED;

@@ -2026,10 +2074,8 @@ void exit_signals(struct task_struct *tsk)
recalc_sigpending_and_wake(t);

if (unlikely(tsk->signal->group_stop_count) &&
- !--tsk->signal->group_stop_count) {
- tsk->signal->flags = SIGNAL_STOP_STOPPED;
+ task_participate_group_stop(tsk))
group_stop = CLD_STOPPED;
- }
out:
spin_unlock_irq(&tsk->sighand->siglock);

--
1.7.1

2010-12-24 14:02:27

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/7] clone: kill CLONE_STOPPED

CLONE_STOPPED has been deprecated and generating warning messages
since 2.6.25 with recycling scheduled for 2.6.26. Remove it to
prepare for signal stop / ptrace cleanup.

For more details, please refer to the commit bdff746a (clone: prepare
to recycle CLONE_STOPPED).

Signed-off-by: Tejun Heo <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
include/linux/sched.h | 1 -
kernel/fork.c | 28 +---------------------------
2 files changed, 1 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2238745..653644d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -21,7 +21,6 @@
#define CLONE_DETACHED 0x00400000 /* Unused, ignored */
#define CLONE_UNTRACED 0x00800000 /* set if the tracing process can't force CLONE_PTRACE on this clone */
#define CLONE_CHILD_SETTID 0x01000000 /* set the TID in the child */
-#define CLONE_STOPPED 0x02000000 /* Start in stopped state */
#define CLONE_NEWUTS 0x04000000 /* New utsname group? */
#define CLONE_NEWIPC 0x08000000 /* New ipcs */
#define CLONE_NEWUSER 0x10000000 /* New user namespace */
diff --git a/kernel/fork.c b/kernel/fork.c
index 5447dc7..0d38381 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1408,23 +1408,6 @@ long do_fork(unsigned long clone_flags,
}

/*
- * We hope to recycle these flags after 2.6.26
- */
- if (unlikely(clone_flags & CLONE_STOPPED)) {
- static int __read_mostly count = 100;
-
- if (count > 0 && printk_ratelimit()) {
- char comm[TASK_COMM_LEN];
-
- count--;
- printk(KERN_INFO "fork(): process `%s' used deprecated "
- "clone flags 0x%lx\n",
- get_task_comm(comm, current),
- clone_flags & CLONE_STOPPED);
- }
- }
-
- /*
* When called from kernel_thread, don't do user tracing stuff.
*/
if (likely(user_mode(regs)))
@@ -1462,16 +1445,7 @@ long do_fork(unsigned long clone_flags,
*/
p->flags &= ~PF_STARTING;

- if (unlikely(clone_flags & CLONE_STOPPED)) {
- /*
- * We'll start up with an immediate SIGSTOP.
- */
- sigaddset(&p->pending.signal, SIGSTOP);
- set_tsk_thread_flag(p, TIF_SIGPENDING);
- __set_task_state(p, TASK_STOPPED);
- } else {
- wake_up_new_task(p, clone_flags);
- }
+ wake_up_new_task(p, clone_flags);

tracehook_report_clone_complete(trace, regs,
clone_flags, nr, p);
--
1.7.1

2011-02-01 10:35:26

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET RFC] ptrace,signal: clean transition between STOPPED and TRACED

Hello,

On Mon, Jan 31, 2011 at 05:07:24PM +0100, Tejun Heo wrote:
> I thought it was the first one but might be mistaken. I left it
> running for a couple of minutes several times but maybe it just needs
> more time. I'll try again.

So, it turned out my computer was too slow and/or I was too impatient.
make xcheck behaved the same before and after the patchset. All six
passed with one skipped.

Thanks.

--
tejun

2011-02-02 05:39:36

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCHSET RFC] ptrace,signal: clean transition between STOPPED and TRACED

> Also, the first test of xcheck seems to enter infinite loop.

It's a test for race-condition bugs, where the original test was to run the
infinite loop until it crashed. As now written, it runs the loop until
$TESTTIME seconds have passed, defaulting to 10. With various buggy
kernels in various configurations and on various machines, the time it
takes to get to a crash has varied widely.

> I see. Yeah, if there are users which expect /proc/pid/status to be
> certain value, we can either emulate it or delay TRACED transition to
> the next PTRACE call *after* ATTACH/wait(2) sequence, but I think both
> are quite ugly and would like to avoid if at all possible.

I agree. We just need to be more clearly sure about userland requirements
before we make changes. I hope the answer is that nothing other than these
synthetic test cases actually relies on which one /proc/pid/status reports.
(That is, if they do check it, they only notice the "T".)


Thanks,
Roland