2010-11-26 10:50:30

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET RFC] ptrace,signal: sane interaction between ptrace and job control signals

Hello,

This patchset is an attempt at cleaning up ptrace and signal
behaviors, especially the interaction between ptrace and group stop.
There are quite some number of problems in the area and the current
behavior is often racy, indeterministic and sometimes outright buggy.
This patchset aims to clean up the muddy stuff, and define and
implement a clear interaction between ptrace and job control.

This patchset contains the following fourteen patches.

0001-signal-fix-SIGCONT-notification-code.patch
0002-freezer-fix-a-race-during-freezing-of-TASK_STOPPED-t.patch
0003-freezer-remove-superflous-try_to_freeze-loop-in-do_s.patch
0004-signal-don-t-notify-parent-if-not-stopping-after-tra.patch
0005-signal-fix-premature-completion-of-group-stop-when-i.patch
0006-signal-use-GROUP_STOP_PENDING-to-avoid-stopping-mult.patch
0007-ptrace-add-why-to-ptrace_stop.patch
0008-ptrace-make-do_signal_stop-use-ptrace_stop-if-the-ta.patch
0009-ptrace-clean-transitions-between-TASK_STOPPED-and-TR.patch
0010-ptrace-don-t-consume-group-count-from-ptrace_stop.patch
0011-ptrace-make-group-stop-notification-reliable-against.patch
0012-ptrace-reorganize-__ptrace_unlink-and-ptrace_untrace.patch
0013-ptrace-make-SIGCONT-notification-reliable-against-pt.patch
0014-ptrace-remove-the-extra-wake_up_process-from-ptrace_.patch

0001-0002 are self-contained fixes. 0003-0004 are preparation
patches.

0005 fixes over-consumption of group_stop_count by ptraced tasks which
can lead to premature completion of group stop with some of the tasks
in the group still running.

0006 prevents a ptraced task from being stopped multiple times for the
same group stop instance.

0007-0009 updates the code such that a ptracee always enters and
leaves TASK_TRACED on its own. ptracer no longer changes tracee's
state underneath it; instead, it tells the tracee to enter the target
state. A TASK_TRACED task is guaranteed to be stopped inside
ptrace_stop() after executing the arch hooks while TASK_STOPPED task
is guaranteed to be stopped in do_signal_stop().

0010-0013 makes CLD_STOPPED/CONTINUED notification reliable with
intervening ptrace. Whether a ptracee owes an notification to its
parent is tracked and the real parent is notified accordingly on
detach.

0014 kills the unnecessary wake_up_process() which is wrong in so many
ways including the possibility of abruptly waking up a task in an
uninterruptible sleep. Now that ptrace / job control interaction is
cleaned up, this really should go.

After the patchset, the interaction between ptrace and job control is
defined as,

* Regardless of ptrace, job control signals control the process-wide
stopped state. A ptracee receives and handles job control signals
the same as before but no matter what ptrace does the global state
isn't affected.

* On ptrace detach, the task is put into the state which matches the
process-wide stopped state. If necessary, notification to the real
parent is reinstated.

Due to the implicit sending of SIGSTOP on PTRACE_ATTACH, ptrace
attach/detach are still not transparent w.r.t. job control but these
changes lay the base for (almost) transparent ptracing.

The branch is on top of 2.6.37-rc3 and available in the following git
branch,

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git ptrace

and contains the following changes.

include/linux/sched.h | 11 +++
kernel/freezer.c | 9 ++-
kernel/power/process.c | 6 ++
kernel/ptrace.c | 105 +++++++++++++++++++++++++-----------
kernel/signal.c | 140 +++++++++++++++++++++++++++++++++----------------
5 files changed, 194 insertions(+), 77 deletions(-)

Thanks.

--
tejun


2010-11-26 10:50:10

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 12/14] ptrace: reorganize __ptrace_unlink() and ptrace_untrace()

* Collapse ptrace_untrace() into __ptrace_unlink().

* Always do the whole unlinking inside siglock.

* Untracing is done before unlinking.

This is to prepare for further changes. As the whole
unlinking/tracing is done inside both tasklist lock and siglock, the
reordering doesn't cause any visible behavior difference.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
kernel/ptrace.c | 40 ++++++++++++++++++----------------------
1 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 08b18f2..71141bf 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -38,45 +38,41 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
}

/*
- * Turn a tracing stop into a normal stop now, since with no tracer there
- * would be no way to wake it up with SIGCONT or SIGKILL. If there was a
- * signal sent that would resume the child, but didn't because it was in
- * TASK_TRACED, resume it now.
- * Requires that irqs be disabled.
+ * unptrace a task: move it back to its original parent and remove it
+ * from the ptrace list.
+ *
+ * Turn a tracing stop into a normal stop now, since with no tracer
+ * there would be no way to wake it up with SIGCONT or SIGKILL. If
+ * there was a signal sent that would resume the child, but didn't
+ * because it was in TASK_TRACED, resume it now. Requires that irqs
+ * be disabled.
+ *
+ * Must be called with the tasklist lock write-held.
*/
-static void ptrace_untrace(struct task_struct *child)
+void __ptrace_unlink(struct task_struct *child)
{
+ struct signal_struct *sig = child->signal;
+
+ BUG_ON(!child->ptrace);
+
spin_lock(&child->sighand->siglock);
+
if (task_is_traced(child)) {
/*
* If group stop is completed or in progress, the task
* should enter group stop. Set GROUP_STOP_PENDING
* before kicking it.
*/
- if (child->signal->flags & SIGNAL_STOP_STOPPED ||
- child->signal->group_stop_count)
+ if (sig->flags & SIGNAL_STOP_STOPPED || sig->group_stop_count)
child->group_stop |= GROUP_STOP_PENDING;
signal_wake_up(child, 1);
}
- spin_unlock(&child->sighand->siglock);
-}
-
-/*
- * unptrace a task: move it back to its original parent and
- * remove it from the ptrace list.
- *
- * Must be called with the tasklist lock write-held.
- */
-void __ptrace_unlink(struct task_struct *child)
-{
- BUG_ON(!child->ptrace);

child->ptrace = 0;
child->parent = child->real_parent;
list_del_init(&child->ptrace_entry);

- if (task_is_traced(child))
- ptrace_untrace(child);
+ spin_unlock(&child->sighand->siglock);
}

/*
--
1.7.1

2010-11-26 10:50:21

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 08/14] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced

A ptraced task would still stop at do_signal_stop() when it's stopping
for stop signals and do_signal_stop() behaves the same whether the
task is ptraced or not. However, in addition to stopping,
ptrace_stop() also does ptrace specific stuff like calling
architecture specific callbacks, so this behavior makes the code more
fragile and difficult to understand.

This patch makes do_signal_stop() test whether the task is ptraced and
use ptrace_stop() if so. This renders tracehook_notify_jctl() rather
pointless as the ptrace notification is now handled by ptrace_stop()
regardless of the return value from the tracehook. It probably is a
good idea to update it.

This doesn't solve the whole problem as tasks already in stopped state
would stay in the regular stop when ptrace attached. That part will
be handled by the next patch.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
kernel/signal.c | 31 ++++++++++++++++++-------------
1 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 3b3e69c..6d78da6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1788,10 +1788,8 @@ static int do_signal_stop(int signr)
* we test GROUP_STOP_PENDING again. If SIGCONT or SIGKILL
* comes in between, it would be clear.
*/
- if (!(current->group_stop & GROUP_STOP_PENDING)) {
- spin_unlock_irq(&current->sighand->siglock);
- goto out;
- }
+ if (!(current->group_stop & GROUP_STOP_PENDING))
+ goto out_unlock;

if (consume_group_stop())
sig->flags = SIGNAL_STOP_STOPPED;
@@ -1800,17 +1798,24 @@ static int do_signal_stop(int signr)
current->group_stop &= ~GROUP_STOP_PENDING;
__set_current_state(TASK_STOPPED);

- spin_unlock_irq(&current->sighand->siglock);
+ if (likely(!task_ptrace(current))) {
+ spin_unlock_irq(&current->sighand->siglock);

- if (notify) {
- read_lock(&tasklist_lock);
- do_notify_parent_cldstop(current, notify);
- read_unlock(&tasklist_lock);
- }
+ if (notify) {
+ read_lock(&tasklist_lock);
+ do_notify_parent_cldstop(current, notify);
+ read_unlock(&tasklist_lock);
+ }
+
+ /* Now we don't run again until woken by SIGCONT or SIGKILL */
+ schedule();
+
+ spin_lock_irq(&current->sighand->siglock);
+ } else
+ ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+out_unlock:
+ spin_unlock_irq(&current->sighand->siglock);

- /* Now we don't run again until woken by SIGCONT or SIGKILL */
- schedule();
-out:
tracehook_finish_jctl();
current->exit_code = 0;

--
1.7.1

2010-11-26 10:50:28

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 01/14] signal: fix SIGCONT notification code

After a task receives SIGCONT, its parent is notified via SIGCHLD with
its siginfo describing what the notified event is. If SIGCONT is
received while the child process is stopped, the code should be
CLD_CONTINUED. If SIGCONT is recieved while the child process is in
the process of being stopped, it should be CLD_STOPPED. Which code to
use is determined in prepare_signal() and recorded in signal->flags
using SIGNAL_CLD_CONTINUED|STOP flags.

get_signal_deliver() should test these flags and then notify
accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying
CLD_CONTINUED if the signal is delivered before the task is wait(2)ed
and CLD_STOPPED if the state was fetched already.

Fix it by testing SIGNAL_CLD_CONTINUED. While at it, uncompress the
?: test into if/else clause for better readability.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
kernel/signal.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 4e3cff1..fe004b5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1853,8 +1853,13 @@ relock:
* the CLD_ si_code into SIGNAL_CLD_MASK bits.
*/
if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
- int why = (signal->flags & SIGNAL_STOP_CONTINUED)
- ? CLD_CONTINUED : CLD_STOPPED;
+ int why;
+
+ if (signal->flags & SIGNAL_CLD_CONTINUED)
+ why = CLD_CONTINUED;
+ else
+ why = CLD_STOPPED;
+
signal->flags &= ~SIGNAL_CLD_MASK;

why = tracehook_notify_jctl(why, CLD_CONTINUED);
--
1.7.1

2010-11-26 10:50:41

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 03/14] freezer: remove superflous try_to_freeze() loop in do_signal_stop()

do_signal_stop() is used only by get_signal_to_deliver() and after a
successful signal stop, it always calls try_to_freeze(), so the
try_to_freeze() loop around schedule() in do_signal_stop() is
superflous and confusing. Remove it.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
---
kernel/signal.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index fe004b5..0a6816a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1781,9 +1781,7 @@ static int do_signal_stop(int signr)
}

/* Now we don't run again until woken by SIGCONT or SIGKILL */
- do {
- schedule();
- } while (try_to_freeze());
+ schedule();

tracehook_finish_jctl();
current->exit_code = 0;
--
1.7.1

2010-11-26 10:50:36

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 02/14] freezer: fix a race during freezing of TASK_STOPPED tasks

After calling freeze_task(), try_to_freeze_tasks() see whether the
task is stopped or traced and if so, considers it to be frozen;
however, nothing guarantees that either the task being frozen sees
TIF_FREEZE or the freezer sees TASK_STOPPED -> TASK_RUNNING
transition. The task being frozen may wake up and not see TIF_FREEZE
while the freezer fails to notice the transition and believes the task
is still stopped.

This patch fixes the race by making freeze_task() always go through
fake_signal_wake_up() for applicable tasks. The function goes through
the target task's scheduler lock and thus guarantees that either the
target sees TIF_FREEZE or try_to_freeze_task() sees TASK_RUNNING.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
---
kernel/freezer.c | 9 +++++++--
kernel/power/process.c | 6 ++++++
2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index bd1d42b..66ecd2e 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -104,8 +104,13 @@ bool freeze_task(struct task_struct *p, bool sig_only)
}

if (should_send_signal(p)) {
- if (!signal_pending(p))
- fake_signal_wake_up(p);
+ fake_signal_wake_up(p);
+ /*
+ * fake_signal_wake_up() goes through p's scheduler
+ * lock and guarantees that TASK_STOPPED/TRACED ->
+ * TASK_RUNNING transition can't race with task state
+ * testing in try_to_freeze_tasks().
+ */
} else if (sig_only) {
return false;
} else {
diff --git a/kernel/power/process.c b/kernel/power/process.c
index e50b4c1..eb2c88a 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -64,6 +64,12 @@ static int try_to_freeze_tasks(bool sig_only)
* perturb a task in TASK_STOPPED or TASK_TRACED.
* It is "frozen enough". If the task does wake
* up, it will immediately call try_to_freeze.
+ *
+ * Because freeze_task() goes through p's
+ * scheduler lock after setting TIF_FREEZE, it's
+ * guaranteed that either we see TASK_RUNNING or
+ * try_to_stop() after schedule() in ptrace/signal
+ * stop sees TIF_FREEZE.
*/
if (!task_is_stopped_or_traced(p) &&
!freezer_should_skip(p))
--
1.7.1

2010-11-26 10:50:26

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 10/14] ptrace: don't consume group count from ptrace_stop()

Now that group stop pending and consumed states are properly tracked
per task, there is no need to consume group_stop_count from
ptrace_stop() which is inaccurate.

The only behavior change is the increased likelihood of missing
notifications for group stops if the thread group contains one or more
ptraced tasks, but they never were reliable in the presence of ptraced
tasks. With this change, consume_group_stop() is called only if group
stop is pending. Make sure it is so by adding WARN_ON_ONCE().

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
kernel/signal.c | 9 ++-------
1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 8341667..c084ea8 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -225,6 +225,8 @@ static inline void print_dropped_signal(int sig)

static bool consume_group_stop(void)
{
+ WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_PENDING));
+
if (!(current->group_stop & GROUP_STOP_CONSUME))
return false;

@@ -1657,13 +1659,6 @@ static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
return;
}

- /*
- * If there is a group stop in progress,
- * we must participate in the bookkeeping.
- */
- if (current->signal->group_stop_count > 0)
- consume_group_stop();
-
current->last_siginfo = info;
current->exit_code = exit_code;

--
1.7.1

2010-11-26 10:51:18

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 04/14] signal: don't notify parent if not stopping after tracehook_notify_jctl() in do_signal_stop()

do_signal_stop() tests sig->group_stop_count one more time after
calling tracehook_notify_jctl() as it's allowed release siglock. If
group_stop_count has changed to zero, it no longer stops but still
notifies the parent. For both SIGCONT and KILL which could cause the
condition, this notification is unnecessary.

SIGCONT will be notified to the parent when the task calls
get_signal_to_deliver() right after returning from do_signal_stop()
which will handle the collapsed notification correctly by itself. The
notification from do_signal_stop() in this case would only cause
duplication. For SIGKILL, the imminent death of the task will be
notified to parent and it's completely superflous to report the
skipped stop.

Also, tracehook_notify_jctl() doesn't release siglock, so, currently,
none of these matters at all.

This patch updates do_signal_stop() such that it jumps out of the
function if group_stop_count has dropped during
tracehook_notify_jctl().

This doesn't cause any behavior difference as the condition never
triggers in the current code.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
kernel/signal.c | 21 +++++++++++++--------
1 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0a6816a..6f7407d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1763,15 +1763,20 @@ static int do_signal_stop(int signr)
notify = tracehook_notify_jctl(notify, CLD_STOPPED);
/*
* tracehook_notify_jctl() can drop and reacquire siglock, so
- * we keep ->group_stop_count != 0 before the call. If SIGCONT
- * or SIGKILL comes in between ->group_stop_count == 0.
+ * we test ->group_stop_count again. If SIGCONT or SIGKILL
+ * comes in between, ->group_stop_count == 0.
*/
- if (sig->group_stop_count) {
- if (!--sig->group_stop_count)
- sig->flags = SIGNAL_STOP_STOPPED;
- current->exit_code = sig->group_exit_code;
- __set_current_state(TASK_STOPPED);
+ if (!sig->group_stop_count) {
+ spin_unlock_irq(&current->sighand->siglock);
+ goto out;
}
+
+ if (!--sig->group_stop_count)
+ sig->flags = SIGNAL_STOP_STOPPED;
+
+ current->exit_code = sig->group_exit_code;
+ __set_current_state(TASK_STOPPED);
+
spin_unlock_irq(&current->sighand->siglock);

if (notify) {
@@ -1782,7 +1787,7 @@ static int do_signal_stop(int signr)

/* Now we don't run again until woken by SIGCONT or SIGKILL */
schedule();
-
+out:
tracehook_finish_jctl();
current->exit_code = 0;

--
1.7.1

2010-11-26 10:51:51

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 07/14] ptrace: add @why to ptrace_stop()

To prepare for cleanup of the interaction between group stop and
ptrace, add @why to ptrace_stop(). Existing users are updateda such
that there is no behavior change.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/signal.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index b859690..3b3e69c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1633,7 +1633,7 @@ static int sigkill_pending(struct task_struct *tsk)
* If we actually decide not to stop at all because the tracer
* is gone, we keep current->exit_code unless clear_code.
*/
-static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
+static void ptrace_stop(int exit_code, int why, int clear_code, siginfo_t *info)
__releases(&current->sighand->siglock)
__acquires(&current->sighand->siglock)
{
@@ -1671,7 +1671,7 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
if (may_ptrace_stop()) {
- do_notify_parent_cldstop(current, CLD_TRAPPED);
+ do_notify_parent_cldstop(current, why);
/*
* Don't want to allow preemption here, because
* sys_ptrace() needs this task to be inactive.
@@ -1730,7 +1730,7 @@ void ptrace_notify(int exit_code)

/* Let the debugger run. */
spin_lock_irq(&current->sighand->siglock);
- ptrace_stop(exit_code, 1, &info);
+ ptrace_stop(exit_code, CLD_TRAPPED, 1, &info);
spin_unlock_irq(&current->sighand->siglock);
}

@@ -1826,7 +1826,7 @@ static int ptrace_signal(int signr, siginfo_t *info,
ptrace_signal_deliver(regs, cookie);

/* Let the debugger run. */
- ptrace_stop(signr, 0, info);
+ ptrace_stop(signr, CLD_TRAPPED, 0, info);

/* We're back. Did the debugger cancel the sig? */
signr = current->exit_code;
--
1.7.1

2010-11-26 10:51:16

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 05/14] signal: fix premature completion of group stop when interfered by ptrace

task->signal->group_stop_count is used to tracke the progress of group
stop. It's initialized to the number of tasks which need to stop for
group stop to finish and each stopping or trapping task decrements.
However, each task doesn't keep track of whether it decremented the
counter or not and if woken up before the group stop is complete and
stops again, it can decrement the counter multiple times.

Please consider the following example code.

static void *worker(void *arg)
{
while (1) ;
return NULL;
}

int main(void)
{
pthread_t thread;
pid_t pid;
int i;

pid = fork();
if (!pid) {
for (i = 0; i < 5; i++)
pthread_create(&thread, NULL, worker, NULL);
while (1) ;
return 0;
}

ptrace(PTRACE_ATTACH, pid, NULL, NULL);
while (1) {
waitid(P_PID, pid, NULL, WSTOPPED);
ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
}
return 0;
}

The child creates five threads and the parent continuously traps the
first thread and whenever the child gets a signal, SIGSTOP is
delivered. If an external process sends SIGSTOP to the child, all
other threads in the process should reliably stop. However, due to
the above described bug, the first thread will often end up consuming
group_stop_count multiple times and SIGSTOP often ends up stopping
none or part of the other four threads.

This patch adds a new field task->group_stop which is protected by
siglock and uses GROUP_STOP_CONSUME flag to track which task is still
to consume group_stop_count to fix this bug.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
include/linux/sched.h | 6 ++++++
kernel/signal.c | 32 +++++++++++++++++++++++++-------
2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c79e92..93157a4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1245,6 +1245,7 @@ struct task_struct {
int exit_state;
int exit_code, exit_signal;
int pdeath_signal; /* The signal sent when the parent dies */
+ unsigned int group_stop; /* GROUP_STOP_*, siglock protected */
/* ??? */
unsigned int personality;
unsigned did_exec:1;
@@ -1756,6 +1757,11 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
#define used_math() tsk_used_math(current)

+/*
+ * task->group_stop flags
+ */
+#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */
+
#ifdef CONFIG_PREEMPT_RCU

#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
diff --git a/kernel/signal.c b/kernel/signal.c
index 6f7407d..7e2da0d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -223,6 +223,19 @@ static inline void print_dropped_signal(int sig)
current->comm, current->pid, sig);
}

+static bool consume_group_stop(void)
+{
+ if (!(current->group_stop & GROUP_STOP_CONSUME))
+ return false;
+
+ current->group_stop &= ~GROUP_STOP_CONSUME;
+
+ if (!WARN_ON_ONCE(current->signal->group_stop_count == 0))
+ current->signal->group_stop_count--;
+
+ return current->signal->group_stop_count == 0;
+}
+
/*
* allocate a new signal queue record
* - this may be called without locks if and only if t == current, otherwise an
@@ -1645,7 +1658,7 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
* we must participate in the bookkeeping.
*/
if (current->signal->group_stop_count > 0)
- --current->signal->group_stop_count;
+ consume_group_stop();

current->last_siginfo = info;
current->exit_code = exit_code;
@@ -1727,9 +1740,10 @@ void ptrace_notify(int exit_code)
static int do_signal_stop(int signr)
{
struct signal_struct *sig = current->signal;
- int notify;
+ int notify = 0;

if (!sig->group_stop_count) {
+ unsigned int gstop = GROUP_STOP_CONSUME;
struct task_struct *t;

if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1741,6 +1755,7 @@ static int do_signal_stop(int signr)
*/
sig->group_exit_code = signr;

+ current->group_stop = gstop;
sig->group_stop_count = 1;
for (t = next_thread(current); t != current; t = next_thread(t))
/*
@@ -1750,16 +1765,20 @@ static int do_signal_stop(int signr)
*/
if (!(t->flags & PF_EXITING) &&
!task_is_stopped_or_traced(t)) {
+ t->group_stop = gstop;
sig->group_stop_count++;
signal_wake_up(t, 0);
- }
+ } else
+ t->group_stop = 0;
}
/*
* If there are no other threads in the group, or if there is
* a group stop in progress and we are the last to stop, report
* to the parent. When ptraced, every thread reports itself.
*/
- notify = sig->group_stop_count == 1 ? CLD_STOPPED : 0;
+ if (sig->group_stop_count == 1 &&
+ (current->group_stop & GROUP_STOP_CONSUME))
+ notify = CLD_STOPPED;
notify = tracehook_notify_jctl(notify, CLD_STOPPED);
/*
* tracehook_notify_jctl() can drop and reacquire siglock, so
@@ -1771,7 +1790,7 @@ static int do_signal_stop(int signr)
goto out;
}

- if (!--sig->group_stop_count)
+ if (consume_group_stop())
sig->flags = SIGNAL_STOP_STOPPED;

current->exit_code = sig->group_exit_code;
@@ -2036,8 +2055,7 @@ void exit_signals(struct task_struct *tsk)
if (!signal_pending(t) && !(t->flags & PF_EXITING))
recalc_sigpending_and_wake(t);

- if (unlikely(tsk->signal->group_stop_count) &&
- !--tsk->signal->group_stop_count) {
+ if (unlikely(tsk->signal->group_stop_count) && consume_group_stop()) {
tsk->signal->flags = SIGNAL_STOP_STOPPED;
group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
}
--
1.7.1

2010-11-26 10:51:50

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 06/14] signal: use GROUP_STOP_PENDING to avoid stopping multiple times for a single group stop

Currently task->signal->group_stop_count is used to decide whether to
stop for group stop. However, if there is a task in the group which
is taking a long time to stop, other tasks which are continued by
ptrace would repeatedly stop for the same group stop until the group
stop is complete.

This patch introduces GROUP_STOP_PENDING which tracks whether a task
is yet to stop for the group stop in progress. The flag is set when a
group stop starts and cleared when the task stops the first time for
the group stop, so the task won't stop multiple times for the same
group stop.

Note that currently GROUP_STOP_PENDING tracks the same state as
GROUP_STOP_CONSUME. Both always get set and cleared together without
releasing siglock inbetween. This will change with future patches.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
include/linux/sched.h | 1 +
kernel/signal.c | 21 +++++++++++++--------
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93157a4..1261993 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1760,6 +1760,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
/*
* task->group_stop flags
*/
+#define GROUP_STOP_PENDING (1 << 16) /* task should stop for group stop */
#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */

#ifdef CONFIG_PREEMPT_RCU
diff --git a/kernel/signal.c b/kernel/signal.c
index 7e2da0d..b859690 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -732,6 +732,9 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
t = p;
do {
unsigned int state;
+
+ t->group_stop = 0;
+
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
/*
* If there is a handler for SIGCONT, we must make
@@ -1742,8 +1745,8 @@ static int do_signal_stop(int signr)
struct signal_struct *sig = current->signal;
int notify = 0;

- if (!sig->group_stop_count) {
- unsigned int gstop = GROUP_STOP_CONSUME;
+ if (!(current->group_stop & GROUP_STOP_PENDING)) {
+ unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
struct task_struct *t;

if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1782,10 +1785,10 @@ static int do_signal_stop(int signr)
notify = tracehook_notify_jctl(notify, CLD_STOPPED);
/*
* tracehook_notify_jctl() can drop and reacquire siglock, so
- * we test ->group_stop_count again. If SIGCONT or SIGKILL
- * comes in between, ->group_stop_count == 0.
+ * we test GROUP_STOP_PENDING again. If SIGCONT or SIGKILL
+ * comes in between, it would be clear.
*/
- if (!sig->group_stop_count) {
+ if (!(current->group_stop & GROUP_STOP_PENDING)) {
spin_unlock_irq(&current->sighand->siglock);
goto out;
}
@@ -1794,6 +1797,7 @@ static int do_signal_stop(int signr)
sig->flags = SIGNAL_STOP_STOPPED;

current->exit_code = sig->group_exit_code;
+ current->group_stop &= ~GROUP_STOP_PENDING;
__set_current_state(TASK_STOPPED);

spin_unlock_irq(&current->sighand->siglock);
@@ -1908,8 +1912,8 @@ relock:
if (unlikely(signr != 0))
ka = return_ka;
else {
- if (unlikely(signal->group_stop_count > 0) &&
- do_signal_stop(0))
+ if (unlikely(current->group_stop &
+ GROUP_STOP_PENDING) && do_signal_stop(0))
goto relock;

signr = dequeue_signal(current, &current->blocked,
@@ -2055,7 +2059,8 @@ void exit_signals(struct task_struct *tsk)
if (!signal_pending(t) && !(t->flags & PF_EXITING))
recalc_sigpending_and_wake(t);

- if (unlikely(tsk->signal->group_stop_count) && consume_group_stop()) {
+ if (unlikely(tsk->group_stop & GROUP_STOP_PENDING) &&
+ consume_group_stop()) {
tsk->signal->flags = SIGNAL_STOP_STOPPED;
group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
}
--
1.7.1

2010-11-26 10:50:08

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 09/14] ptrace: clean transitions between TASK_STOPPED and TRACED

Currently, if the task is STOPPED on ptrace attach, it's left alone
and the state is silently changed to TRACED on the next ptrace call.
The behavior breaks the assumption that arch_ptrace_stop() is called
before any task is poked by ptrace operations and is ugly in that a
task manipulates the state of another task directly.

With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
TRACED can be made clean. The tracer can use the flag to tell the
tracee to retry stop on attach and detach. On retry, the tracee will
enter the desired state the correct way. The lower 16bits of
task->group_stop is used to remember the signal number which caused
the last group stop. This is used while retrying for ptrace attach as
the original group_exit_code could have been consumed with wait(2) by
then.

As the real parent may wait(2) and consume the group_exit_code
anytime, the group_exit_code needs to be saved separately so that it
can be used when switching from regular sleep to ptrace_stop(). This
is recorded in the lower 16bits of task->group_stop.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
include/linux/sched.h | 1 +
kernel/ptrace.c | 27 +++++++++++++++++++--------
kernel/signal.c | 23 +++++++++++++++++------
3 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1261993..e78b1e5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1760,6 +1760,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
/*
* task->group_stop flags
*/
+#define GROUP_STOP_SIGMASK 0xffff /* signr of the last group stop */
#define GROUP_STOP_PENDING (1 << 16) /* task should stop for group stop */
#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 99bbaa3..08b18f2 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -49,14 +49,14 @@ static void ptrace_untrace(struct task_struct *child)
spin_lock(&child->sighand->siglock);
if (task_is_traced(child)) {
/*
- * If the group stop is completed or in progress,
- * this thread was already counted as stopped.
+ * If group stop is completed or in progress, the task
+ * should enter group stop. Set GROUP_STOP_PENDING
+ * before kicking it.
*/
if (child->signal->flags & SIGNAL_STOP_STOPPED ||
child->signal->group_stop_count)
- __set_task_state(child, TASK_STOPPED);
- else
- signal_wake_up(child, 1);
+ child->group_stop |= GROUP_STOP_PENDING;
+ signal_wake_up(child, 1);
}
spin_unlock(&child->sighand->siglock);
}
@@ -101,9 +101,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
* does ptrace_unlink() before __exit_signal().
*/
spin_lock_irq(&child->sighand->siglock);
- if (task_is_stopped(child))
- child->state = TASK_TRACED;
- else if (!task_is_traced(child) && !kill)
+ if (!task_is_traced(child) && !kill)
ret = -ESRCH;
spin_unlock_irq(&child->sighand->siglock);
}
@@ -204,6 +202,19 @@ int ptrace_attach(struct task_struct *task)
__ptrace_link(task, current);
send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);

+ /*
+ * If the task is already STOPPED, set GROUP_STOP_PENDING and
+ * kick it so that it transits to TRACED. This is safe as
+ * both transitions in and out of STOPPED are protected by
+ * siglock.
+ */
+ spin_lock(&task->sighand->siglock);
+ if (task_is_stopped(task)) {
+ task->group_stop |= GROUP_STOP_PENDING;
+ signal_wake_up(task, 1);
+ }
+ spin_unlock(&task->sighand->siglock);
+
retval = 0;
unlock_tasklist:
write_unlock_irq(&tasklist_lock);
diff --git a/kernel/signal.c b/kernel/signal.c
index 6d78da6..8341667 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -733,7 +733,8 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
do {
unsigned int state;

- t->group_stop = 0;
+ /* clear all group_stop flags, only keep the signr */
+ t->group_stop &= GROUP_STOP_SIGMASK;

rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
/*
@@ -1749,6 +1750,9 @@ static int do_signal_stop(int signr)
unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
struct task_struct *t;

+ /* signr will be recorded in task->group_stop for retries */
+ WARN_ON_ONCE(signr & ~GROUP_STOP_SIGMASK);
+
if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
unlikely(signal_group_exit(sig)))
return 0;
@@ -1758,7 +1762,7 @@ static int do_signal_stop(int signr)
*/
sig->group_exit_code = signr;

- current->group_stop = gstop;
+ current->group_stop = signr | gstop;
sig->group_stop_count = 1;
for (t = next_thread(current); t != current; t = next_thread(t))
/*
@@ -1768,11 +1772,11 @@ static int do_signal_stop(int signr)
*/
if (!(t->flags & PF_EXITING) &&
!task_is_stopped_or_traced(t)) {
- t->group_stop = gstop;
+ t->group_stop = signr | gstop;
sig->group_stop_count++;
signal_wake_up(t, 0);
} else
- t->group_stop = 0;
+ t->group_stop = signr;
}
/*
* If there are no other threads in the group, or if there is
@@ -1793,7 +1797,7 @@ static int do_signal_stop(int signr)

if (consume_group_stop())
sig->flags = SIGNAL_STOP_STOPPED;
-
+retry:
current->exit_code = sig->group_exit_code;
current->group_stop &= ~GROUP_STOP_PENDING;
__set_current_state(TASK_STOPPED);
@@ -1805,6 +1809,7 @@ static int do_signal_stop(int signr)
read_lock(&tasklist_lock);
do_notify_parent_cldstop(current, notify);
read_unlock(&tasklist_lock);
+ notify = 0;
}

/* Now we don't run again until woken by SIGCONT or SIGKILL */
@@ -1812,7 +1817,13 @@ static int do_signal_stop(int signr)

spin_lock_irq(&current->sighand->siglock);
} else
- ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
+ ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
+ CLD_STOPPED, 0, NULL);
+
+ if (current->group_stop & GROUP_STOP_PENDING) {
+ WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
+ goto retry;
+ }
out_unlock:
spin_unlock_irq(&current->sighand->siglock);

--
1.7.1

2010-11-26 10:50:16

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 13/14] ptrace: make SIGCONT notification reliable against ptrace

Currently, SIGCONT notifications which are pending on ptrace attach or
occur while ptraced are reported to the tracer and never make it to
the real parent.

This patch adds a new signal flag SIGNAL_NOTIFY_CONT which is set when
a task is woken up by SIGCONT and cleared once the event is notified
to the parent. SIGNAL_CLD_MASK bits are no longer cleared after
notification. Combined with clearing SIGNAL_CLD_MASK if
!SIGNAL_NOTIFY_CONT on ptrace attach, these bits are set on ptrace
detach iff the tracee owes a notification to the real parent.
__ptrace_unlink() is updated to check these bits and reschedule
SIGCONT notification if necessary.

This combined with the previous changes makes ptrace attach/detach
mostly transparent with respect to job control signal handling. The
remaining problems are the extra unconditional wake_up_process() from
ptrace_detach() and SIGSTOP generated by ptrace_attach() clearing
pending SIGCONT. These will be dealt with future patches.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
include/linux/sched.h | 1 +
kernel/ptrace.c | 40 +++++++++++++++++++++++++++++++++++++++-
kernel/signal.c | 14 +++++++++-----
3 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3e40761..4b7f3ca 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -654,6 +654,7 @@ struct signal_struct {
#define SIGNAL_UNKILLABLE 0x00000040 /* for init: ignore fatal signals */

#define SIGNAL_NOTIFY_STOP 0x00000100 /* notify parent of group stop */
+#define SIGNAL_NOTIFY_CONT 0x00000200 /* notify parent of continuation */

/* If true, all threads except ->group_exit_task have pending SIGKILL */
static inline int signal_group_exit(const struct signal_struct *sig)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 71141bf..a6c92ac 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -52,6 +52,7 @@ void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)
void __ptrace_unlink(struct task_struct *child)
{
struct signal_struct *sig = child->signal;
+ bool woken_up = false;

BUG_ON(!child->ptrace);

@@ -66,6 +67,33 @@ void __ptrace_unlink(struct task_struct *child)
if (sig->flags & SIGNAL_STOP_STOPPED || sig->group_stop_count)
child->group_stop |= GROUP_STOP_PENDING;
signal_wake_up(child, 1);
+ woken_up = true;
+ }
+
+ /*
+ * SIGNAL_CLD_MASK is cleared only on a stop signal or, if
+ * notification isn't pending, ptrace attach. If any bit is
+ * set,
+ *
+ * - SIGCONT notification was pending before attach or there
+ * was one or more SIGCONT notifications while tracing.
+ *
+ * - And, there hasn't been any stop signal since the last
+ * pending SIGCONT notification.
+ *
+ * Combined, it means that the tracee owes a SIGCONT
+ * notification to the real parent.
+ */
+ if (sig->flags & SIGNAL_CLD_MASK) {
+ sig->flags |= SIGNAL_NOTIFY_CONT;
+ /*
+ * Force the tracee into signal delivery path so that
+ * the notification is delievered ASAP. This wakeup
+ * is unintrusive as SIGCONT delivery would have
+ * caused the same effect.
+ */
+ if (!woken_up)
+ signal_wake_up(child, 0);
}

child->ptrace = 0;
@@ -198,17 +226,27 @@ int ptrace_attach(struct task_struct *task)
__ptrace_link(task, current);
send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);

+ spin_lock(&task->sighand->siglock);
+
/*
* If the task is already STOPPED, set GROUP_STOP_PENDING and
* kick it so that it transits to TRACED. This is safe as
* both transitions in and out of STOPPED are protected by
* siglock.
*/
- spin_lock(&task->sighand->siglock);
if (task_is_stopped(task)) {
task->group_stop |= GROUP_STOP_PENDING;
signal_wake_up(task, 1);
}
+
+ /*
+ * Clear SIGNAL_CLD_MASK if NOTIFY_CONT is not set. This is
+ * used to preserve SIGCONT notification across ptrace
+ * attach/detach. Read the comment in __ptrace_unlink().
+ */
+ if (!(task->signal->flags & SIGNAL_NOTIFY_CONT))
+ task->signal->flags &= ~SIGNAL_CLD_MASK;
+
spin_unlock(&task->sighand->siglock);

retval = 0;
diff --git a/kernel/signal.c b/kernel/signal.c
index f2da456..735bac5 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -781,7 +781,8 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
* will take ->siglock, notice SIGNAL_CLD_MASK, and
* notify its parent. See get_signal_to_deliver().
*/
- signal->flags = why | SIGNAL_STOP_CONTINUED;
+ why |= SIGNAL_STOP_CONTINUED | SIGNAL_NOTIFY_CONT;
+ signal->flags = why;
signal->group_stop_count = 0;
signal->group_exit_code = 0;
} else {
@@ -1895,7 +1896,7 @@ relock:
* we should notify the parent, prepare_signal(SIGCONT) encodes
* the CLD_ si_code into SIGNAL_CLD_MASK bits.
*/
- if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
+ if (unlikely(signal->flags & SIGNAL_NOTIFY_CONT)) {
int why;

if (signal->flags & SIGNAL_CLD_CONTINUED)
@@ -1903,7 +1904,7 @@ relock:
else
why = CLD_STOPPED;

- signal->flags &= ~SIGNAL_CLD_MASK;
+ signal->flags &= ~SIGNAL_NOTIFY_CONT;

why = tracehook_notify_jctl(why, CLD_CONTINUED);
spin_unlock_irq(&sighand->siglock);
@@ -1942,8 +1943,11 @@ relock:
if (signr != SIGKILL) {
signr = ptrace_signal(signr, info,
regs, cookie);
- if (!signr)
- continue;
+ if (!signr) {
+ /* NOTIFY_CONT might have changed */
+ spin_unlock_irq(&sighand->siglock);
+ goto relock;
+ }
}

ka = &sighand->action[signr-1];
--
1.7.1

2010-11-26 10:50:18

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 14/14] ptrace: remove the extra wake_up_process() from ptrace_detach()

This wake_up_process() has a turbulent history. This is a remnant
from ancient ptrace implementation and patently wrong. Commit
95a3540d (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx
logic) removed it but the change was reverted later by commit edaba2c5
(ptrace: revert "ptrace_detach: the wrong wakeup breaks the
ERESTARTxxx logic ") citing compatibility breakage and general
brokeness of the whole group stop / ptrace interaction.

Digging through the mailing archives, the compatibility breakage
doesn't seem to be critical in the sense that the behavior isn't well
defined or reliable to begin with and it seems to have been agreed to
remove the wakeup with proper cleanup of the whole thing.

Now that the group stop and its interaction with ptrace are cleaned up
and well defined, it's high time to finally kill this silliness.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
kernel/ptrace.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a6c92ac..63c6de6 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -357,8 +357,6 @@ int ptrace_detach(struct task_struct *child, unsigned int data)
if (child->ptrace) {
child->exit_code = data;
dead = __ptrace_detach(current, child);
- if (!child->exit_state)
- wake_up_process(child);
}
write_unlock_irq(&tasklist_lock);

--
1.7.1

2010-11-26 10:50:19

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 11/14] ptrace: make group stop notification reliable against ptrace

Group stop notifications are unreliable if one or more tasks of the
task group are being ptraced. If a ptraced task ends up finishing a
group stop, the notification is sent to the ptracer and the real
parent never gets notified.

This patch adds a new signal flag SIGNAL_NOTIFY_STOP which is set on
group stop completion and cleared after notification to the real
parent or together with other stopped flags on SIGCONT/KILL. This
guarantees that the real parent is notified correctly regardless of
ptrace. If a ptraced task is the last task to stop, the notification
is postponed till ptrace detach or canceled if SIGCONT/KILL is
received inbetween.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
include/linux/sched.h | 2 ++
kernel/signal.c | 20 +++++++++++++-------
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e78b1e5..3e40761 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -653,6 +653,8 @@ struct signal_struct {

#define SIGNAL_UNKILLABLE 0x00000040 /* for init: ignore fatal signals */

+#define SIGNAL_NOTIFY_STOP 0x00000100 /* notify parent of group stop */
+
/* If true, all threads except ->group_exit_task have pending SIGKILL */
static inline int signal_group_exit(const struct signal_struct *sig)
{
diff --git a/kernel/signal.c b/kernel/signal.c
index c084ea8..f2da456 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1739,7 +1739,6 @@ void ptrace_notify(int exit_code)
static int do_signal_stop(int signr)
{
struct signal_struct *sig = current->signal;
- int notify = 0;

if (!(current->group_stop & GROUP_STOP_PENDING)) {
unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
@@ -1780,8 +1779,9 @@ static int do_signal_stop(int signr)
*/
if (sig->group_stop_count == 1 &&
(current->group_stop & GROUP_STOP_CONSUME))
- notify = CLD_STOPPED;
- notify = tracehook_notify_jctl(notify, CLD_STOPPED);
+ tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
+ else
+ tracehook_notify_jctl(0, CLD_STOPPED);
/*
* tracehook_notify_jctl() can drop and reacquire siglock, so
* we test GROUP_STOP_PENDING again. If SIGCONT or SIGKILL
@@ -1791,20 +1791,26 @@ static int do_signal_stop(int signr)
goto out_unlock;

if (consume_group_stop())
- sig->flags = SIGNAL_STOP_STOPPED;
+ sig->flags = SIGNAL_STOP_STOPPED | SIGNAL_NOTIFY_STOP;
retry:
current->exit_code = sig->group_exit_code;
current->group_stop &= ~GROUP_STOP_PENDING;
__set_current_state(TASK_STOPPED);

if (likely(!task_ptrace(current))) {
+ bool do_notify = false;
+
+ if (sig->flags & SIGNAL_NOTIFY_STOP) {
+ sig->flags &= ~SIGNAL_NOTIFY_STOP;
+ do_notify = true;
+ }
+
spin_unlock_irq(&current->sighand->siglock);

- if (notify) {
+ if (do_notify) {
read_lock(&tasklist_lock);
- do_notify_parent_cldstop(current, notify);
+ do_notify_parent_cldstop(current, CLD_STOPPED);
read_unlock(&tasklist_lock);
- notify = 0;
}

/* Now we don't run again until woken by SIGCONT or SIGKILL */
--
1.7.1

2010-11-26 10:55:55

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET RFC] ptrace,signal: sane interaction between ptrace and job control signals

Hey, again.

Ah, crap, I missed a comma between Rafael's and Pavel's email
addresses. Sorry about that. Rafael, Pavel, the thread can be read
from the following URL. Probably only patch 02 and 03 are of interest
to you guys.

http://thread.gmane.org/gmane.linux.kernel/1068420

Thanks.

--
tejun

2010-11-26 13:56:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 01/14] signal: fix SIGCONT notification code

On 11/26, Tejun Heo wrote:
>
> get_signal_deliver() should test these flags and then notify
> accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
> instead of SIGNAL_CLD_CONTINUED,

OOPS! Yes, typo. Thanks.

This is even documented in prepare_signal(). And it always sets
SIGNAL_STOP_CONTINUED along with SIGNAL_CLD_*, we shouldn't check
this bit, of course.

> Fix it by testing SIGNAL_CLD_CONTINUED. While at it, uncompress the
> ?: test into if/else clause for better readability.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Roland McGrath <[email protected]>

Reviewed-by: Oleg Nesterov <[email protected]>

> ---
> kernel/signal.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 4e3cff1..fe004b5 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1853,8 +1853,13 @@ relock:
> * the CLD_ si_code into SIGNAL_CLD_MASK bits.
> */
> if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
> - int why = (signal->flags & SIGNAL_STOP_CONTINUED)
> - ? CLD_CONTINUED : CLD_STOPPED;
> + int why;
> +
> + if (signal->flags & SIGNAL_CLD_CONTINUED)
> + why = CLD_CONTINUED;
> + else
> + why = CLD_STOPPED;
> +
> signal->flags &= ~SIGNAL_CLD_MASK;
>
> why = tracehook_notify_jctl(why, CLD_CONTINUED);
> --
> 1.7.1
>

2010-11-26 14:53:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 04/14] signal: don't notify parent if not stopping after tracehook_notify_jctl() in do_signal_stop()

On 11/26, Tejun Heo wrote:
>
> do_signal_stop() tests sig->group_stop_count one more time after
> calling tracehook_notify_jctl() as it's allowed release siglock. If
> group_stop_count has changed to zero, it no longer stops but still
> notifies the parent.

Only if tracehook_notify_jctl() returns nonzero. That was the
point, tracehook_ can ask to notify even if we are not going to
stop.

> Also, tracehook_notify_jctl() doesn't release siglock, so, currently,
> none of these matters at all.

Yes. But with this patch it is not possible to drop siglock in
tracehook_notify_jctl().

> This doesn't cause any behavior difference as the condition never
> triggers in the current code.

I don't understand the motivation then (probably I should read
the next patches).

If we kill the ability to drop ->siglock in tracehook_notify_jctl(),
we can simplify the code further. No need to start with
->group_stop_count = 1 and then decrement it again.

We could do something like

static int do_signal_stop(int signr)
{
struct signal_struct *sig = current->signal;
int notify;

if (sig->group_stop_count) {
--sig->group_stop_count;
} else {
struct task_struct *t;

if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
unlikely(signal_group_exit(sig)))
return 0;
/*
* There is no group stop already in progress.
* We must initiate one now.
*/
sig->group_exit_code = signr;

for (t = next_thread(current); t != current; t = next_thread(t))
/*
* Setting state to TASK_STOPPED for a group
* stop is always done with the siglock held,
* so this check has no races.
*/
if (!(t->flags & PF_EXITING) &&
!task_is_stopped_or_traced(t)) {
sig->group_stop_count++;
signal_wake_up(t, 0);
}
}

if (!sig->group_stop_count) {
sig->flags = SIGNAL_STOP_STOPPED;
current->exit_code = sig->group_exit_code;
__set_current_state(TASK_STOPPED);
notify = CLD_STOPPED;
}
spin_unlock_irq(&current->sighand->siglock);

if (notify || current->ptrace) {
read_lock(&tasklist_lock);
do_notify_parent_cldstop(current, notify);
read_unlock(&tasklist_lock);
}

/* Now we don't run again until woken by SIGCONT or SIGKILL */
schedule();

tracehook_finish_jctl();
current->exit_code = 0;

return 1;
}

In short, this patch dismisses tracehook_notify_jctl().

Roland?

Oleg.

2010-11-26 15:05:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/14] signal: don't notify parent if not stopping after tracehook_notify_jctl() in do_signal_stop()

Hello, Oleg.

On 11/26/2010 03:46 PM, Oleg Nesterov wrote:
>> This doesn't cause any behavior difference as the condition never
>> triggers in the current code.
>
> I don't understand the motivation then (probably I should read
> the next patches).

The thing is that with the further changes in the series
tracehook_notify_jctl() becomes rather pointless as do_signal_stop()
becomes more involved with the ptrace logic and bypasses
tracehook_notify_jctl(), so I'm basically getting it out of the way.

tracehook might not be such a bad idea for parts which are further
away but for parts which are this close to ptrace implementation, it
seems more of obfuscation than helpful layering. Especially,
restrictions and capabilities of tracehooks without actual (even
proposed) users aren't very healthy to keep. Other people have to
keep guessing what the intentions behind the unused features are,
which is very frustrating and silly.

Anyways, we can deal with tracehooks later. Let's just ignore
tracehook related changes for now.

Thanks.

--
tejun

2010-11-26 15:46:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 05/14] signal: fix premature completion of group stop when interfered by ptrace

On 11/26, Tejun Heo wrote:
>
> task->signal->group_stop_count is used to tracke the progress of group
> stop. It's initialized to the number of tasks which need to stop for
> group stop to finish and each stopping or trapping task decrements.
> However, each task doesn't keep track of whether it decremented the
> counter or not and if woken up before the group stop is complete and
> stops again, it can decrement the counter multiple times.

Everything is fine without ptrace, I hope.

(ignoring the deprecated CLONE_STOPPED)

> Please consider the following example code.

I didn't even read the test-case ;)

Yes, known problems. ptrace is very wrong when it comes to
group_stop_count/SIGNAL_STOP_STOPPED/etc. Almost everything
is wrong.

Cough, this is fixed in utrace ;) It doesn't use ptrace_stop/
ptrace_resume/etc at all.

> This patch adds a new field task->group_stop which is protected by
> siglock and uses GROUP_STOP_CONSUME flag to track which task is still
> to consume group_stop_count to fix this bug.

Yes, currently the tracee never knows how it should react to
->group_stop_count.

> @@ -1645,7 +1658,7 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
> * we must participate in the bookkeeping.
> */
> if (current->signal->group_stop_count > 0)
> - --current->signal->group_stop_count;
> + consume_group_stop();

This doesn't look exactly right. If we participate (decrement the
counter), we should stop even if we race with ptrace_detach().

And what if consume_group_stop() returns true? We should set
SIGNAL_STOP_STOPPED and notify ->parent.

Otherwise looks correct at first glance...


Of course, there are more problems. To me, even
ptrace_resume()->wake_up_process() is very wrt jctl.


Cosmetic nit,

> +static bool consume_group_stop(void)
> +{
> + if (!(current->group_stop & GROUP_STOP_CONSUME))
> + return false;
> +
> + current->group_stop &= ~GROUP_STOP_CONSUME;
> +
> + if (!WARN_ON_ONCE(current->signal->group_stop_count == 0))
> + current->signal->group_stop_count--;

Every caller should check ->group_stop_count != 0. do_signal_stop()
does this too in fact. May be it would be cleaner to move this
check into consume_group_stop() and remove WARN_ON_ONCE().

This way it is more clear why prepare_signal(SIGCONT) do not
reset task->group_stop, it has no effect unless ->group_stop_count
is set by do_signal_stop() which updates ->group_stop for every
thread.

Probably consume_group_stop() should also set SIGNAL_STOP_STOPPED
if it returns true.

But, I didn't read the next patches yet...

Oleg.

2010-11-26 16:04:12

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 05/14] signal: fix premature completion of group stop when interfered by ptrace

Hey, Oleg.

On 11/26/2010 04:40 PM, Oleg Nesterov wrote:
>> task->signal->group_stop_count is used to tracke the progress of group
>> stop. It's initialized to the number of tasks which need to stop for
>> group stop to finish and each stopping or trapping task decrements.
>> However, each task doesn't keep track of whether it decremented the
>> counter or not and if woken up before the group stop is complete and
>> stops again, it can decrement the counter multiple times.
>
> Everything is fine without ptrace, I hope.

AFAICS, w/o ptrace it should function correctly.

> (ignoring the deprecated CLONE_STOPPED)
>
>> Please consider the following example code.
>
> I didn't even read the test-case ;)
>
> Yes, known problems. ptrace is very wrong when it comes to
> group_stop_count/SIGNAL_STOP_STOPPED/etc. Almost everything
> is wrong.
>
> Cough, this is fixed in utrace ;) It doesn't use ptrace_stop/
> ptrace_resume/etc at all.

Yeah, I read about utrace in your previous posts but git grepping it
gave me nothing. Ah, okay, it's out of tree patchset. Are you
planning on merging it? Why not just fix up and extend ptrace? Is
there some fundamental problem?

I was thinking about adding PTRACE_SEIZE operation after this patchset
which allows nesting and transparent operation (without the nasty
implied SIGSTOP). As long as I can get that, I don't really mind
whether it's p or utrace, but still am curious why we need something
completely new.

>> This patch adds a new field task->group_stop which is protected by
>> siglock and uses GROUP_STOP_CONSUME flag to track which task is still
>> to consume group_stop_count to fix this bug.
>
> Yes, currently the tracee never knows how it should react to
> ->group_stop_count.
>
>> @@ -1645,7 +1658,7 @@ static void ptrace_stop(int exit_code, int clear_code, siginfo_t *info)
>> * we must participate in the bookkeeping.
>> */
>> if (current->signal->group_stop_count > 0)
>> - --current->signal->group_stop_count;
>> + consume_group_stop();
>
> This doesn't look exactly right. If we participate (decrement the
> counter), we should stop even if we race with ptrace_detach().
>
> And what if consume_group_stop() returns true? We should set
> SIGNAL_STOP_STOPPED and notify ->parent.

Yeah, exactly, both issues are dealt with later. I'm just fixing
things one by one.

> Otherwise looks correct at first glance...
>
> Of course, there are more problems. To me, even
> ptrace_resume()->wake_up_process() is very wrt jctl.

Yeah, that one too. :-)

> Cosmetic nit,
>
>> +static bool consume_group_stop(void)
>> +{
>> + if (!(current->group_stop & GROUP_STOP_CONSUME))
>> + return false;
>> +
>> + current->group_stop &= ~GROUP_STOP_CONSUME;
>> +
>> + if (!WARN_ON_ONCE(current->signal->group_stop_count == 0))
>> + current->signal->group_stop_count--;
>
> Every caller should check ->group_stop_count != 0. do_signal_stop()
> does this too in fact. May be it would be cleaner to move this
> check into consume_group_stop() and remove WARN_ON_ONCE().
>
> This way it is more clear why prepare_signal(SIGCONT) do not
> reset task->group_stop, it has no effect unless ->group_stop_count
> is set by do_signal_stop() which updates ->group_stop for every
> thread.
>
> Probably consume_group_stop() should also set SIGNAL_STOP_STOPPED
> if it returns true.
>
> But, I didn't read the next patches yet...

Yeap, it changes later.

Thanks a lot for reviewing.

--
tejun

2010-11-26 18:06:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 06/14] signal: use GROUP_STOP_PENDING to avoid stopping multiple times for a single group stop

I am stucked at this point ;)

On 11/26, Tejun Heo wrote:
>
> Currently task->signal->group_stop_count is used to decide whether to
> stop for group stop. However, if there is a task in the group which
> is taking a long time to stop, other tasks which are continued by
> ptrace would repeatedly stop for the same group stop until the group
> stop is complete.

Yes. but the tracee won't abuse ->group_stop_count, this was fixed
by the previous patch.

But, otoh, what if debugger resumes the tracee when the group stop
was completed by other sub-threads ?

The tracee will run with GROUP_STOP_PENDING set. ->group_stop_count
is zero. If this tracee recieves a signal (or spurious TIF_SIGPENDING),
suddenly it will notice GROUP_STOP_PENDING and report the stop to
debugger.

This looks a bit strange. OK, perhaps it makes sense to report the
stop to "ack" the group stop which wasn't acked in ptrace_stop().
Or, if it was untraced after resume, it makes sense to "silently"
stop as well.

But, in this case it shouldn't wait until signal_pending() is true?


> @@ -1742,8 +1745,8 @@ static int do_signal_stop(int signr)
> struct signal_struct *sig = current->signal;
> int notify = 0;
>
> - if (!sig->group_stop_count) {
> - unsigned int gstop = GROUP_STOP_CONSUME;
> + if (!(current->group_stop & GROUP_STOP_PENDING)) {
> + unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
> struct task_struct *t;

Hmm. This means, the ptraced task can initiate the group stop
while it is already in progress...

Debugger can constantly resume a tracee while the group stop
is not finished. Finally this tracee can dequeue SIGSTOP without
GROUP_STOP_PENDING.

At first glance, nothing bad can happen, but I am not sure.
We can have other ptraced threads which were resumed after
ptrace_stop()/do_signal_stop().



> This will change with future patches.

Yes. I tried to study this series patch-by-patch. I think I should
read the whole series to really understand the intermediate changes.
I'll try to return on Monday.

Cough. I didn't expect I forgot this code that much ;)

Oleg.

2010-11-26 18:40:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 06/14] signal: use GROUP_STOP_PENDING to avoid stopping multiple times for a single group stop

Hello, Oleg.

On 11/26/2010 06:59 PM, Oleg Nesterov wrote:
> I am stucked at this point ;)

:-)

> On 11/26, Tejun Heo wrote:
>> Currently task->signal->group_stop_count is used to decide whether to
>> stop for group stop. However, if there is a task in the group which
>> is taking a long time to stop, other tasks which are continued by
>> ptrace would repeatedly stop for the same group stop until the group
>> stop is complete.
>
> Yes. but the tracee won't abuse ->group_stop_count, this was fixed
> by the previous patch.

Yes.

> But, otoh, what if debugger resumes the tracee when the group stop
> was completed by other sub-threads ?

Well, then the tracee continues. What this patch does is making each
task in a group to stop once for a single group stop instance. If
ptracer decides to resume the tracee (w/o sending SIGCONT, that is),
then it can do so and the tracee won't stop for the same group stop
again.

> The tracee will run with GROUP_STOP_PENDING set. ->group_stop_count
> is zero. If this tracee recieves a signal (or spurious TIF_SIGPENDING),
> suddenly it will notice GROUP_STOP_PENDING and report the stop to
> debugger.

Yeah, of course. That's the tracee participating in the group stop.
Oh, the tracee _should_ always have TIF_SIGPENDING set or be
guaranteed to run get_signal_to_deliver(). I think there are traced
points where that is not true. We probably need to set TIF_SIGPENDING
together with GROUP_STOP_PENDING.

> This looks a bit strange. OK, perhaps it makes sense to report the
> stop to "ack" the group stop which wasn't acked in ptrace_stop().
> Or, if it was untraced after resume, it makes sense to "silently"
> stop as well.
>
> But, in this case it shouldn't wait until signal_pending() is true?

Yeap, thanks a lot for catching that one. :-)

>> @@ -1742,8 +1745,8 @@ static int do_signal_stop(int signr)
>> struct signal_struct *sig = current->signal;
>> int notify = 0;
>>
>> - if (!sig->group_stop_count) {
>> - unsigned int gstop = GROUP_STOP_CONSUME;
>> + if (!(current->group_stop & GROUP_STOP_PENDING)) {
>> + unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
>> struct task_struct *t;
>
> Hmm. This means, the ptraced task can initiate the group stop
> while it is already in progress...
>
> Debugger can constantly resume a tracee while the group stop
> is not finished. Finally this tracee can dequeue SIGSTOP without
> GROUP_STOP_PENDING.
>
> At first glance, nothing bad can happen, but I am not sure.
> We can have other ptraced threads which were resumed after
> ptrace_stop()/do_signal_stop().

Hmmm.... right. I think it is better to test for GROUP_STOP_PENDING
there. That happens on delivery of a new stop signal, so
semantic-wise, it's correct. Given the statelessness of group stop
across STOP/CONT attempts, I think it should be okay. I'll think
about it more.

>> This will change with future patches.
>
> Yes. I tried to study this series patch-by-patch. I think I should
> read the whole series to really understand the intermediate changes.
> I'll try to return on Monday.
>
> Cough. I didn't expect I forgot this code that much ;)

Heh, I thought adding transparent/nestable ptrace attach would take me
several days; instead, understanding the code and producing this
patchset took me two weeks filled with swearing. This is a truly
hairy piece of code. :-)

Thanks a lot for reviewing.

--
tejun

2010-11-26 19:41:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 02/14] freezer: fix a race during freezing of TASK_STOPPED tasks

On Friday, November 26, 2010, Tejun Heo wrote:
> After calling freeze_task(), try_to_freeze_tasks() see whether the
> task is stopped or traced and if so, considers it to be frozen;
> however, nothing guarantees that either the task being frozen sees
> TIF_FREEZE or the freezer sees TASK_STOPPED -> TASK_RUNNING
> transition. The task being frozen may wake up and not see TIF_FREEZE
> while the freezer fails to notice the transition and believes the task
> is still stopped.
>
> This patch fixes the race by making freeze_task() always go through
> fake_signal_wake_up() for applicable tasks. The function goes through
> the target task's scheduler lock and thus guarantees that either the
> target sees TIF_FREEZE or try_to_freeze_task() sees TASK_RUNNING.

Looks good. I'll take this one to my tree, if you don't mind.

Thanks,
Rafael


> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> ---
> kernel/freezer.c | 9 +++++++--
> kernel/power/process.c | 6 ++++++
> 2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c
> index bd1d42b..66ecd2e 100644
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -104,8 +104,13 @@ bool freeze_task(struct task_struct *p, bool sig_only)
> }
>
> if (should_send_signal(p)) {
> - if (!signal_pending(p))
> - fake_signal_wake_up(p);
> + fake_signal_wake_up(p);
> + /*
> + * fake_signal_wake_up() goes through p's scheduler
> + * lock and guarantees that TASK_STOPPED/TRACED ->
> + * TASK_RUNNING transition can't race with task state
> + * testing in try_to_freeze_tasks().
> + */
> } else if (sig_only) {
> return false;
> } else {
> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index e50b4c1..eb2c88a 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -64,6 +64,12 @@ static int try_to_freeze_tasks(bool sig_only)
> * perturb a task in TASK_STOPPED or TASK_TRACED.
> * It is "frozen enough". If the task does wake
> * up, it will immediately call try_to_freeze.
> + *
> + * Because freeze_task() goes through p's
> + * scheduler lock after setting TIF_FREEZE, it's
> + * guaranteed that either we see TASK_RUNNING or
> + * try_to_stop() after schedule() in ptrace/signal
> + * stop sees TIF_FREEZE.
> */
> if (!task_is_stopped_or_traced(p) &&
> !freezer_should_skip(p))
>

2010-11-26 19:42:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 03/14] freezer: remove superflous try_to_freeze() loop in do_signal_stop()

On Friday, November 26, 2010, Tejun Heo wrote:
> do_signal_stop() is used only by get_signal_to_deliver() and after a
> successful signal stop, it always calls try_to_freeze(), so the
> try_to_freeze() loop around schedule() in do_signal_stop() is
> superflous and confusing. Remove it.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Roland McGrath <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>

Acked-by: Rafael J. Wysocki <[email protected]>

Thanks!

> ---
> kernel/signal.c | 4 +---
> 1 files changed, 1 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index fe004b5..0a6816a 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1781,9 +1781,7 @@ static int do_signal_stop(int signr)
> }
>
> /* Now we don't run again until woken by SIGCONT or SIGKILL */
> - do {
> - schedule();
> - } while (try_to_freeze());
> + schedule();
>
> tracehook_finish_jctl();
> current->exit_code = 0;
>

2010-11-26 19:59:42

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 02/14] freezer: fix a race during freezing of TASK_STOPPED tasks

On 11/26/2010 08:40 PM, Rafael J. Wysocki wrote:
> On Friday, November 26, 2010, Tejun Heo wrote:
>> After calling freeze_task(), try_to_freeze_tasks() see whether the
>> task is stopped or traced and if so, considers it to be frozen;
>> however, nothing guarantees that either the task being frozen sees
>> TIF_FREEZE or the freezer sees TASK_STOPPED -> TASK_RUNNING
>> transition. The task being frozen may wake up and not see TIF_FREEZE
>> while the freezer fails to notice the transition and believes the task
>> is still stopped.
>>
>> This patch fixes the race by making freeze_task() always go through
>> fake_signal_wake_up() for applicable tasks. The function goes through
>> the target task's scheduler lock and thus guarantees that either the
>> target sees TIF_FREEZE or try_to_freeze_task() sees TASK_RUNNING.
>
> Looks good. I'll take this one to my tree, if you don't mind.

Sure thing.

Thanks.

--
tejun

2010-11-27 11:40:22

by Tejun Heo

[permalink] [raw]
Subject: [PATCH UPDATED 06/14] signal: use GROUP_STOP_PENDING to avoid stopping multiple times for a single group stop

Currently task->signal->group_stop_count is used to decide whether to
stop for group stop. However, if there is a task in the group which
is taking a long time to stop, other tasks which are continued by
ptrace would repeatedly stop for the same group stop until the group
stop is complete.

This patch introduces GROUP_STOP_PENDING which tracks whether a task
is yet to stop for the group stop in progress. The flag is set when a
group stop starts and cleared when the task stops the first time for
the group stop, so the task won't stop multiple times for the same
group stop.

Note that currently GROUP_STOP_PENDING tracks the same state as
GROUP_STOP_CONSUME. Both always get set and cleared together without
releasing siglock inbetween. This will change with future patches.

Oleg spotted that a task might skip signal check even when its
GROUP_STOP_PENDING is set. Fixed by updating recalc_sigpending_tsk()
to check GROUP_STOP_PENDING instead of group_stop_count.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Roland McGrath <[email protected]>
---
recalc_sigpending_tsk() updated to test GROUP_STOP_PENDING too. This
makes sure a ptracee always checks it after woken up. Whether a task
already in TASK_STOPPED should have GROUP_STOP_PENDING set or not is a
different issue. I think it probably would be better to do so so that
the ptracer is notified that the tracee is participating in group
stop. The necessary change is simple, we just need to replace
task_is_stopped_or_traced() in group stop init path to
task_is_stopped().

With later patches, CLD_STOPPED/CONTINUED notifications are made
reliable w.r.t. ptrace by delaying them until ptrace detach if
necessary, so letting ptracees participate in group stop doesn't cost
anything. Anyways, that's a separate issue.

The git tree is updated accordingly.

Thanks.

include/linux/sched.h | 1 +
kernel/signal.c | 23 ++++++++++++++---------
2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93157a4..1261993 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1760,6 +1760,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
/*
* task->group_stop flags
*/
+#define GROUP_STOP_PENDING (1 << 16) /* task should stop for group stop */
#define GROUP_STOP_CONSUME (1 << 17) /* consume group stop count */

#ifdef CONFIG_PREEMPT_RCU
diff --git a/kernel/signal.c b/kernel/signal.c
index 7e2da0d..273ee35 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -124,7 +124,7 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)

static int recalc_sigpending_tsk(struct task_struct *t)
{
- if (t->signal->group_stop_count > 0 ||
+ if ((t->group_stop & GROUP_STOP_PENDING) ||
PENDING(&t->pending, &t->blocked) ||
PENDING(&t->signal->shared_pending, &t->blocked)) {
set_tsk_thread_flag(t, TIF_SIGPENDING);
@@ -732,6 +732,9 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
t = p;
do {
unsigned int state;
+
+ t->group_stop = 0;
+
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
/*
* If there is a handler for SIGCONT, we must make
@@ -1742,8 +1745,8 @@ static int do_signal_stop(int signr)
struct signal_struct *sig = current->signal;
int notify = 0;

- if (!sig->group_stop_count) {
- unsigned int gstop = GROUP_STOP_CONSUME;
+ if (!(current->group_stop & GROUP_STOP_PENDING)) {
+ unsigned int gstop = GROUP_STOP_PENDING | GROUP_STOP_CONSUME;
struct task_struct *t;

if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1782,10 +1785,10 @@ static int do_signal_stop(int signr)
notify = tracehook_notify_jctl(notify, CLD_STOPPED);
/*
* tracehook_notify_jctl() can drop and reacquire siglock, so
- * we test ->group_stop_count again. If SIGCONT or SIGKILL
- * comes in between, ->group_stop_count == 0.
+ * we test GROUP_STOP_PENDING again. If SIGCONT or SIGKILL
+ * comes in between, it would be clear.
*/
- if (!sig->group_stop_count) {
+ if (!(current->group_stop & GROUP_STOP_PENDING)) {
spin_unlock_irq(&current->sighand->siglock);
goto out;
}
@@ -1794,6 +1797,7 @@ static int do_signal_stop(int signr)
sig->flags = SIGNAL_STOP_STOPPED;

current->exit_code = sig->group_exit_code;
+ current->group_stop &= ~GROUP_STOP_PENDING;
__set_current_state(TASK_STOPPED);

spin_unlock_irq(&current->sighand->siglock);
@@ -1908,8 +1912,8 @@ relock:
if (unlikely(signr != 0))
ka = return_ka;
else {
- if (unlikely(signal->group_stop_count > 0) &&
- do_signal_stop(0))
+ if (unlikely(current->group_stop &
+ GROUP_STOP_PENDING) && do_signal_stop(0))
goto relock;

signr = dequeue_signal(current, &current->blocked,
@@ -2055,7 +2059,8 @@ void exit_signals(struct task_struct *tsk)
if (!signal_pending(t) && !(t->flags & PF_EXITING))
recalc_sigpending_and_wake(t);

- if (unlikely(tsk->signal->group_stop_count) && consume_group_stop()) {
+ if (unlikely(tsk->group_stop & GROUP_STOP_PENDING) &&
+ consume_group_stop()) {
tsk->signal->flags = SIGNAL_STOP_STOPPED;
group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
}
--
1.7.1

2010-11-28 19:14:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH UPDATED 06/14] signal: use GROUP_STOP_PENDING to avoid stopping multiple times for a single group stop

On 11/27, Tejun Heo wrote:
>
> static int recalc_sigpending_tsk(struct task_struct *t)
> {
> - if (t->signal->group_stop_count > 0 ||
> + if ((t->group_stop & GROUP_STOP_PENDING) ||

OK, this makes the intent clear.

> @@ -732,6 +732,9 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
> t = p;
> do {
> unsigned int state;
> +
> + t->group_stop = 0;
> +

Yes.

But, afaics, this is not enough. Say, what about zap_other_threads() ?
We shouldn't allow sub-threads to stop in this case.

Basically, every time we clear ->group_stop_count we should also reset
->group_stop for every thread. Fortunately, every time we already do
while_each_thread().

Oleg.

2010-11-28 20:01:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 08/14] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced

On 11/26, Tejun Heo wrote:
>
> A ptraced task would still stop at do_signal_stop() when it's stopping
> for stop signals and do_signal_stop() behaves the same whether the
> task is ptraced or not. However, in addition to stopping,
> ptrace_stop() also does ptrace specific stuff like calling
> architecture specific callbacks, so this behavior makes the code more
> fragile and difficult to understand.

OK. This patch adds the obviously user-visible change. It looks
very minor, but I never know when it comes to ptrace.

To simplify, suppose that we have a single-thread tracee, and
debugger "acks" SIGSTOP, say, it does ptrace(PTRACE_CONT, SIGSTOP).

Before this patch, the tracee stops in TASK_STOPPED, now it calls
ptrace_stop() and goes to TASK_TRACED state.

Add Jan. I hope this is OK, but this might break the tracer if
it looks into fs/proc (probably only test-cases do this).


At least, with or without this patch ->last_siginfo is NULL, good.


The check in do_signal_stop() looks racy though, see the next
email.

Oleg.

2010-11-28 20:30:34

by Jan Kratochvil

[permalink] [raw]
Subject: Re: [PATCH 08/14] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced

On Sun, 28 Nov 2010 20:54:42 +0100, Oleg Nesterov wrote:
> To simplify, suppose that we have a single-thread tracee, and
> debugger "acks" SIGSTOP, say, it does ptrace(PTRACE_CONT, SIGSTOP).

I do not find this case useful. It happens with current GDB:
(gdb) handle SIGSTOP
Signal Stop Print Pass to program Description
SIGSTOP Yes Yes Yes Stopped (signal)
^^^
But it behaves weird anyway:

ptrace(PTRACE_CONT, 11799, 0x1, SIGSTOP) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WNOHANG, NULL) = 11799
[...]
ptrace(PTRACE_CONT, 11799, 0x1, SIGSTOP) = 0
<no new signal received>:
State: S (sleeping)
TracerPid: 11797

So the first time it immediately gets reported and the second time it gets
lost. (kernel-2.6.35.6-48.fc14.x86_64)


> Before this patch, the tracee stops in TASK_STOPPED, now it calls
> ptrace_stop() and goes to TASK_TRACED state.
>
> Add Jan. I hope this is OK, but this might break the tracer if
> it looks into fs/proc (probably only test-cases do this).

ptrace(PTRACE_CONT, SIGSTOP) should just work somehow so that current GDB does
not break as it calls it sometimes. The behavior of it may change I guess.

I find only interesting that (PTRACE_DETACH, SIGSTOP) should really keep it
stopped and also the attaching to `T (stopped)' should work with all the
issues of waited-for/unwaited-for SIGSTOP. Those cases are not being
discussed here.


Thanks,
Jan

2010-11-28 20:32:40

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 09/14] ptrace: clean transitions between TASK_STOPPED and TRACED

On 11/26, Tejun Heo wrote:
>
> Currently, if the task is STOPPED on ptrace attach, it's left alone
> and the state is silently changed to TRACED on the next ptrace call.
> The behavior breaks the assumption that arch_ptrace_stop() is called
> before any task is poked by ptrace operations and is ugly in that a
> task manipulates the state of another task directly.
>
> With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
> TRACED can be made clean. The tracer can use the flag to tell the
> tracee to retry stop on attach and detach. On retry, the tracee will
> enter the desired state the correct way. The lower 16bits of
> task->group_stop is used to remember the signal number which caused
> the last group stop. This is used while retrying for ptrace attach as
> the original group_exit_code could have been consumed with wait(2) by
> then.

This adds another user-visible change, this time it is more serious.
Again, I do not claim it breaks ptrace, just I do not know.

> @@ -204,6 +202,19 @@ int ptrace_attach(struct task_struct *task)
> __ptrace_link(task, current);
> send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
>
> + /*
> + * If the task is already STOPPED, set GROUP_STOP_PENDING and
> + * kick it so that it transits to TRACED. This is safe as
> + * both transitions in and out of STOPPED are protected by
> + * siglock.
> + */
> + spin_lock(&task->sighand->siglock);
> + if (task_is_stopped(task)) {
> + task->group_stop |= GROUP_STOP_PENDING;
> + signal_wake_up(task, 1);

OK. Now we have a window if the tracer attaches to the stopped task.

Say,

child = fork()

if (!child)
return child_do_something();

kill(child, SIGSTOP);
wait(); // <--- ensures it is stopped

ptrace(PTRACE_ATTACH, child);

assert(ptrace(PTRACE_WHATEVER, child) == 0);

Currently this code is correct. With this patch the assertion above
can fail, the child may be running, changing its state from STOPPED
to TRACED.


> @@ -1793,7 +1797,7 @@ static int do_signal_stop(int signr)
>
> if (consume_group_stop())
> sig->flags = SIGNAL_STOP_STOPPED;
> -
> +retry:
> current->exit_code = sig->group_exit_code;
> current->group_stop &= ~GROUP_STOP_PENDING;
> __set_current_state(TASK_STOPPED);
> @@ -1805,6 +1809,7 @@ static int do_signal_stop(int signr)
> read_lock(&tasklist_lock);
> do_notify_parent_cldstop(current, notify);
> read_unlock(&tasklist_lock);
> + notify = 0;
> }
>
> /* Now we don't run again until woken by SIGCONT or SIGKILL */
> @@ -1812,7 +1817,13 @@ static int do_signal_stop(int signr)
>
> spin_lock_irq(&current->sighand->siglock);
> } else
> - ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
> + ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
> + CLD_STOPPED, 0, NULL);
> +
> + if (current->group_stop & GROUP_STOP_PENDING) {
> + WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
> + goto retry;

This doesn't look right even without ptrace.

Suppose we have to threads, T1 and T2, both stopped, SIGNAL_STOP_STOPPED
is set.

SIGCONT wakes them up.

To simplify the discussion, suppose that T1 takes a long preemption
when it returns from schedule(), right before it takes ->siglock again.

T2 sends CLD_CONTINUED to parent and dequeues another SIGSTOP. It
initiates another group stop, sees T1 as running, and stops with
->group_stop_count == 1. Now we are waiting for T1 which should
participate.

Finally T1 resumes and sees GROUP_STOP_PENDING. It goes to 'retry:'
and stops, but nobody notifies the parent ang ->group_stop_count is
still non-zero.

Oleg.

2010-11-28 20:37:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 11/14] ptrace: make group stop notification reliable against ptrace

On 11/26, Tejun Heo wrote:
>
> Group stop notifications are unreliable if one or more tasks of the
> task group are being ptraced. If a ptraced task ends up finishing a
> group stop, the notification is sent to the ptracer and the real
> parent never gets notified.

Yes. I do not even know if this is bug or not, but certainly I agree,
this doesn't look very nice.

> if (likely(!task_ptrace(current))) {
> + bool do_notify = false;
> +
> + if (sig->flags & SIGNAL_NOTIFY_STOP) {
> + sig->flags &= ~SIGNAL_NOTIFY_STOP;
> + do_notify = true;
> + }
> +
> spin_unlock_irq(&current->sighand->siglock);
>
> - if (notify) {
> + if (do_notify) {
> read_lock(&tasklist_lock);
> - do_notify_parent_cldstop(current, notify);
> + do_notify_parent_cldstop(current, CLD_STOPPED);

This can race with ptrace_attach() in between.

IOW, this notification can go to the new tracer anyway.

Oleg.

2010-11-28 20:51:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 14/14] ptrace: remove the extra wake_up_process() from ptrace_detach()

Today I lost the concentration at 13/14 ;)

Will continue tomorrow. As for this patch,

On 11/26, Tejun Heo wrote:
>
> This wake_up_process() has a turbulent history. This is a remnant
> from ancient ptrace implementation and patently wrong. Commit
> 95a3540d (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx
> logic) removed it

Yes. This obviously means I personally like this change. In fact,
I never understood this wakeup, and I was glad to find the reason
to send the patch.

> but the change was reverted later by commit edaba2c5
> (ptrace: revert "ptrace_detach: the wrong wakeup breaks the
> ERESTARTxxx logic ") citing compatibility breakage and general
> brokeness of the whole group stop / ptrace interaction.

Yes. Honestly, I completely forgot the reason, iirc 95a3540d
broke gdb somehow.

Add Jan. IIRC, we had a long discussion after that, and (iirc!)
Jan seems to agree we can kill this wakeup.

Oleg.

2010-11-28 20:53:17

by Jan Kratochvil

[permalink] [raw]
Subject: Re: [PATCH 09/14] ptrace: clean transitions between TASK_STOPPED and TRACED

On Sun, 28 Nov 2010 21:25:35 +0100, Oleg Nesterov wrote:
> On 11/26, Tejun Heo wrote:
> > + /*
> > + * If the task is already STOPPED, set GROUP_STOP_PENDING and
> > + * kick it so that it transits to TRACED. This is safe as
> > + * both transitions in and out of STOPPED are protected by
> > + * siglock.
> > + */
> > + spin_lock(&task->sighand->siglock);
> > + if (task_is_stopped(task)) {
> > + task->group_stop |= GROUP_STOP_PENDING;
> > + signal_wake_up(task, 1);
>
> OK. Now we have a window if the tracer attaches to the stopped task.
>
> Say,
>
> child = fork()
>
> if (!child)
> return child_do_something();
>
> kill(child, SIGSTOP);
> wait(); // <--- ensures it is stopped
>
> ptrace(PTRACE_ATTACH, child);
>
> assert(ptrace(PTRACE_WHATEVER, child) == 0);
>
> Currently this code is correct. With this patch the assertion above
> can fail, the child may be running, changing its state from STOPPED
> to TRACED.

GDB now has code (as rewritten by Daniel Jacobowitz):

ptrace (PTRACE_ATTACH) has been done and:
linux_nat_post_attach_wait:
if (pid_is_stopped (pid)) ### it means `State' is `T (stopped)'
{
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog,
"LNPAW: Attaching to a stopped process\n");

/* The process is definitely stopped. It is in a job control
stop, unless the kernel predates the TASK_STOPPED /
TASK_TRACED distinction, in which case it might be in a
ptrace stop. Make sure it is in a ptrace stop; from there we
can kill it, signal it, et cetera.

First make sure there is a pending SIGSTOP. Since we are
already attached, the process can not transition from stopped
to running without a PTRACE_CONT; so we know this signal will
go into the queue. The SIGSTOP generated by PTRACE_ATTACH is
probably already in the queue (unless this kernel is old
enough to use TASK_STOPPED for ptrace stops); but since SIGSTOP
is not an RT signal, it can only be queued once. */
kill_lwp (pid, SIGSTOP);

/* Finally, resume the stopped process. This will deliver the SIGSTOP
(or a higher priority signal, just like normal PTRACE_ATTACH). */
ptrace (PTRACE_CONT, pid, 0, 0); ### line B
}

### point A

/* Make sure the initial process is stopped. The user-level threads
layer might want to poke around in the inferior, and that won't
work if things haven't stabilized yet. */
new_pid = my_waitpid (pid, &status, 0); ### this is in fact waitpid()


The problem is this code is already racy. At `point A' someone may
`kill (tracee, SIGSTOP)', `waitpid (tracee)' -- eats SIGSTOP, and then
my_waitpid() will hang because the <pid_is_stopped (pid)> block was not
executed to prevent it.

So if we are already discussing the ptrace race safety I would prefer some
kernel ptrace API suggestion how to safely race-less attach to a task, with
the task being in any state - unstopped, T (stopped) with pending SIGSTOP and
T (stopped) with already eaten SIGSTOP.

Specifically as a reply to your mail I guess the `line B' maybe could fail, if
the tracee was very freshly `kill (tracee, SIGSTOP)' and `waitpid (tracee)',
and thus the new `kill_lwp (pid, SIGSTOP)' delivery would not get into effect
for the `my_waitpid()' line.



Thanks,
Jan

2010-11-28 20:59:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 08/14] ptrace: make do_signal_stop() use ptrace_stop() if the task is being ptraced

On 11/28, Jan Kratochvil wrote:
>
> On Sun, 28 Nov 2010 20:54:42 +0100, Oleg Nesterov wrote:
> > To simplify, suppose that we have a single-thread tracee, and
> > debugger "acks" SIGSTOP, say, it does ptrace(PTRACE_CONT, SIGSTOP).
>
> I do not find this case useful.

OK, thanks.

> It happens with current GDB:
> (gdb) handle SIGSTOP
> Signal Stop Print Pass to program Description
> SIGSTOP Yes Yes Yes Stopped (signal)
> ^^^
> But it behaves weird anyway:
>
> ptrace(PTRACE_CONT, 11799, 0x1, SIGSTOP) = 0
> wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WNOHANG, NULL) = 11799
> [...]
> ptrace(PTRACE_CONT, 11799, 0x1, SIGSTOP) = 0
> <no new signal received>:
> State: S (sleeping)
> TracerPid: 11797
>
> So the first time it immediately gets reported and the second time it gets
> lost. (kernel-2.6.35.6-48.fc14.x86_64)

This is correct. (note that I am not saying this looks very nice ;)

The second ptrace(PTRACE_CONT, SIGSTOP) is equal to ptrace(CONT, 0),
exactly because ->last_siginfo is NULL, we do not have the signal
context. (Well, this explanation is not exactly right technically).

And. IIRC, something in user-space relies on the current behaviour.
At least, I do remember it was said that ->last_siginfo != NULL
check (PTRACE_GETSIGINFO fails or not should help to detect the
jctl stop.

Oleg.

2010-11-29 13:38:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH UPDATED 06/14] signal: use GROUP_STOP_PENDING to avoid stopping multiple times for a single group stop

Hello, Oleg.

On 11/28/2010 08:07 PM, Oleg Nesterov wrote:
>> @@ -732,6 +732,9 @@ static int prepare_signal(int sig, struct task_struct *p, int from_ancestor_ns)
>> t = p;
>> do {
>> unsigned int state;
>> +
>> + t->group_stop = 0;
>> +
>
> Yes.
>
> But, afaics, this is not enough. Say, what about zap_other_threads() ?
> We shouldn't allow sub-threads to stop in this case.
>
> Basically, every time we clear ->group_stop_count we should also reset
> ->group_stop for every thread. Fortunately, every time we already do
> while_each_thread().

Yes, indeed. Will update accordingly.

Thanks.

--
tejun

2010-11-29 13:48:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/14] ptrace: clean transitions between TASK_STOPPED and TRACED

Hello, Oleg.

On 11/28/2010 09:25 PM, Oleg Nesterov wrote:
> This adds another user-visible change, this time it is more serious.
> Again, I do not claim it breaks ptrace, just I do not know.
>
>> @@ -204,6 +202,19 @@ int ptrace_attach(struct task_struct *task)
>> __ptrace_link(task, current);
>> send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
>>
>> + /*
>> + * If the task is already STOPPED, set GROUP_STOP_PENDING and
>> + * kick it so that it transits to TRACED. This is safe as
>> + * both transitions in and out of STOPPED are protected by
>> + * siglock.
>> + */
>> + spin_lock(&task->sighand->siglock);
>> + if (task_is_stopped(task)) {
>> + task->group_stop |= GROUP_STOP_PENDING;
>> + signal_wake_up(task, 1);
>
> OK. Now we have a window if the tracer attaches to the stopped task.
>
> Say,
>
> child = fork()
>
> if (!child)
> return child_do_something();
>
> kill(child, SIGSTOP);
> wait(); // <--- ensures it is stopped
>
> ptrace(PTRACE_ATTACH, child);
>
> assert(ptrace(PTRACE_WHATEVER, child) == 0);
>
> Currently this code is correct. With this patch the assertion above
> can fail, the child may be running, changing its state from STOPPED
> to TRACED.

Hmmm... rather convoluted, but yeap. I can update the code to wait
for stopped -> traced transition to complete, so that the transition
is hidden from userland. How does that sound to you?

>> @@ -1812,7 +1817,13 @@ static int do_signal_stop(int signr)
>>
>> spin_lock_irq(&current->sighand->siglock);
>> } else
>> - ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
>> + ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
>> + CLD_STOPPED, 0, NULL);
>> +
>> + if (current->group_stop & GROUP_STOP_PENDING) {
>> + WARN_ON_ONCE(!(current->group_stop & GROUP_STOP_SIGMASK));
>> + goto retry;
>
> This doesn't look right even without ptrace.
>
> Suppose we have to threads, T1 and T2, both stopped, SIGNAL_STOP_STOPPED
> is set.
>
> SIGCONT wakes them up.
>
> To simplify the discussion, suppose that T1 takes a long preemption
> when it returns from schedule(), right before it takes ->siglock again.
>
> T2 sends CLD_CONTINUED to parent and dequeues another SIGSTOP. It
> initiates another group stop, sees T1 as running, and stops with
> ->group_stop_count == 1. Now we are waiting for T1 which should
> participate.
>
> Finally T1 resumes and sees GROUP_STOP_PENDING. It goes to 'retry:'
> and stops, but nobody notifies the parent ang ->group_stop_count is
> still non-zero.

Good point. I'll think about how the problem can be solved. Maybe
the retry and pending states need to be separated.

Thanks.

--
tejun

2010-11-29 13:53:01

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 11/14] ptrace: make group stop notification reliable against ptrace

Hello,

On 11/28/2010 09:30 PM, Oleg Nesterov wrote:
> On 11/26, Tejun Heo wrote:
>>
>> Group stop notifications are unreliable if one or more tasks of the
>> task group are being ptraced. If a ptraced task ends up finishing a
>> group stop, the notification is sent to the ptracer and the real
>> parent never gets notified.
>
> Yes. I do not even know if this is bug or not, but certainly I agree,
> this doesn't look very nice.
>
>> if (likely(!task_ptrace(current))) {
>> + bool do_notify = false;
>> +
>> + if (sig->flags & SIGNAL_NOTIFY_STOP) {
>> + sig->flags &= ~SIGNAL_NOTIFY_STOP;
>> + do_notify = true;
>> + }
>> +
>> spin_unlock_irq(&current->sighand->siglock);
>>
>> - if (notify) {
>> + if (do_notify) {
>> read_lock(&tasklist_lock);
>> - do_notify_parent_cldstop(current, notify);
>> + do_notify_parent_cldstop(current, CLD_STOPPED);
>
> This can race with ptrace_attach() in between.
>
> IOW, this notification can go to the new tracer anyway.

Hmmm, okay, we should hold both locks when checking for notification
to remove that race, or we can just tell do_notify_parent_cldstop()
which parent to use.

--
tejun

2010-11-29 13:55:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 14/14] ptrace: remove the extra wake_up_process() from ptrace_detach()

Hello,

On 11/28/2010 09:44 PM, Oleg Nesterov wrote:
> Today I lost the concentration at 13/14 ;)
>
> Will continue tomorrow. As for this patch,

Eh, well, you've already found enough holes in the patchset. It's
like a swiss cheese. I'll update and repost.

> On 11/26, Tejun Heo wrote:
>>
>> This wake_up_process() has a turbulent history. This is a remnant
>> from ancient ptrace implementation and patently wrong. Commit
>> 95a3540d (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx
>> logic) removed it
>
> Yes. This obviously means I personally like this change. In fact,
> I never understood this wakeup, and I was glad to find the reason
> to send the patch.

Another thing is that the use of wake_up_process() in ptrace is rather
scary even in ptrace_resume(). It will wake up even uninterruptible
sleeps. Is there any reason it's not just waking up
TASK_INTERRUPTIBLE | STOPPED | TRACED?

Thanks.

--
tejun