2008-02-27 15:22:19

by Jiri Kosina

[permalink] [raw]
Subject: [PATCH] [RFC] fix missed SIGCONT cases

The SIGCONT-handling related comment in handle_stop_signal() states:

* If there is a handler for SIGCONT, we must make
* sure that no thread returns to user mode before
* we post the signal, in case it was the only
* thread eligible to run the signal handler--then
* it must not do anything between resuming and
* running the handler. With the TIF_SIGPENDING
* flag set, the thread will pause and acquire the
* siglock that we hold now and until we've queued
* the pending signal.
*
* Wake up the stopped thread _after_ setting
* TIF_SIGPENDING

Unfortunately, just a few lines below, the siglock is unlocked for a short
time, in order to call

do_notify_parent_cldstop(p, CLD_CONTINUED);

Therefore the synchronization on siglock doesn't work properly. If the
task is waiting on siglock and gets scheduled when the siglock is
unlocked, but before SIGCONT is queued (this is done later, after
handle_stop_signal() finishes), the process misses the SIGCONT signal (as
it has already run, but SIGCONT has not been queued yet).

This patch solves it by postponing the do_notify_parent_cldstop() call
(and therefore also unlocking of siglock) until the SIGCONT is properly
queued on the shared-pending queue, and therefore it is not missed by the
process' SIGCONT handler. Also updates the callsites.

Signed-off-by: Jiri Kosina <[email protected]>

diff --git a/kernel/signal.c b/kernel/signal.c
index 84917fe..24786ba 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -556,6 +556,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why);
* actual continuing for SIGCONT, but not the actual stopping for stop
* signals. The process stop is done as a signal action for SIG_DFL.
*/
+
static void handle_stop_signal(int sig, struct task_struct *p)
{
struct task_struct *t;
@@ -630,24 +631,6 @@ static void handle_stop_signal(int sig, struct task_struct *p)
t = next_thread(t);
} while (t != p);

- if (p->signal->flags & SIGNAL_STOP_STOPPED) {
- /*
- * We were in fact stopped, and are now continued.
- * Notify the parent with CLD_CONTINUED.
- */
- p->signal->flags = SIGNAL_STOP_CONTINUED;
- p->signal->group_exit_code = 0;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_CONTINUED);
- spin_lock(&p->sighand->siglock);
- } else {
- /*
- * We are not stopped, but there could be a stop
- * signal in the middle of being processed after
- * being removed from the queue. Clear that too.
- */
- p->signal->flags = 0;
- }
} else if (sig == SIGKILL) {
/*
* Make sure that any pending stop signal already dequeued
@@ -657,6 +640,43 @@ static void handle_stop_signal(int sig, struct task_struct *p)
}
}

+/* This must not be called before the SIGCONT has been queued, otherwise
+ * the userspace task (waiting on siglock) might run too soon and miss
+ * the SIGCONT.
+ */
+static void handle_stop_signal_finish(int sig, struct task_struct *p)
+{
+
+ if (p->signal->flags & SIGNAL_GROUP_EXIT)
+ /*
+ * The process is in the middle of dying already.
+ */
+ return;
+
+ if (sig != SIGCONT)
+ return;
+
+ if (p->signal->flags & SIGNAL_STOP_STOPPED) {
+ /*
+ * We were in fact stopped, and are now continued.
+ * Notify the parent with CLD_CONTINUED.
+ */
+ p->signal->flags = SIGNAL_STOP_CONTINUED;
+ p->signal->group_exit_code = 0;
+ spin_unlock(&p->sighand->siglock);
+ do_notify_parent_cldstop(p, CLD_CONTINUED);
+ spin_lock(&p->sighand->siglock);
+ } else {
+ /*
+ * We are not stopped, but there could be a stop
+ * signal in the middle of being processed after
+ * being removed from the queue. Clear that too.
+ */
+ p->signal->flags = 0;
+ }
+}
+
+
static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
struct sigpending *signals)
{
@@ -946,6 +966,8 @@ __group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
if (unlikely(ret))
return ret;

+ handle_stop_signal_finish(sig, p);
+
__group_complete_signal(sig, p);
return 0;
}
@@ -1389,6 +1411,8 @@ send_group_sigqueue(int sig, struct sigqueue *q, struct task_struct *p)
list_add_tail(&q->list, &p->signal->shared_pending.list);
sigaddset(&p->signal->shared_pending.signal, sig);

+ handle_stop_signal_finish(sig, p);
+
__group_complete_signal(sig, p);
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);


2008-02-27 21:04:34

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

Have you observed an actual problem? I don't think the "race" you seem to
be concerned about is a problem at all.

The comment refers to the necessary atomicity of posting the signal along
with doing the wakeups. Those are done together with the siglock held.
It does not matter that the siglock was dropped and reacquired before
there. All that matters is that we hold the siglock continuously from
before the wake_up_state calls made in handle_stop_signal through until
after the signal-posting done by its callers.

Can you enumerate the specific sequence of events that would result in
error under the current code?


Thanks,
Roland

2008-02-27 22:04:48

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On Wed, 27 Feb 2008, Roland McGrath wrote:

> Have you observed an actual problem? I don't think the "race" you seem
> to be concerned about is a problem at all.
> The comment refers to the necessary atomicity of posting the signal along
> with doing the wakeups. Those are done together with the siglock held.
> It does not matter that the siglock was dropped and reacquired before
> there.

What happens here is that the proces that was woken-up is spinning on the
siglock even before the actual SIGCONT has been queued. When this lock is
released, the process continues executing, and its SIGCONT handler doesn't
run, even though it executes _just because_ it was woken up by SIGCONT.

Let's take this as an example:

#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

volatile int sigcont_received = 0;

static void sigcont_handler(int signo)
{
sigcont_received = 1;
}

int main(int argc, char **argv)
{

struct sigaction action;

memset(&action, 0, sizeof(action));
action.sa_handler = sigcont_handler;
sigemptyset(&action.sa_mask);

if (sigaction(SIGCONT, &action, NULL) != 0) {
fprintf(stderr, "sigaction() failed\n");
return 1;
}

while (1) {
if (kill(getpid(), SIGSTOP) != 0) {
printf("could not send SIGSTOP to self\n");
return 1;
}

if (sigcont_received)
printf("finished (SIGCONT received)\n");
else
printf("finished (without SIGCONT)\n");
sigcont_received = 0;
}
return 0;
}

Do you agree that when you run this program, and once it gets stopped
(sends SIGSTOP to itself), send SIGCONT to it. It should always print

finished (SIGCONT received)

right? Without my patch, sometimes

finished (without SIGCONT)

can be observed (for some reason, it seems to trigger more often on ia64
machines than on any x86 hardware).

Thanks,

--
Jiri Kosina
SUSE Labs

2008-02-28 10:27:38

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

My apologies! I misread your original message and was looking only at the
do_notify_parent_cldstop before the wakeup (CLD_STOPPED case), and not the
one afterwards (CLD_CONTINUED) that you were referring to. I do now
understand the problem and see how it can happen as you describe.

I'm not very fond of your patch, though I think its behavior is probably
correct. I was hoping to find a solution that did not require changes
outside handle_stop_signal, as those additions make the code that much more
intricate and hard to follow. But I haven't found one yet; an idea like
moving the CLD_CONTINUED notification and its troublesome lock-dropping to
before the wakeups is attractive, but I think it opens another can of worms
not worth the trouble. Perhaps Oleg can think of something simpler.

I prefer at least to do some cleanup while taking the same essential
approach, and change the control flow only of the part that needs to move
to fix this problem. What do you think about the patch below? (I have not
tested this with your test case, nor much testing for other regressions.)


Thanks,
Roland

---
[PATCH] clean up and fix SIGCONT

This reorganizes some of the signals code, replacing handle_stop_signal()
with prepare_signal() and finish_signal(), called as a pair before and
after generating any signal. The CLD_CONTINUED notification to parent is
moved into finish_signal(), taking place after the signal is made pending
and siglock dropped. This fixes a race where a process waking from SIGCONT
could resume application code without running its SIGCONT handler first.

Signed-off-by: Roland McGrath <[email protected]>
---
kernel/signal.c | 74 ++++++++++++++++++++++++++++++++++++++++++++----------
1 files changed, 60 insertions(+), 14 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 84917fe..3a94657 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -555,16 +555,20 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why);
* time regardless of blocking, ignoring, or handling. This does the
* actual continuing for SIGCONT, but not the actual stopping for stop
* signals. The process stop is done as a signal action for SIG_DFL.
+ * Returns nonzero if CLD_CONTINUED notification should be done.
+ * finish_signal() should always be called with our return value
+ * after posting the signal (or failing to do so).
*/
-static void handle_stop_signal(int sig, struct task_struct *p)
+static int prepare_signal(int sig, struct task_struct *p)
{
struct task_struct *t;
+ int cont = 0;

if (p->signal->flags & SIGNAL_GROUP_EXIT)
/*
* The process is in the middle of dying already.
*/
- return;
+ return cont;

if (sig_kernel_stop(sig)) {
/*
@@ -635,11 +639,9 @@ static void handle_stop_signal(int sig, struct task_struct *p)
* We were in fact stopped, and are now continued.
* Notify the parent with CLD_CONTINUED.
*/
+ cont = 1;
p->signal->flags = SIGNAL_STOP_CONTINUED;
p->signal->group_exit_code = 0;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_CONTINUED);
- spin_lock(&p->sighand->siglock);
} else {
/*
* We are not stopped, but there could be a stop
@@ -655,6 +657,18 @@ static void handle_stop_signal(int sig, struct task_struct *p)
*/
p->signal->flags = 0;
}
+
+ return cont;
+}
+
+/*
+ * This pairs with prepare_signal() and must always be called afterward,
+ * with the siglock already unlocked, but tasklist_lock still held.
+ */
+static void finish_signal(int cont, struct task_struct *p)
+{
+ if (cont)
+ do_notify_parent_cldstop(p, CLD_CONTINUED);
}

static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
@@ -921,13 +935,17 @@ __group_complete_signal(int sig, struct task_struct *p)
return;
}

-int
-__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
+/*
+ * Internal entry point for sending group-wide signals.
+ * This must always be followed by finish_signal(*continued, p).
+ */
+static int generate_group_signal(int sig, struct siginfo *info,
+ struct task_struct *p, int *continued)
{
int ret = 0;

assert_spin_locked(&p->sighand->siglock);
- handle_stop_signal(sig, p);
+ *continued = prepare_signal(sig, p);

/* Short-circuit ignored signals. */
if (sig_ignored(p, sig))
@@ -951,6 +969,22 @@ __group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
}

/*
+ * This is called by a few places outside this file.
+ */
+int
+__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
+{
+ int cont;
+ int ret = generate_group_signal(sig, info, p, &cont);
+ if (unlikely(cont)) {
+ spin_unlock(&p->sighand->siglock);
+ do_notify_parent_cldstop(p, CLD_CONTINUED);
+ spin_lock(&p->sighand->siglock);
+ }
+ return ret;
+}
+
+/*
* Nuke all other threads in the group.
*/
void zap_other_threads(struct task_struct *p)
@@ -1009,8 +1043,10 @@ int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
if (!ret && sig) {
ret = -ESRCH;
if (lock_task_sighand(p, &flags)) {
- ret = __group_send_sig_info(sig, info, p);
+ int cont;
+ ret = generate_group_signal(sig, info, p, &cont);
unlock_task_sighand(p, &flags);
+ finish_signal(cont, p);
}
}

@@ -1102,10 +1138,12 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
if (ret)
goto out_unlock;
if (sig && p->sighand) {
+ int cont;
unsigned long flags;
spin_lock_irqsave(&p->sighand->siglock, flags);
- ret = __group_send_sig_info(sig, info, p);
+ ret = generate_group_signal(sig, info, p, &cont);
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ finish_signal(cont, p);
}
out_unlock:
read_unlock(&tasklist_lock);
@@ -1351,13 +1389,14 @@ send_group_sigqueue(int sig, struct sigqueue *q, struct task_struct *p)
{
unsigned long flags;
int ret = 0;
+ int cont;

BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));

read_lock(&tasklist_lock);
/* Since it_lock is held, p->sighand cannot be NULL. */
spin_lock_irqsave(&p->sighand->siglock, flags);
- handle_stop_signal(sig, p);
+ cont = prepare_signal(sig, p);

/* Short-circuit ignored signals. */
if (sig_ignored(p, sig)) {
@@ -1392,6 +1431,7 @@ send_group_sigqueue(int sig, struct sigqueue *q, struct task_struct *p)
__group_complete_signal(sig, p);
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ finish_signal(cont, p);
read_unlock(&tasklist_lock);
return ret;
}
@@ -1415,6 +1455,7 @@ void do_notify_parent(struct task_struct *tsk, int sig)
struct siginfo info;
unsigned long flags;
struct sighand_struct *psig;
+ int cont = 0;

BUG_ON(sig == -1);

@@ -1485,9 +1526,10 @@ void do_notify_parent(struct task_struct *tsk, int sig)
sig = 0;
}
if (valid_signal(sig) && sig > 0)
- __group_send_sig_info(sig, &info, tsk->parent);
+ generate_group_signal(sig, &info, tsk->parent, &cont);
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);
+ finish_signal(cont, tsk->parent);
}

static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
@@ -1496,6 +1538,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
unsigned long flags;
struct task_struct *parent;
struct sighand_struct *sighand;
+ int cont = 0;

if (tsk->ptrace & PT_PTRACED)
parent = tsk->parent;
@@ -1538,12 +1581,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
spin_lock_irqsave(&sighand->siglock, flags);
if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
!(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
- __group_send_sig_info(SIGCHLD, &info, parent);
+ generate_group_signal(SIGCHLD, &info, parent, &cont);
/*
* Even if SIGCHLD is not generated, we must wake up wait4 calls.
*/
__wake_up_parent(tsk, parent);
spin_unlock_irqrestore(&sighand->siglock, flags);
+ finish_signal(cont, parent);
}

static inline int may_ptrace_stop(void)
@@ -2251,10 +2295,12 @@ static int do_tkill(int tgid, int pid, int sig)
* probe. No signal is actually delivered.
*/
if (!error && sig && p->sighand) {
+ int cont;
spin_lock_irq(&p->sighand->siglock);
- handle_stop_signal(sig, p);
+ cont = prepare_signal(sig, p);
error = specific_send_sig_info(sig, &info, p);
spin_unlock_irq(&p->sighand->siglock);
+ finish_signal(cont, p);
}
}
read_unlock(&tasklist_lock);

2008-02-28 11:42:20

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On Thu, 28 Feb 2008, Roland McGrath wrote:

> [PATCH] clean up and fix SIGCONT
> This reorganizes some of the signals code, replacing handle_stop_signal()
> with prepare_signal() and finish_signal(), called as a pair before and
> after generating any signal. The CLD_CONTINUED notification to parent is
> moved into finish_signal(), taking place after the signal is made pending
> and siglock dropped. This fixes a race where a process waking from SIGCONT
> could resume application code without running its SIGCONT handler first.

Hi Roland,

I haven't tested what is the behavior with my patch, or without any
patches at all (will do), but with your patch applied, when I run the test
program under strace, it resumes immediately ... doesn't look particularly
OK to me:

$ strace -o /dev/null ./a.out | head -10
finished (without SIGCONT)
finished (without SIGCONT)
finished (without SIGCONT)
finished (without SIGCONT)
finished (without SIGCONT)
finished (without SIGCONT)
finished (without SIGCONT)
finished (without SIGCONT)
finished (without SIGCONT)
finished (without SIGCONT)

--
Jiri Kosina
SUSE Labs

2008-02-28 15:27:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On 02/28, Roland McGrath wrote:
>
> -static void handle_stop_signal(int sig, struct task_struct *p)
> +static int prepare_signal(int sig, struct task_struct *p)
> {
> struct task_struct *t;
> + int cont = 0;
>
> if (p->signal->flags & SIGNAL_GROUP_EXIT)
> /*
> * The process is in the middle of dying already.
> */
> - return;
> + return cont;
>
> if (sig_kernel_stop(sig)) {
> /*
> @@ -635,11 +639,9 @@ static void handle_stop_signal(int sig, struct task_struct *p)
> * We were in fact stopped, and are now continued.
> * Notify the parent with CLD_CONTINUED.
> */
> + cont = 1;
> p->signal->flags = SIGNAL_STOP_CONTINUED;
> p->signal->group_exit_code = 0;
> - spin_unlock(&p->sighand->siglock);
> - do_notify_parent_cldstop(p, CLD_CONTINUED);
> - spin_lock(&p->sighand->siglock);
> } else {

Oh, I'd like to suggest something simpler, but I can't :(

BTW, I think we have the same problem when handle_stop_signal() does
do_notify_parent_cldstop(p, CLD_STOPPED) above. The multithreaded app
in the middle of the group stop can resume without actually seeing
SIGCONT, no?

Currently I have the very vagues idea, please see the "patch" below
(it is not right of course, just for illustration).

These unlock(->siglock)'s on the signal sending path are really nasty,
it would be wonderfull to avoid them. Note also sig_needs_tasklist(),
we take tasklist_lock() exactly because we will probably drop siglock.

What do you think about the approach at least?

Oleg.

--- include/linux/sched.h 2008-02-15 16:59:17.000000000 +0300
+++ include/linux/sched.h 2008-02-28 18:12:20.833069408 +0300
@@ -455,6 +455,10 @@ struct signal_struct {
int group_stop_count;
unsigned int flags; /* see SIGNAL_* flags below */

+ #define __CLD_STOPPED 1
+ #define __CLD_CONTINUED 2
+ unsigned int notify_parent; /* can live in fact in ->flags */
+
/* POSIX.1b Interval Timers */
struct list_head posix_timers;

--- kernel/signal.c 2008-02-15 16:59:17.000000000 +0300
+++ kernel/signal.c 2008-02-28 18:19:01.542067371 +0300
@@ -596,9 +596,7 @@ static void handle_stop_signal(int sig,
*/
p->signal->group_stop_count = 0;
p->signal->flags = SIGNAL_STOP_CONTINUED;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_STOPPED);
- spin_lock(&p->sighand->siglock);
+ p->signal->notify_parent |= __CLD_STOPPED;
}
rm_from_queue(SIG_KERNEL_STOP_MASK, &p->signal->shared_pending);
t = p;
@@ -637,9 +635,7 @@ static void handle_stop_signal(int sig,
*/
p->signal->flags = SIGNAL_STOP_CONTINUED;
p->signal->group_exit_code = 0;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_CONTINUED);
- spin_lock(&p->sighand->siglock);
+ p->signal->notify_parent |= __CLD_CONTINUED;
} else {
/*
* We are not stopped, but there could be a stop
@@ -1755,6 +1751,7 @@ int get_signal_to_deliver(siginfo_t *inf
struct pt_regs *regs, void *cookie)
{
sigset_t *mask = &current->blocked;
+ int notify_parent;
int signr = 0;

try_to_freeze();
@@ -1890,7 +1887,15 @@ relock:
do_group_exit(signr);
/* NOTREACHED */
}
+ notify_parent = current->signal->notify_parent;
+ current->signal->notify_parent = 0;
spin_unlock_irq(&current->sighand->siglock);
+
+ if (notify_parent & __CLD_STOPPED)
+ do_notify_parent_cldstop(p, CLD_STOPPED);
+ if (notify_parent & __CLD_CONTINUED)
+ do_notify_parent_cldstop(p, CLD_CONTINUED);
+
return signr;
}

2008-02-28 15:29:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On 02/28, Jiri Kosina wrote:
>
> On Thu, 28 Feb 2008, Roland McGrath wrote:
>
> > [PATCH] clean up and fix SIGCONT
> > This reorganizes some of the signals code, replacing handle_stop_signal()
> > with prepare_signal() and finish_signal(), called as a pair before and
> > after generating any signal. The CLD_CONTINUED notification to parent is
> > moved into finish_signal(), taking place after the signal is made pending
> > and siglock dropped. This fixes a race where a process waking from SIGCONT
> > could resume application code without running its SIGCONT handler first.
>
> Hi Roland,
>
> I haven't tested what is the behavior with my patch, or without any
> patches at all (will do), but with your patch applied, when I run the test
> program under strace, it resumes immediately ... doesn't look particularly
> OK to me:
>
> $ strace -o /dev/null ./a.out | head -10
> finished (without SIGCONT)
> finished (without SIGCONT)
> finished (without SIGCONT)
> finished (without SIGCONT)
> finished (without SIGCONT)
> finished (without SIGCONT)
> finished (without SIGCONT)
> finished (without SIGCONT)
> finished (without SIGCONT)
> finished (without SIGCONT)

This is different, you can try this trivial program,

int main(int argc, char **argv)
{
kill(getpid(), SIGSTOP);
printf("I'm running, SIGSTOP doesn't really work with strace.\n");
return 0;
}

Oleg.

2008-02-28 15:32:49

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On Thu, 28 Feb 2008, Oleg Nesterov wrote:

> This is different, you can try this trivial program,
>
> int main(int argc, char **argv)
> {
> kill(getpid(), SIGSTOP);
> printf("I'm running, SIGSTOP doesn't really work with strace.\n");
> return 0;
> }

Yes, I have meanwhile (to my surprise) found out that it is even
documented in the strace manpage ...

Strange thing, though.

--
Jiri Kosina

2008-02-28 15:40:58

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On Thu, 28 Feb 2008, Oleg Nesterov wrote:

> BTW, I think we have the same problem when handle_stop_signal() does
> do_notify_parent_cldstop(p, CLD_STOPPED) above. The multithreaded app in
> the middle of the group stop can resume without actually seeing SIGCONT,
> no?

Yes, I think so, this would also need fixing.

> Currently I have the very vagues idea, please see the "patch" below (it
> is not right of course, just for illustration).
> These unlock(->siglock)'s on the signal sending path are really nasty,
> it would be wonderfull to avoid them. Note also sig_needs_tasklist(),
> we take tasklist_lock() exactly because we will probably drop siglock.

Whould it be problematic to change do_notify_parent_cldstop() so that it
doesn't acquire siglock, but asumes that is is called with siglock held
instead? I can't immediately see any reason why this shouldn't be
possible.

> What do you think about the approach at least?

So it basically does the same thing my patch did -- postpones signalling
the parent until it is safe, right?

Thanks,

--
Jiri Kosina

2008-02-28 16:05:14

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On 02/28, Jiri Kosina wrote:
>
> On Thu, 28 Feb 2008, Oleg Nesterov wrote:
>
> > Currently I have the very vagues idea, please see the "patch" below (it
> > is not right of course, just for illustration).
> > These unlock(->siglock)'s on the signal sending path are really nasty,
> > it would be wonderfull to avoid them. Note also sig_needs_tasklist(),
> > we take tasklist_lock() exactly because we will probably drop siglock.
>
> Whould it be problematic to change do_notify_parent_cldstop() so that it
> doesn't acquire siglock, but asumes that is is called with siglock held
> instead? I can't immediately see any reason why this shouldn't be
> possible.

Please note that do_notify_parent_cldstop() needs _another_ ->siglock,
the parent's one. We drop target's one because we can't take both, this
is deadlockable.

> > What do you think about the approach at least?
>
> So it basically does the same thing my patch did -- postpones signalling
> the parent until it is safe, right?

Well, yes and no. Yes, it postpones signalling, but actually it removes
these do_notify_parent_cldstop()'s from the senfer path completely.
Instead, this task itself (the reciever) should notify the parent.

I am still not sure this doesn't have some fundamental problems, at least
I need to think more... Actually, I hope Roland will shed a light ;)

Hmm... Can't we make a simpler patch for the start? See below.
We can notify the parent earlier, before waking the TASK_STOPPED
child. This should fix this race, no?

Oleg.


--- kernel/signal.c 2008-02-15 16:59:17.000000000 +0300
+++ kernel/signal.c 2008-02-28 19:06:16.970253164 +0300
@@ -600,6 +600,25 @@ static void handle_stop_signal(int sig,
do_notify_parent_cldstop(p, CLD_STOPPED);
spin_lock(&p->sighand->siglock);
}
+ if (p->signal->flags & SIGNAL_STOP_STOPPED) {
+ /*
+ * We were in fact stopped, and are now continued.
+ * Notify the parent with CLD_CONTINUED.
+ */
+ p->signal->flags = SIGNAL_STOP_CONTINUED;
+ p->signal->group_exit_code = 0;
+ spin_unlock(&p->sighand->siglock);
+ do_notify_parent_cldstop(p, CLD_CONTINUED);
+ spin_lock(&p->sighand->siglock);
+ } else {
+ /*
+ * We are not stopped, but there could be a stop
+ * signal in the middle of being processed after
+ * being removed from the queue. Clear that too.
+ */
+ p->signal->flags = 0;
+ }
+
rm_from_queue(SIG_KERNEL_STOP_MASK, &p->signal->shared_pending);
t = p;
do {
@@ -629,25 +648,6 @@ static void handle_stop_signal(int sig,

t = next_thread(t);
} while (t != p);
-
- if (p->signal->flags & SIGNAL_STOP_STOPPED) {
- /*
- * We were in fact stopped, and are now continued.
- * Notify the parent with CLD_CONTINUED.
- */
- p->signal->flags = SIGNAL_STOP_CONTINUED;
- p->signal->group_exit_code = 0;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_CONTINUED);
- spin_lock(&p->sighand->siglock);
- } else {
- /*
- * We are not stopped, but there could be a stop
- * signal in the middle of being processed after
- * being removed from the queue. Clear that too.
- */
- p->signal->flags = 0;
- }
} else if (sig == SIGKILL) {
/*
* Make sure that any pending stop signal already dequeued

2008-02-28 16:13:38

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On Thu, 28 Feb 2008, Oleg Nesterov wrote:

> Hmm... Can't we make a simpler patch for the start? See below.
> We can notify the parent earlier, before waking the TASK_STOPPED
> child. This should fix this race, no?

Yes, this looks also like a correct fix to me (haven't tried to run the
actual test yet, but 'obviously' it should close the race).

Thanks,

--
Jiri Kosina
SUSE Labs

2008-02-28 16:42:08

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] handle_stop_signal: don't wake up the stopped task until it sees SIGCONT

On 02/28, Jiri Kosina wrote:
>
> On Thu, 28 Feb 2008, Oleg Nesterov wrote:
>
> > Hmm... Can't we make a simpler patch for the start? See below.
> > We can notify the parent earlier, before waking the TASK_STOPPED
> > child. This should fix this race, no?
>
> Yes, this looks also like a correct fix to me (haven't tried to run the
> actual test yet, but 'obviously' it should close the race).

Thanks, please ack this patch unless you have any objections.

I tested this patch with Jiri's test-case, seems to work.
The Roland's review is still needed ;)

-------------------------------------------------------------------------------
[PATCH] handle_stop_signal: don't wake up the stopped task until it sees SIGCONT

The bug was found by Jiri Kosina <[email protected]>, and the patch is based
on his ideas.

handle_stop_signal(SIGCONT) wakes up the stopped task and unlocks ->siglock
for do_notify_parent_cldstop(p, CLD_CONTINUED). The woken task returns from
do_signal_stop(), takes ->siglock and resumes to user-space without actually
seeing SIGCONT which may have a handler.

Move the code realated to do_notify_parent_cldstop(CLD_CONTINUED) up, before
"wake_up_state(t, state)".

NOTE: It is possible that the subsequent rm_from_queue(SIG_KERNEL_STOP_MASK)
removes SIGSTOP which comes after SIGCONT when we drop ->siglock. Not nice,
but possible even without this change. Hopefully we can remove the parent
notifying code from the sender path completely, but this needs more thinking.

Signed-off-by: Oleg Nesterov <[email protected]>

--- 25/kernel/signal.c~HSS_SIGCONT_RACE 2008-02-28 19:22:32.000000000 +0300
+++ 25/kernel/signal.c 2008-02-28 19:27:52.000000000 +0300
@@ -601,12 +601,31 @@ static void handle_stop_signal(int sig,
do_notify_parent_cldstop(p, CLD_STOPPED);
spin_lock(&p->sighand->siglock);
}
+
+ if (p->signal->flags & SIGNAL_STOP_STOPPED) {
+ /*
+ * We were in fact stopped, and are now continued.
+ * Notify the parent with CLD_CONTINUED.
+ */
+ p->signal->flags = SIGNAL_STOP_CONTINUED;
+ p->signal->group_exit_code = 0;
+ spin_unlock(&p->sighand->siglock);
+ do_notify_parent_cldstop(p, CLD_CONTINUED);
+ spin_lock(&p->sighand->siglock);
+ } else {
+ /*
+ * We are not stopped, but there could be a stop
+ * signal in the middle of being processed after
+ * being removed from the queue. Clear that too.
+ */
+ p->signal->flags = 0;
+ }
+
rm_from_queue(SIG_KERNEL_STOP_MASK, &p->signal->shared_pending);
t = p;
do {
unsigned int state;
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
-
/*
* If there is a handler for SIGCONT, we must make
* sure that no thread returns to user mode before
@@ -630,25 +649,6 @@ static void handle_stop_signal(int sig,

t = next_thread(t);
} while (t != p);
-
- if (p->signal->flags & SIGNAL_STOP_STOPPED) {
- /*
- * We were in fact stopped, and are now continued.
- * Notify the parent with CLD_CONTINUED.
- */
- p->signal->flags = SIGNAL_STOP_CONTINUED;
- p->signal->group_exit_code = 0;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_CONTINUED);
- spin_lock(&p->sighand->siglock);
- } else {
- /*
- * We are not stopped, but there could be a stop
- * signal in the middle of being processed after
- * being removed from the queue. Clear that too.
- */
- p->signal->flags = 0;
- }
} else if (sig == SIGKILL) {
/*
* Make sure that any pending stop signal already dequeued

2008-02-28 16:56:41

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] handle_stop_signal: don't wake up the stopped task until it sees SIGCONT

On Thu, 28 Feb 2008, Oleg Nesterov wrote:

> -------------------------------------------------------------------------------
> [PATCH] handle_stop_signal: don't wake up the stopped task until it sees SIGCONT
>
> The bug was found by Jiri Kosina <[email protected]>, and the patch is based
> on his ideas.
> handle_stop_signal(SIGCONT) wakes up the stopped task and unlocks ->siglock
> for do_notify_parent_cldstop(p, CLD_CONTINUED). The woken task returns from
> do_signal_stop(), takes ->siglock and resumes to user-space without actually
> seeing SIGCONT which may have a handler.
> Move the code realated to do_notify_parent_cldstop(CLD_CONTINUED) up, before
> "wake_up_state(t, state)".
> NOTE: It is possible that the subsequent rm_from_queue(SIG_KERNEL_STOP_MASK)
> removes SIGSTOP which comes after SIGCONT when we drop ->siglock. Not nice,
> but possible even without this change. Hopefully we can remove the parent
> notifying code from the sender path completely, but this needs more thinking.
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Jiri Kosina <[email protected]>

--
Jiri Kosina
SUSE Labs

2008-02-29 02:39:49

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

> BTW, I think we have the same problem when handle_stop_signal() does
> do_notify_parent_cldstop(p, CLD_STOPPED) above. The multithreaded app
> in the middle of the group stop can resume without actually seeing
> SIGCONT, no?

I don't think I follow the scenario you have in mind there. When siglock
is dropped there, noone has been woken up yet. It's just as if the sending
of SIGCONT had not begun yet.

> Currently I have the very vagues idea, please see the "patch" below
> (it is not right of course, just for illustration).

Hmm. It's an interesting idea.

> These unlock(->siglock)'s on the signal sending path are really nasty,
> it would be wonderfull to avoid them.

I agree.

> Note also sig_needs_tasklist(), we take tasklist_lock() exactly because
> we will probably drop siglock.

do_notify_parent_cldstop will always need tasklist_lock because of its
parent link. With my patch below, the signal-posting path should never
drop and reacquire the siglock any more. So we could just make
do_notify_parent_cldstop take the lock (read_lock nests anyway) and the
signal-posting path callers don't need to take it.

> What do you think about the approach at least?

My first reaction was that it would have the wrong semantics. The special
effect of SIGCONT to resume the process happens at the time of signal
generation, not at signal delivery. The CLD_CONTINUED notification to the
parent has to happen when the resumption happens. If SIGCONT is blocked or
ignored, then the process is woken up but may never dequeue the signal.
However, in fact it will always already been in get_signal_to_deliver by
definition, since that's where stopping happens. So it will indeed always
come out and see your check before it resumes. The only difference then is
whether the SIGCHLD gets sent immediately at post time or only when the
resumed child actually gets scheduled. I don't think that's a problem.

Certainly it should just be a flag or two in signal_struct.flags. The
extra field is really too ugly. We need to think carefully about all the
places that clear/reset ->signal->flags, though. Note that doing two
do_notify_parent_cldstop calls in a two is redundant, since the signals
don't queue and the parent is not often going to respond so quickly that
the second one ever actually posts. I'd be inclined to just make that an
else if.

> Hmm... Can't we make a simpler patch for the start? See below.
> We can notify the parent earlier, before waking the TASK_STOPPED
> child. This should fix this race, no?

That is exactly what my first impulse was and what I described as opening
another can of worms. (I almost sent the identical patch yesterday.) Here
is what I was concerned about. This does the CLD_CONTINUED SIGCHLD before
any wakeups, so task_struct.state is still TASK_STOPPED. The parent can
see the SIGCHLD, call wait, and see the child still in TASK_STOPPED. In
wait_task_stopped, a few things can happen. It might by then have gotten
to the signal-sender's retaking of the siglock and wake_up_state; if so, it
sees p->state no longer TASK_STOPPED and/or p->exit_code no longer nonzero,
and returns 0. Then the parent's wait goes back to sleep (or reports
nothing for WNOHANG), and it never gets a wakeup corresponding to the
CLD_CONTINUED notification, which it expects in a call using WCONTINUED.
This is why the notification comes after the wakeup. It might be feasible
to get the right semantics with changes in wait. But like I said, a new
can of worms.

I don't dislike the direction of your flag-setting approach above. But
it does introduce more new subtleties that warrant more thought. Here is
another version of my patch, that also eliminates the lock-drop for the
CLD_STOPPED notification when a group stop is still in progress.
(Though I still haven't yet grokked how that case introduced any bug.)
What do you think?


Thanks,
Roland

---
[PATCH] clean up and fix SIGCONT

This reorganizes some of the signals code, replacing handle_stop_signal()
with prepare_signal() and finish_signal(), called as a pair before and
after generating any signal. The CLD_CONTINUED notification to parent is
moved into finish_signal(), taking place after the signal is made pending
and siglock dropped. This fixes a race where a process waking from SIGCONT
could resume application code without running its SIGCONT handler first.

Signed-off-by: Roland McGrath <[email protected]>
---
kernel/signal.c | 111 ++++++++++++++++++++++++++++++++++++++-----------------
1 files changed, 77 insertions(+), 34 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 84917fe..3a74955 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -555,16 +555,20 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why);
* time regardless of blocking, ignoring, or handling. This does the
* actual continuing for SIGCONT, but not the actual stopping for stop
* signals. The process stop is done as a signal action for SIG_DFL.
+ * Returns a (nonzero) CLD_* value if notification should be done.
+ * finish_signal() should always be called with our return value
+ * after posting the signal (or failing to do so).
*/
-static void handle_stop_signal(int sig, struct task_struct *p)
+static int prepare_signal(int sig, struct task_struct *p)
{
struct task_struct *t;
+ int notify = 0;

if (p->signal->flags & SIGNAL_GROUP_EXIT)
/*
* The process is in the middle of dying already.
*/
- return;
+ return notify;

if (sig_kernel_stop(sig)) {
/*
@@ -581,25 +585,6 @@ static void handle_stop_signal(int sig, struct task_struct *p)
* Remove all stop signals from all queues,
* and wake all threads.
*/
- if (unlikely(p->signal->group_stop_count > 0)) {
- /*
- * There was a group stop in progress. We'll
- * pretend it finished before we got here. We are
- * obliged to report it to the parent: if the
- * SIGSTOP happened "after" this SIGCONT, then it
- * would have cleared this pending SIGCONT. If it
- * happened "before" this SIGCONT, then the parent
- * got the SIGCHLD about the stop finishing before
- * the continue happened. We do the notification
- * now, and it's as if the stop had finished and
- * the SIGCHLD was pending on entry to this kill.
- */
- p->signal->group_stop_count = 0;
- p->signal->flags = SIGNAL_STOP_CONTINUED;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_STOPPED);
- spin_lock(&p->sighand->siglock);
- }
rm_from_queue(SIG_KERNEL_STOP_MASK, &p->signal->shared_pending);
t = p;
do {
@@ -630,16 +615,30 @@ static void handle_stop_signal(int sig, struct task_struct *p)
t = next_thread(t);
} while (t != p);

- if (p->signal->flags & SIGNAL_STOP_STOPPED) {
+ if ((p->signal->flags & SIGNAL_STOP_STOPPED) ||
+ p->signal->group_stop_count > 0) {
/*
* We were in fact stopped, and are now continued.
* Notify the parent with CLD_CONTINUED.
+ *
+ * If we were in the middle of a group stop,
+ * we treat this as if we had already finished
+ * stopping. We are obliged to report that
+ * stop to the parent: if the SIGSTOP happened
+ * "after" this SIGCONT, then it would have
+ * cleared this pending SIGCONT. But since
+ * SIGCHLD does not queue, the CLD_STOPPED
+ * notification would have stayed and the new
+ * signal with CLD_CONTINUED would have been
+ * dropped. So make this notification be
+ * CLD_CONTINUED if the stop was complete, and
+ * CLD_STOPPED if it was in progress.
*/
+ notify = ((p->signal->flags & SIGNAL_STOP_STOPPED) ?
+ CLD_CONTINUED : CLD_STOPPED);
p->signal->flags = SIGNAL_STOP_CONTINUED;
+ p->signal->group_stop_count = 0;
p->signal->group_exit_code = 0;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_CONTINUED);
- spin_lock(&p->sighand->siglock);
} else {
/*
* We are not stopped, but there could be a stop
@@ -655,6 +654,18 @@ static void handle_stop_signal(int sig, struct task_struct *p)
*/
p->signal->flags = 0;
}
+
+ return notify;
+}
+
+/*
+ * This pairs with prepare_signal() and must always be called afterward,
+ * with the siglock already unlocked, but tasklist_lock still held.
+ */
+static void finish_signal(int notify, struct task_struct *p)
+{
+ if (notify)
+ do_notify_parent_cldstop(p, notify);
}

static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
@@ -921,13 +932,17 @@ __group_complete_signal(int sig, struct task_struct *p)
return;
}

-int
-__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
+/*
+ * Internal entry point for sending group-wide signals.
+ * This must always be followed by finish_signal(*notify, p).
+ */
+static int generate_group_signal(int sig, struct siginfo *info,
+ struct task_struct *p, int *notify)
{
int ret = 0;

assert_spin_locked(&p->sighand->siglock);
- handle_stop_signal(sig, p);
+ *notify = prepare_signal(sig, p);

/* Short-circuit ignored signals. */
if (sig_ignored(p, sig))
@@ -951,6 +966,22 @@ __group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
}

/*
+ * This is called by a few places outside this file.
+ */
+int
+__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
+{
+ int notify;
+ int ret = generate_group_signal(sig, info, p, &notify);
+ if (unlikely(notify)) {
+ spin_unlock(&p->sighand->siglock);
+ do_notify_parent_cldstop(p, notify);
+ spin_lock(&p->sighand->siglock);
+ }
+ return ret;
+}
+
+/*
* Nuke all other threads in the group.
*/
void zap_other_threads(struct task_struct *p)
@@ -1009,8 +1040,10 @@ int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
if (!ret && sig) {
ret = -ESRCH;
if (lock_task_sighand(p, &flags)) {
- ret = __group_send_sig_info(sig, info, p);
+ int notify;
+ ret = generate_group_signal(sig, info, p, &notify);
unlock_task_sighand(p, &flags);
+ finish_signal(notify, p);
}
}

@@ -1102,10 +1135,12 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
if (ret)
goto out_unlock;
if (sig && p->sighand) {
+ int notify;
unsigned long flags;
spin_lock_irqsave(&p->sighand->siglock, flags);
- ret = __group_send_sig_info(sig, info, p);
+ ret = generate_group_signal(sig, info, p, &notify);
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ finish_signal(notify, p);
}
out_unlock:
read_unlock(&tasklist_lock);
@@ -1351,13 +1386,14 @@ send_group_sigqueue(int sig, struct sigqueue *q, struct task_struct *p)
{
unsigned long flags;
int ret = 0;
+ int notify;

BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));

read_lock(&tasklist_lock);
/* Since it_lock is held, p->sighand cannot be NULL. */
spin_lock_irqsave(&p->sighand->siglock, flags);
- handle_stop_signal(sig, p);
+ notify = prepare_signal(sig, p);

/* Short-circuit ignored signals. */
if (sig_ignored(p, sig)) {
@@ -1392,6 +1428,7 @@ send_group_sigqueue(int sig, struct sigqueue *q, struct task_struct *p)
__group_complete_signal(sig, p);
out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ finish_signal(notify, p);
read_unlock(&tasklist_lock);
return ret;
}
@@ -1415,6 +1452,7 @@ void do_notify_parent(struct task_struct *tsk, int sig)
struct siginfo info;
unsigned long flags;
struct sighand_struct *psig;
+ int notify = 0;

BUG_ON(sig == -1);

@@ -1485,9 +1523,10 @@ void do_notify_parent(struct task_struct *tsk, int sig)
sig = 0;
}
if (valid_signal(sig) && sig > 0)
- __group_send_sig_info(sig, &info, tsk->parent);
+ generate_group_signal(sig, &info, tsk->parent, &notify);
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);
+ finish_signal(notify, tsk->parent);
}

static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
@@ -1496,6 +1535,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
unsigned long flags;
struct task_struct *parent;
struct sighand_struct *sighand;
+ int notify = 0;

if (tsk->ptrace & PT_PTRACED)
parent = tsk->parent;
@@ -1538,12 +1578,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
spin_lock_irqsave(&sighand->siglock, flags);
if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
!(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
- __group_send_sig_info(SIGCHLD, &info, parent);
+ generate_group_signal(SIGCHLD, &info, parent, &notify);
/*
* Even if SIGCHLD is not generated, we must wake up wait4 calls.
*/
__wake_up_parent(tsk, parent);
spin_unlock_irqrestore(&sighand->siglock, flags);
+ finish_signal(notify, parent);
}

static inline int may_ptrace_stop(void)
@@ -2251,10 +2292,12 @@ static int do_tkill(int tgid, int pid, int sig)
* probe. No signal is actually delivered.
*/
if (!error && sig && p->sighand) {
+ int notify;
spin_lock_irq(&p->sighand->siglock);
- handle_stop_signal(sig, p);
+ notify = prepare_signal(sig, p);
error = specific_send_sig_info(sig, &info, p);
spin_unlock_irq(&p->sighand->siglock);
+ finish_signal(notify, p);
}
}
read_unlock(&tasklist_lock);

2008-02-29 11:57:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

(Sorry Roland, re-sending with cc's)

On 02/28, Roland McGrath wrote:
>
> > BTW, I think we have the same problem when handle_stop_signal() does
> > do_notify_parent_cldstop(p, CLD_STOPPED) above. The multithreaded app
> > in the middle of the group stop can resume without actually seeing
> > SIGCONT, no?
>
> I don't think I follow the scenario you have in mind there. When siglock
> is dropped there, noone has been woken up yet. It's just as if the sending
> of SIGCONT had not begun yet.

Yes.

> > Note also sig_needs_tasklist(), we take tasklist_lock() exactly because
> > we will probably drop siglock.
>
> do_notify_parent_cldstop will always need tasklist_lock because of its
> parent link. With my patch below, the signal-posting path should never
> drop and reacquire the siglock any more. So we could just make
> do_notify_parent_cldstop take the lock (read_lock nests anyway) and the
> signal-posting path callers don't need to take it.

We have to take tasklist in advance to make sure that the target task
can't go away once we drop its ->siglock.

Otherwise, finish_signal() can do:

read_lock(tasklist);
if (p->signal)
do_notify_parent_cldstop(p, notify);
read_unlock(tasklist);

but the parent can miss the notification. Is it OK?

> > What do you think about the approach at least?
>
> My first reaction was that it would have the wrong semantics. The special
> effect of SIGCONT to resume the process happens at the time of signal
> generation, not at signal delivery. The CLD_CONTINUED notification to the
> parent has to happen when the resumption happens. If SIGCONT is blocked or
> ignored, then the process is woken up but may never dequeue the signal.
> However, in fact it will always already been in get_signal_to_deliver by
> definition, since that's where stopping happens. So it will indeed always
> come out and see your check before it resumes. The only difference then is
> whether the SIGCHLD gets sent immediately at post time or only when the
> resumed child actually gets scheduled. I don't think that's a problem.

Another difference is that the notification may come from another thread,
not from the signalled one. (this only matters if the task is ptraced).
Hopefully not a problem too?

> Certainly it should just be a flag or two in signal_struct.flags. The
> extra field is really too ugly. We need to think carefully about all the
> places that clear/reset ->signal->flags, though.

Yes, exactly.

> Note that doing two
> do_notify_parent_cldstop calls in a two is redundant, since the signals
> don't queue and the parent is not often going to respond so quickly that
> the second one ever actually posts. I'd be inclined to just make that an
> else if.

Yes, I also thought about that.

> > Hmm... Can't we make a simpler patch for the start? See below.
> > We can notify the parent earlier, before waking the TASK_STOPPED
> > child. This should fix this race, no?
>
> That is exactly what my first impulse was and what I described as opening
> another can of worms. (I almost sent the identical patch yesterday.) Here
> is what I was concerned about. This does the CLD_CONTINUED SIGCHLD before
> any wakeups, so task_struct.state is still TASK_STOPPED. The parent can
> see the SIGCHLD, call wait, and see the child still in TASK_STOPPED. In
> wait_task_stopped, a few things can happen. It might by then have gotten
> to the signal-sender's retaking of the siglock and wake_up_state; if so, it
> sees p->state no longer TASK_STOPPED and/or p->exit_code no longer nonzero,
> and returns 0. Then the parent's wait goes back to sleep (or reports
> nothing for WNOHANG), and it never gets a wakeup corresponding to the
> CLD_CONTINUED notification, which it expects in a call using WCONTINUED.
> This is why the notification comes after the wakeup.

Yes, you are right, thanks. But please note that do_wait() is very wrong
here, see http://marc.info/?l=linux-kernel&m=119713909207444 (the patch
needs more work, of course).


Btw, I tried to read the comment many times, but still I can't understand
why handle_stop_signal() reports CLD_STOPPED.

Let's look at your patch:

notify = ((p->signal->flags & SIGNAL_STOP_STOPPED) ?
CLD_CONTINUED : CLD_STOPPED);

Q: why can't we do

if (p->signal->flags & SIGNAL_STOP_STOPPED)
notify = CLD_CONTINUED;

instead?

IOW. Suppose that SIGCONT comes after SIGSTOP. Now, if SIGSTOP was not
dequeued yet, we report nothing. If it was dequeued by some thread which
initiates ->group_stop_count we report CLD_STOPPED. What is the difference?

> Here is
> another version of my patch, that also eliminates the lock-drop for the
> CLD_STOPPED notification when a group stop is still in progress.
> (Though I still haven't yet grokked how that case introduced any bug.)
> What do you think?

My only concern is that it complicates the code :(

However,

> I don't dislike the direction of your flag-setting approach above. But
> it does introduce more new subtleties that warrant more thought.

Yes sure.

> /*
> + * This is called by a few places outside this file.
> + */
> +int
> +__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
> +{
> + int notify;
> + int ret = generate_group_signal(sig, info, p, &notify);
> + if (unlikely(notify)) {
> + spin_unlock(&p->sighand->siglock);
> + do_notify_parent_cldstop(p, notify);
> + spin_lock(&p->sighand->siglock);
> + }
> + return ret;
> +}

Perhaps it is better to export generate_group_signal() instead (not right now
of course). There is only one caller which does __group_send_sig_info(SIGCONT),
do_tty_hangup(). Any function which does unlock+lock is dangerous, the user
can miss the fact it doesn't hold the lock throughout.

Oleg.

2008-02-29 13:16:37

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On 02/29, Oleg Nesterov wrote:
>
> On 02/28, Roland McGrath wrote:
> >
> > I don't dislike the direction of your flag-setting approach above. But
> > it does introduce more new subtleties that warrant more thought.
>
> Yes sure.

Still. The more I think about it, the more I like it. It is so simple,
and allows us to do further cleanups. Please look at the patch below.
Now we don't need tasklist to send the signal.

handle_stop_signal() should be cleanuped as you pointed out, but this
patch only does the requied minimum.

I am worried about ptrace, but hopefully there is no problem.

Oleg.

--- 25/include/linux/sched.h~SIG_NP 2008-02-25 19:13:05.000000000 +0300
+++ 25/include/linux/sched.h 2008-02-29 15:38:00.000000000 +0300
@@ -455,6 +455,8 @@ struct signal_struct {
int group_stop_count;
unsigned int flags; /* see SIGNAL_* flags below */

+ unsigned int notify_parent; /* will be moved into ->flags */
+
/* POSIX.1b Interval Timers */
struct list_head posix_timers;

--- 25/kernel/signal.c~SIG_NP 2008-02-29 15:28:28.000000000 +0300
+++ 25/kernel/signal.c 2008-02-29 15:54:54.000000000 +0300
@@ -597,9 +597,7 @@ static void handle_stop_signal(int sig,
*/
p->signal->group_stop_count = 0;
p->signal->flags = SIGNAL_STOP_CONTINUED;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_STOPPED);
- spin_lock(&p->sighand->siglock);
+ p->signal->notify_parent = CLD_STOPPED;
}
rm_from_queue(SIG_KERNEL_STOP_MASK, &p->signal->shared_pending);
t = p;
@@ -638,9 +636,7 @@ static void handle_stop_signal(int sig,
*/
p->signal->flags = SIGNAL_STOP_CONTINUED;
p->signal->group_exit_code = 0;
- spin_unlock(&p->sighand->siglock);
- do_notify_parent_cldstop(p, CLD_CONTINUED);
- spin_lock(&p->sighand->siglock);
+ p->signal->notify_parent = CLD_CONTINUED;
} else {
/*
* We are not stopped, but there could be a stop
@@ -1748,6 +1744,21 @@ static int do_signal_stop(int signr)
return 1;
}

+static int handle_notify_parent(void)
+{
+ int why = current->signal->notify_parent;
+ if (likely(!why))
+ return 0;
+ current->signal->notify_parent = 0;
+ spin_unlock_irq(&current->sighand->siglock);
+
+ read_lock(&tasklist_lock);
+ do_notify_parent_cldstop(current, why);
+ read_unlock(&tasklist_lock);
+
+ return 1;
+}
+
int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
struct pt_regs *regs, void *cookie)
{
@@ -1761,6 +1772,9 @@ relock:
for (;;) {
struct k_sigaction *ka;

+ if (unlikely(handle_notify_parent()))
+ goto relock;
+
if (unlikely(current->signal->group_stop_count > 0) &&
do_signal_stop(0))
goto relock;

2008-03-01 02:00:19

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

> > I don't think I follow the scenario you have in mind there. When siglock
> > is dropped there, noone has been woken up yet. It's just as if the sending
> > of SIGCONT had not begun yet.
>
> Yes.

So you concur that there is no misbehavior from that lock-drop?
(I'm all for getting rid of it, just don't want to have a wrong
thought in my head about what the precise issues are.)

> We have to take tasklist in advance to make sure that the target task
> can't go away once we drop its ->siglock.

I see.

> Otherwise, finish_signal() can do:
>
> read_lock(tasklist);
> if (p->signal)
> do_notify_parent_cldstop(p, notify);
> read_unlock(tasklist);
>
> but the parent can miss the notification. Is it OK?

I certainly didn't have in mind permitting that. But if the only thing
that can happen is that if the parent called wait already to reap the
child, it never gets the SIGCHLD for it, then technically that is fine.
(In fact, we are not POSIX-conforming now because pending SIGCHLDs are
supposed to be cleared by a wait call made with SIGCHLD blocked when there
are no more unwaited-for children ready.) It seemed relevant to share that
nit, but I don't think we want to perturb the behavior here any time soon.

> Another difference is that the notification may come from another thread,
> not from the signalled one. (this only matters if the task is ptraced).
> Hopefully not a problem too?

The status quo is that the CLD_CONTINUED signal-sending call is made by an
unrelated task, the one sending the SIGCONT to the child, rather than with
child==current as other SIGCHLD's are. I don't think that should matter,
because nothing in the do_notify_parent_cldstop code path consults current
for anything. With your new plan, that path will be entered instead by
some thread in the group that was woken by generating SIGCONT. If it did
matter, that would presumably only be better.

I guess the issue you meant was that the task_struct argument to
do_notify_parent_cldstop is whichever thread woke up and took the siglock
first, rather than the recipient of the SIGCONT (usually the group leader).
Now I understand your reference to ptrace--it does tsk = tsk->group_leader
unless ptraced. AFAICT, what this affects is only the si_pid, si_uid,
si_[us]time values in the siginfo_t for the SIGCHLD with si_code =
CLD_CONTINUED. Not that ptracers really care one way or the other, but I
think the ptrace special case really only sensibly applies to CLD_STOPPED.
There aren't going to be CLD_CONTINUED notifications for each and every
thread anyway (and I don't think they would be all that useful to a ptracer).

So independent of all this, even as things stand now I would do:

if (why == CLD_STOPPED && (tsk->ptrace & PT_PTRACED))

to clear that up.

> Yes, you are right, thanks. But please note that do_wait() is very wrong
> here, see http://marc.info/?l=linux-kernel&m=119713909207444 (the patch
> needs more work, of course).

I haven't really reviewed that stuff properly (sorry about that, behind on
many things). Let's not tie up this can of worms with that one for now.

> Btw, I tried to read the comment many times, but still I can't understand
> why handle_stop_signal() reports CLD_STOPPED.
>
> Let's look at your patch:
>
> notify = ((p->signal->flags & SIGNAL_STOP_STOPPED) ?
> CLD_CONTINUED : CLD_STOPPED);
>
> Q: why can't we do
>
> if (p->signal->flags & SIGNAL_STOP_STOPPED)
> notify = CLD_CONTINUED;
>
> instead?
>
> IOW. Suppose that SIGCONT comes after SIGSTOP. Now, if SIGSTOP was not
> dequeued yet, we report nothing. If it was dequeued by some thread which
> initiates ->group_stop_count we report CLD_STOPPED. What is the difference?

If SIGSTOP was not dequeued yet, then it was pending and was cleared by
generating SIGCONT, so nothing happened: no stop, no continue, just a SIGCONT
is now pending. (There is also the example where it's SIGTSTP and it was
blocked, so there's a case that does not depend on a race happening.) If the
group stop has begun, then at least one thread has already begun stopping,
and experienced effects like a syscall returning -EINTR. Those effects
shouldn't be seen if "nothing happened". But they are expected if the
process stopped and then continued very quickly. If that's what happened,
then the CLD_STOPPED SIGCHLD was posted before the continue happened.

So that was my original rationale. But there are still races. (This can't
come up in the blocked-SIGTSTP case, where the behavior is reliably testable
from the application perspective.) An unblocked stop signal might have
caused a signal_wake_up on some thread that then lost the race for the
siglock. The SIGCONT-sender clears the pending SIGSTOP that was the reason
that thread's syscall got interrupted. Then that thread will see -EINTR and
such effects, but there will never be any CLD_STOPPED report to justify it.
If that same thread is not the one to immediately run the SIGCONT handler,
there is no other excuse for the side effects. Since that possibility still
exists (and is probably impossible to avoid), maybe I shouldn't worry about
the group-stop-in-progress case. Clearly this isn't something anyone else
ever really worries about but me.

Still, when group_stop_count > 0 that means that we know for sure that every
thread has TIF_SIGPENDING and so all those that were in syscalls will for
sure see -EINTR and such side effects. So I do lean towards having that
situation reported as "stopped and then continued" rather than "nothing
happened".

> Perhaps it is better to export generate_group_signal() instead (not right now
> of course). There is only one caller which does __group_send_sig_info(SIGCONT),
> do_tty_hangup(). Any function which does unlock+lock is dangerous, the user
> can miss the fact it doesn't hold the lock throughout.

I certainly think we should clean up all the callers to __group_send_sig_info
from outside signal.c, but I did not want to get into that cleanup in the
middle of this change. Once we finish hashing out any changes to the locking
requirements for calling into the signals code, we can clean those uses up.

> Still. The more I think about it, the more I like it. It is so simple,
> and allows us to do further cleanups. Please look at the patch below.
> Now we don't need tasklist to send the signal.

I like it too. My inclination is still to do the other cleanup first, and
also do enough other cleanups first so that we're confident of using a
signal_struct.flags bit for this right off.

> I am worried about ptrace, but hopefully there is no problem.

In practice no ptracer ever wanted a CLD_CONTINUED notification or cares
about them now. So even "breaking" it wouldn't get us in much trouble.
But I don't see any problem at all, especially if we change the
do_notify_parent_cldstop behavior to use tsk = tsk->group_leader as
I discussed above.

> @@ -1748,6 +1744,21 @@ static int do_signal_stop(int signr)
> return 1;
> }
>
> +static int handle_notify_parent(void)
> +{
> + int why = current->signal->notify_parent;
> + if (likely(!why))
> + return 0;
> + current->signal->notify_parent = 0;
> + spin_unlock_irq(&current->sighand->siglock);
> +
> + read_lock(&tasklist_lock);
> + do_notify_parent_cldstop(current, why);
> + read_unlock(&tasklist_lock);
> +
> + return 1;
> +}
> +
> int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
> struct pt_regs *regs, void *cookie)
> {
> @@ -1761,6 +1772,9 @@ relock:
> for (;;) {
> struct k_sigaction *ka;
>
> + if (unlikely(handle_notify_parent()))
> + goto relock;
> +
> if (unlikely(current->signal->group_stop_count > 0) &&
> do_signal_stop(0))
> goto relock;

This is only ever needed after we have just been in do_signal_stop and it
actually stopped (returned 1). That always gets to a goto relock. So this
could be outside the for loop. I'd just inline it rather than have another
function that conditionally unlocks, which sparse hates.

So:

spin_lock_irq(&current->sighand->siglock);

/*
* long and helpful comment
*/
if (unlikely(current->signal->notify_parent)) {
int why = current->signal->notify_parent;
current->signal->notify_parent = 0;
spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
do_notify_parent_cldstop(current, why);
read_unlock(&tasklist_lock);
goto relock;
}

for (;;) {


Thanks,
Roland

2008-03-01 16:37:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

On 02/29, Roland McGrath wrote:
>
> > > I don't think I follow the scenario you have in mind there. When siglock
> > > is dropped there, noone has been woken up yet. It's just as if the sending
> > > of SIGCONT had not begun yet.
> >
> > Yes.
>
> So you concur that there is no misbehavior from that lock-drop?
> (I'm all for getting rid of it, just don't want to have a wrong
> thought in my head about what the precise issues are.)

Let me first clarify what I meant when I said "I think we have the same problem".
Suppose that we have the "int sigcont_received" flag which is set by the SIGCONT
handler. Now, the main thread sends SIGSTOP to itself, and another thread does:

if (this_task_is_stopped("/proc/self/status") {
// If I'am here and running, I was woken by SIGCONT.
// Otherwise I must see the group-stop-in-progress
// "signal" after syscall
ASSERT(sigcont_received == 1);
}

This assertion abibe may fail. But of course, this example is completely
artificial, there is no real problem.

However, I think this lock-drop is really wrong by another reason. Suppose
that another sig_kernel_stop() signal comes when we drop ->siglock. The new
signal should dismiss the "pending" SIGCONT, but this doesn't happen.
When handle_stop_signal() re-takes ->siglock it removes the new signal.

Worse. Let's suppose that the new signal was already deueued by some thread
before handle_stop_signal() re-takes ->siglock. This thread initiates the
new group stop (note that we cleared ->group_stop_count). Now we are waking
up all TASK_STOPPED threads, but we don't clear ->group_stop_count. This
means that eventually the thread group will have SIGNAL_STOP_STOPPED set
but not all threads are stopped.

> > Otherwise, finish_signal() can do:
> >
> > read_lock(tasklist);
> > if (p->signal)
> > do_notify_parent_cldstop(p, notify);
> > read_unlock(tasklist);
> >
> > but the parent can miss the notification. Is it OK?
>
> I certainly didn't have in mind permitting that. But if the only thing
> that can happen is that if the parent called wait already to reap the
> child, it never gets the SIGCHLD for it, then technically that is fine.

Well yes, but what if CLONE_THREAD task autoreaps itself?

> > Another difference is that the notification may come from another thread,
> > not from the signalled one. (this only matters if the task is ptraced).
> > Hopefully not a problem too?
>
> I guess the issue you meant was that the task_struct argument to
> do_notify_parent_cldstop is whichever thread woke up and took the siglock
> first, rather than the recipient of the SIGCONT (usually the group leader).
> Now I understand your reference to ptrace--it does tsk = tsk->group_leader
> unless ptraced. AFAICT, what this affects is only the si_pid, si_uid,
> si_[us]time values in the siginfo_t for the SIGCHLD with si_code =
> CLD_CONTINUED. Not that ptracers really care one way or the other, but I
> think the ptrace special case really only sensibly applies to CLD_STOPPED.
> There aren't going to be CLD_CONTINUED notifications for each and every
> thread anyway (and I don't think they would be all that useful to a ptracer).

Thanks! I really hoped the answer will be like that.

> So independent of all this, even as things stand now I would do:
>
> if (why == CLD_STOPPED && (tsk->ptrace & PT_PTRACED))
>
> to clear that up.

This breaks ptrace_stop()->do_notify_parent_cldstop(CLD_TRAPPED) but I see
what you mean.

Anyway, I think we can clean up this separately. This CLD_STOPPED reported
from handle_stop_signal() is "group wide" anyway. Actually, I think that
get_signal_to_deliver() can just do

do_notify_parent_cldstop(current->group_leader, why);

, this even looks more sane to me.

> > IOW. Suppose that SIGCONT comes after SIGSTOP. Now, if SIGSTOP was not
> > dequeued yet, we report nothing. If it was dequeued by some thread which
> > initiates ->group_stop_count we report CLD_STOPPED. What is the difference?
>
> If SIGSTOP was not dequeued yet, then it was pending and was cleared by
> generating SIGCONT, so nothing happened: no stop, no continue, just a SIGCONT
> is now pending. (There is also the example where it's SIGTSTP and it was
> blocked, so there's a case that does not depend on a race happening.) If the
> group stop has begun, then at least one thread has already begun stopping,
> and experienced effects like a syscall returning -EINTR. Those effects
> shouldn't be seen if "nothing happened". But they are expected if the
> process stopped and then continued very quickly. If that's what happened,
> then the CLD_STOPPED SIGCHLD was posted before the continue happened.

OK, thanks. But let's look at this scenario with the different angle.
We pretend that the group stop was already finished, and report CLD_STOPPED.
But at the same time, this SIGCONT "cancels" this group stop, so what we have
is "stopped and then continued". Isn't it better to always report CLD_CONTINUED?
(yes, this looks as if the preceding CLD_STOPPED was somehow lost, but still).

To avoid a possible confusing, I am not really arguing, just curious. This
issue is minor anyway.

> > int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka,
> > struct pt_regs *regs, void *cookie)
> > {
> > @@ -1761,6 +1772,9 @@ relock:
> > for (;;) {
> > struct k_sigaction *ka;
> >
> > + if (unlikely(handle_notify_parent()))
> > + goto relock;
> > +
> > if (unlikely(current->signal->group_stop_count > 0) &&
> > do_signal_stop(0))
> > goto relock;
>
> This is only ever needed after we have just been in do_signal_stop and it
> actually stopped (returned 1). That always gets to a goto relock. So this
> could be outside the for loop. I'd just inline it rather than have another
> function that conditionally unlocks, which sparse hates.
>
> So:
>
> spin_lock_irq(&current->sighand->siglock);
>
> /*
> * long and helpful comment
> */
> if (unlikely(current->signal->notify_parent)) {
> int why = current->signal->notify_parent;
> current->signal->notify_parent = 0;
> spin_unlock_irq(&current->sighand->siglock);
> read_lock(&tasklist_lock);
> do_notify_parent_cldstop(current, why);
> read_unlock(&tasklist_lock);
> goto relock;
> }
>
> for (;;) {

Agreed, thanks!

Oleg.

2008-03-03 13:21:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

Actually, it is easy to re-use signal_struct->flags from the very beginning,
no need to introduce the new flag temporary, see the attached patches. I also
have a couple of simple cleanups on top of them (use cached ->signal value).

Perhaps, we can add the new helper which sets SIGNAL_GROUP_EXIT but preserves
SIGNAL_CLD_MASK, but I don't think this is really needed.

Oleg.


Attachments:
(No filename) (376.00 B)
1_SIGCONT_IMPL.patch (2.80 kB)
2_HSS_SIMPLIFY.patch (2.66 kB)
Download all attachments

2008-03-06 09:06:17

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

Oleg, I'm responding to all your messages mostly in chronological order. I
have seen the later ones with new code and ideas. But I like to follow up
each point so we continue to double-check each other's logic and rationale.

Here we're talking about the first drop of siglock in handle_stop_signal,
where it calls do_notify_parent_cldstop with CLD_STOPPED.

> Let me first clarify what I meant when I said "I think we have the same
> problem". Suppose that we have the "int sigcont_received" flag which is
> set by the SIGCONT handler. Now, the main thread sends SIGSTOP to itself,
> and another thread does:
>
> if (this_task_is_stopped("/proc/self/status") {
> // If I'am here and running, I was woken by SIGCONT.
> // Otherwise I must see the group-stop-in-progress
> // "signal" after syscall
> ASSERT(sigcont_received == 1);
> }
>
> This assertion abibe may fail. But of course, this example is completely
> artificial, there is no real problem.

Here you mean testing that the main thread got into TASK_STOPPED (and could
be so seen via /proc) while the group stop was still in progress but the
second thread running that read+assert has not stopped yet. Before
dropping the siglock, we abort the group stop in progress. So the second
thread never stops, and runs before the SIGCONT handler. What /proc
reports about individual threads is not something I'm trying to make
guarantees about. But I see there is a problem there because
wait_task_stopped checks group_stop_count and not SIGNAL_STOP_STOPPED,
so it would report the process as stopped to WUNTRACED wait.

> However, I think this lock-drop is really wrong by another reason. Suppose
> that another sig_kernel_stop() signal comes when we drop ->siglock. The new
> signal should dismiss the "pending" SIGCONT, but this doesn't happen.
> When handle_stop_signal() re-takes ->siglock it removes the new signal.

I don't think this is really a problem. This just means the SIGCONT is
effectively "after" the new signal, and the SIGCONT clears it rather than
vice versa.

> Worse. Let's suppose that the new signal was already deueued by some thread
> before handle_stop_signal() re-takes ->siglock. This thread initiates the
> new group stop (note that we cleared ->group_stop_count). Now we are waking
> up all TASK_STOPPED threads, but we don't clear ->group_stop_count. This
> means that eventually the thread group will have SIGNAL_STOP_STOPPED set
> but not all threads are stopped.

This is indeed a problem.

> > > Otherwise, finish_signal() can do:
> > >
> > > read_lock(tasklist);
> > > if (p->signal)
> > > do_notify_parent_cldstop(p, notify);
> > > read_unlock(tasklist);
> > >
> > > but the parent can miss the notification. Is it OK?
> >
> > I certainly didn't have in mind permitting that. But if the only thing
> > that can happen is that if the parent called wait already to reap the
> > child, it never gets the SIGCHLD for it, then technically that is fine.
>
> Well yes, but what if CLONE_THREAD task autoreaps itself?

Indeed then p->signal will be null and so we'll send no signal.
So no go. (No loss, we didn't like this road much anyway.)

> > So independent of all this, even as things stand now I would do:
> >
> > if (why == CLD_STOPPED && (tsk->ptrace & PT_PTRACED))
> >
> > to clear that up.
>
> This breaks ptrace_stop()->do_notify_parent_cldstop(CLD_TRAPPED) but I see
> what you mean.

Sure, so why != CLD_CONTINUED then.

> Anyway, I think we can clean up this separately. This CLD_STOPPED reported
> from handle_stop_signal() is "group wide" anyway. Actually, I think that
> get_signal_to_deliver() can just do
>
> do_notify_parent_cldstop(current->group_leader, why);
>
> , this even looks more sane to me.

Quite so.

> OK, thanks. But let's look at this scenario with the different angle.
> We pretend that the group stop was already finished, and report CLD_STOPPED.
> But at the same time, this SIGCONT "cancels" this group stop, so what we have
> is "stopped and then continued". Isn't it better to always report CLD_CONTINUED?
> (yes, this looks as if the preceding CLD_STOPPED was somehow lost, but still).

I think "stopped and then continued" means do exactly what would happen if
you really finished stopping and then continued without this race hitting.
If you did that and the parent had SIGCHLD blocked the whole time, then
waits two minutes, and then unblocks or sigwaits to get the siginfo_t, the
parent would see only CLD_STOPPED. (This is because SIGCHLD doesn't queue
and the first siginfo_t pending wins.) The other non-racy scenario that
could be is that the stop signal never got delivered at all before the
SIGCONT cleared it, when there would be no SIGCHLDs at all. As I said
before, I exclude that version of events from being what we represent to
userland as having happened because the syscall-interruption effects on some
threads may already have happened, and give the lie to this story. There is
only one story consistent with some sequence of events as specified in
POSIX, that explains all the possible userland-visible ramifications of this
particular internal race scenario. Seeing a CLD_CONTINUED without a
preceding CLD_STOPPED is a new sequence for userland that was never possible
before. In an application following POSIX there is no scenario possible
under the specification in which it could observe that sequence of events.
(Except that last one is probably not really true because I suspect it's
unspecified or implementation-defined whether when a non-queued signal is
posted twice, the first or second siginfo_t survives.) That is my pedantic
rationale. (Are you sorry you were curious? ;-)


Thanks,
Roland

2008-03-06 10:51:36

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] [RFC] fix missed SIGCONT cases

> Actually, it is easy to re-use signal_struct->flags from the very beginning,
> no need to introduce the new flag temporary, see the attached patches. I also
> have a couple of simple cleanups on top of them (use cached ->signal value).

Looks like a good start.

> Perhaps, we can add the new helper which sets SIGNAL_GROUP_EXIT but preserves
> SIGNAL_CLD_MASK, but I don't think this is really needed.

Actually, I think it is. This made think of another pedantic concern.
(Not that I'm sure we get this right now anyway.)

Say a child was previously stopped, parent had seen CLD_STOPPED and done
WSTOPPED wait. Now we send SIGCONT followed shortly by SIGTERM (unhandled
fatal signal). In the interval before the SIGTERM has yet been dequeued
(nor SIGKILLs sent to other threads by __group_complete_signal), the parent
does waitpid(pid, WSTOPPED|WCONTINUED|WEXITED|WNOHANG). It should see one
of: WIFSTOPPED (still); WIFSIGNALED (dead); WIFCONTINUED. Instead, waitpid
will return 0 (it's running, not stopped, not continued). This says
SIGNAL_STOP_CONTINUED ought to function as normal (i.e. WCONTINUED polling)
until we're actually ready for wait to report the process as dead.

We already break this pedantic nit because flags = SIGNAL_GROUP_EXIT clears
SIGNAL_STOP_CONTINUED. But if we clean up all flags usage, I think we
should make it preserve SIGNAL_STOP_CONTINUED.

I'll admit it's thin since a SIGKILL not yet dequeued or in the middle of
delivery brings the task to running so WNOHANG wait won't report it as
still stopped, won't report it for WCONTINUED at all, and won't report it
as dead yet. But if we were to make wait key on SIGNAL_STOP_STOPPED, as is
probably necessary to do WSTOPPED waits for processes with a zombie
group_leader, then this too could be made consistent. (Leave the
SIGNAL_STOP_STOPPED bit set throughout, so WSTOPPED|WNOHANG waits continues
to report it as stopped until it actually dies.)

> @@ -1761,6 +1757,19 @@ int get_signal_to_deliver(siginfo_t *inf
>
> relock:
> spin_lock_irq(&current->sighand->siglock);
> +
> + if (unlikely(current->signal->flags & SIGNAL_CLD_MASK)) {
> + int why = (current->signal->flags & SIGNAL_STOP_CONTINUED)

This should have a comment before the block. Explain that it is finishing
the report set up by handle_stop_signal while we were in TASK_STOPPED in a
previous iteration here inside get_signal_to_deliver.


Thanks,
Roland