2007-08-28 15:51:40

by Oleg Nesterov

[permalink] [raw]
Subject: PATCH? fix SIGNAL_STOP_DEQUEUED vs SIGCONT race

Two threads, T1 and T2, SIGTTIN is SIG_DFL, SIGTTOU has a handler.

T1 does get_signal_to_deliver(), dequeus SIGTTIN, unlocks ->siglock.

SIGCONT comes in, nothing to do yet, just clear SIGNAL_STOP_DEQUEUED.

SIGTTOU is sent, T2 dequeues it and sets SIGNAL_STOP_DEQUEUED again.
It has a handler, we shouldn't stop, but

T1 continues, takes ->siglock, and calls do_signal_stop().

Move the setting of SIGNAL_STOP_DEQUEUED from dequeue_signal() to
get_signal_to_deliver(), and set this flag only if we are really going to stop.
This also simplifies the code and makes the SIGNAL_STOP_DEQUEUED usage more
understandable.

However, this changes the behaviour when the task is ptraced. If the debugger
doesn't clear ->exit_code, SIGSTOP always succeeds after ptrace_stop(), even
if SIGCONT was sent in between. I can't decide whether this change is good
or bad, hopefully Roland can clarify.

Signed-off-by: Oleg Nesterov <[email protected]>

--- t/kernel/signal.c~S_G_S 2007-08-23 16:02:57.000000000 +0400
+++ t/kernel/signal.c 2007-08-28 19:15:28.000000000 +0400
@@ -409,22 +409,7 @@ int dequeue_signal(struct task_struct *t
}
if (likely(tsk == current))
recalc_sigpending();
- if (signr && unlikely(sig_kernel_stop(signr))) {
- /*
- * Set a marker that we have dequeued a stop signal. Our
- * caller might release the siglock and then the pending
- * stop signal it is about to process is no longer in the
- * pending bitmasks, but must still be cleared by a SIGCONT
- * (and overruled by a SIGKILL). So those cases clear this
- * shared flag after we've set it. Note that this flag may
- * remain set after the signal we return is ignored or
- * handled. That doesn't matter because its only purpose
- * is to alert stop-signal processing code when another
- * processor has come along and cleared the flag.
- */
- if (!(tsk->signal->flags & SIGNAL_GROUP_EXIT))
- tsk->signal->flags |= SIGNAL_STOP_DEQUEUED;
- }
+
if (signr && likely(tsk == current) &&
((info->si_code & __SI_MASK) == __SI_TIMER) &&
info->si_sys_private){
@@ -1683,9 +1668,6 @@ static int do_signal_stop(int signr)
struct signal_struct *sig = current->signal;
int stop_count;

- if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED))
- return 0;
-
if (sig->group_stop_count > 0) {
/*
* There is a group stop in progress. We don't need to
@@ -1849,6 +1831,9 @@ relock:
continue;

if (sig_kernel_stop(signr)) {
+ if (current->signal->flags & SIGNAL_GROUP_EXIT)
+ continue;
+
/*
* The default action is to stop all threads in
* the thread group. The job control signals
@@ -1860,14 +1845,20 @@ relock:
* We need to check for that and bail out if necessary.
*/
if (signr != SIGSTOP) {
+ /*
+ * Set a marker that we have dequeued a stop
+ * signal. We are going to unlock ->siglock,
+ * another signal can cancel group stop request
+ * during this window. In that case this marker
+ * will be cleared when we re-check below.
+ */
+ current->signal->flags |= SIGNAL_STOP_DEQUEUED;
spin_unlock_irq(&current->sighand->siglock);
-
- /* signals can be posted during this window */
-
if (is_current_pgrp_orphaned())
goto relock;
-
spin_lock_irq(&current->sighand->siglock);
+ if (!(current->signal->flags & SIGNAL_STOP_DEQUEUED))
+ continue;
}

if (likely(do_signal_stop(signr))) {


2007-09-01 08:27:25

by Roland McGrath

[permalink] [raw]
Subject: Re: PATCH? fix SIGNAL_STOP_DEQUEUED vs SIGCONT race

> Move the setting of SIGNAL_STOP_DEQUEUED from dequeue_signal() to
> get_signal_to_deliver(), and set this flag only if we are really going to stop.
> This also simplifies the code and makes the SIGNAL_STOP_DEQUEUED usage more
> understandable.

It looks like a nice cleanup to me.

> However, this changes the behaviour when the task is ptraced. If the debugger
> doesn't clear ->exit_code, SIGSTOP always succeeds after ptrace_stop(), even
> if SIGCONT was sent in between. I can't decide whether this change is good
> or bad, hopefully Roland can clarify.

Hmm. I think this is bad.

First, considering only the single-threaded case, there are debugger vs
SIGCONT races. Someone does kill(pid,SIGSTOP);kill(pid,SIGCONT); while pid
is debugged. The mandate for end user behavior here is that pid cannot
wind up sitting in job control stop in the end. Say the debugger is
e.g. strace, simply printing every signal and passing it through.
So say it goes:
T K D
merrily running ... blocked in wait4
kill(K, SIGSTOP)
dequeue SIGSTOP
-> ptrace_stop
wait4 -> K,{SIGSTOP}
kill(K, SIGCONT)
PTRACE_CONT,K,SIGSTOP
do_signal_stop(SIGSTOP)
wait4 -> K,{SIGSTOP}

The debugger did the obvious thing: just pass through what it got.
The killer did something POSIX guarantees leaves T running.
T is in job control stop, with a pending SIGCONT (normally impossible).
It's not the end of the world, since the next SIGKILL or SIGCONT will
always wake it up. But it's not right.

There are related issues with multi-threaded job control stop and with
suddenly killing the debugger. I could get into those in detail. But I
think the case above illustrates why we need a stop signal pending
consideration by the debugger to be like a stop signal pending locking for
the is_current_pgrp_orphaned() check when a SIGCONT/SIGKILL comes in between.

It's still probably a worthwhile cleanup to have the logic only in
get_signal_to_deliver, and to fix the problem you cited. It will take only
a little extra code to handle the ptrace case too, i.e.
if (sig_kernel_stop(signr) &&
current->sighand->action[signr-1] == SIG_DFL &&
!(current->signal->flags & SIGNAL_GROUP_EXIT))
current->signal->flags |= SIGNAL_STOP_DEQUEUED;
ptrace_stop(signr, signr, info);
if (sig_kernel_stop(signr) && current->exit_code == signr &&
!(current->signal->flags & SIGNAL_STOP_DEQUEUED) &&
current->sighand->action[signr-1] == SIG_DFL)
current->exit_code = 0;
maybe.


Thanks,
Roland

2007-09-01 11:41:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: PATCH? fix SIGNAL_STOP_DEQUEUED vs SIGCONT race

On 09/01, Roland McGrath wrote:
>
> > However, this changes the behaviour when the task is ptraced. If the debugger
> > doesn't clear ->exit_code, SIGSTOP always succeeds after ptrace_stop(), even
> > if SIGCONT was sent in between. I can't decide whether this change is good
> > or bad, hopefully Roland can clarify.
>
> Hmm. I think this is bad.
>
> First, considering only the single-threaded case, there are debugger vs
> SIGCONT races. Someone does kill(pid,SIGSTOP);kill(pid,SIGCONT); while pid
> is debugged. The mandate for end user behavior here is that pid cannot
> wind up sitting in job control stop in the end. Say the debugger is
> e.g. strace, simply printing every signal and passing it through.
> So say it goes:
> T K D
> merrily running ... blocked in wait4
> kill(K, SIGSTOP)
> dequeue SIGSTOP
> -> ptrace_stop
> wait4 -> K,{SIGSTOP}
> kill(K, SIGCONT)
> PTRACE_CONT,K,SIGSTOP
> do_signal_stop(SIGSTOP)
> wait4 -> K,{SIGSTOP}

Thanks Roland.

Yes, that is what I was worrying about.

> It's still probably a worthwhile cleanup to have the logic only in
> get_signal_to_deliver, and to fix the problem you cited. It will take only
> a little extra code to handle the ptrace case too, i.e.
> if (sig_kernel_stop(signr) &&
> current->sighand->action[signr-1] == SIG_DFL &&
> !(current->signal->flags & SIGNAL_GROUP_EXIT))
> current->signal->flags |= SIGNAL_STOP_DEQUEUED;
> ptrace_stop(signr, signr, info);
> if (sig_kernel_stop(signr) && current->exit_code == signr &&
> !(current->signal->flags & SIGNAL_STOP_DEQUEUED) &&
> current->sighand->action[signr-1] == SIG_DFL)
> current->exit_code = 0;

Yes, I also thought about something like this, but tried to avoid because
it adds some complications. OTOH, this is not the fast path.

I'll try to think a bit more about this, and update the patch according
to your comments. Looks like we don't need to check SIGNAL_GROUP_EXIT
here, we are doing this later anyway.

Thanks!

Oleg.

2007-09-04 02:25:01

by Roland McGrath

[permalink] [raw]
Subject: Re: PATCH? fix SIGNAL_STOP_DEQUEUED vs SIGCONT race

> Yes, I also thought about something like this, but tried to avoid because
> it adds some complications. OTOH, this is not the fast path.

I agree it's not real pleasing, and would be glad to find better ideas.

> I'll try to think a bit more about this, and update the patch according
> to your comments. Looks like we don't need to check SIGNAL_GROUP_EXIT
> here, we are doing this later anyway.

I look forward to your next version.


Thanks,
Roland