2008-06-08 17:20:34

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/3] sched: fix TASK_WAKEKILL vs SIGKILL race

schedule() has the special "TASK_INTERRUPTIBLE && signal_pending()" case,
this allows us to do

current->state = TASK_INTERRUPTIBLE;
schedule();

without fear to sleep with pending signal.

However, the code like

current->state = TASK_KILLABLE;
schedule();

is not right, schedule() doesn't take TASK_WAKEKILL into account. This means
that mutex_lock_killable(), wait_for_completion_killable(), down_killable(),
schedule_timeout_killable() can miss SIGKILL (and btw the second SIGKILL has
no effect).

Introduce the new helper, signal_pending_state(), and change schedule() to
use it. Hopefully it will have more users, that is why the task's state is
passed separately.

Note this "__TASK_STOPPED | __TASK_TRACED" check in signal_pending_state().
This is needed to preserve the current behaviour (ptrace_notify). I hope
this check will be removed soon, but this (afaics good) change needs the
separate discussion.

The fast path is "(state & (INTERRUPTIBLE | WAKEKILL)) + signal_pending(p)",
basically the same that schedule() does now. However, this patch of course
bloats schedule().

Signed-off-by: Oleg Nesterov <[email protected]>

include/linux/sched.h | 13 +++++++++++++
kernel/sched.c | 6 ++----
2 files changed, 15 insertions(+), 4 deletions(-)

--- 26-rc2/include/linux/sched.h~1_SP_STATE 2008-06-01 16:44:39.000000000 +0400
+++ 26-rc2/include/linux/sched.h 2008-06-01 16:44:39.000000000 +0400
@@ -2027,6 +2027,19 @@ static inline int fatal_signal_pending(s
return signal_pending(p) && __fatal_signal_pending(p);
}

+static inline int signal_pending_state(long state, struct task_struct *p)
+{
+ if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
+ return 0;
+ if (!signal_pending(p))
+ return 0;
+
+ if (state & (__TASK_STOPPED | __TASK_TRACED))
+ return 0;
+
+ return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
+}
+
static inline int need_resched(void)
{
return unlikely(test_tsk_need_resched(current));
--- 26-rc2/kernel/sched.c~1_SP_STATE 2008-06-08 16:15:25.000000000 +0400
+++ 26-rc2/kernel/sched.c 2008-06-08 16:52:23.000000000 +0400
@@ -4510,12 +4510,10 @@ need_resched_nonpreemptible:
clear_tsk_need_resched(prev);

if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
- if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
- signal_pending(prev))) {
+ if (unlikely(signal_pending_state(prev->state, prev)))
prev->state = TASK_RUNNING;
- } else {
+ else
deactivate_task(rq, prev, 1);
- }
switch_count = &prev->nvcsw;
}


2008-06-27 20:52:33

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 1/3] sched: fix TASK_WAKEKILL vs SIGKILL race

I was also concerned about perturbing the schedule() hot path with the
original patch. With this version making signal_pending_state() inline,
I think it's good.

> Note this "__TASK_STOPPED | __TASK_TRACED" check in signal_pending_state().
> This is needed to preserve the current behaviour (ptrace_notify). I hope
> this check will be removed soon, but this (afaics good) change needs the
> separate discussion.

Agreed. I think it might actually already be safe to drop it, but we can
get to that after this settles.


Thanks,
Roland

2008-06-28 12:36:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/3] sched: fix TASK_WAKEKILL vs SIGKILL race

On 06/09, Roland McGrath wrote:
>
^^^^^
Oops, something is wrong! I received this message today, 06/28.

> > Note this "__TASK_STOPPED | __TASK_TRACED" check in signal_pending_state().
> > This is needed to preserve the current behaviour (ptrace_notify). I hope
> > this check will be removed soon, but this (afaics good) change needs the
> > separate discussion.
>
> Agreed. I think it might actually already be safe to drop it, but we can
> get to that after this settles.

Great! I'll re-send the patch which drops it in a minute.

Could you also look at other patches?

[PATCH 2/3] ptrace: never sleep in TASK_TRACED if SIGKILL'ed
http://marc.info/?l=linux-kernel&m=121362601402886

[PATCH 3/3] ptrace: kill may_ptrace_stop()
http://marc.info/?l=linux-kernel&m=121362601702897

I think ptrace_stop() really needs changes. But I don't know what is
the supposed behaviour of PT_TRACE_EXIT. Should we sleep even if the
task was killed? Should we sleep if the thread was killed because another
one does exit_group() or exec() ?

We can use "SIGNAL_GROUP_EXIT && (group_exit_code & 0x7f)" instead
of signal_group_exit() to be sure that task was killed by the fatal
signal.

Oleg.