2006-02-06 15:28:39

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] fix kill_proc_info() vs copy_process() race

kill_proc_info() takes taskllist_lock only for sig_kernel_stop()
case nowadays. I beleive it races with copy_process().

The first race is simple, copy_process:

/*
* Check for pending SIGKILL! The new thread should not be allowed
* to slip out of an OOM kill. (or normal SIGKILL.)
*/
if (sigismember(&current->pending.signal, SIGKILL))
return EINTR;

This relies on tasklist_lock and is racy now.


The second race is more tricky, copy_process:

attach_pid(p, PIDTYPE_PID, p->pid);
attach_pid(p, PIDTYPE_TGID, p->tgid);

This means that we can find a task in kill_proc_info()->find_task_by_pid()
which is not registered as part of thread group yet. Various bad things can
happen, note that handle_stop_signal(SIGCONT) and __group_complete_signal()
iterate over threads list. But p->pids[PIDTYPE_TGID] is a copy of current's
'struct pid' from dup_task_struct(), and if we don't have CLONE_THREAD here
we will use completely unreleated (parent's) thread list.

I think we can solve these problems by enlarging a ->siglock's scope in
copy_process(), but I don't know how to test this patch.

NOTE: release_task()->__unhash_process() path is safe, we already have
->sighand == NULL while detaching PIDTYPE_PID/PIDTYPE_TGID nonatomically.

Unless I missed something, I personally think this is a 2.6.16 material.

Signed-off-by: Oleg Nesterov <[email protected]>

--- RC-1/kernel/fork.c~ 2006-02-06 22:04:40.000000000 +0300
+++ RC-1/kernel/fork.c 2006-02-06 22:11:51.000000000 +0300
@@ -1050,7 +1050,7 @@ static task_t *copy_process(unsigned lon
sched_fork(p, clone_flags);

/* Need tasklist lock for parent etc handling! */
- write_lock_irq(&tasklist_lock);
+ write_lock(&tasklist_lock);

/*
* The task hasn't been attached yet, so its cpus_allowed mask will
@@ -1066,33 +1066,34 @@ static task_t *copy_process(unsigned lon
!cpu_online(task_cpu(p))))
set_task_cpu(p, smp_processor_id());

+ /* CLONE_PARENT re-uses the old parent */
+ if (clone_flags & (CLONE_PARENT|CLONE_THREAD))
+ p->real_parent = current->real_parent;
+ else
+ p->real_parent = current;
+ p->parent = p->real_parent;
+
+ spin_lock_irq(&current->sighand->siglock);
/*
* Check for pending SIGKILL! The new thread should not be allowed
* to slip out of an OOM kill. (or normal SIGKILL.)
*/
if (sigismember(&current->pending.signal, SIGKILL)) {
- write_unlock_irq(&tasklist_lock);
+ spin_unlock_irq(&current->sighand->siglock);
+ write_unlock(&tasklist_lock);
retval = -EINTR;
goto bad_fork_cleanup_namespace;
}

- /* CLONE_PARENT re-uses the old parent */
- if (clone_flags & (CLONE_PARENT|CLONE_THREAD))
- p->real_parent = current->real_parent;
- else
- p->real_parent = current;
- p->parent = p->real_parent;
-
if (clone_flags & CLONE_THREAD) {
- spin_lock(&current->sighand->siglock);
/*
* Important: if an exit-all has been started then
* do not create this new thread - the whole thread
* group is supposed to exit anyway.
*/
if (current->signal->flags & SIGNAL_GROUP_EXIT) {
- spin_unlock(&current->sighand->siglock);
- write_unlock_irq(&tasklist_lock);
+ spin_unlock_irq(&current->sighand->siglock);
+ write_unlock(&tasklist_lock);
retval = -EAGAIN;
goto bad_fork_cleanup_namespace;
}
@@ -1122,8 +1123,6 @@ static task_t *copy_process(unsigned lon
*/
p->it_prof_expires = jiffies_to_cputime(1);
}
-
- spin_unlock(&current->sighand->siglock);
}

/*
@@ -1151,7 +1150,9 @@ static task_t *copy_process(unsigned lon

nr_threads++;
total_forks++;
- write_unlock_irq(&tasklist_lock);
+ spin_unlock_irq(&current->sighand->siglock);
+ write_unlock(&tasklist_lock);
+
proc_fork_connector(p);
return p;


2006-02-06 15:53:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] fix kill_proc_info() vs copy_process() race

Oleg Nesterov wrote:
>
> This means that we can find a task in kill_proc_info()->find_task_by_pid()
> which is not registered as part of thread group yet. Various bad things can
> happen, note that handle_stop_signal(SIGCONT) and __group_complete_signal()
> iterate over threads list. But p->pids[PIDTYPE_TGID] is a copy of current's
> 'struct pid' from dup_task_struct(), and if we don't have CLONE_THREAD here
> we will use completely unreleated (parent's) thread list.
>
> I think we can solve these problems by enlarging a ->siglock's scope in
> copy_process(), but I don't know how to test this patch.
>
> NOTE: release_task()->__unhash_process() path is safe, we already have
> ->sighand == NULL while detaching PIDTYPE_PID/PIDTYPE_TGID nonatomically.

Sorry, I was wrong. Without CLONE_THREAD current->sighand.siglock can't help,
we need p->sighand.siglock, I beleive.

Am I correct that the bug exists at least?

Oleg.

2006-02-06 19:08:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] fix kill_proc_info() vs copy_process() race

Oleg Nesterov wrote:
>
> Sorry, I was wrong. Without CLONE_THREAD current->sighand.siglock can't help,
> we need p->sighand.siglock, I beleive.

Also, it is stupid to do write_lock(&tasklist_lock) without clearing irqs.

Ok, may be something like (untested, for review only) patch below ?

attach_pid() does wmb(), group_send_sig_info()->rcu_dereference() calls
rmb(), so we can just reverse PIDTYPE_PID/PIDTYPE_TGID attaching?

Note that now we check sigismember(->pending, SIGKILL) holding both
tasklist and ->sighand.siglock, this means we can kill SIGNAL_GROUP_EXIT
check under 'if (clone_flags & CLONE_THREAD)':

__group_complete_signal() and zap_other_threads() need at least
->sighand.siglock and they send SIGKILL without unlocking.

We can miss SIGNAL_GROUP_EXIT from do_coredump(), but it is possible
anyway. The new thread will be killed later, from zap_threads().

What do you think?

Oleg.

--- RC-1/kernel/fork.c~ 2006-02-07 01:41:14.000000000 +0300
+++ RC-1/kernel/fork.c 2006-02-07 02:13:10.000000000 +0300
@@ -1066,11 +1066,13 @@ static task_t *copy_process(unsigned lon
!cpu_online(task_cpu(p))))
set_task_cpu(p, smp_processor_id());

+ spin_lock(&current->sighand->siglock);
/*
* Check for pending SIGKILL! The new thread should not be allowed
* to slip out of an OOM kill. (or normal SIGKILL.)
*/
if (sigismember(&current->pending.signal, SIGKILL)) {
+ spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock);
retval = -EINTR;
goto bad_fork_cleanup_namespace;
@@ -1084,18 +1086,6 @@ static task_t *copy_process(unsigned lon
p->parent = p->real_parent;

if (clone_flags & CLONE_THREAD) {
- spin_lock(&current->sighand->siglock);
- /*
- * Important: if an exit-all has been started then
- * do not create this new thread - the whole thread
- * group is supposed to exit anyway.
- */
- if (current->signal->flags & SIGNAL_GROUP_EXIT) {
- spin_unlock(&current->sighand->siglock);
- write_unlock_irq(&tasklist_lock);
- retval = -EAGAIN;
- goto bad_fork_cleanup_namespace;
- }
p->group_leader = current->group_leader;

if (current->signal->group_stop_count > 0) {
@@ -1122,8 +1112,6 @@ static task_t *copy_process(unsigned lon
*/
p->it_prof_expires = jiffies_to_cputime(1);
}
-
- spin_unlock(&current->sighand->siglock);
}

/*
@@ -1135,8 +1123,8 @@ static task_t *copy_process(unsigned lon
if (unlikely(p->ptrace & PT_PTRACED))
__ptrace_link(p, current->parent);

- attach_pid(p, PIDTYPE_PID, p->pid);
attach_pid(p, PIDTYPE_TGID, p->tgid);
+ attach_pid(p, PIDTYPE_PID, p->pid);
if (thread_group_leader(p)) {
p->signal->tty = current->signal->tty;
p->signal->pgrp = process_group(current);
@@ -1149,6 +1137,7 @@ static task_t *copy_process(unsigned lon

nr_threads++;
total_forks++;
+ spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock);
proc_fork_connector(p);
return p;

2006-02-14 22:31:54

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] fix kill_proc_info() vs copy_process() race

On Mon, Feb 06, 2006 at 07:45:48PM +0300, Oleg Nesterov wrote:
> kill_proc_info() takes taskllist_lock only for sig_kernel_stop()
> case nowadays. I beleive it races with copy_process().

For SIGCONT as well as well as sig_kernel_stop(), but yes, tasklist_lock
is now acquired only sometimes. (And please accept my apologies for
the late reply -- trouble with my local email system that I could not
fix from 8,000 miles away...)

> The first race is simple, copy_process:
>
> /*
> * Check for pending SIGKILL! The new thread should not be allowed
> * to slip out of an OOM kill. (or normal SIGKILL.)
> */
> if (sigismember(&current->pending.signal, SIGKILL))
> return EINTR;
>
> This relies on tasklist_lock and is racy now.

Agreed, but is the race any worse than it was before? Since SIGKILL is
fatal, the bit can be set but never cleared. My belief, quite possibly
mistaken, is that this is a performance issue rather than a correctness
issue -- we would like to avoid the overhead of a fork() for a "walking
dead" process.

If my belief is incorrect, one fix would be to add SIGKILL to the checks
in kill_proc_info().

> The second race is more tricky, copy_process:
>
> attach_pid(p, PIDTYPE_PID, p->pid);
> attach_pid(p, PIDTYPE_TGID, p->tgid);
>
> This means that we can find a task in kill_proc_info()->find_task_by_pid()
> which is not registered as part of thread group yet. Various bad things can
> happen, note that handle_stop_signal(SIGCONT) and __group_complete_signal()
> iterate over threads list. But p->pids[PIDTYPE_TGID] is a copy of current's
> 'struct pid' from dup_task_struct(), and if we don't have CLONE_THREAD here
> we will use completely unreleated (parent's) thread list.

But handle_stop_signal() will not do anything except for sig_kernel_stop(),
SIGCONT, and SIGKILL. The first two will have tasklist_lock held just
as before. The latter (SIGKILL) does not iterate over anything, instead
it only sets p->signal->flags to zero.

On __group_complete_signal(), just to make sure I understand... For this
to happen, we would have to kill() a child process while it was being
fork()ed, but before the parent was aware what the child's pid was, right?
And the race you are noting is that we might find the embryo process just
as it is checking for thread_group_leader(p), but before p->signal->tty,
p->signal_pgrp, and p->signal->session are initialized?

> I think we can solve these problems by enlarging a ->siglock's scope in
> copy_process(), but I don't know how to test this patch.

I have not convinced myself that this is a bug, but it might well
be. The reason I am unconvinced is that if we have not done all the
attach_pid()s, the signal should not be able to find us. Yes, we do
have a copy of the parent's p->pids[PIDTYPE_TGID], but this process is
not linked into the lists -- the process can find the parent, but not
vice versa.

But I could easily be missing something, still a bit jetlagged. Could
you please lay out the exact sequence of events in the scenario that you
are thinking of?

And if there is a real problem, is it possible to fix it by changing
the order of the attach_pid() calls?

> NOTE: release_task()->__unhash_process() path is safe, we already have
> ->sighand == NULL while detaching PIDTYPE_PID/PIDTYPE_TGID nonatomically.

Agreed.

Thanx, Paul

> Unless I missed something, I personally think this is a 2.6.16 material.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- RC-1/kernel/fork.c~ 2006-02-06 22:04:40.000000000 +0300
> +++ RC-1/kernel/fork.c 2006-02-06 22:11:51.000000000 +0300
> @@ -1050,7 +1050,7 @@ static task_t *copy_process(unsigned lon
> sched_fork(p, clone_flags);
>
> /* Need tasklist lock for parent etc handling! */
> - write_lock_irq(&tasklist_lock);
> + write_lock(&tasklist_lock);
>
> /*
> * The task hasn't been attached yet, so its cpus_allowed mask will
> @@ -1066,33 +1066,34 @@ static task_t *copy_process(unsigned lon
> !cpu_online(task_cpu(p))))
> set_task_cpu(p, smp_processor_id());
>
> + /* CLONE_PARENT re-uses the old parent */
> + if (clone_flags & (CLONE_PARENT|CLONE_THREAD))
> + p->real_parent = current->real_parent;
> + else
> + p->real_parent = current;
> + p->parent = p->real_parent;
> +
> + spin_lock_irq(&current->sighand->siglock);
> /*
> * Check for pending SIGKILL! The new thread should not be allowed
> * to slip out of an OOM kill. (or normal SIGKILL.)
> */
> if (sigismember(&current->pending.signal, SIGKILL)) {
> - write_unlock_irq(&tasklist_lock);
> + spin_unlock_irq(&current->sighand->siglock);
> + write_unlock(&tasklist_lock);
> retval = -EINTR;
> goto bad_fork_cleanup_namespace;
> }
>
> - /* CLONE_PARENT re-uses the old parent */
> - if (clone_flags & (CLONE_PARENT|CLONE_THREAD))
> - p->real_parent = current->real_parent;
> - else
> - p->real_parent = current;
> - p->parent = p->real_parent;
> -
> if (clone_flags & CLONE_THREAD) {
> - spin_lock(&current->sighand->siglock);
> /*
> * Important: if an exit-all has been started then
> * do not create this new thread - the whole thread
> * group is supposed to exit anyway.
> */
> if (current->signal->flags & SIGNAL_GROUP_EXIT) {
> - spin_unlock(&current->sighand->siglock);
> - write_unlock_irq(&tasklist_lock);
> + spin_unlock_irq(&current->sighand->siglock);
> + write_unlock(&tasklist_lock);
> retval = -EAGAIN;
> goto bad_fork_cleanup_namespace;
> }
> @@ -1122,8 +1123,6 @@ static task_t *copy_process(unsigned lon
> */
> p->it_prof_expires = jiffies_to_cputime(1);
> }
> -
> - spin_unlock(&current->sighand->siglock);
> }
>
> /*
> @@ -1151,7 +1150,9 @@ static task_t *copy_process(unsigned lon
>
> nr_threads++;
> total_forks++;
> - write_unlock_irq(&tasklist_lock);
> + spin_unlock_irq(&current->sighand->siglock);
> + write_unlock(&tasklist_lock);
> +
> proc_fork_connector(p);
> return p;
>

2006-02-15 12:48:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] fix kill_proc_info() vs copy_process() race

Paul E. McKenney wrote:
>
> On Mon, Feb 06, 2006 at 07:45:48PM +0300, Oleg Nesterov wrote:
> > The first race is simple, copy_process:
> >
> > /*
> > * Check for pending SIGKILL! The new thread should not be allowed
> > * to slip out of an OOM kill. (or normal SIGKILL.)
> > */
> > if (sigismember(&current->pending.signal, SIGKILL))
> > return EINTR;
> >
> > This relies on tasklist_lock and is racy now.
>
> Agreed, but is the race any worse than it was before? Since SIGKILL is
> fatal, the bit can be set but never cleared. My belief, quite possibly
> mistaken, is that this is a performance issue rather than a correctness
> issue -- we would like to avoid the overhead of a fork() for a "walking
> dead" process.

My apologies, I was very unclear. I talked about this check because I tried
to unify it with 'if (SIGNAL_GROUP_EXIT)' below. Let me try again.

copy_process(CLONE_THREAD) __group_complete_signal(SIGKILL)

lock(->sighand);
if (->signal->flags & SIGNAL_GROUP_EXIT) // NO
...abort forking...
unlock(->sighand)

->signal->flags = SIGNAL_GROUP_EXIT;
// does not see the new thread yet
for_each_thread_in_thread_group(t) {
sigaddset(t->pending, SIGKILL);
signal_wake_up(t);
}

... finish clone ...


The new thread starts without TIF_SIGPENDING. When any of other threads calls
get_signal_to_deliver() it will notice SIGKILL and call do_group_exit(), which
does:

if (SIGNAL_GROUP_EXIT) // Yes, was set in group_complete_signal()
// don't call zap_other_threads()
do_exit();

So, thread group missed SIGKILL. The new thread runs with SIGNAL_GROUP_EXIT set
and has SIGKILL in ->shared_pending, so it can't be killed via sys_kill(SIGKILL),
and it can't be stopped.

This is not fatal, we can kill this thread via tkill(), even if it blocked other
signals, but still this is a bug (if I am right).

> > The second race is more tricky, copy_process:
> >
> > attach_pid(p, PIDTYPE_PID, p->pid);
> > attach_pid(p, PIDTYPE_TGID, p->tgid);
> >
> > This means that we can find a task in kill_proc_info()->find_task_by_pid()
> > which is not registered as part of thread group yet. Various bad things can
> > happen, note that handle_stop_signal(SIGCONT) and __group_complete_signal()
> > iterate over threads list. But p->pids[PIDTYPE_TGID] is a copy of current's
> > 'struct pid' from dup_task_struct(), and if we don't have CLONE_THREAD here
> > we will use completely unreleated (parent's) thread list.
>
> But I could easily be missing something, still a bit jetlagged. Could
> you please lay out the exact sequence of events in the scenario that you
> are thinking of?

Let's suppose that process with pid == 1000 does fork (no CLONE_THREAD bit),
and a bad boy does sys_kill(1001, SIGXXX)

copy_process:

// it is possible that p->pid == 1001
attach_pid(p, PIDTYPE_PID, p->pid);


kill_proc_info:

p = find_task_by_pid(1001); // Found!

__group_complete_signal(p):

// iterate over thread group
do {
...
} while (next_thread(t) != p)

The (one of) problem is that this loop never stops: next_thread() will iterate
over parent's threads list, because p have a copy of the parent's pids[PIDTYPE_TGID],
and p is not a member of this thread group. Unless I missed something, we have
an endless loop with interrupts disabled.

> And if there is a real problem, is it possible to fix it by changing
> the order of the attach_pid() calls?

I think yes, and I did exactly that in my next attempt to fix this problem.

Oleg.