2006-02-15 17:55:50

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/2] fix kill_proc_info() vs CLONE_THREAD race

There is a window after copy_process() unlocks ->sighand.siglock
and before it adds the new thread to the thread list.

In that window __group_complete_signal(SIGKILL) will not see the
new thread yet, so this thread will start running while the whole
thread group was supposed to exit.

I beleive we have another good reason to place attach_pid(PID/TGID)
under ->sighand.siglock. We can do the same for

release_task()->__unhash_process()

de_thread()->switch_exec_pids()

After that we don't need tasklist_lock to iterate over the thread
list, and we can simplify things, see for example do_sigaction()
or sys_times().

Signed-off-by: Oleg Nesterov <[email protected]>

--- 2.6.16-rc3/kernel/fork.c~1_KILL 2006-02-15 22:52:07.000000000 +0300
+++ 2.6.16-rc3/kernel/fork.c 2006-02-15 23:21:51.000000000 +0300
@@ -1123,8 +1123,8 @@ static task_t *copy_process(unsigned lon
p->real_parent = current;
p->parent = p->real_parent;

+ spin_lock(&current->sighand->siglock);
if (clone_flags & CLONE_THREAD) {
- spin_lock(&current->sighand->siglock);
/*
* Important: if an exit-all has been started then
* do not create this new thread - the whole thread
@@ -1162,8 +1162,6 @@ static task_t *copy_process(unsigned lon
*/
p->it_prof_expires = jiffies_to_cputime(1);
}
-
- spin_unlock(&current->sighand->siglock);
}

/*
@@ -1189,6 +1187,7 @@ static task_t *copy_process(unsigned lon

nr_threads++;
total_forks++;
+ spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock);
proc_fork_connector(p);
return p;


2006-02-16 19:19:03

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/2] fix kill_proc_info() vs CLONE_THREAD race

On Wed, Feb 15, 2006 at 10:13:24PM +0300, Oleg Nesterov wrote:
> There is a window after copy_process() unlocks ->sighand.siglock
> and before it adds the new thread to the thread list.
>
> In that window __group_complete_signal(SIGKILL) will not see the
> new thread yet, so this thread will start running while the whole
> thread group was supposed to exit.

The fix looks good to me!

> I beleive we have another good reason to place attach_pid(PID/TGID)
> under ->sighand.siglock. We can do the same for
>
> release_task()->__unhash_process()
>
> de_thread()->switch_exec_pids()
>
> After that we don't need tasklist_lock to iterate over the thread
> list, and we can simplify things, see for example do_sigaction()
> or sys_times().

The above proposal would require that we hold siglock during the
traversal, correct? Is that reasonable for non-signal-related traversals?
Or were you thinking of making this change only for signal code?

Thanx, Paul

Acked-by: <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 2.6.16-rc3/kernel/fork.c~1_KILL 2006-02-15 22:52:07.000000000 +0300
> +++ 2.6.16-rc3/kernel/fork.c 2006-02-15 23:21:51.000000000 +0300
> @@ -1123,8 +1123,8 @@ static task_t *copy_process(unsigned lon
> p->real_parent = current;
> p->parent = p->real_parent;
>
> + spin_lock(&current->sighand->siglock);
> if (clone_flags & CLONE_THREAD) {
> - spin_lock(&current->sighand->siglock);
> /*
> * Important: if an exit-all has been started then
> * do not create this new thread - the whole thread
> @@ -1162,8 +1162,6 @@ static task_t *copy_process(unsigned lon
> */
> p->it_prof_expires = jiffies_to_cputime(1);
> }
> -
> - spin_unlock(&current->sighand->siglock);
> }
>
> /*
> @@ -1189,6 +1187,7 @@ static task_t *copy_process(unsigned lon
>
> nr_threads++;
> total_forks++;
> + spin_unlock(&current->sighand->siglock);
> write_unlock_irq(&tasklist_lock);
> proc_fork_connector(p);
> return p;
>

2006-02-16 19:58:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/2] fix kill_proc_info() vs CLONE_THREAD race

"Paul E. McKenney" wrote:
>
> > After that we don't need tasklist_lock to iterate over the thread
> > list, and we can simplify things, see for example do_sigaction()
> > or sys_times().
>
> The above proposal would require that we hold siglock during the
> traversal, correct?

Yes, of course.

> Is that reasonable for non-signal-related traversals?
> Or were you thinking of making this change only for signal code?

Yes, I think it may be useful for non-signal-related traversals.

Currently we need tasklist_lock in order to use next_thread().
I beleive, we can migrate to rcu_read_lock+spinlock(sighand)
in most cases.

Well, next_thread() itself is safe already, but it can return
already zapped threads.

Oleg.