2006-02-15 17:55:55

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race

copy_process:

attach_pid(p, PIDTYPE_PID, p->pid);
attach_pid(p, PIDTYPE_TGID, p->tgid);

What if kill_proc_info(p->pid) happens in between?

copy_process() holds current->sighand.siglock, so we are safe
in CLONE_THREAD case, because current->sighand == p->sighand.

Otherwise, p->sighand is unlocked, the new process is already
visible to the find_task_by_pid(), but have a copy of parent's
'struct pid' in ->pids[PIDTYPE_TGID].

This means that __group_complete_signal() may hang while doing

do ... while (next_thread() != p)

We can solve this problem if we reverse these 2 attach_pid()s:

attach_pid() does wmb()

group_send_sig_info() calls spin_lock(), which
provides a read barrier. // Yes ?

I don't think we can hit this race in practice, but still.

Signed-off-by: Oleg Nesterov <[email protected]>

--- 2.6.16-rc3/kernel/fork.c~2_HANG 2006-02-15 23:21:51.000000000 +0300
+++ 2.6.16-rc3/kernel/fork.c 2006-02-16 00:03:20.000000000 +0300
@@ -1173,8 +1173,6 @@ static task_t *copy_process(unsigned lon
if (unlikely(p->ptrace & PT_PTRACED))
__ptrace_link(p, current->parent);

- attach_pid(p, PIDTYPE_PID, p->pid);
- attach_pid(p, PIDTYPE_TGID, p->tgid);
if (thread_group_leader(p)) {
p->signal->tty = current->signal->tty;
p->signal->pgrp = process_group(current);
@@ -1184,6 +1182,8 @@ static task_t *copy_process(unsigned lon
if (p->pid)
__get_cpu_var(process_counts)++;
}
+ attach_pid(p, PIDTYPE_TGID, p->tgid);
+ attach_pid(p, PIDTYPE_PID, p->pid);

nr_threads++;
total_forks++;


2006-02-16 19:25:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race

On Wed, Feb 15, 2006 at 10:13:26PM +0300, Oleg Nesterov wrote:
> copy_process:
>
> attach_pid(p, PIDTYPE_PID, p->pid);
> attach_pid(p, PIDTYPE_TGID, p->tgid);
>
> What if kill_proc_info(p->pid) happens in between?

Doesn't your patch 1/2 that expanded the scope of siglock in
copy_process() prevent this from happening?

o A new process is being created on CPU 0, and does the first
attach_pid() in copy_process(), but has not yet done
the second attach_pid().

o Meanwhile, on CPU 1, kill_proc_info() successfully looks up the
new process via find_task_by_pid().

o Also on CPU 1, kill_proc_info() calls group_send_sig_info(),
which checks permissions, locates the sighand structure,
then attempts to acquire siglock.

Given your patch 1/2, CPU 1 cannot proceed until CPU 0 gets
done with the remaining attach_pid() calls.

So, what am I missing this time? ;-)

Thanx, Paul

> copy_process() holds current->sighand.siglock, so we are safe
> in CLONE_THREAD case, because current->sighand == p->sighand.
>
> Otherwise, p->sighand is unlocked, the new process is already
> visible to the find_task_by_pid(), but have a copy of parent's
> 'struct pid' in ->pids[PIDTYPE_TGID].
>
> This means that __group_complete_signal() may hang while doing
>
> do ... while (next_thread() != p)
>
> We can solve this problem if we reverse these 2 attach_pid()s:
>
> attach_pid() does wmb()
>
> group_send_sig_info() calls spin_lock(), which
> provides a read barrier. // Yes ?
>
> I don't think we can hit this race in practice, but still.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 2.6.16-rc3/kernel/fork.c~2_HANG 2006-02-15 23:21:51.000000000 +0300
> +++ 2.6.16-rc3/kernel/fork.c 2006-02-16 00:03:20.000000000 +0300
> @@ -1173,8 +1173,6 @@ static task_t *copy_process(unsigned lon
> if (unlikely(p->ptrace & PT_PTRACED))
> __ptrace_link(p, current->parent);
>
> - attach_pid(p, PIDTYPE_PID, p->pid);
> - attach_pid(p, PIDTYPE_TGID, p->tgid);
> if (thread_group_leader(p)) {
> p->signal->tty = current->signal->tty;
> p->signal->pgrp = process_group(current);
> @@ -1184,6 +1182,8 @@ static task_t *copy_process(unsigned lon
> if (p->pid)
> __get_cpu_var(process_counts)++;
> }
> + attach_pid(p, PIDTYPE_TGID, p->tgid);
> + attach_pid(p, PIDTYPE_PID, p->pid);
>
> nr_threads++;
> total_forks++;
>

2006-02-16 19:38:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race

"Paul E. McKenney" wrote:
>
> On Wed, Feb 15, 2006 at 10:13:26PM +0300, Oleg Nesterov wrote:
> > copy_process:
> >
> > attach_pid(p, PIDTYPE_PID, p->pid);
> > attach_pid(p, PIDTYPE_TGID, p->tgid);
> >
> > What if kill_proc_info(p->pid) happens in between?
>
> Doesn't your patch 1/2 that expanded the scope of siglock in
> copy_process() prevent this from happening?

I think, no. Please see below,

> o A new process is being created on CPU 0, and does the first
> attach_pid() in copy_process(), but has not yet done
> the second attach_pid().
>
> o Meanwhile, on CPU 1, kill_proc_info() successfully looks up the
> new process via find_task_by_pid().
>
> o Also on CPU 1, kill_proc_info() calls group_send_sig_info(),
> which checks permissions, locates the sighand structure,
> then attempts to acquire siglock.

... and takes it. Without CLONE_THREAD (more precisely, CLONE_SIGHAND)
we have different ->sighand for parent (current) and for the new child.

copy_process() holds parents's ->sighand, while group_send_sig_info()
takes child's.

Oleg.

2006-02-16 19:53:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race

On Thu, Feb 16, 2006 at 11:56:12PM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > On Wed, Feb 15, 2006 at 10:13:26PM +0300, Oleg Nesterov wrote:
> > > copy_process:
> > >
> > > attach_pid(p, PIDTYPE_PID, p->pid);
> > > attach_pid(p, PIDTYPE_TGID, p->tgid);
> > >
> > > What if kill_proc_info(p->pid) happens in between?
> >
> > Doesn't your patch 1/2 that expanded the scope of siglock in
> > copy_process() prevent this from happening?
>
> I think, no. Please see below,
>
> > o A new process is being created on CPU 0, and does the first
> > attach_pid() in copy_process(), but has not yet done
> > the second attach_pid().
> >
> > o Meanwhile, on CPU 1, kill_proc_info() successfully looks up the
> > new process via find_task_by_pid().
> >
> > o Also on CPU 1, kill_proc_info() calls group_send_sig_info(),
> > which checks permissions, locates the sighand structure,
> > then attempts to acquire siglock.
>
> ... and takes it. Without CLONE_THREAD (more precisely, CLONE_SIGHAND)
> we have different ->sighand for parent (current) and for the new child.
>
> copy_process() holds parents's ->sighand, while group_send_sig_info()
> takes child's.

Good point!!!

The other thing to think through is tkill on a thread/process while it
is being created. I believe that this is OK, since thread-specific
kill must target a specific thread, so does not do the traversal.

Does this match your understanding?

Thanx, Paul

2006-02-16 20:02:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race

"Paul E. McKenney" wrote:
>
> The other thing to think through is tkill on a thread/process while it
> is being created. I believe that this is OK, since thread-specific
> kill must target a specific thread, so does not do the traversal.
>

Also, tkill was not converted to use rcu_read_lock yet, it still
takes tasklist_lock, so I think it is safe.

Oleg.

2006-02-18 02:06:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race

On Fri, Feb 17, 2006 at 12:20:08AM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > The other thing to think through is tkill on a thread/process while it
> > is being created. I believe that this is OK, since thread-specific
> > kill must target a specific thread, so does not do the traversal.
>
> Also, tkill was not converted to use rcu_read_lock yet, it still
> takes tasklist_lock, so I think it is safe.

I suspect that tkill will eventually need to avoid tasklist_lock... ;-)

Thanx, Paul

2006-02-18 17:01:48

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race

"Paul E. McKenney" wrote:
>
> On Fri, Feb 17, 2006 at 12:20:08AM +0300, Oleg Nesterov wrote:
> > "Paul E. McKenney" wrote:
> > >
> > > The other thing to think through is tkill on a thread/process while it
> > > is being created. I believe that this is OK, since thread-specific
> > > kill must target a specific thread, so does not do the traversal.
> >
> > Also, tkill was not converted to use rcu_read_lock yet, it still
> > takes tasklist_lock, so I think it is safe.
>
> I suspect that tkill will eventually need to avoid tasklist_lock... ;-)

Ok, I am sending a couple of preparation patches for this.

Paul, I didn't beleive you when you started this work. Now I think
we can avoid tasklist AND cleanup the code in many places. I am glad
I was wrong.

Btw,
>
> firing off some steamroller tests on it.

Could you point me to these tests?

Oleg.

2006-02-20 17:49:16

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] fix kill_proc_info() vs fork() theoretical race

On Sat, Feb 18, 2006 at 09:19:36PM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > On Fri, Feb 17, 2006 at 12:20:08AM +0300, Oleg Nesterov wrote:
> > > "Paul E. McKenney" wrote:
> > > >
> > > > The other thing to think through is tkill on a thread/process while it
> > > > is being created. I believe that this is OK, since thread-specific
> > > > kill must target a specific thread, so does not do the traversal.
> > >
> > > Also, tkill was not converted to use rcu_read_lock yet, it still
> > > takes tasklist_lock, so I think it is safe.
> >
> > I suspect that tkill will eventually need to avoid tasklist_lock... ;-)
>
> Ok, I am sending a couple of preparation patches for this.
>
> Paul, I didn't beleive you when you started this work. Now I think
> we can avoid tasklist AND cleanup the code in many places. I am glad
> I was wrong.

And I am very glad that you are working this -- you have found some
approaches that are much better than those I would have come up with!

> Btw,
> >
> > firing off some steamroller tests on it.
>
> Could you point me to these tests?

http://www.rdrop.com/users/paulmck/projects/steamroller/

Contributions of additional tests very welcome!

Thanx, Paul