copy_process:
attach_pid(p, PIDTYPE_PID, p->pid);
attach_pid(p, PIDTYPE_TGID, p->tgid);
What if kill_proc_info(p->pid) happens in between?
copy_process() holds current->sighand.siglock, so we are safe
in CLONE_THREAD case, because current->sighand == p->sighand.
Otherwise, p->sighand is unlocked, the new process is already
visible to the find_task_by_pid(), but have a copy of parent's
'struct pid' in ->pids[PIDTYPE_TGID].
This means that __group_complete_signal() may hang while doing
do ... while (next_thread() != p)
We can solve this problem if we reverse these 2 attach_pid()s:
attach_pid() does wmb()
group_send_sig_info() calls spin_lock(), which
provides a read barrier. // Yes ?
I don't think we can hit this race in practice, but still.
Signed-off-by: Oleg Nesterov <[email protected]>
--- 2.6.16-rc3/kernel/fork.c~2_HANG 2006-02-15 23:21:51.000000000 +0300
+++ 2.6.16-rc3/kernel/fork.c 2006-02-16 00:03:20.000000000 +0300
@@ -1173,8 +1173,6 @@ static task_t *copy_process(unsigned lon
if (unlikely(p->ptrace & PT_PTRACED))
__ptrace_link(p, current->parent);
- attach_pid(p, PIDTYPE_PID, p->pid);
- attach_pid(p, PIDTYPE_TGID, p->tgid);
if (thread_group_leader(p)) {
p->signal->tty = current->signal->tty;
p->signal->pgrp = process_group(current);
@@ -1184,6 +1182,8 @@ static task_t *copy_process(unsigned lon
if (p->pid)
__get_cpu_var(process_counts)++;
}
+ attach_pid(p, PIDTYPE_TGID, p->tgid);
+ attach_pid(p, PIDTYPE_PID, p->pid);
nr_threads++;
total_forks++;
On Wed, Feb 15, 2006 at 10:13:26PM +0300, Oleg Nesterov wrote:
> copy_process:
>
> attach_pid(p, PIDTYPE_PID, p->pid);
> attach_pid(p, PIDTYPE_TGID, p->tgid);
>
> What if kill_proc_info(p->pid) happens in between?
Doesn't your patch 1/2 that expanded the scope of siglock in
copy_process() prevent this from happening?
o A new process is being created on CPU 0, and does the first
attach_pid() in copy_process(), but has not yet done
the second attach_pid().
o Meanwhile, on CPU 1, kill_proc_info() successfully looks up the
new process via find_task_by_pid().
o Also on CPU 1, kill_proc_info() calls group_send_sig_info(),
which checks permissions, locates the sighand structure,
then attempts to acquire siglock.
Given your patch 1/2, CPU 1 cannot proceed until CPU 0 gets
done with the remaining attach_pid() calls.
So, what am I missing this time? ;-)
Thanx, Paul
> copy_process() holds current->sighand.siglock, so we are safe
> in CLONE_THREAD case, because current->sighand == p->sighand.
>
> Otherwise, p->sighand is unlocked, the new process is already
> visible to the find_task_by_pid(), but have a copy of parent's
> 'struct pid' in ->pids[PIDTYPE_TGID].
>
> This means that __group_complete_signal() may hang while doing
>
> do ... while (next_thread() != p)
>
> We can solve this problem if we reverse these 2 attach_pid()s:
>
> attach_pid() does wmb()
>
> group_send_sig_info() calls spin_lock(), which
> provides a read barrier. // Yes ?
>
> I don't think we can hit this race in practice, but still.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 2.6.16-rc3/kernel/fork.c~2_HANG 2006-02-15 23:21:51.000000000 +0300
> +++ 2.6.16-rc3/kernel/fork.c 2006-02-16 00:03:20.000000000 +0300
> @@ -1173,8 +1173,6 @@ static task_t *copy_process(unsigned lon
> if (unlikely(p->ptrace & PT_PTRACED))
> __ptrace_link(p, current->parent);
>
> - attach_pid(p, PIDTYPE_PID, p->pid);
> - attach_pid(p, PIDTYPE_TGID, p->tgid);
> if (thread_group_leader(p)) {
> p->signal->tty = current->signal->tty;
> p->signal->pgrp = process_group(current);
> @@ -1184,6 +1182,8 @@ static task_t *copy_process(unsigned lon
> if (p->pid)
> __get_cpu_var(process_counts)++;
> }
> + attach_pid(p, PIDTYPE_TGID, p->tgid);
> + attach_pid(p, PIDTYPE_PID, p->pid);
>
> nr_threads++;
> total_forks++;
>
"Paul E. McKenney" wrote:
>
> On Wed, Feb 15, 2006 at 10:13:26PM +0300, Oleg Nesterov wrote:
> > copy_process:
> >
> > attach_pid(p, PIDTYPE_PID, p->pid);
> > attach_pid(p, PIDTYPE_TGID, p->tgid);
> >
> > What if kill_proc_info(p->pid) happens in between?
>
> Doesn't your patch 1/2 that expanded the scope of siglock in
> copy_process() prevent this from happening?
I think, no. Please see below,
> o A new process is being created on CPU 0, and does the first
> attach_pid() in copy_process(), but has not yet done
> the second attach_pid().
>
> o Meanwhile, on CPU 1, kill_proc_info() successfully looks up the
> new process via find_task_by_pid().
>
> o Also on CPU 1, kill_proc_info() calls group_send_sig_info(),
> which checks permissions, locates the sighand structure,
> then attempts to acquire siglock.
... and takes it. Without CLONE_THREAD (more precisely, CLONE_SIGHAND)
we have different ->sighand for parent (current) and for the new child.
copy_process() holds parents's ->sighand, while group_send_sig_info()
takes child's.
Oleg.
On Thu, Feb 16, 2006 at 11:56:12PM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > On Wed, Feb 15, 2006 at 10:13:26PM +0300, Oleg Nesterov wrote:
> > > copy_process:
> > >
> > > attach_pid(p, PIDTYPE_PID, p->pid);
> > > attach_pid(p, PIDTYPE_TGID, p->tgid);
> > >
> > > What if kill_proc_info(p->pid) happens in between?
> >
> > Doesn't your patch 1/2 that expanded the scope of siglock in
> > copy_process() prevent this from happening?
>
> I think, no. Please see below,
>
> > o A new process is being created on CPU 0, and does the first
> > attach_pid() in copy_process(), but has not yet done
> > the second attach_pid().
> >
> > o Meanwhile, on CPU 1, kill_proc_info() successfully looks up the
> > new process via find_task_by_pid().
> >
> > o Also on CPU 1, kill_proc_info() calls group_send_sig_info(),
> > which checks permissions, locates the sighand structure,
> > then attempts to acquire siglock.
>
> ... and takes it. Without CLONE_THREAD (more precisely, CLONE_SIGHAND)
> we have different ->sighand for parent (current) and for the new child.
>
> copy_process() holds parents's ->sighand, while group_send_sig_info()
> takes child's.
Good point!!!
The other thing to think through is tkill on a thread/process while it
is being created. I believe that this is OK, since thread-specific
kill must target a specific thread, so does not do the traversal.
Does this match your understanding?
Thanx, Paul
"Paul E. McKenney" wrote:
>
> The other thing to think through is tkill on a thread/process while it
> is being created. I believe that this is OK, since thread-specific
> kill must target a specific thread, so does not do the traversal.
>
Also, tkill was not converted to use rcu_read_lock yet, it still
takes tasklist_lock, so I think it is safe.
Oleg.
On Fri, Feb 17, 2006 at 12:20:08AM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > The other thing to think through is tkill on a thread/process while it
> > is being created. I believe that this is OK, since thread-specific
> > kill must target a specific thread, so does not do the traversal.
>
> Also, tkill was not converted to use rcu_read_lock yet, it still
> takes tasklist_lock, so I think it is safe.
I suspect that tkill will eventually need to avoid tasklist_lock... ;-)
Thanx, Paul
"Paul E. McKenney" wrote:
>
> On Fri, Feb 17, 2006 at 12:20:08AM +0300, Oleg Nesterov wrote:
> > "Paul E. McKenney" wrote:
> > >
> > > The other thing to think through is tkill on a thread/process while it
> > > is being created. I believe that this is OK, since thread-specific
> > > kill must target a specific thread, so does not do the traversal.
> >
> > Also, tkill was not converted to use rcu_read_lock yet, it still
> > takes tasklist_lock, so I think it is safe.
>
> I suspect that tkill will eventually need to avoid tasklist_lock... ;-)
Ok, I am sending a couple of preparation patches for this.
Paul, I didn't beleive you when you started this work. Now I think
we can avoid tasklist AND cleanup the code in many places. I am glad
I was wrong.
Btw,
>
> firing off some steamroller tests on it.
Could you point me to these tests?
Oleg.
On Sat, Feb 18, 2006 at 09:19:36PM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > On Fri, Feb 17, 2006 at 12:20:08AM +0300, Oleg Nesterov wrote:
> > > "Paul E. McKenney" wrote:
> > > >
> > > > The other thing to think through is tkill on a thread/process while it
> > > > is being created. I believe that this is OK, since thread-specific
> > > > kill must target a specific thread, so does not do the traversal.
> > >
> > > Also, tkill was not converted to use rcu_read_lock yet, it still
> > > takes tasklist_lock, so I think it is safe.
> >
> > I suspect that tkill will eventually need to avoid tasklist_lock... ;-)
>
> Ok, I am sending a couple of preparation patches for this.
>
> Paul, I didn't beleive you when you started this work. Now I think
> we can avoid tasklist AND cleanup the code in many places. I am glad
> I was wrong.
And I am very glad that you are working this -- you have found some
approaches that are much better than those I would have come up with!
> Btw,
> >
> > firing off some steamroller tests on it.
>
> Could you point me to these tests?
http://www.rdrop.com/users/paulmck/projects/steamroller/
Contributions of additional tests very welcome!
Thanx, Paul