2008-01-29 16:37:57

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.

Usually it is, detach_pid() is always called under write_lock(tasklist_lock),
but copy_process() calls free_pid() lockless.

"#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
too ugly and should be removed.

Signed-off-by: Oleg Nesterov <[email protected]>

--- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
+++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
@@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
return p;

bad_fork_free_pid:
- if (pid != &init_struct_pid)
+ if (pid != &init_struct_pid) {
+#ifdef CONFIG_PREEMPT_RCU
+ /*
+ * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
+ * make sure find_pid() is safe under read_lock(tasklist).
+ */
+ write_lock_irq(&tasklist_lock);
+#endif
free_pid(pid);
+#ifdef CONFIG_PREEMPT_RCU
+ write_unlock_irq(&tasklist_lock);
+#endif
+ }
bad_fork_cleanup_namespaces:
exit_task_namespaces(p);
bad_fork_cleanup_keys:


2008-01-29 23:03:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

On Tue, 29 Jan 2008 19:40:19 +0300
Oleg Nesterov <[email protected]> wrote:

> With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply rcu_read_lock(),

I'm suspecting that we have other code which assumes that read_lock, write_lock
and spin_lock imply rcu_read_lock().

I wonder if there are any sane runtime checks we can put in there to find
such problems.

2008-01-29 23:09:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

On Tue, 29 Jan 2008 19:40:19 +0300
Oleg Nesterov <[email protected]> wrote:

> With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.
>
> Usually it is, detach_pid() is always called under write_lock(tasklist_lock),
> but copy_process() calls free_pid() lockless.
>
> "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
> too ugly and should be removed.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
> +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
> @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
> return p;
>
> bad_fork_free_pid:
> - if (pid != &init_struct_pid)
> + if (pid != &init_struct_pid) {
> +#ifdef CONFIG_PREEMPT_RCU
> + /*
> + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> + * make sure find_pid() is safe under read_lock(tasklist).
> + */
> + write_lock_irq(&tasklist_lock);
> +#endif
> free_pid(pid);
> +#ifdef CONFIG_PREEMPT_RCU
> + write_unlock_irq(&tasklist_lock);
> +#endif
> + }
> bad_fork_cleanup_namespaces:
> exit_task_namespaces(p);
> bad_fork_cleanup_keys:

My attempt to understand this change timed out.

kernel/pid.c is full of global but undocumented functions. What are the
locking requirements for free_pid()? free_pid_ns()? If it's just
caller-must-hold-rcu_read_lock() then why not use rcu_read_lock() here?

If the locking is "caller must hold write_lock_irq(tasklist_lock) then the
sole relevant comment in there (in free_pid()) is wrong.

Guys, more maintainable code please?

2008-01-30 02:23:47

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

Andrew Morton <[email protected]> writes:

> On Tue, 29 Jan 2008 19:40:19 +0300
> Oleg Nesterov <[email protected]> wrote:
>
>> With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply
> rcu_read_lock(),
>> but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.
>>
>> Usually it is, detach_pid() is always called under write_lock(tasklist_lock),
>> but copy_process() calls free_pid() lockless.
>>
>> "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
>> too ugly and should be removed.
>>
>> Signed-off-by: Oleg Nesterov <[email protected]>
>>
>> --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
>> +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
>> @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
>> return p;
>>
>> bad_fork_free_pid:
>> - if (pid != &init_struct_pid)
>> + if (pid != &init_struct_pid) {
>> +#ifdef CONFIG_PREEMPT_RCU
>> + /*
>> + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
>> + * make sure find_pid() is safe under read_lock(tasklist).
>> + */
>> + write_lock_irq(&tasklist_lock);
>> +#endif
>> free_pid(pid);
>> +#ifdef CONFIG_PREEMPT_RCU
>> + write_unlock_irq(&tasklist_lock);
>> +#endif
>> + }
>> bad_fork_cleanup_namespaces:
>> exit_task_namespaces(p);
>> bad_fork_cleanup_keys:
>
> My attempt to understand this change timed out.
>
> kernel/pid.c is full of global but undocumented functions. What are the
> locking requirements for free_pid()? free_pid_ns()? If it's just
> caller-must-hold-rcu_read_lock() then why not use rcu_read_lock() here?
>
> If the locking is "caller must hold write_lock_irq(tasklist_lock) then the
> sole relevant comment in there (in free_pid()) is wrong.
>
> Guys, more maintainable code please?

Well I took a quick look.

Yeah this looks complex.
Mutation of the hash table is protected by pidmap_lock.
But attachments of tasks to hash entries is protected task_lock.

And it looks like it has been that way since commit 92476d7fc0326a409ab1d3864a04093a6be9aca7

I thought free_pid did not have any requirements that a lock be held when
it was called, taking all of the needed locks.

Now how read_lock doesn't imply rcu_read_lock is another question.

Anyway I have to run. I will see about looking at this in a bit.

Eric

2008-01-30 03:29:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

Oleg Nesterov <[email protected]> writes:

> With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.
>
> Usually it is, detach_pid() is always called under write_lock(tasklist_lock),
> but copy_process() calls free_pid() lockless.
>
> "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
> too ugly and should be removed.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
> +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
> @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
> return p;
>
> bad_fork_free_pid:
> - if (pid != &init_struct_pid)
> + if (pid != &init_struct_pid) {
> +#ifdef CONFIG_PREEMPT_RCU
> + /*
> + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> + * make sure find_pid() is safe under read_lock(tasklist).
> + */
> + write_lock_irq(&tasklist_lock);
> +#endif
> free_pid(pid);
> +#ifdef CONFIG_PREEMPT_RCU
> + write_unlock_irq(&tasklist_lock);
> +#endif
> + }
> bad_fork_cleanup_namespaces:
> exit_task_namespaces(p);
> bad_fork_cleanup_keys:

Ok. I believe I see what problem you are trying to fix. That
a pid returned from find_pid might disappear if we are not rcu
protected.

This patch in the simplest form is wrong because it is confusing.

We currently appear to have two options.
1) Force all pid hash table access and pid accesses that
do not get a count to be covered under rcu_read_lock.
2) To modify the locking requirements for free_pid to require
the tasklist_lock.

However this second approach is horribly brittle, as it
will break if we ever have intermediate entries in the
hash table protected by pidmap_lock.

Using the tasklist_lock to still guarantee we see the list, the entire
list, and exactly the list for proper implementation of kill to
process groups and sessions still seems sane.

So let's just remove the guarantee of find_pid being usable with
just the tasklist_lock held.

Eric

diff --git a/include/linux/pid.h b/include/linux/pid.h
index e29a900..0ffb8cc 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -100,8 +100,7 @@ struct pid_namespace;
extern struct pid_namespace init_pid_ns;

/*
- * look up a PID in the hash table. Must be called with the tasklist_lock
- * or rcu_read_lock() held.
+ * look up a PID in the hash table. Must be called with the rcu_read_lock() held.
*
* find_pid_ns() finds the pid in the namespace specified
* find_pid() find the pid by its global id, i.e. in the init namespace

2008-01-30 04:56:52

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

On Tue, Jan 29, 2008 at 07:16:50PM -0700, Eric W. Biederman wrote:
> Andrew Morton <[email protected]> writes:
>
> > On Tue, 29 Jan 2008 19:40:19 +0300
> > Oleg Nesterov <[email protected]> wrote:
> >
> >> With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply
> > rcu_read_lock(),
> >> but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.
> >>
> >> Usually it is, detach_pid() is always called under write_lock(tasklist_lock),
> >> but copy_process() calls free_pid() lockless.
> >>
> >> "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
> >> too ugly and should be removed.
> >>
> >> Signed-off-by: Oleg Nesterov <[email protected]>
> >>
> >> --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
> >> +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
> >> @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
> >> return p;
> >>
> >> bad_fork_free_pid:
> >> - if (pid != &init_struct_pid)
> >> + if (pid != &init_struct_pid) {
> >> +#ifdef CONFIG_PREEMPT_RCU
> >> + /*
> >> + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> >> + * make sure find_pid() is safe under read_lock(tasklist).
> >> + */
> >> + write_lock_irq(&tasklist_lock);
> >> +#endif
> >> free_pid(pid);
> >> +#ifdef CONFIG_PREEMPT_RCU
> >> + write_unlock_irq(&tasklist_lock);
> >> +#endif
> >> + }
> >> bad_fork_cleanup_namespaces:
> >> exit_task_namespaces(p);
> >> bad_fork_cleanup_keys:
> >
> > My attempt to understand this change timed out.
> >
> > kernel/pid.c is full of global but undocumented functions. What are the
> > locking requirements for free_pid()? free_pid_ns()? If it's just
> > caller-must-hold-rcu_read_lock() then why not use rcu_read_lock() here?
> >
> > If the locking is "caller must hold write_lock_irq(tasklist_lock) then the
> > sole relevant comment in there (in free_pid()) is wrong.
> >
> > Guys, more maintainable code please?
>
> Well I took a quick look.
>
> Yeah this looks complex.
> Mutation of the hash table is protected by pidmap_lock.
> But attachments of tasks to hash entries is protected task_lock.
>
> And it looks like it has been that way since commit 92476d7fc0326a409ab1d3864a04093a6be9aca7
>
> I thought free_pid did not have any requirements that a lock be held when
> it was called, taking all of the needed locks.
>
> Now how read_lock doesn't imply rcu_read_lock is another question.

Although read_lock() does accidentally imply rcu_read_lock() for
Classic RCU, it no longer does so for preemptible RCU.

But I thought that we had found these -- must have missed some...

Thanx, Paul

> Anyway I have to run. I will see about looking at this in a bit.
>
> Eric

2008-01-30 05:01:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

On Tue, Jan 29, 2008 at 08:24:17PM -0700, Eric W. Biederman wrote:
> Oleg Nesterov <[email protected]> writes:
>
> > With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> > but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.
> >
> > Usually it is, detach_pid() is always called under write_lock(tasklist_lock),
> > but copy_process() calls free_pid() lockless.
> >
> > "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
> > too ugly and should be removed.
> >
> > Signed-off-by: Oleg Nesterov <[email protected]>
> >
> > --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
> > +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
> > @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
> > return p;
> >
> > bad_fork_free_pid:
> > - if (pid != &init_struct_pid)
> > + if (pid != &init_struct_pid) {
> > +#ifdef CONFIG_PREEMPT_RCU
> > + /*
> > + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> > + * make sure find_pid() is safe under read_lock(tasklist).
> > + */
> > + write_lock_irq(&tasklist_lock);
> > +#endif
> > free_pid(pid);
> > +#ifdef CONFIG_PREEMPT_RCU
> > + write_unlock_irq(&tasklist_lock);
> > +#endif
> > + }
> > bad_fork_cleanup_namespaces:
> > exit_task_namespaces(p);
> > bad_fork_cleanup_keys:
>
> Ok. I believe I see what problem you are trying to fix. That
> a pid returned from find_pid might disappear if we are not rcu
> protected.
>
> This patch in the simplest form is wrong because it is confusing.
>
> We currently appear to have two options.
> 1) Force all pid hash table access and pid accesses that
> do not get a count to be covered under rcu_read_lock.
> 2) To modify the locking requirements for free_pid to require
> the tasklist_lock.
>
> However this second approach is horribly brittle, as it
> will break if we ever have intermediate entries in the
> hash table protected by pidmap_lock.
>
> Using the tasklist_lock to still guarantee we see the list, the entire
> list, and exactly the list for proper implementation of kill to
> process groups and sessions still seems sane.
>
> So let's just remove the guarantee of find_pid being usable with
> just the tasklist_lock held.

Makes sense to me -- it is totally permissible to hold rcu_read_lock()
across update code. ;-)

Thanx, Paul

> Eric
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index e29a900..0ffb8cc 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -100,8 +100,7 @@ struct pid_namespace;
> extern struct pid_namespace init_pid_ns;
>
> /*
> - * look up a PID in the hash table. Must be called with the tasklist_lock
> - * or rcu_read_lock() held.
> + * look up a PID in the hash table. Must be called with the rcu_read_lock() held.
> *
> * find_pid_ns() finds the pid in the namespace specified
> * find_pid() find the pid by its global id, i.e. in the init namespace

2008-01-30 09:25:47

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

"Paul E. McKenney" <[email protected]> writes:

> On Tue, Jan 29, 2008 at 08:24:17PM -0700, Eric W. Biederman wrote:
>> Oleg Nesterov <[email protected]> writes:
>>
>> > With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply
> rcu_read_lock(),
>> > but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.
>> >
>> > Usually it is, detach_pid() is always called under
> write_lock(tasklist_lock),
>> > but copy_process() calls free_pid() lockless.
>> >
>> > "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
>> > too ugly and should be removed.
>> >
>> > Signed-off-by: Oleg Nesterov <[email protected]>
>> >
>> > --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
>> > +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
>> > @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
>> > return p;
>> >
>> > bad_fork_free_pid:
>> > - if (pid != &init_struct_pid)
>> > + if (pid != &init_struct_pid) {
>> > +#ifdef CONFIG_PREEMPT_RCU
>> > + /*
>> > + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
>> > + * make sure find_pid() is safe under read_lock(tasklist).
>> > + */
>> > + write_lock_irq(&tasklist_lock);
>> > +#endif
>> > free_pid(pid);
>> > +#ifdef CONFIG_PREEMPT_RCU
>> > + write_unlock_irq(&tasklist_lock);
>> > +#endif
>> > + }
>> > bad_fork_cleanup_namespaces:
>> > exit_task_namespaces(p);
>> > bad_fork_cleanup_keys:
>>
>> Ok. I believe I see what problem you are trying to fix. That
>> a pid returned from find_pid might disappear if we are not rcu
>> protected.
>>
>> This patch in the simplest form is wrong because it is confusing.
>>
>> We currently appear to have two options.
>> 1) Force all pid hash table access and pid accesses that
>> do not get a count to be covered under rcu_read_lock.
>> 2) To modify the locking requirements for free_pid to require
>> the tasklist_lock.
>>
>> However this second approach is horribly brittle, as it
>> will break if we ever have intermediate entries in the
>> hash table protected by pidmap_lock.
>>
>> Using the tasklist_lock to still guarantee we see the list, the entire
>> list, and exactly the list for proper implementation of kill to
>> process groups and sessions still seems sane.
>>
>> So let's just remove the guarantee of find_pid being usable with
>> just the tasklist_lock held.
>
> Makes sense to me -- it is totally permissible to hold rcu_read_lock()
> across update code. ;-)

Let me rephrase so it is clear.

When dealing with pids there is exactly one case where we need
to take read_lock(&tasklist_lock);

Posix (and sanely handling corner cases) requires that when we send a
signal to a process group or a session we have a snapshot in time view
of the entire group. In particular this allows us to send SIGKILL to
every member of the group and to have the entire group die.

For all other cases (like /proc) we can safely and simply use the
rcu_read_lock and it improves scalability.

So for those cases where we are sending a signal to multiple processes
it looks like we need to go in and change the code to say:

read_lock(&tasklist_lock);
rcu_read_lock();
...
rcu_read_unlock();
read_unlock(&tasklist_lock);

Which is all sane, and should be correct and maintainable in the
future and is relatively easy to understand.

Of course if we start with find_get_pid and then later do put_pid we
are also fine. The current code is a little extra confused right now
because we have not completed the pid namespace conversion. Although
we are getting quite close.

Eric

2008-01-30 09:27:52

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

On 01/29, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> > but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.
> >
> > Usually it is, detach_pid() is always called under write_lock(tasklist_lock),
> > but copy_process() calls free_pid() lockless.
> >
> > "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
> > too ugly and should be removed.
> >
> > Signed-off-by: Oleg Nesterov <[email protected]>
> >
> > --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
> > +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
> > @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
> > return p;
> >
> > bad_fork_free_pid:
> > - if (pid != &init_struct_pid)
> > + if (pid != &init_struct_pid) {
> > +#ifdef CONFIG_PREEMPT_RCU
> > + /*
> > + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
> > + * make sure find_pid() is safe under read_lock(tasklist).
> > + */
> > + write_lock_irq(&tasklist_lock);
> > +#endif
> > free_pid(pid);
> > +#ifdef CONFIG_PREEMPT_RCU
> > + write_unlock_irq(&tasklist_lock);
> > +#endif
> > + }
> > bad_fork_cleanup_namespaces:
> > exit_task_namespaces(p);
> > bad_fork_cleanup_keys:
>
> Ok. I believe I see what problem you are trying to fix. That
> a pid returned from find_pid might disappear if we are not rcu
> protected.

Not only.

Any find_pid() is unsafe under tasklist, even the find_pid(1).
Because free_pid() mangles the pid_hash[hash] while find_pid()
scans the same list.

> This patch in the simplest form is wrong because it is confusing.
>
> We currently appear to have two options.
> 1) Force all pid hash table access and pid accesses that
> do not get a count to be covered under rcu_read_lock.

I agree, we can (and should) convert most of these read_lock(tasklist)'s
to rcu_read_lock(). But we have a lot of them.

> 2) To modify the locking requirements for free_pid to require
> the tasklist_lock.
>
> However this second approach is horribly brittle, as it
> will break if we ever have intermediate entries in the
> hash table protected by pidmap_lock.
>
> Using the tasklist_lock to still guarantee we see the list, the entire
> list, and exactly the list for proper implementation of kill to
> process groups and sessions still seems sane.

And this means that attach_pid() and detach_pid() need write_lock(tasklist)
anyway.

So copy_process()->free_pid() is the only case when we modify the pid_hash[]
list without tasklist.

> So let's just remove the guarantee of find_pid being usable with
> just the tasklist_lock held.
>
> Eric
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index e29a900..0ffb8cc 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -100,8 +100,7 @@ struct pid_namespace;
> extern struct pid_namespace init_pid_ns;
>
> /*
> - * look up a PID in the hash table. Must be called with the tasklist_lock
> - * or rcu_read_lock() held.
> + * look up a PID in the hash table. Must be called with the rcu_read_lock() held.

Imho, we should first fix all users of read_lock(tasklist)+find_..._pid().

So I still think this patch makes sense as a trivial fix for now, until
we add the necessary rcu_read_lock()s. However this race is very unlikely,
perhaps we can live with it.

Oleg.

2008-01-30 09:45:39

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

On 01/30, Eric W. Biederman wrote:
>
> "Paul E. McKenney" <[email protected]> writes:
>
> > On Tue, Jan 29, 2008 at 08:24:17PM -0700, Eric W. Biederman wrote:
> >>
> >> Using the tasklist_lock to still guarantee we see the list, the entire
> >> list, and exactly the list for proper implementation of kill to
> >> process groups and sessions still seems sane.
> >>
> >> So let's just remove the guarantee of find_pid being usable with
> >> just the tasklist_lock held.
> >
> > Makes sense to me -- it is totally permissible to hold rcu_read_lock()
> > across update code. ;-)
>
> Let me rephrase so it is clear.
>
> When dealing with pids there is exactly one case where we need
> to take read_lock(&tasklist_lock);

Well, another example is sys_ioprio_set(IOPRIO_WHO_PGRP),

> Posix (and sanely handling corner cases) requires that when we send a
> signal to a process group or a session we have a snapshot in time view
> of the entire group. In particular this allows us to send SIGKILL to
> every member of the group and to have the entire group die.

but you are right of course. tasklist pins group/session/->tasks.

Oleg.

2008-01-30 14:18:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU


On Tue, 2008-01-29 at 15:02 -0800, Andrew Morton wrote:
> On Tue, 29 Jan 2008 19:40:19 +0300
> Oleg Nesterov <[email protected]> wrote:
>
> > With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
>
> I'm suspecting that we have other code which assumes that read_lock, write_lock
> and spin_lock imply rcu_read_lock().
>
> I wonder if there are any sane runtime checks we can put in there to find
> such problems.

I have a lockdep annotation that finds rcu_dereference() usages outside
of rcu_read_lock().

Trouble is the amazing amount of output, I haven't come round to going
through it and annotation the false positives (think rcu safe library
routines called from contexts where the rcu capability is not used).


2008-01-30 18:31:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

Oleg Nesterov <[email protected]> writes:

> On 01/29, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <[email protected]> writes:
>>
>> > With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply
> rcu_read_lock(),
>> > but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist.
>> >
>> > Usually it is, detach_pid() is always called under
> write_lock(tasklist_lock),
>> > but copy_process() calls free_pid() lockless.
>> >
>> > "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is
>> > too ugly and should be removed.
>> >
>> > Signed-off-by: Oleg Nesterov <[email protected]>
>> >
>> > --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300
>> > +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300
>> > @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process(
>> > return p;
>> >
>> > bad_fork_free_pid:
>> > - if (pid != &init_struct_pid)
>> > + if (pid != &init_struct_pid) {
>> > +#ifdef CONFIG_PREEMPT_RCU
>> > + /*
>> > + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(),
>> > + * make sure find_pid() is safe under read_lock(tasklist).
>> > + */
>> > + write_lock_irq(&tasklist_lock);
>> > +#endif
>> > free_pid(pid);
>> > +#ifdef CONFIG_PREEMPT_RCU
>> > + write_unlock_irq(&tasklist_lock);
>> > +#endif
>> > + }
>> > bad_fork_cleanup_namespaces:
>> > exit_task_namespaces(p);
>> > bad_fork_cleanup_keys:
>>
>> Ok. I believe I see what problem you are trying to fix. That
>> a pid returned from find_pid might disappear if we are not rcu
>> protected.
>
> Not only.
>
> Any find_pid() is unsafe under tasklist, even the find_pid(1).
> Because free_pid() mangles the pid_hash[hash] while find_pid()
> scans the same list.

But we do it the rcu way.

So it isn't the mangling that is a problem. Just walking off into
freed memory. That is fundamentally the principle that the rcu
accessors work on.

Any find_pid is in trouble because any intermediate pid in the hash
chain may be freed as we walk the list, and if we don't wait the rcu
interval the pointers we need may go bad.

>> This patch in the simplest form is wrong because it is confusing.
>>
>> We currently appear to have two options.
>> 1) Force all pid hash table access and pid accesses that
>> do not get a count to be covered under rcu_read_lock.
>
> I agree, we can (and should) convert most of these read_lock(tasklist)'s
> to rcu_read_lock(). But we have a lot of them.

We have to touch and recheck all of that code anyway to ensure
the pid namespace stuff is correct and working well.

>> 2) To modify the locking requirements for free_pid to require
>> the tasklist_lock.
>>
>> However this second approach is horribly brittle, as it
>> will break if we ever have intermediate entries in the
>> hash table protected by pidmap_lock.
>>
>> Using the tasklist_lock to still guarantee we see the list, the entire
>> list, and exactly the list for proper implementation of kill to
>> process groups and sessions still seems sane.
>
> And this means that attach_pid() and detach_pid() need write_lock(tasklist)
> anyway.
>
> So copy_process()->free_pid() is the only case when we modify the pid_hash[]
> list without tasklist.

Nope. alloc_pid does also. The pid_hash table is fundamentally not
protected by task_lock, but by pidmap_lock.

task_list only protects detach (by fluke) and going from struct pid
to a task. It doesn't prevent additions to the hash chains at all.

>> So let's just remove the guarantee of find_pid being usable with
>> just the tasklist_lock held.
>>
>> Eric
>>
>> diff --git a/include/linux/pid.h b/include/linux/pid.h
>> index e29a900..0ffb8cc 100644
>> --- a/include/linux/pid.h
>> +++ b/include/linux/pid.h
>> @@ -100,8 +100,7 @@ struct pid_namespace;
>> extern struct pid_namespace init_pid_ns;
>>
>> /*
>> - * look up a PID in the hash table. Must be called with the tasklist_lock
>> - * or rcu_read_lock() held.
>> + * look up a PID in the hash table. Must be called with the rcu_read_lock()
> held.
>
> Imho, we should first fix all users of read_lock(tasklist)+find_..._pid().
>
> So I still think this patch makes sense as a trivial fix for now, until
> we add the necessary rcu_read_lock()s. However this race is very unlikely,
> perhaps we can live with it.

The patch is horrible because it works for all of the wrong reasons,
and obscures what is really going on. That can't be good.

Eric

2008-01-31 09:29:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

On 01/30, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > On 01/29, Eric W. Biederman wrote:
> >>
> >> Ok. I believe I see what problem you are trying to fix. That
> >> a pid returned from find_pid might disappear if we are not rcu
> >> protected.
> >
> > Not only.
> >
> > Any find_pid() is unsafe under tasklist, even the find_pid(1).
> > Because free_pid() mangles the pid_hash[hash] while find_pid()
> > scans the same list.
>
> But we do it the rcu way.
>
> So it isn't the mangling that is a problem. Just walking off into
> freed memory. That is fundamentally the principle that the rcu
> accessors work on.

Yes, this is what I meant.

> >> We currently appear to have two options.
> >> 1) Force all pid hash table access and pid accesses that
> >> do not get a count to be covered under rcu_read_lock.
> >
> > I agree, we can (and should) convert most of these read_lock(tasklist)'s
> > to rcu_read_lock(). But we have a lot of them.
>
> We have to touch and recheck all of that code anyway to ensure
> the pid namespace stuff is correct and working well.

Sure.

> >> 2) To modify the locking requirements for free_pid to require
> >> the tasklist_lock.
> >>
> >> However this second approach is horribly brittle, as it
> >> will break if we ever have intermediate entries in the
> >> hash table protected by pidmap_lock.
> >>
> >> Using the tasklist_lock to still guarantee we see the list, the entire
> >> list, and exactly the list for proper implementation of kill to
> >> process groups and sessions still seems sane.
> >
> > And this means that attach_pid() and detach_pid() need write_lock(tasklist)
> > anyway.
> >
> > So copy_process()->free_pid() is the only case when we modify the pid_hash[]
> > list without tasklist.
>
> Nope. alloc_pid does also.

Doesn't matter, inserts are safe. hlist_add_head_rcu() does wmb()
after the full init, and find_pid() has rmb(). We don't really
need rcu_read_lock() to traverse the rcu-protected list as long as
we know the node can't go away.

> The pid_hash table is fundamentally not
> protected by task_lock, but by pidmap_lock.

Until this patch.

> task_list only protects detach (by fluke) and going from struct pid
> to a task. It doesn't prevent additions to the hash chains at all.

see above.

> >> - * look up a PID in the hash table. Must be called with the tasklist_lock
> >> - * or rcu_read_lock() held.
> >> + * look up a PID in the hash table. Must be called with the rcu_read_lock()
> > held.
> >
> > Imho, we should first fix all users of read_lock(tasklist)+find_..._pid().
> >
> > So I still think this patch makes sense as a trivial fix for now, until
> > we add the necessary rcu_read_lock()s. However this race is very unlikely,
> > perhaps we can live with it.
>
> The patch is horrible because it works for all of the wrong reasons,
> and obscures what is really going on. That can't be good.

Eric, perhaps you misunderstood me. I do agree, the patch is horrible,
I added this awful #ifdef's exactly because this must look as "FIXME!".

Let me explain. Until now, find_pid() under tasklist was legal and documented.
It was actually broken, but happened to work because read_lock() implied
rcu_read_lock(). This patch fixes tasklist+find_pid(), and since PREEMPT_RCU
was merged, imho it could go into 2.6.25.

Now we can change the rules, fix all users, and then revert this hack.

OK. please feel free to do what you think right, but unless you prove
I'm technically wrong I don't understand you. But I won't insist any
longer.

Oleg.

2008-01-31 13:33:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU


* Andrew Morton <[email protected]> wrote:

> On Tue, 29 Jan 2008 19:40:19 +0300
> Oleg Nesterov <[email protected]> wrote:
>
> > With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply
> > rcu_read_lock(),
>
> I'm suspecting that we have other code which assumes that read_lock,
> write_lock and spin_lock imply rcu_read_lock().
>
> I wonder if there are any sane runtime checks we can put in there to
> find such problems.

we usually caught them via the DEBUG_PREEMPT checks on PREEMPT_RT: stuff
that has such implicit reliance tends to use smp_processor_id() along
the way and that gets flagged if the non-preemptability guarantee of
spin_lock() vanishes.

Ingo

2009-12-14 02:15:38

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU

Hello.

Below change (found at http://lkml.org/lkml/2008/1/30/1 ) is not yet applied
as of 2.6.32-git10 . Any reason not to apply?

> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index e29a900..0ffb8cc 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -100,8 +100,7 @@ struct pid_namespace;
> extern struct pid_namespace init_pid_ns;
>
> /*
> - * look up a PID in the hash table. Must be called with the tasklist_lock
> - * or rcu_read_lock() held.
> + * look up a PID in the hash table. Must be called with the rcu_read_lock() held.
> *
> * find_pid_ns() finds the pid in the namespace specified
> * find_pid() find the pid by its global id, i.e. in the init namespace