2009-01-16 05:58:18

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/3] pids: refactor vnr/nr_ns helpers to make them safe

Inho, the safety rules for vnr/nr_ns helpers are horrible and buggy.

task_pid_nr_ns(task) needs rcu/tasklist depending on task == current.

As for "special" pids, vnr/nr_ns helpers always need rcu. However,
if task != current, they are unsafe even under rcu lock, we can't
trust task->group_leader without the special checks.

And almost every helper has a callsite which needs a fix.

Also, it is a bit annoying that the implementations of, say,
task_pgrp_vnr() and task_pgrp_nr_ns() are not "symmetrical".

This patch introduces the new helper, __task_pid_nr_ns(), which is
always safe to use, and turns all other helpers into the trivial
wrappers.

If this patch is acceptable, I'll send another one which converts
task_tgid_xxx() as well, there are a bit special.

Signed-off-by: Oleg Nesterov <[email protected]>

--- CUR/include/linux/sched.h~SP_3_NR 2009-01-16 02:13:50.000000000 +0100
+++ CUR/include/linux/sched.h 2009-01-16 04:16:12.000000000 +0100
@@ -1481,17 +1481,23 @@ struct pid_namespace;
*
* see also pid_nr() etc in include/linux/pid.h
*/
+pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
+ struct pid_namespace *ns);

static inline pid_t task_pid_nr(struct task_struct *tsk)
{
return tsk->pid;
}

-pid_t task_pid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns);
+static inline pid_t task_pid_nr_ns(struct task_struct *tsk,
+ struct pid_namespace *ns)
+{
+ return __task_pid_nr_ns(tsk, PIDTYPE_PID, ns);
+}

static inline pid_t task_pid_vnr(struct task_struct *tsk)
{
- return pid_vnr(task_pid(tsk));
+ return __task_pid_nr_ns(tsk, PIDTYPE_PID, NULL);
}


@@ -1513,11 +1519,15 @@ static inline pid_t task_pgrp_nr(struct
return tsk->signal->__pgrp;
}

-pid_t task_pgrp_nr_ns(struct task_struct *tsk, struct pid_namespace *ns);
+static inline pid_t task_pgrp_nr_ns(struct task_struct *tsk,
+ struct pid_namespace *ns)
+{
+ return __task_pid_nr_ns(tsk, PIDTYPE_PGID, ns);
+}

static inline pid_t task_pgrp_vnr(struct task_struct *tsk)
{
- return pid_vnr(task_pgrp(tsk));
+ return __task_pid_nr_ns(tsk, PIDTYPE_PGID, NULL);
}


@@ -1526,14 +1536,17 @@ static inline pid_t task_session_nr(stru
return tsk->signal->__session;
}

-pid_t task_session_nr_ns(struct task_struct *tsk, struct pid_namespace *ns);
+static inline pid_t task_session_nr_ns(struct task_struct *tsk,
+ struct pid_namespace *ns)
+{
+ return __task_pid_nr_ns(tsk, PIDTYPE_SID, ns);
+}

static inline pid_t task_session_vnr(struct task_struct *tsk)
{
- return pid_vnr(task_session(tsk));
+ return __task_pid_nr_ns(tsk, PIDTYPE_SID, NULL);
}

-
/**
* pid_alive - check that a task structure is not stale
* @p: Task structure to be checked.
--- CUR/kernel/pid.c~SP_3_NR 2009-01-16 02:54:26.000000000 +0100
+++ CUR/kernel/pid.c 2009-01-16 05:40:43.000000000 +0100
@@ -452,11 +452,24 @@ pid_t pid_vnr(struct pid *pid)
}
EXPORT_SYMBOL_GPL(pid_vnr);

-pid_t task_pid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
+pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
+ struct pid_namespace *ns)
{
- return pid_nr_ns(task_pid(tsk), ns);
+ pid_t nr = 0;
+
+ rcu_read_lock();
+ if (!ns)
+ ns = current->nsproxy->pid_ns;
+ if (likely(pid_alive(task))) {
+ if (type != PIDTYPE_PID)
+ task = task->group_leader;
+ nr = pid_nr_ns(task->pids[type].pid, ns);
+ }
+ rcu_read_unlock();
+
+ return nr;
}
-EXPORT_SYMBOL(task_pid_nr_ns);
+EXPORT_SYMBOL(__task_pid_nr_ns);

pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
{
@@ -464,18 +477,6 @@ pid_t task_tgid_nr_ns(struct task_struct
}
EXPORT_SYMBOL(task_tgid_nr_ns);

-pid_t task_pgrp_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
-{
- return pid_nr_ns(task_pgrp(tsk), ns);
-}
-EXPORT_SYMBOL(task_pgrp_nr_ns);
-
-pid_t task_session_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
-{
- return pid_nr_ns(task_session(tsk), ns);
-}
-EXPORT_SYMBOL(task_session_nr_ns);
-
struct pid_namespace *task_active_pid_ns(struct task_struct *tsk)
{
return ns_of_pid(task_pid(tsk));


2009-01-16 10:35:35

by Louis Rilling

[permalink] [raw]
Subject: Re: [PATCH 3/3] pids: refactor vnr/nr_ns helpers to make them safe

Hi Oleg,

On 16/01/09 6:55 +0100, Oleg Nesterov wrote:
> Inho, the safety rules for vnr/nr_ns helpers are horrible and buggy.
>
> task_pid_nr_ns(task) needs rcu/tasklist depending on task == current.
>
> As for "special" pids, vnr/nr_ns helpers always need rcu. However,
> if task != current, they are unsafe even under rcu lock, we can't
> trust task->group_leader without the special checks.
>
> And almost every helper has a callsite which needs a fix.
>
> Also, it is a bit annoying that the implementations of, say,
> task_pgrp_vnr() and task_pgrp_nr_ns() are not "symmetrical".
>
> This patch introduces the new helper, __task_pid_nr_ns(), which is
> always safe to use, and turns all other helpers into the trivial
> wrappers.
>
> If this patch is acceptable, I'll send another one which converts
> task_tgid_xxx() as well, there are a bit special.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>

[...]

> --- CUR/kernel/pid.c~SP_3_NR 2009-01-16 02:54:26.000000000 +0100
> +++ CUR/kernel/pid.c 2009-01-16 05:40:43.000000000 +0100
> @@ -452,11 +452,24 @@ pid_t pid_vnr(struct pid *pid)
> }
> EXPORT_SYMBOL_GPL(pid_vnr);
>
> -pid_t task_pid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
> +pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
> + struct pid_namespace *ns)
> {
> - return pid_nr_ns(task_pid(tsk), ns);
> + pid_t nr = 0;
> +
> + rcu_read_lock();
> + if (!ns)
> + ns = current->nsproxy->pid_ns;
> + if (likely(pid_alive(task))) {

I don't see what this pid_alive() check buys you. Since tasklist_lock is not
enforced, nothing prevents another CPU from detaching the pid right after the
check.

I'm also a bit puzzled by your description with using tasklist_lock when task !=
current, and not seeing tasklist_lock anywhere in the patch. Does this mean that
"safe" is for "no access to freed memory is done, but caller has to take
tasklist_lock or may get 0 as return value"?

Thanks,

Louis

> + if (type != PIDTYPE_PID)
> + task = task->group_leader;
> + nr = pid_nr_ns(task->pids[type].pid, ns);
> + }
> + rcu_read_unlock();
> +
> + return nr;
> }
> -EXPORT_SYMBOL(task_pid_nr_ns);
> +EXPORT_SYMBOL(__task_pid_nr_ns);
>
> pid_t task_tgid_nr_ns(struct task_struct *tsk, struct pid_namespace *ns)
> {

[...]

--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes


Attachments:
(No filename) (2.40 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2009-01-16 20:48:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/3] pids: refactor vnr/nr_ns helpers to make them safe

Hi Louis,

On 01/16, Louis Rilling wrote:
>
> On 16/01/09 6:55 +0100, Oleg Nesterov wrote:
> > + struct pid_namespace *ns)
> > {
> > - return pid_nr_ns(task_pid(tsk), ns);
> > + pid_t nr = 0;
> > +
> > + rcu_read_lock();
> > + if (!ns)
> > + ns = current->nsproxy->pid_ns;
> > + if (likely(pid_alive(task))) {
>
> I don't see what this pid_alive() check buys you. Since tasklist_lock is not
> enforced, nothing prevents another CPU from detaching the pid right after the
> check.

pid_alive() should be renamed. We use it to make sure the task didn't pass
__unhash_process().

Yes, you are right, nothing prevents another CPU from detaching the pid right
after the check. But this is fine: we read ->pids[].pid under rcu_read_lock(),
and if it is NULL pid_nr_ns() returns. So, we don't need pid_alive() check at
all.

However, we can not use task->group_leader unless we verify the task is still
alive. That is why we need this check. We do not clear ->group_leader when
the task exits, so we can't do

rcu_read_lock();
if (task->group_leader)
do_something(task->group_leader);
rcu_unread_lock();

Instead we use pid_alive() before using ->group_leader.

> I'm also a bit puzzled by your description with using tasklist_lock when task !=
> current, and not seeing tasklist_lock anywhere in the patch. Does this mean that
> "safe" is for "no access to freed memory is done, but caller has to take
> tasklist_lock or may get 0 as return value"?

I am not sure I understand the question...

This patch doesn't use tasklist, it relies on rcu. With this patch the caller
doesn't need tasklist/rcu to call these helpers (but of course, the caller
must ensure that task_struct is stable).

But, whatever the caller does, it can get 0 as return value anyway if the
task exists, this is correct. Or I misunderstood you?

Oleg.

2009-01-16 22:22:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/3] pids: refactor vnr/nr_ns helpers to make them safe

On Fri, 16 Jan 2009 06:55:20 +0100
Oleg Nesterov <[email protected]> wrote:

> --- CUR/include/linux/sched.h~SP_3_NR 2009-01-16 02:13:50.000000000 +0100
> +++ CUR/include/linux/sched.h 2009-01-16 04:16:12.000000000 +0100
> @@ -1481,17 +1481,23 @@ struct pid_namespace;
> *
> * see also pid_nr() etc in include/linux/pid.h
> */
> +pid_t __task_pid_nr_ns(struct task_struct *task, enum pid_type type,
> + struct pid_namespace *ns);

Sometime it would be nice to pull all the pid-related helpers out of
sched.h, into a dedicated header file which is included only by those
.c files which actually use them.

2009-01-17 11:58:55

by Louis Rilling

[permalink] [raw]
Subject: Re: [PATCH 3/3] pids: refactor vnr/nr_ns helpers to make them safe

On Fri, Jan 16, 2009 at 09:45:40PM +0100, Oleg Nesterov wrote:
> Hi Louis,
>
> On 01/16, Louis Rilling wrote:
> >
> > On 16/01/09 6:55 +0100, Oleg Nesterov wrote:
> > > + struct pid_namespace *ns)
> > > {
> > > - return pid_nr_ns(task_pid(tsk), ns);
> > > + pid_t nr = 0;
> > > +
> > > + rcu_read_lock();
> > > + if (!ns)
> > > + ns = current->nsproxy->pid_ns;
> > > + if (likely(pid_alive(task))) {
> >
> > I don't see what this pid_alive() check buys you. Since tasklist_lock is not
> > enforced, nothing prevents another CPU from detaching the pid right after the
> > check.
>
> pid_alive() should be renamed. We use it to make sure the task didn't pass
> __unhash_process().
>
> Yes, you are right, nothing prevents another CPU from detaching the pid right
> after the check. But this is fine: we read ->pids[].pid under rcu_read_lock(),
> and if it is NULL pid_nr_ns() returns. So, we don't need pid_alive() check at
> all.
>
> However, we can not use task->group_leader unless we verify the task is still
> alive. That is why we need this check. We do not clear ->group_leader when
> the task exits, so we can't do
>
> rcu_read_lock();
> if (task->group_leader)
> do_something(task->group_leader);
> rcu_unread_lock();
>
> Instead we use pid_alive() before using ->group_leader.

Ok I see now. Since RCU is locked and pid_alive(task) has been true at
some point, task->group_leader cannot be freed because release_task()
does not release task->group_leader before releasing task. IOW
release_task() can't have started releasing task->group_leader before
rcu_read_lock().

Thank you for your explanation!

>
> > I'm also a bit puzzled by your description with using tasklist_lock when task !=
> > current, and not seeing tasklist_lock anywhere in the patch. Does this mean that
> > "safe" is for "no access to freed memory is done, but caller has to take
> > tasklist_lock or may get 0 as return value"?
>
> I am not sure I understand the question...
>
> This patch doesn't use tasklist, it relies on rcu. With this patch the caller
> doesn't need tasklist/rcu to call these helpers (but of course, the caller
> must ensure that task_struct is stable).
>
> But, whatever the caller does, it can get 0 as return value anyway if the
> task exists, this is correct. Or I misunderstood you?

My question was probably badly phrased. When reading the patch
description I thought that the point was to fix the helpers so that
tasklist_lock would be taken whenever task != current. Of course your
patch does not do this, and I'm perfectly fine with this.

Thanks,

Louis

--
Dr Louis Rilling Kerlabs - IRISA
Skype: louis.rilling Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52 Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71 35042 Rennes CEDEX - France
http://www.kerlabs.com/