2006-03-08 17:13:45

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] task: Make task list manipulations RCU safe.


While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking
the entire task list.

We already have all of the other RCU guarantees so there is no cost
in doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.

prev_task was killed because it has no users, and using it will
miss new tasks when doing an rcu traversal.

Signed-off-by: Eric W. Biederman <[email protected]>


---

fs/exec.c | 2 +-
include/linux/sched.h | 3 +--
kernel/exit.c | 2 +-
kernel/fork.c | 2 +-
4 files changed, 4 insertions(+), 5 deletions(-)

51621c1a347edd8935a80fdf4c78840e03be5e03
diff --git a/fs/exec.c b/fs/exec.c
index d961639..c01150e 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -713,7 +713,7 @@ static int de_thread(struct task_struct
attach_pid(current, PIDTYPE_PID, current->pid);
attach_pid(current, PIDTYPE_PGID, current->signal->pgrp);
attach_pid(current, PIDTYPE_SID, current->signal->session);
- list_add_tail(&current->tasks, &init_task.tasks);
+ list_add_tail_rcu(&current->tasks, &init_task.tasks);

current->parent = current->real_parent = leader->real_parent;
leader->parent = leader->real_parent = child_reaper;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2856f52..e6fde1a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1187,8 +1187,7 @@ extern void wait_task_inactive(task_t *
#define remove_parent(p) list_del_init(&(p)->sibling)
#define add_parent(p) list_add_tail(&(p)->sibling,&(p)->parent->children)

-#define next_task(p) list_entry((p)->tasks.next, struct task_struct, tasks)
-#define prev_task(p) list_entry((p)->tasks.prev, struct task_struct, tasks)
+#define next_task(p) list_entry(rcu_dereference((p)->tasks.next), struct task_struct, tasks)

#define for_each_process(p) \
for (p = &init_task ; (p = next_task(p)) != &init_task ; )
diff --git a/kernel/exit.c b/kernel/exit.c
index 806a667..47add40 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -56,7 +56,7 @@ static void __unhash_process(struct task
detach_pid(p, PIDTYPE_PGID);
detach_pid(p, PIDTYPE_SID);

- list_del_init(&p->tasks);
+ list_del_rcu(&p->tasks);
__get_cpu_var(process_counts)--;
}

diff --git a/kernel/fork.c b/kernel/fork.c
index d0eebc2..e7eef2e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1210,7 +1210,7 @@ static task_t *copy_process(unsigned lon
attach_pid(p, PIDTYPE_PGID, process_group(p));
attach_pid(p, PIDTYPE_SID, p->signal->session);

- list_add_tail(&p->tasks, &init_task.tasks);
+ list_add_tail_rcu(&p->tasks, &init_task.tasks);
__get_cpu_var(process_counts)++;
}
attach_pid(p, PIDTYPE_TGID, p->tgid);
--
1.2.2.g709a-dirty


2006-03-14 17:39:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] task: Make task list manipulations RCU safe.

Some questions.

first_tgid:
...
for (; pos && pid_alive(pos); pos = next_task(pos))

I think this patch makes this 'pid_alive(pos)' unneeded?

next_tgid:
rcu_read_lock();
pos = start;
if (pid_alive(start))
pos = next_task(start);
if (pid_alive(pos) && (pos != &init_task)) {
get_task_struct(pos);
goto done;
}

The first 'pid_alive()' check is quite understandable.
What about the second one? I beleive, now it is unneeded
as well. The same for first_tid/next_tid.

Also, first_tid() does 'task_lock(leader)' while reading
->signal->count. Why? ->signal is protected by ->siglock,
but we don't need any locks because ->signal is rcu safe.
Same for proc_task_getattr(), s/task_lock/rcu_read_lock/.

Oleg.

2006-03-14 18:08:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] task: Make task list manipulations RCU safe.

Oleg Nesterov <[email protected]> writes:

> Some questions.
>
> first_tgid:
> ...
> for (; pos && pid_alive(pos); pos = next_task(pos))
>
> I think this patch makes this 'pid_alive(pos)' unneeded?

Close. The problem is that we could have slept with the
count elevated on start before we do rcu_read_lock().

> next_tgid:
> rcu_read_lock();
> pos = start;
> if (pid_alive(start))
> pos = next_task(start);
> if (pid_alive(pos) && (pos != &init_task)) {
> get_task_struct(pos);
> goto done;
> }
>
> The first 'pid_alive()' check is quite understandable.
> What about the second one? I beleive, now it is unneeded
> as well. The same for first_tid/next_tid.

Agreed. Since we are guaranteed that ->next will still
be valid we should be able to get this down to a single
pid_alive check. Although I'm not certain I would want
to return a task that had just died from either of these functions.
But I guess the race is there regardless.

> Also, first_tid() does 'task_lock(leader)' while reading
> ->signal->count. Why? ->signal is protected by ->siglock,
> but we don't need any locks because ->signal is rcu safe.
> Same for proc_task_getattr(), s/task_lock/rcu_read_lock/.

Probably my general paranoia. I know I didn't quite grok
rcu at the time I wrote that code, and I could have easily
gotten confused about what task_lock protects. Looks like
I need to generate a patch to cleanup that one.

Eric

2006-03-14 18:29:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] task: Make task list manipulations RCU safe.

"Eric W. Biederman" wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > Some questions.
> >
> > first_tgid:
> > ...
> > for (; pos && pid_alive(pos); pos = next_task(pos))
> >
> > I think this patch makes this 'pid_alive(pos)' unneeded?
>
> Close. The problem is that we could have slept with the
> count elevated on start before we do rcu_read_lock().

Yes, we could have slept. But (unlike next_tgid) this loop
starts from pos=init_task or from pos=find_task_by_pid()
and we are doing find_task_by_pid under rcu_read_lock() ?

Oleg.

2006-03-14 20:17:33

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] task: Make task list manipulations RCU safe.

Oleg Nesterov <[email protected]> writes:

> "Eric W. Biederman" wrote:
>>
>> Oleg Nesterov <[email protected]> writes:
>>
>> > Some questions.
>> >
>> > first_tgid:
>> > ...
>> > for (; pos && pid_alive(pos); pos = next_task(pos))
>> >
>> > I think this patch makes this 'pid_alive(pos)' unneeded?
>>
>> Close. The problem is that we could have slept with the
>> count elevated on start before we do rcu_read_lock().
>
> Yes, we could have slept. But (unlike next_tgid) this loop
> starts from pos=init_task or from pos=find_task_by_pid()
> and we are doing find_task_by_pid under rcu_read_lock() ?

Yes. We are sorry. The bits of code blurred in the discussion.

Eric