2006-02-02 17:46:27

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] pidhash: Kill switch_exec_pids


Andrew my apologies for the patch thrash, but I'm not certain
which patches are in your tree at the moment (I know there are
things that don't appear in -mm4) or I would send you an incremental
patch. So this is a clean patch against linus's tree. After some
more review and seeing how my patches intersected with Oleg's it
appears the sane thing is to just kill switch_exec_pids().

switch_exec_pids is only called from de_thread by way of exec, and it
is only called when we are exec'ing from a non thread group leader.

Currently switch_exec_pids gives the leader the pid of the thread and
unhashes and rehashes all of the process groups. The leader is
already in the EXIT_DEAD state so no one cares about it's pids. The
only concern for the leader is that __unhash_process called from release_task
will function correctly. If we don't touch the leader at all we know
that __unhash_process will work fine so there is no need to touch the leader.

For the task becomming the thread group leader, we just need to
give it the pid of the old thread group leader, add it to the task list,
and attach it to the session and the process group of the thread group.

Currently de_thread is also adding the task to the task list which
is just silly.

Currently the only leader of __detach_pid besides detach_pid is
switch_exec_pids because of the ugly extra work that was being
performed.

So this patch removes switch_exec_pids because it is doing too much,
it is creating an unnecessary special case in pid.c, duing work duplicated
in de_thread, and generally obscuring what it is going on.

The necessary work is added to de_thread, and it seems to be a little
clearer there what is going on.

Signed-off-by: Eric W. Biederman <[email protected]>


---

fs/exec.c | 14 +++++++++++---
include/linux/pid.h | 1 -
kernel/pid.c | 30 ------------------------------
3 files changed, 11 insertions(+), 34 deletions(-)

e418b6a6f92419d947be8cc5778f1ad2b4d71275
diff --git a/fs/exec.c b/fs/exec.c
index 055378d..56d60c4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -699,7 +699,17 @@ static int de_thread(struct task_struct
remove_parent(current);
remove_parent(leader);

- switch_exec_pids(leader, current);
+
+ /* Become a process group leader with the old leader's pid.
+ * Note: The old leader also uses thispid until release_task
+ * is called. Odd but simple and correct.
+ */
+ detach_pid(current, PIDTYPE_PID);
+ current->pid = leader->pid;
+ attach_pid(current, PIDTYPE_PID, current->pid);
+ attach_pid(current, PIDTYPE_PGID, current->signal->pgrp);
+ attach_pid(current, PIDTYPE_SID, current->signal->session);
+ list_add_tail(&current->tasks, &init_task.tasks);

current->parent = current->real_parent = leader->real_parent;
leader->parent = leader->real_parent = child_reaper;
@@ -713,8 +723,6 @@ static int de_thread(struct task_struct
__ptrace_link(current, parent);
}

- list_del(&current->tasks);
- list_add_tail(&current->tasks, &init_task.tasks);
current->exit_signal = SIGCHLD;

BUG_ON(leader->exit_state != EXIT_ZOMBIE);
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 5b2fcb1..099e70e 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -38,7 +38,6 @@ extern struct pid *FASTCALL(find_pid(enu

extern int alloc_pidmap(void);
extern void FASTCALL(free_pidmap(int));
-extern void switch_exec_pids(struct task_struct *leader, struct task_struct *thread);

#define do_each_task_pid(who, type, task) \
if ((task = find_task_by_pid_type(type, who))) { \
diff --git a/kernel/pid.c b/kernel/pid.c
index 1acc072..7781d99 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -218,36 +218,6 @@ task_t *find_task_by_pid_type(int type,
EXPORT_SYMBOL(find_task_by_pid_type);

/*
- * This function switches the PIDs if a non-leader thread calls
- * sys_execve() - this must be done without releasing the PID.
- * (which a detach_pid() would eventually do.)
- */
-void switch_exec_pids(task_t *leader, task_t *thread)
-{
- __detach_pid(leader, PIDTYPE_PID);
- __detach_pid(leader, PIDTYPE_TGID);
- __detach_pid(leader, PIDTYPE_PGID);
- __detach_pid(leader, PIDTYPE_SID);
-
- __detach_pid(thread, PIDTYPE_PID);
- __detach_pid(thread, PIDTYPE_TGID);
-
- leader->pid = leader->tgid = thread->pid;
- thread->pid = thread->tgid;
-
- attach_pid(thread, PIDTYPE_PID, thread->pid);
- attach_pid(thread, PIDTYPE_TGID, thread->tgid);
- attach_pid(thread, PIDTYPE_PGID, thread->signal->pgrp);
- attach_pid(thread, PIDTYPE_SID, thread->signal->session);
- list_add_tail(&thread->tasks, &init_task.tasks);
-
- attach_pid(leader, PIDTYPE_PID, leader->pid);
- attach_pid(leader, PIDTYPE_TGID, leader->tgid);
- attach_pid(leader, PIDTYPE_PGID, leader->signal->pgrp);
- attach_pid(leader, PIDTYPE_SID, leader->signal->session);
-}
-
-/*
* The pid hash table is scaled according to the amount of memory in the
* machine. From a minimum of 16 slots up to 4096 slots at one gigabyte or
* more.
--
1.1.5.g3480


2006-02-02 19:08:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] pidhash: Kill switch_exec_pids

Eric W. Biederman wrote:
>
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -699,7 +699,17 @@ static int de_thread(struct task_struct
> remove_parent(current);
> remove_parent(leader);
>
> - switch_exec_pids(leader, current);
> +
> + /* Become a process group leader with the old leader's pid.
> + * Note: The old leader also uses thispid until release_task
> + * is called. Odd but simple and correct.
> + */
> + detach_pid(current, PIDTYPE_PID);
> + current->pid = leader->pid;
> + attach_pid(current, PIDTYPE_PID, current->pid);

What happens after de_thread() unlocks tasklist_lock and before
it is taken again in release_task() ?

In that window find_task_by_pid() will return dead leader, not
the new leader of thread group. This means we can miss tkill()
or ptrace(), for example.

Oleg.

2006-02-02 20:09:11

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] pidhash: Kill switch_exec_pids

Oleg Nesterov <[email protected]> writes:

> Eric W. Biederman wrote:
>>
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -699,7 +699,17 @@ static int de_thread(struct task_struct
>> remove_parent(current);
>> remove_parent(leader);
>>
>> - switch_exec_pids(leader, current);
>> +
>> + /* Become a process group leader with the old leader's pid.
>> + * Note: The old leader also uses thispid until release_task
>> + * is called. Odd but simple and correct.
>> + */
>> + detach_pid(current, PIDTYPE_PID);
>> + current->pid = leader->pid;
>> + attach_pid(current, PIDTYPE_PID, current->pid);
>
> What happens after de_thread() unlocks tasklist_lock and before
> it is taken again in release_task() ?
>
> In that window find_task_by_pid() will return dead leader, not
> the new leader of thread group. This means we can miss tkill()
> or ptrace(), for example.

A reasonable concern. For some reason I had it in my
head that find_pid didn't need that tasklist_lock
but it does. kthread and vmscan need to be fixed.

All I have done is enlarged the window where this
race is possible. So for tkill I am not concerned,
as it wants a particular thread. Nor am I concerned
about anything else that wants a particular thread.

The fact that the group_leader does not point
at the actual thread group leader might be a problem,
as I have opened a window where that is now the case.

For signals that is not a problem as signals are still shared.
This applies to most other resources as well.

Looking through all of the callers of find_task_by_pid
I couldn't see anything that looks like it cares.

So until we spot that case I'm ready to put this down
of one of those cases in de_thread that looks wrong
but happens to work. Now if there is a way to make
it work more cleanly that may be worth looking at.


Eric

2006-02-02 21:43:58

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] pidhash: Kill switch_exec_pids

[email protected] (Eric W. Biederman) wrote:
>
> Andrew my apologies for the patch thrash, but I'm not certain
> which patches are in your tree at the moment (I know there are
> things that don't appear in -mm4) or I would send you an incremental
> patch.

Yes, it has been pretty chaotic. Here's what I currently have:

exec-allow-init-to-exec-from-any-thread.patch
remove-dead-kill_sl-prototype-from-schedh.patch
do_tty_hangup-use-group_send_sig_info-not.patch
do_sak-dont-depend-on-session-id-0.patch
pidhash-dont-count-idle-threads.patch
pidhash-dont-use-zero-pids.patch
dont-touch-current-tasks-in-de_thread.patch

Plus a couple more patches from Oleg in the todo-queue: "choose_new_parent:
remove unused arg, sanitize exit_state check" and "simplify exec from
init's subthread", which we can assume I'll merge up.

2006-02-03 12:09:40

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] pidhash: Kill switch_exec_pids

"Eric W. Biederman" wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > Eric W. Biederman wrote:
> >>
> >> + detach_pid(current, PIDTYPE_PID);
> >> + current->pid = leader->pid;
> >> + attach_pid(current, PIDTYPE_PID, current->pid);
> >
> > What happens after de_thread() unlocks tasklist_lock and before
> > it is taken again in release_task() ?
> >
> > In that window find_task_by_pid() will return dead leader, not
> > the new leader of thread group. This means we can miss tkill()
> > or ptrace(), for example.
>
> All I have done is enlarged the window where this
> race is possible. So for tkill I am not concerned,
> as it wants a particular thread. Nor am I concerned
> about anything else that wants a particular thread.

Yes, you are right, sorry for noise. We have exactly same situation
before de_thread() locks tasklist after killing the leader.

> The fact that the group_leader does not point
> at the actual thread group leader might be a problem,
> as I have opened a window where that is now the case.
>
> For signals that is not a problem as signals are still shared.
> This applies to most other resources as well.

Actually, now I think this patch fixes a small theoretical bug.
Currently we have a tiny window in switch_exec_pids() when it
detaches ->pid from PIDTYPE_PID namespace. RCU based kill_proc_info()
does not take tasklist, so we can miss a signal.

I have added Paul to the CC: list.

> So until we spot that case I'm ready to put this down
> of one of those cases in de_thread that looks wrong
> but happens to work. Now if there is a way to make
> it work more cleanly that may be worth looking at.

I think you are right.

Andrew, please drop this one:

dont-touch-current-tasks-in-de_thread.patch

Eric's patch includes this cleanup.

Oleg.

2006-02-03 13:00:29

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] pidhash: Kill switch_exec_pids

Oleg Nesterov <[email protected]> writes:

> "Eric W. Biederman" wrote:
>>
>>
>> All I have done is enlarged the window where this
>> race is possible. So for tkill I am not concerned,
>> as it wants a particular thread. Nor am I concerned
>> about anything else that wants a particular thread.
>
> Yes, you are right, sorry for noise. We have exactly same situation
> before de_thread() locks tasklist after killing the leader.

No problem. If de_thread was simple and obviously correct
we wouldn't be fixing it :)

>> The fact that the group_leader does not point
>> at the actual thread group leader might be a problem,
>> as I have opened a window where that is now the case.
>>
>> For signals that is not a problem as signals are still shared.
>> This applies to most other resources as well.
>
> Actually, now I think this patch fixes a small theoretical bug.
> Currently we have a tiny window in switch_exec_pids() when it
> detaches ->pid from PIDTYPE_PID namespace. RCU based kill_proc_info()
> does not take tasklist, so we can miss a signal.

Ok. I thought there was a RCU component to the readers. I just
lost track of where it was.

> I have added Paul to the CC: list.
>
>> So until we spot that case I'm ready to put this down
>> of one of those cases in de_thread that looks wrong
>> but happens to work. Now if there is a way to make
>> it work more cleanly that may be worth looking at.
>
> I think you are right.

I hope so.

> Andrew, please drop this one:
>
> dont-touch-current-tasks-in-de_thread.patch
>
> Eric's patch includes this cleanup.

I just tested this path against 2.6.16-rc1-mm5
and the patch applies with just a little fuzz.

Eric