2005-09-12 17:46:41

by Sripathi Kodi

[permalink] [raw]
Subject: [PATCH 2.6.13.1] Patch for invisible threads

Hi,

When the main thread of a multi-threaded program calls 'pthread_exit' before
other threads have exited, it results in the other threads becoming
'invisible' to commands like 'ps'. This problem was discussed here :
http://lkml.org/lkml/2004/10/5/234 and http://kerneltrap.org/node/3930, but
I can't find a patch or explanation for it anywhere. This problem is only
seen with NPTL and not with LinuxThreads, because Linuxthreads does not let
the main thread exit (puts it to sleep) until all other threads have exited.

The problem can be easily recreated with this simple program:
#include <pthread.h>
#include <sys/types.h>
#include <unistd.h>
void *run(void *arg)
{
sleep(60);
printf("Thread: Exiting\n");
pthread_exit(NULL);
}
int main()
{
pthread_t t;
pthread_create(&t, NULL, run, NULL);
sleep(20);
printf("Main: exiting\n");
pthread_exit(NULL);
}

After the main thread calls 'pthread_exit', it is shown to be defunct. We
can still see the directory /proc/<pid_of_main_thread>/task using 'ls',
'stat' on it returns success, but 'open' on that directory returns ENOENT.
Hence though the other thread is still running, it can't be seen.

The reason appears to be the call to __exit_fs from do_exit when the main
thread exits. This sets the 'fs' pointer in the task struct to NULL. It also
decrements the reference count on the fs structure, but does not release the
memory because the other thread still holds a reference (__put_fs_struct).
When we do open() on /proc/<pid>/task, proc_root_link() (flow is open_namei
- may_open - proc_permission - proc_check_root - proc_root_link) tries to
obtain the task_struct->fs of the main thread, which is now NULL. So it
returns ENOENT.

I think we can fix this problem by the following patch. We set the fs
pointer to NULL only if either the thread is not a thread group leader or if
the whole thread group has exited. If the main thread is the last to exit,
it will set the fs pointer to NULL. However, if it is not the last, it won't
set fs pointer to NULL so that other threads can still use it. Behavior of
__put_fs_struct is not affected.

Please let me know if this is reasonable or if there are other ways to fix
the problem.

Thanks and regards,
Sripathi.


Signed-off-by: Sripathi Kodi <[email protected]>

--- linux-2.6.13.1/kernel/exit.c 2005-09-12 02:46:26.000000000 -0500
+++ /home/sripathi/17794/patch_2.6.13.1/exit.c 2005-09-12 02:46:15.000000000
-0500
@@ -463,9 +463,11 @@ static inline void __exit_fs(struct task
struct fs_struct * fs = tsk->fs;

if (fs) {
- task_lock(tsk);
- tsk->fs = NULL;
- task_unlock(tsk);
+ if (!thread_group_leader(tsk) || !atomic_read(&tsk->signal->live)) {
+ task_lock(tsk);
+ tsk->fs = NULL;
+ task_unlock(tsk);
+ }
__put_fs_struct(fs);
}
}


2005-09-12 20:51:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Sripathi Kodi <[email protected]> wrote:
>
> Hi,
>
> When the main thread of a multi-threaded program calls 'pthread_exit' before
> other threads have exited, it results in the other threads becoming
> 'invisible' to commands like 'ps'.

This stuff is subtle. Let me cc some subtle people.

> This problem was discussed here :
> http://lkml.org/lkml/2004/10/5/234 and http://kerneltrap.org/node/3930, but
> I can't find a patch or explanation for it anywhere. This problem is only
> seen with NPTL and not with LinuxThreads, because Linuxthreads does not let
> the main thread exit (puts it to sleep) until all other threads have exited.
>
> The problem can be easily recreated with this simple program:
> #include <pthread.h>
> #include <sys/types.h>
> #include <unistd.h>
> void *run(void *arg)
> {
> sleep(60);
> printf("Thread: Exiting\n");
> pthread_exit(NULL);
> }
> int main()
> {
> pthread_t t;
> pthread_create(&t, NULL, run, NULL);
> sleep(20);
> printf("Main: exiting\n");
> pthread_exit(NULL);
> }
>
> After the main thread calls 'pthread_exit', it is shown to be defunct. We
> can still see the directory /proc/<pid_of_main_thread>/task using 'ls',
> 'stat' on it returns success, but 'open' on that directory returns ENOENT.
> Hence though the other thread is still running, it can't be seen.
>
> The reason appears to be the call to __exit_fs from do_exit when the main
> thread exits. This sets the 'fs' pointer in the task struct to NULL. It also
> decrements the reference count on the fs structure, but does not release the
> memory because the other thread still holds a reference (__put_fs_struct).
> When we do open() on /proc/<pid>/task, proc_root_link() (flow is open_namei
> - may_open - proc_permission - proc_check_root - proc_root_link) tries to
> obtain the task_struct->fs of the main thread, which is now NULL. So it
> returns ENOENT.
>
> I think we can fix this problem by the following patch. We set the fs
> pointer to NULL only if either the thread is not a thread group leader or if
> the whole thread group has exited. If the main thread is the last to exit,
> it will set the fs pointer to NULL. However, if it is not the last, it won't
> set fs pointer to NULL so that other threads can still use it. Behavior of
> __put_fs_struct is not affected.
>
> Please let me know if this is reasonable or if there are other ways to fix
> the problem.
>
> Thanks and regards,
> Sripathi.
>
>
> Signed-off-by: Sripathi Kodi <[email protected]>
>
> --- linux-2.6.13.1/kernel/exit.c 2005-09-12 02:46:26.000000000 -0500
> +++ /home/sripathi/17794/patch_2.6.13.1/exit.c 2005-09-12 02:46:15.000000000
> -0500
> @@ -463,9 +463,11 @@ static inline void __exit_fs(struct task
> struct fs_struct * fs = tsk->fs;
>
> if (fs) {
> - task_lock(tsk);
> - tsk->fs = NULL;
> - task_unlock(tsk);
> + if (!thread_group_leader(tsk) || !atomic_read(&tsk->signal->live)) {
> + task_lock(tsk);
> + tsk->fs = NULL;
> + task_unlock(tsk);
> + }
> __put_fs_struct(fs);
> }
> }

A comment in there would be nice.

2005-09-13 13:12:29

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Andrew,

Andrew Morton wrote:
> Sripathi Kodi <[email protected]> wrote:
>
>>Hi,
>>
>>When the main thread of a multi-threaded program calls 'pthread_exit' before
>>other threads have exited, it results in the other threads becoming
>>'invisible' to commands like 'ps'.
>
>
> This stuff is subtle. Let me cc some subtle people.
>
>
>>Signed-off-by: Sripathi Kodi <[email protected]>
>>
>>--- linux-2.6.13.1/kernel/exit.c 2005-09-12 02:46:26.000000000 -0500
>>+++ /home/sripathi/17794/patch_2.6.13.1/exit.c 2005-09-12 02:46:15.000000000
>>-0500
>>@@ -463,9 +463,11 @@ static inline void __exit_fs(struct task
>> struct fs_struct * fs = tsk->fs;
>>
>> if (fs) {
>>- task_lock(tsk);
>>- tsk->fs = NULL;
>>- task_unlock(tsk);
>>+ if (!thread_group_leader(tsk) || !atomic_read(&tsk->signal->live)) {
>>+ task_lock(tsk);
>>+ tsk->fs = NULL;
>>+ task_unlock(tsk);
>>+ }
>> __put_fs_struct(fs);
>> }
>> }
>
>
> A comment in there would be nice.
>

Below is the patch with a comment.

Thanks and regards,
Sripathi.

Signed-off-by: Sripathi Kodi <[email protected]>

--- linux-2.6.13.1/kernel/exit.c 2005-09-13 15:39:48.738542872 -0500
+++ /home/sripathi/17794/patch_2.6.13.1/exit.c 2005-09-13 15:39:27.367791720
-0500
@@ -463,9 +463,13 @@ static inline void __exit_fs(struct task
struct fs_struct * fs = tsk->fs;

if (fs) {
- task_lock(tsk);
- tsk->fs = NULL;
- task_unlock(tsk);
+ /* If tsk is thread group leader and if group still has alive
+ * threads, those threads may use tsk->fs */
+ if (!thread_group_leader(tsk) || !atomic_read(&tsk->signal->live)) {
+ task_lock(tsk);
+ tsk->fs = NULL;
+ task_unlock(tsk);
+ }
__put_fs_struct(fs);
}
}

2005-09-13 14:54:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads



On Tue, 13 Sep 2005, Sripathi Kodi wrote:
>
> Thanks and regards,
> Sripathi.
>
> Signed-off-by: Sripathi Kodi <[email protected]>
>
> --- linux-2.6.13.1/kernel/exit.c 2005-09-13 15:39:48.738542872 -0500
> +++ /home/sripathi/17794/patch_2.6.13.1/exit.c 2005-09-13 15:39:27.367791720
> -0500
> @@ -463,9 +463,13 @@ static inline void __exit_fs(struct task
> struct fs_struct * fs = tsk->fs;
>
> if (fs) {
> - task_lock(tsk);
> - tsk->fs = NULL;
> - task_unlock(tsk);
> + /* If tsk is thread group leader and if group still has alive
> + * threads, those threads may use tsk->fs */
> + if (!thread_group_leader(tsk) || !atomic_read(&tsk->signal->live)) {
> + task_lock(tsk);
> + tsk->fs = NULL;
> + task_unlock(tsk);
> + }
> __put_fs_struct(fs);
> }
> }

This really is wrong. You "put" the fs without clearing it in that thread,
which means that now the reference counts no longer match the number of
pointers to it. This will inevitably result in using stale fs pointers in
/proc at some point. Not good. In fact, I almost guarantee that you can do
that by just having a thread group which doesn't share it's file
descriptors (which is possible, even though no _nice_ program does it.
Think DoS/security attack).

The sub-threads have a "->fs" of their own, and they'll happily continue
to use their own versions.

So this patch is _wrong_.

I think the problem is "proc_check_root()", which just refuses to do a lot
of things without a fs. Many of those things are unnecessary, afaik - we
should allow it. But allowing it means that some other paths may need more
checking..

So you can _try_ to just make proc_check_root() return 0 when
proc_root_link() returns an error...

Linus

2005-09-13 16:51:13

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Tue, Sep 13, 2005 at 07:53:40AM -0700, Linus Torvalds wrote:

> So this patch is _wrong_.

Definitely.

> I think the problem is "proc_check_root()", which just refuses to do a lot
> of things without a fs. Many of those things are unnecessary, afaik - we
> should allow it. But allowing it means that some other paths may need more
> checking..
>
> So you can _try_ to just make proc_check_root() return 0 when
> proc_root_link() returns an error...

I very much doubt the correctness of that.

The real problem here is obvious: it's about permissions on /proc/<pid>/task.
That's where the things go wrong - we use proc_permission() for it and we
have group leader as associated task.

Note that stuff _in_ proc/<pid>/task will keep working just fine, if we
manage to get to it - there we have other threads as associated tasks,
so everything works as it should.

What we need is to decide what kind of access control do we really want on
/proc/<pid>/task. That's it.

2005-09-13 17:02:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads



On Tue, 13 Sep 2005, Al Viro wrote:
>
> What we need is to decide what kind of access control do we really want on
> /proc/<pid>/task. That's it.

I don't think any controls at all. The real control should then be on the
/proc/<pid>/task/<tid> access, which should be the same as the /proc/<pid>
controls (except for thread <tid> rather than thread <pid>, of course)

Linus

2005-09-13 17:12:22

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Tue, Sep 13, 2005 at 10:01:58AM -0700, Linus Torvalds wrote:
>
>
> On Tue, 13 Sep 2005, Al Viro wrote:
> >
> > What we need is to decide what kind of access control do we really want on
> > /proc/<pid>/task. That's it.
>
> I don't think any controls at all. The real control should then be on the
> /proc/<pid>/task/<tid> access, which should be the same as the /proc/<pid>
> controls (except for thread <tid> rather than thread <pid>, of course)

Well... If exposing the list of tasks in a group is OK, we can just leave
->permission NULL for that sucker. If it's not (and arguably it can be
sensitive information), we have a bigger problem - right now chroot boundary
is the only control we have there; normally anyone can ls /proc/<whatever>/task
and see other threads.

2005-09-13 21:37:31

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Al Viro wrote:
>
> Well... If exposing the list of tasks in a group is OK, we can just leave
> ->permission NULL for that sucker. If it's not (and arguably it can be
> sensitive information), we have a bigger problem - right now chroot boundary
> is the only control we have there; normally anyone can ls /proc/<whatever>/task
> and see other threads.
>

Al, I understand that we can't set ->permission to NULL as it removes the
chroot boundary check. If I understood you correctly, we need to put
additional checks in proc_permission to ensure anyone doing ls
/proc/<pid>/task won't be able to see other threads.

Coming back to the problem of being able to see the threads of a process
whose main thread has done pthread_exit, it appears to me that the only
hurdle is getting the ->fs pointer from task struct. Since all threads of
the process point to the same fs structure, would it be okay if we try to
get it from some other thread? Below is the patch I tried for this:


Signed-off-by: Sripathi Kodi <[email protected]>

--- linux-2.6.13.1-orig/fs/proc/base.c 2005-09-13 23:53:06.000000000 -0500
+++ linux-2.6.13.1/fs/proc/base.c 2005-09-13 23:51:36.000000000 -0500
@@ -273,13 +273,25 @@ static int proc_cwd_link(struct inode *i

static int proc_root_link(struct inode *inode, struct dentry **dentry,
struct vfsmount **mnt)
{
- struct fs_struct *fs;
+ struct fs_struct *fs = NULL;
int result = -ENOENT;
- task_lock(proc_task(inode));
- fs = proc_task(inode)->fs;
- if(fs)
- atomic_inc(&fs->count);
- task_unlock(proc_task(inode));
+ struct task_struct *task = proc_task(inode);
+ struct task_struct *leader_task = task;
+
+ read_lock(&tasklist_lock);
+ if (pid_alive(task)) {
+ do {
+ task_lock(task);
+ if ((fs = task->fs)) {
+ atomic_inc(&fs->count);
+ task_unlock(task);
+ break;
+ }
+ task_unlock(task);
+ } while ((task = next_thread(task)) != leader_task);
+ }
+ read_unlock(&tasklist_lock);
+
if (fs) {
read_lock(&fs->lock);
*mnt = mntget(fs->rootmnt);

2005-09-13 21:56:49

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

It might suffice to move __exit_fs to the top of release_task. (Perhaps
that requires moving exit_namespace after it, I don't know if it matters.)
Then a zombie group leader won't release and clear ->fs before the whole
group dies.


Thanks,
Roland

2005-09-13 21:57:13

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Tue, Sep 13, 2005 at 04:30:43PM -0500, Sripathi Kodi wrote:
> Al Viro wrote:
> >
> >Well... If exposing the list of tasks in a group is OK, we can just leave
> >->permission NULL for that sucker. If it's not (and arguably it can be
> >sensitive information), we have a bigger problem - right now chroot
> >boundary
> >is the only control we have there; normally anyone can ls
> >/proc/<whatever>/task
> >and see other threads.
> >
>
> Al, I understand that we can't set ->permission to NULL as it removes the
> chroot boundary check. If I understood you correctly, we need to put
> additional checks in proc_permission to ensure anyone doing ls
> /proc/<pid>/task won't be able to see other threads.

Wrong. We need a separate function, _not_ modifying proc_permssion().
If we need ->permission() at all, that is - note that anyone can do
ls /proc/<pid>/task on other users' process.

2005-09-13 23:10:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads



On Tue, 13 Sep 2005, Sripathi Kodi wrote:
>
> Coming back to the problem of being able to see the threads of a process
> whose main thread has done pthread_exit, it appears to me that the only
> hurdle is getting the ->fs pointer from task struct. Since all threads of
> the process point to the same fs structure, would it be okay if we try to
> get it from some other thread? Below is the patch I tried for this:

I don't think this is wrong per se, but you shouldn't take the tasklist
lock normally. You're better off just doing

static struct fs_struct *get_fs(struct task_struct *tsk)
{
struct fs_struct *fs;

task_lock(tsk);
fs = task->fs;
if (fs) {
atomic_inc(&fs->count);
task_unlock(tsk);
return fs;
}
task_unlock(tsk);

read_lock(&tasklist_lock);
if (pid_alive(tsk)) {
struct task_struct *tmp = tsk;
while ((tmp = next_thread(tmp)) != tsk) {
task_lock(tmp);
fs = tmp->fs;
if (fs) {
atomic_inc(fs->count)
task_unlock(tmp);
break;
}
task_unlock(tmp);
}
}
read_unlock(&tasklist_lock);
return fs;
}

or something like that. You get the idea (totally untested, use at your
own risk, if your hair falls off and you get boils, it wasn't my fault).

Linus

2005-09-14 01:47:37

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Linus Torvalds wrote:

> I don't think this is wrong per se, but you shouldn't take the tasklist
> lock normally. You're better off just doing

Linus,

I incarporated the path that doesn't hold tasklist lock unnecessarily. The
patch is below. This seems to work without any problems for me.

If the decision is to remove ->permission, I can send a small patch I have
that removes .permission entry from proc_task_inode_operations. Either way
fixes the problem I found.

Thanks and regards,
Sripathi.

Signed-off-by: Sripathi Kodi <[email protected]>

--- linux-2.6.13.1-orig/fs/proc/base.c 2005-09-14 03:46:22.000000000 -0500
+++ linux-2.6.13.1/fs/proc/base.c 2005-09-14 03:48:35.000000000 -0500
@@ -275,11 +275,33 @@ static int proc_root_link(struct inode *
{
struct fs_struct *fs;
int result = -ENOENT;
- task_lock(proc_task(inode));
- fs = proc_task(inode)->fs;
- if(fs)
+ struct task_struct *leader = proc_task(inode);
+
+ task_lock(leader);
+ fs = leader->fs;
+ if (fs) {
atomic_inc(&fs->count);
- task_unlock(proc_task(inode));
+ task_unlock(leader);
+ } else {
+ /* Try to get fs from sub-threads */
+ task_unlock(leader);
+ struct task_struct *task = leader;
+ read_lock(&tasklist_lock);
+ if (pid_alive(task)) {
+ while ((task = next_thread(task)) != leader) {
+ task_lock(task);
+ fs = task->fs;
+ if (fs) {
+ atomic_inc(&fs->count);
+ task_unlock(task);
+ break;
+ }
+ task_unlock(task);
+ }
+ }
+ read_unlock(&tasklist_lock);
+ }
+
if (fs) {
read_lock(&fs->lock);
*mnt = mntget(fs->rootmnt);

2005-09-14 01:50:14

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Tue, Sep 13, 2005 at 04:10:21PM -0700, Linus Torvalds wrote:
>
> > get it from some other thread? Below is the patch I tried for this:

> I don't think this is wrong per se, but you shouldn't take the tasklist
> lock normally. You're better off just doing

Could you exlain why we might want to bother with that in the first place?
In any case, why would we want to put that stuff on the common codepath
instead of specialized ->permission()?

2005-09-14 01:53:00

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Tue, Sep 13, 2005 at 08:47:02PM -0500, Sripathi Kodi wrote:
> Linus Torvalds wrote:
>
> >I don't think this is wrong per se, but you shouldn't take the tasklist
> >lock normally. You're better off just doing
>
> Linus,
>
> I incarporated the path that doesn't hold tasklist lock unnecessarily. The
> patch is below. This seems to work without any problems for me.
>
> If the decision is to remove ->permission, I can send a small patch I have
> that removes .permission entry from proc_task_inode_operations. Either way
> fixes the problem I found.

NAK. Even if equivalent of that kludge might be tolerable for our purposes,
it does _not_ belong on any of the normal codepaths.

2005-09-14 14:31:39

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Sripathi Kodi wrote:
> Linus Torvalds wrote:
>
>> I don't think this is wrong per se, but you shouldn't take the
>> tasklist lock normally. You're better off just doing
>
>
> Linus,
>
> I incarporated the path that doesn't hold tasklist lock unnecessarily.
> The patch is below. This seems to work without any problems for me.
>
> If the decision is to remove ->permission, I can send a small patch I
> have that removes .permission entry from proc_task_inode_operations.
> Either way fixes the problem I found.

Let me say that this solution, and any other which loops through all
threads of a task, isn't going to scale well. I don't have a magic O(1)
solution, if it were easy someone would have done that instead of the
while loop, just noting that a clever solution would be a win on servers.

>
> Thanks and regards,
> Sripathi.
>
> Signed-off-by: Sripathi Kodi <[email protected]>
>
> --- linux-2.6.13.1-orig/fs/proc/base.c 2005-09-14 03:46:22.000000000
> -0500
> +++ linux-2.6.13.1/fs/proc/base.c 2005-09-14 03:48:35.000000000 -0500
> @@ -275,11 +275,33 @@ static int proc_root_link(struct inode *
> {
> struct fs_struct *fs;
> int result = -ENOENT;
> - task_lock(proc_task(inode));
> - fs = proc_task(inode)->fs;
> - if(fs)
> + struct task_struct *leader = proc_task(inode);
> +
> + task_lock(leader);
> + fs = leader->fs;
> + if (fs) {
> atomic_inc(&fs->count);
> - task_unlock(proc_task(inode));
> + task_unlock(leader);
> + } else {
> + /* Try to get fs from sub-threads */
> + task_unlock(leader);
> + struct task_struct *task = leader;
> + read_lock(&tasklist_lock);
> + if (pid_alive(task)) {
> + while ((task = next_thread(task)) != leader) {
> + task_lock(task);
> + fs = task->fs;
> + if (fs) {
> + atomic_inc(&fs->count);
> + task_unlock(task);
> + break;
> + }
> + task_unlock(task);
> + }
> + }
> + read_unlock(&tasklist_lock);
> + }
> +
> if (fs) {
> read_lock(&fs->lock);
> *mnt = mntget(fs->rootmnt);
>

2005-09-15 00:30:24

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Bill Davidsen wrote:
>
> Let me say that this solution, and any other which loops through all
> threads of a task, isn't going to scale well. I don't have a magic O(1)
> solution, if it were easy someone would have done that instead of the
> while loop, just noting that a clever solution would be a win on servers.

Bill,

We will only have to go through the while loop in rare cases where main
thread has done pthread_exit() before other threads (and hence it's task->fs
is null). Also, even in most such cases, the very first iteration through
the while loop will get us the 'fs' pointer, so we won't have to loop
through all threads. So I think this won't have scalability problem. Am I right?

Thanks,
Sripathi.

2005-09-15 00:31:16

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Al Viro wrote:
> On Tue, Sep 13, 2005 at 04:10:21PM -0700, Linus Torvalds wrote:
>
>>I don't think this is wrong per se, but you shouldn't take the tasklist
>>lock normally. You're better off just doing
>
>
> Could you exlain why we might want to bother with that in the first place?
> In any case, why would we want to put that stuff on the common codepath
> instead of specialized ->permission()?
>
Al,

I can move this code from proc_root_link() to proc_check_root(), but it will
still not be completely limited to ->permission() path. I can create a
separate ->permission() for proc_task_inode_operations, and have this
additional code there. If I do that, I think I will have to duplicate much
of proc_check_root(). Or else, I will have to split proc_check_root() into
two functions to prevent code duplication. Please let me know if any of
these makes sense, and I will send another patch.

If you don't like this idea at all, please let me know if there any other
way of solving the invisible threads problem, short of taking out
->permission() altogether from proc_task_inode_operations.

Thanks,
Sripathi.

2005-09-15 00:55:57

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

> If you don't like this idea at all, please let me know if there any other
> way of solving the invisible threads problem, short of taking out
> ->permission() altogether from proc_task_inode_operations.

Have you investigated my suggestion to move __exit_fs from do_exit to
release_task?


Thanks,
Roland

2005-09-15 01:18:59

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Wed, Sep 14, 2005 at 07:31:12PM -0500, Sripathi Kodi wrote:
> I can move this code from proc_root_link() to proc_check_root(), but it
> will still not be completely limited to ->permission() path. I can create a
> separate ->permission() for proc_task_inode_operations, and have this
> additional code there. If I do that, I think I will have to duplicate much
> of proc_check_root(). Or else, I will have to split proc_check_root() into
> two functions to prevent code duplication. Please let me know if any of
> these makes sense, and I will send another patch.

The last variant would be preferable if we go in that direction...

> If you don't like this idea at all, please let me know if there any other
> way of solving the invisible threads problem, short of taking out
> ->permission() altogether from proc_task_inode_operations.

Frankly, I don't see the rationale for combination of
* allowing anyone see all processes in top-level directory and
visit their directories, chroot or not
* allowing anyone see /proc/<pid>/task/*, unless separated by
chroot (note that we allow that regardless of process ownership, etc.)
* disallowing to see /proc/<pid>/task/* if leader is or used to be
outside of our chroot.

IOW, it's either too weak or too strong; current rules make very little
sense, regardless of the behaviour when group leader dies.

2005-09-15 01:39:06

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Roland McGrath wrote:
>>If you don't like this idea at all, please let me know if there any other
>>way of solving the invisible threads problem, short of taking out
>>->permission() altogether from proc_task_inode_operations.
>
>
> Have you investigated my suggestion to move __exit_fs from do_exit to
> release_task?

Roland,

No, I had missed this completely. Sorry.

I just gave it a quick try and it seems to be working fine. I have only
moved __exit_fs to the top of release_task, not moved exit_namespace after
it. I will try to run some tests to see if this is working fine. Thanks a lot.

Thanks and regards,
Sripathi.

2005-09-15 02:12:49

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Wed, Sep 14, 2005 at 08:38:48PM -0500, Sripathi Kodi wrote:
> Roland McGrath wrote:
> >>If you don't like this idea at all, please let me know if there any other
> >>way of solving the invisible threads problem, short of taking out
> >>->permission() altogether from proc_task_inode_operations.
> >
> >
> >Have you investigated my suggestion to move __exit_fs from do_exit to
> >release_task?
>
> Roland,
>
> No, I had missed this completely. Sorry.
>
> I just gave it a quick try and it seems to be working fine. I have only
> moved __exit_fs to the top of release_task, not moved exit_namespace after
> it. I will try to run some tests to see if this is working fine. Thanks a
> lot.

Among other things, it means that zombies keep pinning their root and cwd
down, AFAICS. Not Good(tm).

2005-09-15 07:29:33

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

> Among other things, it means that zombies keep pinning their root and cwd
> down, AFAICS. Not Good(tm).

Good point. The technique that looks for the first nonzombie in the group
to fetch its fs might be best if it's necessary to use fs for the
permission checks. It should always find a winner on just one
iteration, or two or three in races, except for ptrace keeping zombie
threads alive (otherwise all zombies but the leader self-reap quickly).


Thanks,
Roland

2005-09-16 00:55:38

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Al Viro wrote:
> On Wed, Sep 14, 2005 at 07:31:12PM -0500, Sripathi Kodi wrote:
>
>>I can move this code from proc_root_link() to proc_check_root(), but it
>>will still not be completely limited to ->permission() path. I can create a
>>separate ->permission() for proc_task_inode_operations, and have this
>>additional code there. If I do that, I think I will have to duplicate much
>>of proc_check_root(). Or else, I will have to split proc_check_root() into
>>two functions to prevent code duplication. Please let me know if any of
>>these makes sense, and I will send another patch.
>
>
> The last variant would be preferable if we go in that direction...

Al,

I have done the following changes in this new patch:
1) Changed proc_permission to call new function proc_task_check_root instead
of proc_check_root. Other paths that call proc_check_root are not affected.
2) proc_task_check_root calls a new function proc_task_root_link to get fs.
This is same as proc_root_link, but additionally tries to get fs from one of
the other threads of the group when the main thread doesn't have it.
3) Moved the code that checks chroot boundary from proc_root_link to a new
function proc_check_chroot. Both proc_check_root and proc_task_check_root
call this function. This avoids duplication of this code.

With these changes, only ->permission path will encounter the new kludge code.

proc_root_link and proc_task_root_link still have some duplicated code. I
could have split these functions further to avoid duplication completely,
but that would move incrementing and decrementing fs->lock to two different
functions, which I think will be confusing.

The other way of implementing this that I could think of was to have a flag
to indicate that the call is from ->permission path and pass it all along.
This will avoid having to change many existing functions, but it will defeat
the purpose of limiting this kludge code to ->permission path.

Please let me know how it is looking now.

Further, about actual permission checks that we are doing, can we say: "A
process should be able to see /proc/<pid>/task/* of another process only if
they both belong to same uid or reader is root"? But any such change will
change the behavior of commands like 'ps', right?

Thanks,
Sripathi.


Signed-off-by: Sripathi Kodi <[email protected]>

--- linux-2.6.13.1-orig/fs/proc/base.c 2005-09-09 21:42:58.000000000 -0500
+++ linux-2.6.13.1/fs/proc/base.c 2005-09-16 00:57:31.000000000 -0500
@@ -291,6 +291,52 @@ static int proc_root_link(struct inode *
return result;
}

+
+/* Same as proc_root_link, but this addionally tries to get fs from other
+ * threads in the group */
+static int proc_task_root_link(struct inode *inode, struct dentry **dentry,
struct vfsmount **mnt)
+{
+ struct fs_struct *fs;
+ int result = -ENOENT;
+ struct task_struct *leader = proc_task(inode);
+
+ task_lock(leader);
+ fs = leader->fs;
+ if (fs) {
+ atomic_inc(&fs->count);
+ task_unlock(leader);
+ } else {
+ /* Try to get fs from other threads */
+ task_unlock(leader);
+ struct task_struct *task = leader;
+ read_lock(&tasklist_lock);
+ if (pid_alive(task)) {
+ while ((task = next_thread(task)) != leader) {
+ task_lock(task);
+ fs = task->fs;
+ if (fs) {
+ atomic_inc(&fs->count);
+ task_unlock(task);
+ break;
+ }
+ task_unlock(task);
+ }
+ }
+ read_unlock(&tasklist_lock);
+ }
+
+ if (fs) {
+ read_lock(&fs->lock);
+ *mnt = mntget(fs->rootmnt);
+ *dentry = dget(fs->root);
+ read_unlock(&fs->lock);
+ result = 0;
+ put_fs_struct(fs);
+ }
+ return result;
+}
+
+
#define MAY_PTRACE(task) \
(task == current || \
(task->parent == current && \
@@ -449,14 +495,14 @@ static int proc_oom_score(struct task_st

/* permission checks */

-static int proc_check_root(struct inode *inode)
+/* If the process being read is separated by chroot from the reading process,
+ * don't let the reader access the threads.
+ */
+static int proc_check_chroot(struct dentry *root, struct vfsmount *vfsmnt)
{
- struct dentry *de, *base, *root;
- struct vfsmount *our_vfsmnt, *vfsmnt, *mnt;
+ struct dentry *de, *base;
+ struct vfsmount *our_vfsmnt, *mnt;
int res = 0;
-
- if (proc_root_link(inode, &root, &vfsmnt)) /* Ewww... */
- return -ENOENT;
read_lock(&current->fs->lock);
our_vfsmnt = mntget(current->fs->rootmnt);
base = dget(current->fs->root);
@@ -489,11 +535,33 @@ out:
goto exit;
}

+static int proc_check_root(struct inode *inode)
+{
+ struct dentry *root;
+ struct vfsmount *vfsmnt;
+
+ if (proc_root_link(inode, &root, &vfsmnt)) /* Ewww... */
+ return -ENOENT;
+
+ return proc_check_chroot(root, vfsmnt);
+}
+
+static int proc_task_check_root(struct inode *inode)
+{
+ struct dentry *root;
+ struct vfsmount *vfsmnt;
+
+ if (proc_task_root_link(inode, &root, &vfsmnt)) /* Ewww... */
+ return -ENOENT;
+
+ return proc_check_chroot(root, vfsmnt);
+}
+
static int proc_permission(struct inode *inode, int mask, struct nameidata
*nd)
{
if (generic_permission(inode, mask, NULL) != 0)
return -EACCES;
- return proc_check_root(inode);
+ return proc_task_check_root(inode);
}

extern struct seq_operations proc_pid_maps_op;

2005-09-16 07:46:18

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Thu, Sep 15, 2005 at 07:54:56PM -0500, Sripathi Kodi wrote:
> proc_root_link and proc_task_root_link still have some duplicated code. I
> could have split these functions further to avoid duplication completely,
> but that would move incrementing and decrementing fs->lock to two different
> functions, which I think will be confusing.
>
> The other way of implementing this that I could think of was to have a flag
> to indicate that the call is from ->permission path and pass it all along.
> This will avoid having to change many existing functions, but it will
> defeat the purpose of limiting this kludge code to ->permission path.
>
> Please let me know how it is looking now.

Ugh... Considering that all of that is _only_ for /proc/<pid>/task and
that proc_permission() is a couple of function calls, why bother with
proc_task_check_root() instead of just adding proc_task_permission() with

{
struct dentry *root;
struct vfsmount *vfsmnt;

if (generic_permission(inode, mask, NULL) != 0)
return -EACCES;

/* or just open-code it here, for that matter */
if (proc_task_root_link(inode, &root, &vfsmnt))
return -ENOENT;

return proc_check_chroot(root, vfsmnt);
}

for a body and leaving proc_permission() without any changes at all?

> Further, about actual permission checks that we are doing, can we say: "A
> process should be able to see /proc/<pid>/task/* of another process only if
> they both belong to same uid or reader is root"? But any such change will
> change the behavior of commands like 'ps', right?

Right. The real question is whether the current behaviour makes any sense.
I've no objections to your patch + modification above, but I really wonder
if we should keep current rules in that area.

2005-09-16 15:08:31

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

Al Viro wrote:
> Ugh... Considering that all of that is _only_ for /proc/<pid>/task and
> that proc_permission() is a couple of function calls, why bother with
> proc_task_check_root() instead of just adding proc_task_permission() with
>
> {
> struct dentry *root;
> struct vfsmount *vfsmnt;
>
> if (generic_permission(inode, mask, NULL) != 0)
> return -EACCES;
>
> /* or just open-code it here, for that matter */
> if (proc_task_root_link(inode, &root, &vfsmnt))
> return -ENOENT;
>
> return proc_check_chroot(root, vfsmnt);
> }
>
> for a body and leaving proc_permission() without any changes at all?

Al,
Done. Please find the patch below. I retained proc_task_root_link, because
it has significant amount of code in it.

> Right. The real question is whether the current behaviour makes any sense.
> I've no objections to your patch + modification above, but I really wonder
> if we should keep current rules in that area.

I don't know what would be the right behavior for this area. If you have any
ideas for changes we could introduce here, I am ready to code and test it out.

Thanks and regards,
Sripathi.


Signed-off-by: Sripathi Kodi <[email protected]>

--- linux-2.6.13.1-orig/fs/proc/base.c 2005-09-16 17:22:44.000000000 -0500
+++ linux-2.6.13.1/fs/proc/base.c 2005-09-16 17:08:18.000000000 -0500
@@ -291,6 +291,52 @@ static int proc_root_link(struct inode *
return result;
}

+
+/* Same as proc_root_link, but this addionally tries to get fs from other
+ * threads in the group */
+static int proc_task_root_link(struct inode *inode, struct dentry **dentry,
struct vfsmount **mnt)
+{
+ struct fs_struct *fs;
+ int result = -ENOENT;
+ struct task_struct *leader = proc_task(inode);
+
+ task_lock(leader);
+ fs = leader->fs;
+ if (fs) {
+ atomic_inc(&fs->count);
+ task_unlock(leader);
+ } else {
+ /* Try to get fs from other threads */
+ task_unlock(leader);
+ struct task_struct *task = leader;
+ read_lock(&tasklist_lock);
+ if (pid_alive(task)) {
+ while ((task = next_thread(task)) != leader) {
+ task_lock(task);
+ fs = task->fs;
+ if (fs) {
+ atomic_inc(&fs->count);
+ task_unlock(task);
+ break;
+ }
+ task_unlock(task);
+ }
+ }
+ read_unlock(&tasklist_lock);
+ }
+
+ if (fs) {
+ read_lock(&fs->lock);
+ *mnt = mntget(fs->rootmnt);
+ *dentry = dget(fs->root);
+ read_unlock(&fs->lock);
+ result = 0;
+ put_fs_struct(fs);
+ }
+ return result;
+}
+
+
#define MAY_PTRACE(task) \
(task == current || \
(task->parent == current && \
@@ -449,14 +495,14 @@ static int proc_oom_score(struct task_st

/* permission checks */

-static int proc_check_root(struct inode *inode)
+/* If the process being read is separated by chroot from the reading process,
+ * don't let the reader access the threads.
+ */
+static int proc_check_chroot(struct dentry *root, struct vfsmount *vfsmnt)
{
- struct dentry *de, *base, *root;
- struct vfsmount *our_vfsmnt, *vfsmnt, *mnt;
+ struct dentry *de, *base;
+ struct vfsmount *our_vfsmnt, *mnt;
int res = 0;
-
- if (proc_root_link(inode, &root, &vfsmnt)) /* Ewww... */
- return -ENOENT;
read_lock(&current->fs->lock);
our_vfsmnt = mntget(current->fs->rootmnt);
base = dget(current->fs->root);
@@ -489,6 +535,16 @@ out:
goto exit;
}

+static int proc_check_root(struct inode *inode)
+{
+ struct dentry *root;
+ struct vfsmount *vfsmnt;
+
+ if (proc_root_link(inode, &root, &vfsmnt)) /* Ewww... */
+ return -ENOENT;
+ return proc_check_chroot(root, vfsmnt);
+}
+
static int proc_permission(struct inode *inode, int mask, struct nameidata
*nd)
{
if (generic_permission(inode, mask, NULL) != 0)
@@ -496,6 +552,20 @@ static int proc_permission(struct inode
return proc_check_root(inode);
}

+static int proc_task_permission(struct inode *inode, int mask, struct
nameidata *nd)
+{
+ struct dentry *root;
+ struct vfsmount *vfsmnt;
+
+ if (generic_permission(inode, mask, NULL) != 0)
+ return -EACCES;
+
+ if (proc_task_root_link(inode, &root, &vfsmnt))
+ return -ENOENT;
+
+ return proc_check_chroot(root, vfsmnt);
+}
+
extern struct seq_operations proc_pid_maps_op;
static int maps_open(struct inode *inode, struct file *file)
{
@@ -1355,7 +1425,7 @@ static struct inode_operations proc_fd_i

static struct inode_operations proc_task_inode_operations = {
.lookup = proc_task_lookup,
- .permission = proc_permission,
+ .permission = proc_task_permission,
};

#ifdef CONFIG_SECURITY

2005-09-16 18:05:46

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Fri, Sep 16, 2005 at 08:46:06AM +0100, Al Viro wrote:
> > Further, about actual permission checks that we are doing, can we say: "A
> > process should be able to see /proc/<pid>/task/* of another process only if
> > they both belong to same uid or reader is root"? But any such change will
> > change the behavior of commands like 'ps', right?
>
> Right. The real question is whether the current behaviour makes any sense.
> I've no objections to your patch + modification above, but I really wonder
> if we should keep current rules in that area.

Why should there be any more restrictions on /proc/<pid>/task than
there are in /proc? Threads are not listed in the latter, but that's
strictly for performance/usability; you can enumerate threads in /proc
by just trying all the valid PIDs.

--
Daniel Jacobowitz
CodeSourcery, LLC

2005-09-16 18:15:06

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 2.6.13.1] Patch for invisible threads

On Fri, Sep 16, 2005 at 02:05:35PM -0400, Daniel Jacobowitz wrote:
> On Fri, Sep 16, 2005 at 08:46:06AM +0100, Al Viro wrote:
> > > Further, about actual permission checks that we are doing, can we say: "A
> > > process should be able to see /proc/<pid>/task/* of another process only if
> > > they both belong to same uid or reader is root"? But any such change will
> > > change the behavior of commands like 'ps', right?
> >
> > Right. The real question is whether the current behaviour makes any sense.
> > I've no objections to your patch + modification above, but I really wonder
> > if we should keep current rules in that area.
>
> Why should there be any more restrictions on /proc/<pid>/task than
> there are in /proc? Threads are not listed in the latter, but that's
> strictly for performance/usability; you can enumerate threads in /proc
> by just trying all the valid PIDs.

But we *do* see processes outside of chroot jail in /proc. That's the
point - we have seriously inconsistent rules here.