2008-01-17 08:29:36

by Vinay Sridhar

[permalink] [raw]
Subject: [RFC] Per-thread getrusage

Hi All,

Last year, there was discussion about per-thread getrusage by adding
RUSAGE_THREAD flag to getrusage(). Please refer to the thread
http://lkml.org/lkml/2007/4/4/308. Ulrich had suggested that we should
design a better user-space API. Specifically, we need a
pthread_getrusage interface in the thread library, which accepts
pthread_t, converts pthread_t into the corresponding tid and passes it
down to the syscall.

There are two ways to implement this in the kernel:
1) Introduce an additional parameter 'tid' to sys_getrusage() and put
code in glibc to handle getrusage() and pthread_getrusage() calls
correctly.
2) Introduce a new system call to handle pthread_getrusage() and leave
sys_getrusage() untouched.

We implemented the second idea above, simply because it avoids touching
any existing code. We have implemented a new syscall, thread_getrusage()
and we have exposed pthread_getrusage() API to applications.

Could you please share your thoughts on this? Does the approach look
alright? The code is hardly complete. It is just a prototype that works
on IA32 at the moment.

kernel patch :

signed-off by : Vinay Sridhar <[email protected]>
signed-off by : Sripathi Kodi <[email protected]>

diff -Nuarp linux-2.6.24-rc6_org/arch/x86/ia32/ia32entry.S linux-2.6.24-rc6/arch/x86/ia32/ia32entry.S
--- linux-2.6.24-rc6_org/arch/x86/ia32/ia32entry.S 2008-01-10 17:16:05.000000000 +0530
+++ linux-2.6.24-rc6/arch/x86/ia32/ia32entry.S 2008-01-14 15:54:54.000000000 +0530
@@ -726,4 +726,5 @@ ia32_sys_call_table:
.quad compat_sys_timerfd
.quad sys_eventfd
.quad sys32_fallocate
+ .quad sys_thread_getrusage /* 325 */
ia32_syscall_end:
diff -Nuarp linux-2.6.24-rc6_org/arch/x86/kernel/syscall_table_32.S linux-2.6.24-rc6/arch/x86/kernel/syscall_table_32.S
--- linux-2.6.24-rc6_org/arch/x86/kernel/syscall_table_32.S 2008-01-10 17:16:05.000000000 +0530
+++ linux-2.6.24-rc6/arch/x86/kernel/syscall_table_32.S 2008-01-14 15:54:17.000000000 +0530
@@ -324,3 +324,5 @@ ENTRY(sys_call_table)
.long sys_timerfd
.long sys_eventfd
.long sys_fallocate
+ .long sys_thread_getrusage /* 325 */
+
diff -Nuarp linux-2.6.24-rc6_org/include/asm-x86/unistd_32.h linux-2.6.24-rc6/include/asm-x86/unistd_32.h
--- linux-2.6.24-rc6_org/include/asm-x86/unistd_32.h 2008-01-10 17:16:13.000000000 +0530
+++ linux-2.6.24-rc6/include/asm-x86/unistd_32.h 2008-01-14 15:58:35.000000000 +0530
@@ -330,10 +330,11 @@
#define __NR_timerfd 322
#define __NR_eventfd 323
#define __NR_fallocate 324
+#define __NR_thread_getrusage 325

#ifdef __KERNEL__

-#define NR_syscalls 325
+#define NR_syscalls 326

#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
diff -Nuarp linux-2.6.24-rc6_org/include/linux/syscalls.h linux-2.6.24-rc6/include/linux/syscalls.h
--- linux-2.6.24-rc6_org/include/linux/syscalls.h 2008-01-10 17:16:15.000000000 +0530
+++ linux-2.6.24-rc6/include/linux/syscalls.h 2008-01-14 15:59:12.000000000 +0530
@@ -611,7 +611,7 @@ asmlinkage long sys_timerfd(int ufd, int
const struct itimerspec __user *utmr);
asmlinkage long sys_eventfd(unsigned int count);
asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);
-
+asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru);
int kernel_execve(const char *filename, char *const argv[], char *const envp[]);

#endif
diff -Nuarp linux-2.6.24-rc6_org/kernel/sys.c linux-2.6.24-rc6/kernel/sys.c
--- linux-2.6.24-rc6_org/kernel/sys.c 2008-01-10 17:16:10.000000000 +0530
+++ linux-2.6.24-rc6/kernel/sys.c 2008-01-17 11:00:18.000000000 +0530
@@ -33,6 +33,7 @@
#include <linux/task_io_accounting_ops.h>
#include <linux/seccomp.h>
#include <linux/cpu.h>
+#include <linux/sched.h>

#include <linux/compat.h>
#include <linux/syscalls.h>
@@ -1570,6 +1571,16 @@ static void k_getrusage(struct task_stru
}

switch (who) {
+ case RUSAGE_THREAD:
+ utime = p->utime;
+ stime = p->stime;
+ r->ru_nvcsw = p->nvcsw;
+ r->ru_nivcsw = p->nivcsw;
+ r->ru_minflt = p->min_flt;
+ r->ru_majflt = p->maj_flt;
+ r->ru_inblock = task_io_get_inblock(p);
+ r->ru_oublock = task_io_get_oublock(p);
+ break;
case RUSAGE_BOTH:
case RUSAGE_CHILDREN:
utime = p->signal->cutime;
@@ -1627,11 +1638,19 @@ int getrusage(struct task_struct *p, int

asmlinkage long sys_getrusage(int who, struct rusage __user *ru)
{
- if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
+ if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN &&
+ who != RUSAGE_THREAD)
return -EINVAL;
return getrusage(current, who, ru);
}

+asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru)
+{
+ struct task_struct *tsk;
+ tsk = find_task_by_pid(tid);
+ return getrusage(tsk, RUSAGE_THREAD, ru);
+}
+
asmlinkage long sys_umask(int mask)
{
mask = xchg(&current->fs->umask, mask & S_IRWXUGO);


glibc patch :

signed-off by : Vinay Sridhar <[email protected]>
signed-off by : Sripathi Kodi <[email protected]>

diff -Nuarp glibc_cvs_20sep.orig/abilist/libpthread.abilist glibc_cvs_20sep/abilist/libpthread.abilist
--- glibc_cvs_20sep.orig/abilist/libpthread.abilist 2008-01-16 11:31:32.000000000 +0530
+++ glibc_cvs_20sep/abilist/libpthread.abilist 2008-01-16 11:26:37.000000000 +0530
@@ -102,6 +102,7 @@ GLIBC_2.0 i.86-.*-linux.*/notls i.86-.*-
pthread_key_create F
pthread_key_delete F
pthread_kill F
+ pthread_getrusage F
pthread_kill_other_threads_np F
pthread_mutex_destroy F
pthread_mutex_init F
diff -Nuarp glibc_cvs_20sep.orig/nptl/Makefile glibc_cvs_20sep/nptl/Makefile
--- glibc_cvs_20sep.orig/nptl/Makefile 2008-01-16 11:31:32.000000000 +0530
+++ glibc_cvs_20sep/nptl/Makefile 2008-01-16 11:28:10.000000000 +0530
@@ -85,7 +85,7 @@ libpthread-routines = init vars events v
pthread_barrierattr_setpshared \
pthread_key_create pthread_key_delete \
pthread_getspecific pthread_setspecific \
- pthread_sigmask pthread_kill \
+ pthread_sigmask pthread_kill pthread_getrusage \
pthread_cancel pthread_testcancel \
pthread_setcancelstate pthread_setcanceltype \
pthread_once \
diff -Nuarp glibc_cvs_20sep.orig/nptl/sysdeps/unix/sysv/linux/pthread_getrusage.c glibc_cvs_20sep/nptl/sysdeps/unix/sysv/linux/pthread_getrusage.c
--- glibc_cvs_20sep.orig/nptl/sysdeps/unix/sysv/linux/pthread_getrusage.c 1970-01-01 05:30:00.000000000 +0530
+++ glibc_cvs_20sep/nptl/sysdeps/unix/sysv/linux/pthread_getrusage.c 2008-01-16 11:34:56.000000000 +0530
@@ -0,0 +1,29 @@
+#include <sys/resource.h>
+#include <errno.h>
+#include <pthreadP.h>
+#include <tls.h>
+#include <sysdep.h>
+#include <kernel-features.h>
+#define __NR_pthread_getrusage 325
+
+
+int
+__pthread_getrusage (threadid, usage)
+ pthread_t threadid;
+ struct rusage *usage;
+{
+ int val;
+ pid_t tid;
+ struct pthread *pd = (const struct pthread *) threadid;
+ tid = atomic_forced_read (pd->tid);
+ if (__builtin_expect (tid <= 0, 0))
+ /* Not a valid thread handle. */
+ return ESRCH;
+ INTERNAL_SYSCALL_DECL (err);
+ val = INTERNAL_SYSCALL (pthread_getrusage, err, 2, tid, usage);
+ return (INTERNAL_SYSCALL_ERROR_P (val, err)
+ ? INTERNAL_SYSCALL_ERRNO (val, err) : 0);
+}
+
+strong_alias (__pthread_getrusage, pthread_getrusage)
+
diff -Nuarp glibc_cvs_20sep.orig/nptl/Versions glibc_cvs_20sep/nptl/Versions
--- glibc_cvs_20sep.orig/nptl/Versions 2008-01-16 11:31:32.000000000 +0530
+++ glibc_cvs_20sep/nptl/Versions 2008-01-16 11:27:37.000000000 +0530
@@ -60,7 +60,7 @@ libpthread {
pthread_cancel; pthread_testcancel;
pthread_setcancelstate; pthread_setcanceltype;

- pthread_sigmask; pthread_kill;
+ pthread_sigmask; pthread_kill; pthread_getrusage;

pthread_key_create; pthread_key_delete;
pthread_getspecific; pthread_setspecific;


2008-01-17 15:43:36

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vinay Sridhar wrote:
> There are two ways to implement this in the kernel:
> 1) Introduce an additional parameter 'tid' to sys_getrusage() and put
> code in glibc to handle getrusage() and pthread_getrusage() calls
> correctly.
> 2) Introduce a new system call to handle pthread_getrusage() and leave
> sys_getrusage() untouched.

You're doing two things at once:

a) provide a way to get a thread's usage

b) provide a way to get another process's/thread's usage


The former is a trivial extension and I completely agree. RUSAGE_THREAD
is trivial to implement and should go in ASAP.

The second part isn't that easy. The first question is: do we really
need this? It is a new type of interface. We have the /proc filesystem
etc for programs which want to look at other process' data. Second,
more importantly right now, your patch seems not to include any security
support. Correct me if I'm wrong, but find_task_by_pid will always
succeed, regardless of whether the calling thread belongs to another UID
or not. I.e., your patch enables any process to read any other process'
usage. That's a no-no.


I suggest that you split the patch in two. The first should implement
RUSAGE_THREAD. You'll immediately get an ACK from me for that. The
second part then should introduce a way to get another process' usage.
This patch should only be used initially as a starting point for
discussions. You'll have to argue why it is necessary in the first place.

The argument might have to do with why you want a pthread_getrusage()
interface (which, btw, is a bad name since the interface is nothing like
getrusage, getrusage doesn't allow requesting any other process' data).
Yes, for intra-process lookups relying on /proc is no good idea. But
then, I have not seen any reason so far why such an API is needed and
why a thread cannot just be responsible for reading its own usage data.
Anyway, if pthread_getrusage (or whatever it'll be called) is the only
usage then the syscall should require that the TID parameter is from a
thread in the same process which would solve the security problem.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHj3do2ijCOnn/RHQRAiKdAKCSooiEWcxr780hJGenElyDiWPWKgCdE+6Y
j6ibmGsPT4aYxhSfpimSdiw=
=jOC9
-----END PGP SIGNATURE-----

2008-01-19 01:14:35

by Roland McGrath

[permalink] [raw]
Subject: [PATCH] RUSAGE_THREAD


This adds the RUSAGE_THREAD option for the getrusage system call.
Solaris calls this RUSAGE_LWP and uses the same value (1).
That name is not a natural one for Linux, but we keep it as an alias.

Signed-off-by: Roland McGrath <[email protected]>
---
include/linux/resource.h | 2 ++
kernel/sys.c | 31 ++++++++++++++++++++++---------
2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/linux/resource.h b/include/linux/resource.h
index ae13db7..02b3377 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -19,6 +19,8 @@ struct task_struct;
#define RUSAGE_SELF 0
#define RUSAGE_CHILDREN (-1)
#define RUSAGE_BOTH (-2) /* sys_wait4() uses this */
+#define RUSAGE_THREAD 1 /* only the calling thread */
+#define RUSAGE_LWP RUSAGE_THREAD /* Solaris name for same */

struct rusage {
struct timeval ru_utime; /* user time used */
diff --git a/kernel/sys.c b/kernel/sys.c
index d1fe71e..6a62bc4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1554,6 +1554,19 @@ out:
*
*/

+static void accumulate_thread_rusage(struct task_struct *t, struct rusage *r,
+ cputime_t *utimep, cputime_t *stimep)
+{
+ *utimep = cputime_add(*utimep, t->utime);
+ *stimep = cputime_add(*stimep, t->stime);
+ r->ru_nvcsw += t->nvcsw;
+ r->ru_nivcsw += t->nivcsw;
+ r->ru_minflt += t->min_flt;
+ r->ru_majflt += t->maj_flt;
+ r->ru_inblock += task_io_get_inblock(t);
+ r->ru_oublock += task_io_get_oublock(t);
+}
+
static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
{
struct task_struct *t;
@@ -1563,6 +1576,11 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;

+ if (who == RUSAGE_THREAD) {
+ accumulate_thread_rusage(p, r, &utime, &stime);
+ goto out;
+ }
+
rcu_read_lock();
if (!lock_task_sighand(p, &flags)) {
rcu_read_unlock();
@@ -1595,14 +1613,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_oublock += p->signal->oublock;
t = p;
do {
- utime = cputime_add(utime, t->utime);
- stime = cputime_add(stime, t->stime);
- r->ru_nvcsw += t->nvcsw;
- r->ru_nivcsw += t->nivcsw;
- r->ru_minflt += t->min_flt;
- r->ru_majflt += t->maj_flt;
- r->ru_inblock += task_io_get_inblock(t);
- r->ru_oublock += task_io_get_oublock(t);
+ accumulate_thread_rusage(t, r, &utime, &stime);
t = next_thread(t);
} while (t != p);
break;
@@ -1614,6 +1625,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
unlock_task_sighand(p, &flags);
rcu_read_unlock();

+out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
}
@@ -1627,7 +1639,8 @@ int getrusage(struct task_struct *p, int who, struct rusage __user *ru)

asmlinkage long sys_getrusage(int who, struct rusage __user *ru)
{
- if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
+ if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN &&
+ who != RUSAGE_THREAD)
return -EINVAL;
return getrusage(current, who, ru);
}

2008-01-19 01:14:49

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

I agree that RUSAGE_THREAD is fine. (In fact, if you'd pressed me to
remember without looking, I would have assumed we put it in already.)
However, in the implementation, I would keep it cleaner by moving the
identical code from inside the loop under case RUSAGE_SELF into a shared
subfunction, rather than duplicating it. In fact, here you go (next posting).

As to getting arbitrary other threads' data, there are several problems
there. Adding a syscall is often more trouble than it's worth. Ulrich
cited the issues with that as the API. You also didn't handle compat for
it correctly. To warrant the code necessary to make this available by
whatever API, I think you need to say some more about what it's needed for.

Off hand, it seems most in keeping with other things to expose this via a
/proc file, i.e. /proc/tgid/task/tid/rusage and (/proc/tgid/rusage for the
RUSAGE_SELF behavior on a foreign process). There we already have the
infrastructure for dealing with the security issues uniformly with how we
control other similar information. Personally I tend to prefer a binary
interface, i.e. a virtual file whose contents are struct rusage; for that
you still need to do the extra compat work, since a 32-bit process should
have the 32-bit struct rusage layout in its /proc files. If you put the
numbers into ascii text as some /proc interfaces do, you don't need any
special considerations for CONFIG_COMPAT.


Thanks,
Roland

2008-01-19 06:22:21

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] RUSAGE_THREAD

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roland McGrath wrote:
> +#define RUSAGE_LWP RUSAGE_THREAD /* Solaris name for same */

No need to clutter the kernel header with this, it'll be in the libc header.

Aside from that:

Acked-by: Ulrich Drepper <[email protected]>

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHkZbk2ijCOnn/RHQRAtohAKCyWgJsm20LSqxTznvff3LI8zplvgCgwttu
16eJFNgQXWNEk76b141uZvo=
=DzhA
-----END PGP SIGNATURE-----

2008-01-21 09:55:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] RUSAGE_THREAD

On Fri, Jan 18, 2008 at 05:14:18PM -0800, Roland McGrath wrote:
> +#define RUSAGE_LWP RUSAGE_THREAD /* Solaris name for same */

Please don't add this to the kernel header. If glibc really wants that
it can provide it in it's own headers.

Subject: Re: [PATCH] RUSAGE_THREAD

On Jan 19, 2008 2:14 AM, Roland McGrath <[email protected]> wrote:
>
> This adds the RUSAGE_THREAD option for the getrusage system call.
> Solaris calls this RUSAGE_LWP and uses the same value (1).
> That name is not a natural one for Linux, but we keep it as an alias.

Hey Roland,

Would you please CC at this address me on patches that change the
kernel-userland API.

Cheers,

Michael

2008-01-28 05:53:45

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

On Thu, 17 Jan 2008 13:57:05 +0530 Vinay Sridhar <[email protected]> wrote:

> Hi All,
>
> Last year, there was discussion about per-thread getrusage by adding
> RUSAGE_THREAD flag to getrusage(). Please refer to the thread
> http://lkml.org/lkml/2007/4/4/308. Ulrich had suggested that we should
> design a better user-space API. Specifically, we need a
> pthread_getrusage interface in the thread library, which accepts
> pthread_t, converts pthread_t into the corresponding tid and passes it
> down to the syscall.
>
> There are two ways to implement this in the kernel:
> 1) Introduce an additional parameter 'tid' to sys_getrusage() and put
> code in glibc to handle getrusage() and pthread_getrusage() calls
> correctly.
> 2) Introduce a new system call to handle pthread_getrusage() and leave
> sys_getrusage() untouched.
>
> We implemented the second idea above, simply because it avoids touching
> any existing code. We have implemented a new syscall, thread_getrusage()
> and we have exposed pthread_getrusage() API to applications.
>
> Could you please share your thoughts on this? Does the approach look
> alright? The code is hardly complete. It is just a prototype that works
> on IA32 at the moment.
>
> ...
>
> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru);

What happens if `tid' refers to a thread in a different pid namespace?

2008-01-28 07:50:13

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

Andrew Morton wrote:
> On Thu, 17 Jan 2008 13:57:05 +0530 Vinay Sridhar <[email protected]> wrote:
>
>> Hi All,
>>
>> Last year, there was discussion about per-thread getrusage by adding
>> RUSAGE_THREAD flag to getrusage(). Please refer to the thread
>> http://lkml.org/lkml/2007/4/4/308. Ulrich had suggested that we should
>> design a better user-space API. Specifically, we need a
>> pthread_getrusage interface in the thread library, which accepts
>> pthread_t, converts pthread_t into the corresponding tid and passes it
>> down to the syscall.
>>
>> There are two ways to implement this in the kernel:
>> 1) Introduce an additional parameter 'tid' to sys_getrusage() and put
>> code in glibc to handle getrusage() and pthread_getrusage() calls
>> correctly.
>> 2) Introduce a new system call to handle pthread_getrusage() and leave
>> sys_getrusage() untouched.
>>
>> We implemented the second idea above, simply because it avoids touching
>> any existing code. We have implemented a new syscall, thread_getrusage()
>> and we have exposed pthread_getrusage() API to applications.
>>
>> Could you please share your thoughts on this? Does the approach look
>> alright? The code is hardly complete. It is just a prototype that works
>> on IA32 at the moment.
>>
>> ...
>>
>> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru);
>
> What happens if `tid' refers to a thread in a different pid namespace?
>

That's impossible. I explicitly deny namespace creation in case the
CLONE_THREAD is specified. So all threads of a single process always
live in one pid namespace.

Thanks,
Pavel

2008-01-28 08:24:10

by Sripathi Kodi

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

Hi Andrew,

On Monday 28 January 2008 11:22, Andrew Morton wrote:
> On Thu, 17 Jan 2008 13:57:05 +0530 Vinay Sridhar
<[email protected]> wrote:
> > Hi All,
> >
> > Last year, there was discussion about per-thread getrusage by
> > adding RUSAGE_THREAD flag to getrusage(). Please refer to the
> > thread http://lkml.org/lkml/2007/4/4/308. Ulrich had suggested that
> > we should design a better user-space API. Specifically, we need a
> > pthread_getrusage interface in the thread library, which accepts
> > pthread_t, converts pthread_t into the corresponding tid and passes
> > it down to the syscall.
> >
> > There are two ways to implement this in the kernel:
> > 1) Introduce an additional parameter 'tid' to sys_getrusage() and
> > put code in glibc to handle getrusage() and pthread_getrusage()
> > calls correctly.
> > 2) Introduce a new system call to handle pthread_getrusage() and
> > leave sys_getrusage() untouched.
> >
> > We implemented the second idea above, simply because it avoids
> > touching any existing code. We have implemented a new syscall,
> > thread_getrusage() and we have exposed pthread_getrusage() API to
> > applications.
> >
> > Could you please share your thoughts on this? Does the approach
> > look alright? The code is hardly complete. It is just a prototype
> > that works on IA32 at the moment.
> >
> > ...
> >
> > +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user
> > *ru);
>
> What happens if `tid' refers to a thread in a different pid
> namespace?

The code was only meant to be a base for discussions. It surely needs
work. Our idea for the final version was to be able to read a thread's
rusage from another thread strictly within the same process. The idea
came from applications that need a cost enforcement mechanism. Having a
mechanism for a thread to read it's own usage is essential. If there is
a way to read other threads' rusage, it is even better.

Does Roland's patch (http://lkml.org/lkml/2008/1/18/589) look good to go
in, provided Ulrich's comment (http://lkml.org/lkml/2008/1/19/15) is
addressed?

Thanks,
Sripathi.

2008-01-28 09:11:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

On Mon, 28 Jan 2008 10:48:23 +0300 Pavel Emelyanov <[email protected]> wrote:

> Andrew Morton wrote:
> > On Thu, 17 Jan 2008 13:57:05 +0530 Vinay Sridhar <[email protected]> wrote:
> >
> >> Hi All,
> >>
> >> Last year, there was discussion about per-thread getrusage by adding
> >> RUSAGE_THREAD flag to getrusage(). Please refer to the thread
> >> http://lkml.org/lkml/2007/4/4/308. Ulrich had suggested that we should
> >> design a better user-space API. Specifically, we need a
> >> pthread_getrusage interface in the thread library, which accepts
> >> pthread_t, converts pthread_t into the corresponding tid and passes it
> >> down to the syscall.
> >>
> >> There are two ways to implement this in the kernel:
> >> 1) Introduce an additional parameter 'tid' to sys_getrusage() and put
> >> code in glibc to handle getrusage() and pthread_getrusage() calls
> >> correctly.
> >> 2) Introduce a new system call to handle pthread_getrusage() and leave
> >> sys_getrusage() untouched.
> >>
> >> We implemented the second idea above, simply because it avoids touching
> >> any existing code. We have implemented a new syscall, thread_getrusage()
> >> and we have exposed pthread_getrusage() API to applications.
> >>
> >> Could you please share your thoughts on this? Does the approach look
> >> alright? The code is hardly complete. It is just a prototype that works
> >> on IA32 at the moment.
> >>
> >> ...
> >>
> >> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru);
> >
> > What happens if `tid' refers to a thread in a different pid namespace?
> >
>
> That's impossible. I explicitly deny namespace creation in case the
> CLONE_THREAD is specified. So all threads of a single process always
> live in one pid namespace.
>

If the code was using find_task_by_vpid() then OK (I guess). But it is
looking the tids up in the init_pid_ns. Which I assume means that if it's
in a new namespace and is looking up a sibling thread it will simply fail?

Or am I missing something?

2008-01-28 09:23:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

On Mon, 28 Jan 2008 13:54:12 +0530 Sripathi Kodi <[email protected]> wrote:

> Does Roland's patch (http://lkml.org/lkml/2008/1/18/589) look good to go
> in, provided Ulrich's comment (http://lkml.org/lkml/2008/1/19/15) is
> addressed?

Sure, it looks sane - it avoids the problematic get_task_by_pid() too.

2008-01-28 09:46:18

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

Andrew Morton wrote:
> On Mon, 28 Jan 2008 10:48:23 +0300 Pavel Emelyanov <[email protected]> wrote:
>
>> Andrew Morton wrote:
>>> On Thu, 17 Jan 2008 13:57:05 +0530 Vinay Sridhar <[email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Last year, there was discussion about per-thread getrusage by adding
>>>> RUSAGE_THREAD flag to getrusage(). Please refer to the thread
>>>> http://lkml.org/lkml/2007/4/4/308. Ulrich had suggested that we should
>>>> design a better user-space API. Specifically, we need a
>>>> pthread_getrusage interface in the thread library, which accepts
>>>> pthread_t, converts pthread_t into the corresponding tid and passes it
>>>> down to the syscall.
>>>>
>>>> There are two ways to implement this in the kernel:
>>>> 1) Introduce an additional parameter 'tid' to sys_getrusage() and put
>>>> code in glibc to handle getrusage() and pthread_getrusage() calls
>>>> correctly.
>>>> 2) Introduce a new system call to handle pthread_getrusage() and leave
>>>> sys_getrusage() untouched.
>>>>
>>>> We implemented the second idea above, simply because it avoids touching
>>>> any existing code. We have implemented a new syscall, thread_getrusage()
>>>> and we have exposed pthread_getrusage() API to applications.
>>>>
>>>> Could you please share your thoughts on this? Does the approach look
>>>> alright? The code is hardly complete. It is just a prototype that works
>>>> on IA32 at the moment.
>>>>
>>>> ...
>>>>
>>>> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru);
>>> What happens if `tid' refers to a thread in a different pid namespace?
>>>
>> That's impossible. I explicitly deny namespace creation in case the
>> CLONE_THREAD is specified. So all threads of a single process always
>> live in one pid namespace.
>>
>
> If the code was using find_task_by_vpid() then OK (I guess). But it is

Yup, find_task_by_vpid() will find the proper (i.e. in your namespace) task.

> looking the tids up in the init_pid_ns. Which I assume means that if it's
> in a new namespace and is looking up a sibling thread it will simply fail?

If it looks in the init_pid_ns, then it can either fail or obtain a task
from different namespace. The find_task_by_pid_ns() was intended to be used
in proc mainly, to get tasks from the namespace pointed by the super-block
being explored.

Please excuse my lamentable ignorance, but which code does such things with
init_pid_ns? I followed the 'per-thread rusage' thread and didn't find any.

> Or am I missing something?
>

Thanks,
Pavel

2008-01-28 09:46:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

On Mon, 28 Jan 2008 12:38:17 +0300 Pavel Emelyanov <[email protected]> wrote:

> > If the code was using find_task_by_vpid() then OK (I guess). But it is
>
> Yup, find_task_by_vpid() will find the proper (i.e. in your namespace) task.
>
> > looking the tids up in the init_pid_ns. Which I assume means that if it's
> > in a new namespace and is looking up a sibling thread it will simply fail?
>
> If it looks in the init_pid_ns, then it can either fail or obtain a task
> from different namespace. The find_task_by_pid_ns() was intended to be used
> in proc mainly, to get tasks from the namespace pointed by the super-block
> being explored.
>
> Please excuse my lamentable ignorance, but which code does such things with
> init_pid_ns? I followed the 'per-thread rusage' thread and didn't find any.

From: Vinay Sridhar <[email protected]>
To: [email protected], [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: [RFC] Per-thread getrusage
...
+asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru)
+{
+ struct task_struct *tsk;
+ tsk = find_task_by_pid(tid);
+ return getrusage(tsk, RUSAGE_THREAD, ru);
+}

2008-01-28 10:04:15

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

Andrew Morton wrote:
> On Mon, 28 Jan 2008 12:38:17 +0300 Pavel Emelyanov <[email protected]> wrote:
>
>>> If the code was using find_task_by_vpid() then OK (I guess). But it is
>> Yup, find_task_by_vpid() will find the proper (i.e. in your namespace) task.
>>
>>> looking the tids up in the init_pid_ns. Which I assume means that if it's
>>> in a new namespace and is looking up a sibling thread it will simply fail?
>> If it looks in the init_pid_ns, then it can either fail or obtain a task
>> from different namespace. The find_task_by_pid_ns() was intended to be used
>> in proc mainly, to get tasks from the namespace pointed by the super-block
>> being explored.
>>
>> Please excuse my lamentable ignorance, but which code does such things with
>> init_pid_ns? I followed the 'per-thread rusage' thread and didn't find any.
>
> From: Vinay Sridhar <[email protected]>
> To: [email protected], [email protected]
> Cc: [email protected], [email protected], [email protected], [email protected]
> Subject: [RFC] Per-thread getrusage

Ouch. Thanks, I've missed that and looked just at the Roland's patch :(

> ...
> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru)
> +{
> + struct task_struct *tsk;
> + tsk = find_task_by_pid(tid);
> + return getrusage(tsk, RUSAGE_THREAD, ru);
> +}

Well, the find_task_by_pid() is really wrong here.

Besides (just in case this system call is going to be developed further),
the tsk == NULL case is not checked inside the getrusage and may OOPS
even if the proper namespace is used.

Thanks,
Pavel

2008-01-28 20:45:36

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

Pavel Emelyanov <[email protected]> writes:
>> ...
>> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru)
>> +{
>> + struct task_struct *tsk;
>> + tsk = find_task_by_pid(tid);
>> + return getrusage(tsk, RUSAGE_THREAD, ru);
>> +}
>
> Well, the find_task_by_pid() is really wrong here.

And find_task_by_pid should probably just be removed.

No need to provide function with the gun firmly pointed at our feet....

Eric

2008-01-28 21:58:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

On Mon, 28 Jan 2008 13:43:02 -0700
[email protected] (Eric W. Biederman) wrote:

> Pavel Emelyanov <[email protected]> writes:
> >> ...
> >> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru)
> >> +{
> >> + struct task_struct *tsk;
> >> + tsk = find_task_by_pid(tid);
> >> + return getrusage(tsk, RUSAGE_THREAD, ru);
> >> +}
> >
> > Well, the find_task_by_pid() is really wrong here.
>
> And find_task_by_pid should probably just be removed.

That's what I was thinking.

> No need to provide function with the gun firmly pointed at our feet....

It still has a disturbingly large number of callers.

2008-01-29 08:24:55

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

Andrew Morton wrote:
> On Mon, 28 Jan 2008 13:43:02 -0700
> [email protected] (Eric W. Biederman) wrote:
>
>> Pavel Emelyanov <[email protected]> writes:
>>>> ...
>>>> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru)
>>>> +{
>>>> + struct task_struct *tsk;
>>>> + tsk = find_task_by_pid(tid);
>>>> + return getrusage(tsk, RUSAGE_THREAD, ru);
>>>> +}
>>> Well, the find_task_by_pid() is really wrong here.
>> And find_task_by_pid should probably just be removed.
>
> That's what I was thinking.

find_task_by_pid and find_pid are to be removed, but this task
heavily depends on others.

E.g. to drop the find_pid() we need to kill the kill_proc()
function, which in turn depends on turning the usbatm, nfs and
lockd code into kthread API. We're currently working on this.

>> No need to provide function with the gun firmly pointed at our feet....
>
> It still has a disturbingly large number of callers.

Yes, but unfortunately simple conversion from find_xxx_pid into
find_xxx_vpid is not possible - each case is special.

Thanks,
Pavel

2008-01-29 08:31:07

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] Per-thread getrusage

Eric W. Biederman wrote:
> Pavel Emelyanov <[email protected]> writes:
>>> ...
>>> +asmlinkage long sys_thread_getrusage(int tid, struct rusage __user *ru)
>>> +{
>>> + struct task_struct *tsk;
>>> + tsk = find_task_by_pid(tid);
>>> + return getrusage(tsk, RUSAGE_THREAD, ru);
>>> +}
>> Well, the find_task_by_pid() is really wrong here.
>
> And find_task_by_pid should probably just be removed.
>
> No need to provide function with the gun firmly pointed at our feet....

We are working to uncock it. If you feel you know how to do it
faster, it would be just terrific to review your patches.

> Eric
>