2007-08-17 22:22:41

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

Neil Horman wrote:
>
> +static int proc_pid_limits(struct task_struct *task, char *buffer)
> +{
> + unsigned int i;
> + int count = 0;
> + char *bufptr = buffer;
> +
> + struct rlimit rlim[RLIM_NLIMITS];
> +
> + read_lock(&tasklist_lock);
> + memcpy(rlim, task->signal->rlim, sizeof(struct rlimit) * RLIM_NLIMITS);
> + read_unlock(&tasklist_lock);

Please don't re-introduce tasklist_lock unless strictly needed. And in this case
it doesn't help, sys_getrlimit() changes ->rlim[] under task_lock().

Hovewer, I think the whole patch is not right. The "tsk" itself is pinned, but its
->signal is not stable and can be == NULL.

You can use lock_task_sighand() to access ->signal.

Oleg.


2007-08-18 11:59:08

by Neil Horman

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

On Sat, Aug 18, 2007 at 02:22:28AM +0400, Oleg Nesterov wrote:
> Neil Horman wrote:
> >
> > +static int proc_pid_limits(struct task_struct *task, char *buffer)
> > +{
> > + unsigned int i;
> > + int count = 0;
> > + char *bufptr = buffer;
> > +
> > + struct rlimit rlim[RLIM_NLIMITS];
> > +
> > + read_lock(&tasklist_lock);
> > + memcpy(rlim, task->signal->rlim, sizeof(struct rlimit) * RLIM_NLIMITS);
> > + read_unlock(&tasklist_lock);
>
> Please don't re-introduce tasklist_lock unless strictly needed. And in this case
> it doesn't help, sys_getrlimit() changes ->rlim[] under task_lock().
>
> Hovewer, I think the whole patch is not right. The "tsk" itself is pinned, but its
> ->signal is not stable and can be == NULL.
>
> You can use lock_task_sighand() to access ->signal.
>
You're right about the use of task_lock rather than tasklist_lock in getrlimit,
but the comment from lock_task_sighand indicates that its use is predicated on
the prerequisite of locking tasklist_lock, so I think the situation is not that
much of an issue. From what I see the use of lock_task_sighand is used when
modifying values in the signal struct, not when removing it entirely (IIRC it
needs to exist until such time as all sharing processes exit. The fact that we
have an outstanding task struct we are using here guarantees its continued
existence). Since we are only reading signal->rlimit, which is only written to
from sys_setrlimit, we should be safe from corrupted limit reads, which at worst
would cause an erroneous transient data read, rather than any sort of
panic/crash.

Regards
Neil

> Oleg.

--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/

2007-08-18 12:56:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

On 08/18, Neil Horman wrote:
>
> On Sat, Aug 18, 2007 at 02:22:28AM +0400, Oleg Nesterov wrote:
> > Neil Horman wrote:
> > >
> > > +static int proc_pid_limits(struct task_struct *task, char *buffer)
> > > +{
> > > + unsigned int i;
> > > + int count = 0;
> > > + char *bufptr = buffer;
> > > +
> > > + struct rlimit rlim[RLIM_NLIMITS];
> > > +
> > > + read_lock(&tasklist_lock);
> > > + memcpy(rlim, task->signal->rlim, sizeof(struct rlimit) * RLIM_NLIMITS);
> > > + read_unlock(&tasklist_lock);
> >
> > Please don't re-introduce tasklist_lock unless strictly needed. And in this case
> > it doesn't help, sys_getrlimit() changes ->rlim[] under task_lock().
> >
> > Hovewer, I think the whole patch is not right. The "tsk" itself is pinned, but its
> > ->signal is not stable and can be == NULL.
> >
> > You can use lock_task_sighand() to access ->signal.
> >
> You're right about the use of task_lock rather than tasklist_lock in getrlimit,
> but the comment from lock_task_sighand indicates that its use is predicated on
> the prerequisite of locking tasklist_lock,

or rcu_read_lock(),

> so I think the situation is not that
> much of an issue. From what I see the use of lock_task_sighand is used when
> modifying values in the signal struct, not when removing it entirely (IIRC it
> needs to exist until such time as all sharing processes exit.

yes, it needs to exist until the whole process exits, but no, __exit_signal()
sets ->signal == NULL under sighand->siglock. This btw happens per thread.

> The fact that we
> have an outstanding task struct we are using here guarantees its continued
> existence).

No. proc_info_read() finds a "pid_alive()" task with a valid signal, and bumps
its ->usage. But nothing prevent this thread from exiting (in fact, it may be
already dead), after that the parent can reap that task, or it can reap itself
if it was detached thread.

This means that proc_read() can assume nothing about the task, except that its
task_struct can't disappear.

> Since we are only reading signal->rlimit, which is only written to
> from sys_setrlimit, we should be safe from corrupted limit reads, which at worst
> would cause an erroneous transient data read, rather than any sort of
> panic/crash.

->signal == NULL leads to panic().

Oleg.

2007-08-18 18:56:22

by Neil Horman

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree


Ok, I think I see your point. Thanks for the input. New patch attached which
adds the use of the sighand lock to the patch.


Currently, there exists no method for a process to query the resource
limits of another process. They can be inferred via some mechanisms but
they cannot be explicitly determined. Given that this information can be
usefull to know during the debugging of an application, I've written this
patch which exports all of a processes limits via /proc/<pid>/limits.

Regards
Neil

Signed-off-by: Neil Horman <[email protected]>


base.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)



diff --git a/fs/proc/base.c b/fs/proc/base.c
index ed2b224..86130b0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -74,6 +74,7 @@
#include <linux/nsproxy.h>
#include <linux/oom.h>
#include <linux/elf.h>
+#include <linux/resource.h>
#include "internal.h"

/* NOTE:
@@ -323,6 +324,80 @@ static int proc_oom_score(struct task_struct *task, char *buffer)
return sprintf(buffer, "%lu\n", points);
}

+struct limit_names {
+ char *name;
+ char *unit;
+};
+
+static const struct limit_names lnames[RLIM_NLIMITS] = {
+ [RLIMIT_CPU] = {"Max cpu time", "ms"},
+ [RLIMIT_FSIZE] = {"Max file size", "bytes"},
+ [RLIMIT_DATA] = {"Max data size", "bytes"},
+ [RLIMIT_STACK] = {"Max stack size", "bytes"},
+ [RLIMIT_CORE] = {"Max core file size", "bytes"},
+ [RLIMIT_RSS] = {"Max resident set", "bytes"},
+ [RLIMIT_NPROC] = {"Max processes", "processes"},
+ [RLIMIT_NOFILE] = {"Max open files", "files"},
+ [RLIMIT_MEMLOCK] = {"Max locked memory", "bytes"},
+ [RLIMIT_AS] = {"Max address space", "bytes"},
+ [RLIMIT_LOCKS] = {"Max file locks", "locks"},
+ [RLIMIT_SIGPENDING] = {"Max pending signals", "signals"},
+ [RLIMIT_MSGQUEUE] = {"Max msgqueue size", "bytes"},
+ [RLIMIT_NICE] = {"Max nice priority", NULL},
+ [RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
+};
+
+/* Display limits for a process */
+static int proc_pid_limits(struct task_struct *task, char *buffer)
+{
+ unsigned int i;
+ int count = 0;
+ unsigned long flags;
+ char *bufptr = buffer;
+
+ struct rlimit rlim[RLIM_NLIMITS];
+
+ read_lock(&tasklist_lock);
+ lock_task_sighand(task, &flags);
+ if (task->signal == NULL){
+ unlock_task_sighand(task, &flags);
+ read_unlock(&tasklist_lock);
+ return 0;
+ }
+ memcpy(rlim, task->signal->rlim, sizeof(struct rlimit) * RLIM_NLIMITS);
+ unlock_task_sighand(task, &flags);
+ read_unlock(&tasklist_lock);
+
+ /*
+ * print the file header
+ */
+ count += sprintf(&bufptr[count], "%-25s %-20s %-20s %-10s\n",
+ "Limit", "Soft Limit", "Hard Limit", "Units");
+
+ for (i = 0; i < RLIM_NLIMITS; i++) {
+ if (rlim[i].rlim_cur == RLIM_INFINITY)
+ count += sprintf(&bufptr[count], "%-25s %-20s ",
+ lnames[i].name, "unlimited");
+ else
+ count += sprintf(&bufptr[count], "%-25s %-20lu ",
+ lnames[i].name, rlim[i].rlim_cur);
+
+ if (rlim[i].rlim_max == RLIM_INFINITY)
+ count += sprintf(&bufptr[count], "%-20s ", "unlimited");
+ else
+ count += sprintf(&bufptr[count], "%-20lu ",
+ rlim[i].rlim_max);
+
+ if (lnames[i].unit)
+ count += sprintf(&bufptr[count], "%-10s\n",
+ lnames[i].unit);
+ else
+ count += sprintf(&bufptr[count], "\n");
+ }
+
+ return count;
+}
+
/************************************************************************/
/* Here the fs part begins */
/************************************************************************/
@@ -2017,6 +2092,7 @@ static const struct pid_entry tgid_base_stuff[] = {
INF("environ", S_IRUSR, pid_environ),
INF("auxv", S_IRUSR, pid_auxv),
INF("status", S_IRUGO, pid_status),
+ INF("limits", S_IRUSR, pid_limits),
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, pid_sched),
#endif
@@ -2310,6 +2386,7 @@ static const struct pid_entry tid_base_stuff[] = {
INF("environ", S_IRUSR, pid_environ),
INF("auxv", S_IRUSR, pid_auxv),
INF("status", S_IRUGO, pid_status),
+ INF("limits", S_IRUSR, pid_limits),
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, pid_sched),
#endif

--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/

2007-08-18 19:03:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

On 08/18, Neil Horman wrote:
>
> +static int proc_pid_limits(struct task_struct *task, char *buffer)
> +{
> + unsigned int i;
> + int count = 0;
> + unsigned long flags;
> + char *bufptr = buffer;
> +
> + struct rlimit rlim[RLIM_NLIMITS];
> +
> + read_lock(&tasklist_lock);
> + lock_task_sighand(task, &flags);
> + if (task->signal == NULL){
> + unlock_task_sighand(task, &flags);
> + read_unlock(&tasklist_lock);

Neil, please, don't add tasklist_lock again. It was not easy to wipe it from
fs/proc/ :) Just change this code to use rcu_read_lock().

Oleg.

2007-08-18 23:24:41

by Neil Horman

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

>
> Neil, please, don't add tasklist_lock again. It was not easy to wipe it from
> fs/proc/ :) Just change this code to use rcu_read_lock().
>

Ok, done/tested. Thanks!

Currently, there exists no method for a process to query the resource
limits of another process. They can be inferred via some mechanisms but they
cannot be explicitly determined. Given that this information can be usefull to
know during the debugging of an application, I've written this patch which
exports all of a processes limits via /proc/<pid>/limits. Tested successfully
by myself on x86 on top of 2.6.23-rc2-mm1.


Signed-off-by: Neil Horman <[email protected]>


base.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)


diff --git a/fs/proc/base.c b/fs/proc/base.c
index ed2b224..86130b0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -74,6 +74,7 @@
#include <linux/nsproxy.h>
#include <linux/oom.h>
#include <linux/elf.h>
+#include <linux/resource.h>
#include "internal.h"

/* NOTE:
@@ -323,6 +324,80 @@ static int proc_oom_score(struct task_struct *task, char *buffer)
return sprintf(buffer, "%lu\n", points);
}

+struct limit_names {
+ char *name;
+ char *unit;
+};
+
+static const struct limit_names lnames[RLIM_NLIMITS] = {
+ [RLIMIT_CPU] = {"Max cpu time", "ms"},
+ [RLIMIT_FSIZE] = {"Max file size", "bytes"},
+ [RLIMIT_DATA] = {"Max data size", "bytes"},
+ [RLIMIT_STACK] = {"Max stack size", "bytes"},
+ [RLIMIT_CORE] = {"Max core file size", "bytes"},
+ [RLIMIT_RSS] = {"Max resident set", "bytes"},
+ [RLIMIT_NPROC] = {"Max processes", "processes"},
+ [RLIMIT_NOFILE] = {"Max open files", "files"},
+ [RLIMIT_MEMLOCK] = {"Max locked memory", "bytes"},
+ [RLIMIT_AS] = {"Max address space", "bytes"},
+ [RLIMIT_LOCKS] = {"Max file locks", "locks"},
+ [RLIMIT_SIGPENDING] = {"Max pending signals", "signals"},
+ [RLIMIT_MSGQUEUE] = {"Max msgqueue size", "bytes"},
+ [RLIMIT_NICE] = {"Max nice priority", NULL},
+ [RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
+};
+
+/* Display limits for a process */
+static int proc_pid_limits(struct task_struct *task, char *buffer)
+{
+ unsigned int i;
+ int count = 0;
+ unsigned long flags;
+ char *bufptr = buffer;
+
+ struct rlimit rlim[RLIM_NLIMITS];
+
+ rcu_read_lock();
+ lock_task_sighand(task,&flags);
+ if (task->signal == NULL){
+ unlock_task_sighand(task, &flags);
+ rcu_read_unlock();
+ return 0;
+ }
+ memcpy(rlim, task->signal->rlim, sizeof(struct rlimit) * RLIM_NLIMITS);
+ unlock_task_sighand(task, &flags);
+ rcu_read_unlock();
+
+ /*
+ * print the file header
+ */
+ count += sprintf(&bufptr[count], "%-25s %-20s %-20s %-10s\n",
+ "Limit", "Soft Limit", "Hard Limit", "Units");
+
+ for (i = 0; i < RLIM_NLIMITS; i++) {
+ if (rlim[i].rlim_cur == RLIM_INFINITY)
+ count += sprintf(&bufptr[count], "%-25s %-20s ",
+ lnames[i].name, "unlimited");
+ else
+ count += sprintf(&bufptr[count], "%-25s %-20lu ",
+ lnames[i].name, rlim[i].rlim_cur);
+
+ if (rlim[i].rlim_max == RLIM_INFINITY)
+ count += sprintf(&bufptr[count], "%-20s ", "unlimited");
+ else
+ count += sprintf(&bufptr[count], "%-20lu ",
+ rlim[i].rlim_max);
+
+ if (lnames[i].unit)
+ count += sprintf(&bufptr[count], "%-10s\n",
+ lnames[i].unit);
+ else
+ count += sprintf(&bufptr[count], "\n");
+ }
+
+ return count;
+}
+
/************************************************************************/
/* Here the fs part begins */
/************************************************************************/
@@ -2017,6 +2092,7 @@ static const struct pid_entry tgid_base_stuff[] = {
INF("environ", S_IRUSR, pid_environ),
INF("auxv", S_IRUSR, pid_auxv),
INF("status", S_IRUGO, pid_status),
+ INF("limits", S_IRUSR, pid_limits),
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, pid_sched),
#endif
@@ -2310,6 +2386,7 @@ static const struct pid_entry tid_base_stuff[] = {
INF("environ", S_IRUSR, pid_environ),
INF("auxv", S_IRUSR, pid_auxv),
INF("status", S_IRUGO, pid_status),
+ INF("limits", S_IRUSR, pid_limits),
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, pid_sched),
#endif
--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/

2007-08-19 08:57:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

On 08/18, Neil Horman wrote:
>
> +static int proc_pid_limits(struct task_struct *task, char *buffer)
> +{
> + unsigned int i;
> + int count = 0;
> + unsigned long flags;
> + char *bufptr = buffer;
> +
> + struct rlimit rlim[RLIM_NLIMITS];
> +
> + rcu_read_lock();
> + lock_task_sighand(task,&flags);
> + if (task->signal == NULL){
> + unlock_task_sighand(task, &flags);
> + rcu_read_unlock();
> + return 0;
> + }
> + memcpy(rlim, task->signal->rlim, sizeof(struct rlimit) * RLIM_NLIMITS);
> + unlock_task_sighand(task, &flags);
> + rcu_read_unlock();

No, no. If lock_task_sighand() fails, ->signal _should be_ == NULL, but the
"if (task->signal == NULL)" check is not reliable, we don't have any barriers
to serialize with __exit_signal().

More importantly, it is very wrong to do unlock_task_sighand() in that case,
this means NULL pointer dereference.

What we need is:

rcu_read_lock();
if (!lock_task_sighand(task, &flags)) {
rcu_read_unlock();
return 0;
}
memcpy(rlim, task->signal->rlim, sizeof(rlim));
unlock_task_sighand(task, &flags);
rcu_read_unlock();

No need to check ->signal != NULL if lock_task_sighand() succeeds.

Oleg.

2007-08-19 13:04:14

by Neil Horman

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

Copy That

Currently, there exists no method for a process to query the resource
limits of another process. They can be inferred via some mechanisms but
they cannot be explicitly determined. Given that this information can be
usefull to know during the debugging of an application, I've written this
patch which exports all of a processes limits via /proc/<pid>/limits.


Signed-off-by: Neil Horman <[email protected]>


base.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)


diff --git a/fs/proc/base.c b/fs/proc/base.c
index ed2b224..2eb197f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -74,6 +74,7 @@
#include <linux/nsproxy.h>
#include <linux/oom.h>
#include <linux/elf.h>
+#include <linux/resource.h>
#include "internal.h"

/* NOTE:
@@ -323,6 +324,78 @@ static int proc_oom_score(struct task_struct *task, char *buffer)
return sprintf(buffer, "%lu\n", points);
}

+struct limit_names {
+ char *name;
+ char *unit;
+};
+
+static const struct limit_names lnames[RLIM_NLIMITS] = {
+ [RLIMIT_CPU] = {"Max cpu time", "ms"},
+ [RLIMIT_FSIZE] = {"Max file size", "bytes"},
+ [RLIMIT_DATA] = {"Max data size", "bytes"},
+ [RLIMIT_STACK] = {"Max stack size", "bytes"},
+ [RLIMIT_CORE] = {"Max core file size", "bytes"},
+ [RLIMIT_RSS] = {"Max resident set", "bytes"},
+ [RLIMIT_NPROC] = {"Max processes", "processes"},
+ [RLIMIT_NOFILE] = {"Max open files", "files"},
+ [RLIMIT_MEMLOCK] = {"Max locked memory", "bytes"},
+ [RLIMIT_AS] = {"Max address space", "bytes"},
+ [RLIMIT_LOCKS] = {"Max file locks", "locks"},
+ [RLIMIT_SIGPENDING] = {"Max pending signals", "signals"},
+ [RLIMIT_MSGQUEUE] = {"Max msgqueue size", "bytes"},
+ [RLIMIT_NICE] = {"Max nice priority", NULL},
+ [RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
+};
+
+/* Display limits for a process */
+static int proc_pid_limits(struct task_struct *task, char *buffer)
+{
+ unsigned int i;
+ int count = 0;
+ unsigned long flags;
+ char *bufptr = buffer;
+
+ struct rlimit rlim[RLIM_NLIMITS];
+
+ rcu_read_lock();
+ if (!lock_task_sighand(task,&flags)) {
+ rcu_read_unlock();
+ return 0;
+ }
+ memcpy(rlim, task->signal->rlim, sizeof(struct rlimit) * RLIM_NLIMITS);
+ unlock_task_sighand(task, &flags);
+ rcu_read_unlock();
+
+ /*
+ * print the file header
+ */
+ count += sprintf(&bufptr[count], "%-25s %-20s %-20s %-10s\n",
+ "Limit", "Soft Limit", "Hard Limit", "Units");
+
+ for (i = 0; i < RLIM_NLIMITS; i++) {
+ if (rlim[i].rlim_cur == RLIM_INFINITY)
+ count += sprintf(&bufptr[count], "%-25s %-20s ",
+ lnames[i].name, "unlimited");
+ else
+ count += sprintf(&bufptr[count], "%-25s %-20lu ",
+ lnames[i].name, rlim[i].rlim_cur);
+
+ if (rlim[i].rlim_max == RLIM_INFINITY)
+ count += sprintf(&bufptr[count], "%-20s ", "unlimited");
+ else
+ count += sprintf(&bufptr[count], "%-20lu ",
+ rlim[i].rlim_max);
+
+ if (lnames[i].unit)
+ count += sprintf(&bufptr[count], "%-10s\n",
+ lnames[i].unit);
+ else
+ count += sprintf(&bufptr[count], "\n");
+ }
+
+ return count;
+}
+
/************************************************************************/
/* Here the fs part begins */
/************************************************************************/
@@ -2017,6 +2090,7 @@ static const struct pid_entry tgid_base_stuff[] = {
INF("environ", S_IRUSR, pid_environ),
INF("auxv", S_IRUSR, pid_auxv),
INF("status", S_IRUGO, pid_status),
+ INF("limits", S_IRUSR, pid_limits),
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, pid_sched),
#endif
@@ -2310,6 +2384,7 @@ static const struct pid_entry tid_base_stuff[] = {
INF("environ", S_IRUSR, pid_environ),
INF("auxv", S_IRUSR, pid_auxv),
INF("status", S_IRUGO, pid_status),
+ INF("limits", S_IRUSR, pid_limits),
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, pid_sched),
#endif
--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/

2007-08-19 15:17:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + proc-export-a-processes-resource-limits-via-proc-pid.patch added to -mm tree

On 08/19, Neil Horman wrote:
> Copy That
>
> Currently, there exists no method for a process to query the resource
> limits of another process. They can be inferred via some mechanisms but
> they cannot be explicitly determined. Given that this information can be
> usefull to know during the debugging of an application, I've written this
> patch which exports all of a processes limits via /proc/<pid>/limits.

Looks good, thanks.

Oleg.