2008-12-30 11:15:45

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH for -mm] getrusage: fill ru_maxrss value

Hi

Oleg, Jiri, this is my getrusage testcase and proposal patch.
Could you please review it?



Changes Jiris's last version
- At wait_task_zombie(), parent process doesn't only collect child maxrss,
but also cmaxrss.
- ru_maxrss inherit at exec()
- style fixes.

Applied after: introduce-get_mm_hiwater_xxx-fix-taskstats-hiwater_xxx-accounting.patch
==
From: Signed-off-by: Jiri Pirko <[email protected]>
Subject: [PATCH for -mm] getrusage: fill ru_maxrss value

This patch makes ->ru_maxrss value in struct rusage filled accordingly to
rss hiwater mark. This struct is filled as a parameter to
getrusage syscall. ->ru_maxrss value is set to KBs which is the way it
is done in BSD systems. /usr/bin/time (gnu time) application converts
->ru_maxrss to KBs which seems to be incorrect behavior. Maintainer of
this util was notified by me with the patch which corrects it and cc'ed.

To make this happen we extend struct signal_struct by two fields. The
first one is ->maxrss which we use to store rss hiwater of the task. The
second one is ->cmaxrss which we use to store highest rss hiwater of all
task childs. These values are used in k_getrusage() to actually fill
->ru_maxrss. k_getrusage() uses current rss hiwater value directly
if mm struct exists.

Note:
exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
it is intetionally behavior. *BSD getrusage have exec() inheriting.


Test progmam and test case
===========================

getrusage.c
----
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");
}

int main(int argc, char** argv)
{
int status;
int c;
int need_sleep_before_wait = 0;
int consume_large_memory_at_first = 0;
int create_child_at_first = 0;
int sigign = 0;
int create_child_before_exec = 0;
int after_fork_test = 0;

while ((c = getopt(argc, argv, "ceflsz")) != -1) {
switch (c) {
case 'c':
create_child_at_first = 1;
break;
case 'e':
create_child_before_exec = 1;
break;
case 'f':
after_fork_test = 1;
break;
case 'l':
consume_large_memory_at_first = 1;
break;
case 's':
sigign = 1;
break;
case 'z':
need_sleep_before_wait = 1;
break;
default:
break;
}
}

if (consume_large_memory_at_first)
consume(100);

if (create_child_at_first)
system("./child -q");

if (sigign)
signal(SIGCHLD, SIG_IGN);

if (fork()) {
usleep(1);
if (need_sleep_before_wait)
sleep(3); /* children become zombie */
show_rusage("pre_wait");
wait(&status);
show_rusage("post_wait");
} else {
usleep(1);
show_rusage("fork");

if (after_fork_test) {
consume(30);
show_rusage("fork2");
}
if (create_child_before_exec) {
system("./child -lq");
usleep(1);
show_rusage("fork3");
}

execl("./child", "child", 0);
exit(0);
}

return 0;
}

child.c
----
#include <sys/types.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");

}


int main(int argc, char** argv)
{
int status;
int c;
int silent = 0;
int light_weight = 0;

while ((c = getopt(argc, argv, "lq")) != -1) {
switch (c) {
case 'l':
light_weight = 1;
break;
case 'q':
silent = 1;
break;
default:
break;
}
}

if (!silent)
show_rusage("exec");

if (fork()) {
if (light_weight)
consume(400);
else
consume(700);
wait(&status);
} else {
if (light_weight)
consume(600);
else
consume(900);

exit(0);
}

return 0;
}

testcase
==================
1. inherit fork?

test way:
% ./getrusage -lc

bsd result:
fork line is "fork: self 0 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

2. inherit exec?

test way:
% ./getrusage -lce

bsd result:
fork3: self 103204 children 60000
exec: self 103204 children 60000

fork3 and exec line are the same.

-> rusage shold be inherit by exec.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

3. getrusage(RUSAGE_CHILDREN) collect grandchild statics?

test way:
% ./getrusage

bsd result:
post_wait line is about "post_wait: self 0 children 90000".

-> RUSAGE_CHILDREN can collect grandchild.

4. zombie, but not waited children collect or not?

test way:
% ./getrusage -z

bsd result:
pre_wait line is "pre_wait: self 0 children 0".

-> zombie child process (not waited-for child process)
isn't accounted.

5. SIG_IGN collect or not

test way:
% ./getrusage -s

bsd result:
post_wait line is "post_wait: self 0 children 0".

-> if SIGCHLD is ignored, children isn't accounted.

6. fork and malloc
test way:
% ./getrusage -lcf

bsd result:
fork line is "fork: self 0 children 0".
fork2 line is about "fork: self 130000 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)
but additional memory cunsumption cause right
maxrss calculation.


Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
fs/exec.c | 5 +++++
include/linux/sched.h | 1 +
kernel/exit.c | 6 ++++++
kernel/fork.c | 1 +
kernel/sys.c | 15 +++++++++++++++
5 files changed, 28 insertions(+)

Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h 2008-12-29 23:27:59.000000000 +0900
+++ b/include/linux/sched.h 2008-12-30 03:25:23.000000000 +0900
@@ -562,6 +562,7 @@ struct signal_struct {
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
+ unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

/*
Index: b/kernel/exit.c
===================================================================
--- a/kernel/exit.c 2008-12-29 23:27:59.000000000 +0900
+++ b/kernel/exit.c 2008-12-30 17:35:51.000000000 +0900
@@ -1053,6 +1053,10 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm) {
+ unsigned long maxrss = get_mm_hiwater_rss(tsk->mm);
+ tsk->signal->maxrss = max(maxrss, tsk->signal->maxrss);
+ }
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1349,6 +1353,8 @@ static int wait_task_zombie(struct task_
psig->coublock +=
task_io_get_oublock(p) +
sig->oublock + sig->coublock;
+ psig->cmaxrss = max(max(sig->maxrss, sig->cmaxrss),
+ psig->cmaxrss);
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
spin_unlock_irq(&p->parent->sighand->siglock);
Index: b/kernel/fork.c
===================================================================
--- a/kernel/fork.c 2008-12-25 08:26:37.000000000 +0900
+++ b/kernel/fork.c 2008-12-30 03:48:09.000000000 +0900
@@ -849,6 +849,7 @@ static int copy_signal(unsigned long clo
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
+ sig->maxrss = sig->cmaxrss = 0;
task_io_accounting_init(&sig->ioac);
taskstats_tgid_init(sig);

Index: b/kernel/sys.c
===================================================================
--- a/kernel/sys.c 2008-12-25 08:26:37.000000000 +0900
+++ b/kernel/sys.c 2008-12-30 04:04:18.000000000 +0900
@@ -1569,6 +1569,7 @@ static void k_getrusage(struct task_stru
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ r->ru_maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1583,6 +1584,8 @@ static void k_getrusage(struct task_stru
r->ru_majflt += p->signal->maj_flt;
r->ru_inblock += p->signal->inblock;
r->ru_oublock += p->signal->oublock;
+ if (r->ru_maxrss < p->signal->maxrss)
+ r->ru_maxrss = p->signal->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
@@ -1598,6 +1601,18 @@ static void k_getrusage(struct task_stru
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ struct mm_struct *mm = get_task_mm(p);
+ if (mm) {
+ unsigned long maxrss = get_mm_hiwater_rss(mm);
+
+ if (r->ru_maxrss < maxrss)
+ r->ru_maxrss = maxrss;
+ mmput(mm);
+ }
+ }
+ r->ru_maxrss <<= PAGE_SHIFT - 10;
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
Index: b/fs/exec.c
===================================================================
--- a/fs/exec.c 2008-12-25 08:26:37.000000000 +0900
+++ b/fs/exec.c 2008-12-30 17:42:32.000000000 +0900
@@ -774,6 +774,7 @@ static int de_thread(struct task_struct
spinlock_t *lock = &oldsighand->siglock;
struct task_struct *leader = NULL;
int count;
+ unsigned long maxrss = 0;

if (thread_group_empty(tsk))
goto no_thread_group;
@@ -870,6 +871,10 @@ static int de_thread(struct task_struct
sig->notify_count = 0;

no_thread_group:
+ if (current->mm)
+ maxrss = get_mm_hiwater_rss(current->mm);
+ sig->maxrss = max(sig->maxrss, maxrss);
+
exit_itimers(sig);
flush_itimer_signals();
if (leader)


2008-12-31 10:09:31

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Tue, 30 Dec 2008 20:15:33 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Hi
>
> Oleg, Jiri, this is my getrusage testcase and proposal patch.
> Could you please review it?
>
>
>
> Changes Jiris's last version
> - At wait_task_zombie(), parent process doesn't only collect child maxrss,
> but also cmaxrss.
Yes, this is what we were missing so far.

> - ru_maxrss inherit at exec()
> - style fixes.

New patch seems almost fine to me. Only I do not like usage of max()
macro in do_exit() and in de_thread(). It's unnecessary and little
wasting (yes only a little :)). This two spots would be maybe better
solved with if(<).

>
> Applied after: introduce-get_mm_hiwater_xxx-fix-taskstats-hiwater_xxx-accounting.patch
> ==
> From: Signed-off-by: Jiri Pirko <[email protected]>
> Subject: [PATCH for -mm] getrusage: fill ru_maxrss value
>
> This patch makes ->ru_maxrss value in struct rusage filled accordingly to
> rss hiwater mark. This struct is filled as a parameter to
> getrusage syscall. ->ru_maxrss value is set to KBs which is the way it
> is done in BSD systems. /usr/bin/time (gnu time) application converts
> ->ru_maxrss to KBs which seems to be incorrect behavior. Maintainer of
> this util was notified by me with the patch which corrects it and cc'ed.
>
> To make this happen we extend struct signal_struct by two fields. The
> first one is ->maxrss which we use to store rss hiwater of the task. The
> second one is ->cmaxrss which we use to store highest rss hiwater of all
> task childs. These values are used in k_getrusage() to actually fill
> ->ru_maxrss. k_getrusage() uses current rss hiwater value directly
> if mm struct exists.
>
> Note:
> exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
> it is intetionally behavior. *BSD getrusage have exec() inheriting.
>
>
> Test progmam and test case
> ===========================
>
> getrusage.c
> ----
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/time.h>
> #include <sys/resource.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <unistd.h>
> #include <signal.h>
>
> static void consume(int mega)
> {
> size_t sz = mega * 1024 * 1024;
> void *ptr;
>
> ptr = malloc(sz);
> memset(ptr, 0, sz);
> usleep(1); /* BSD rusage statics need to sleep 1 tick */
> }
>
> static void show_rusage(char *prefix)
> {
> int err, err2;
> struct rusage rusage_self;
> struct rusage rusage_children;
>
> printf("%s: ", prefix);
> err = getrusage(RUSAGE_SELF, &rusage_self);
> if (!err)
> printf("self %ld ", rusage_self.ru_maxrss);
> err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
> if (!err2)
> printf("children %ld ", rusage_children.ru_maxrss);
>
> printf("\n");
> }
>
> int main(int argc, char** argv)
> {
> int status;
> int c;
> int need_sleep_before_wait = 0;
> int consume_large_memory_at_first = 0;
> int create_child_at_first = 0;
> int sigign = 0;
> int create_child_before_exec = 0;
> int after_fork_test = 0;
>
> while ((c = getopt(argc, argv, "ceflsz")) != -1) {
> switch (c) {
> case 'c':
> create_child_at_first = 1;
> break;
> case 'e':
> create_child_before_exec = 1;
> break;
> case 'f':
> after_fork_test = 1;
> break;
> case 'l':
> consume_large_memory_at_first = 1;
> break;
> case 's':
> sigign = 1;
> break;
> case 'z':
> need_sleep_before_wait = 1;
> break;
> default:
> break;
> }
> }
>
> if (consume_large_memory_at_first)
> consume(100);
>
> if (create_child_at_first)
> system("./child -q");
>
> if (sigign)
> signal(SIGCHLD, SIG_IGN);
>
> if (fork()) {
> usleep(1);
> if (need_sleep_before_wait)
> sleep(3); /* children become zombie */
> show_rusage("pre_wait");
> wait(&status);
> show_rusage("post_wait");
> } else {
> usleep(1);
> show_rusage("fork");
>
> if (after_fork_test) {
> consume(30);
> show_rusage("fork2");
> }
> if (create_child_before_exec) {
> system("./child -lq");
> usleep(1);
> show_rusage("fork3");
> }
>
> execl("./child", "child", 0);
> exit(0);
> }
>
> return 0;
> }
>
> child.c
> ----
> #include <sys/types.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/time.h>
> #include <sys/resource.h>
>
> static void consume(int mega)
> {
> size_t sz = mega * 1024 * 1024;
> void *ptr;
>
> ptr = malloc(sz);
> memset(ptr, 0, sz);
> usleep(1); /* BSD rusage statics need to sleep 1 tick */
> }
>
> static void show_rusage(char *prefix)
> {
> int err, err2;
> struct rusage rusage_self;
> struct rusage rusage_children;
>
> printf("%s: ", prefix);
> err = getrusage(RUSAGE_SELF, &rusage_self);
> if (!err)
> printf("self %ld ", rusage_self.ru_maxrss);
> err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
> if (!err2)
> printf("children %ld ", rusage_children.ru_maxrss);
>
> printf("\n");
>
> }
>
>
> int main(int argc, char** argv)
> {
> int status;
> int c;
> int silent = 0;
> int light_weight = 0;
>
> while ((c = getopt(argc, argv, "lq")) != -1) {
> switch (c) {
> case 'l':
> light_weight = 1;
> break;
> case 'q':
> silent = 1;
> break;
> default:
> break;
> }
> }
>
> if (!silent)
> show_rusage("exec");
>
> if (fork()) {
> if (light_weight)
> consume(400);
> else
> consume(700);
> wait(&status);
> } else {
> if (light_weight)
> consume(600);
> else
> consume(900);
>
> exit(0);
> }
>
> return 0;
> }
>
> testcase
> ==================
> 1. inherit fork?
>
> test way:
> % ./getrusage -lc
>
> bsd result:
> fork line is "fork: self 0 children 0".
>
> -> rusage sholdn't be inherit by fork.
> (both RUSAGE_SELF and RUSAGE_CHILDREN)
>
> 2. inherit exec?
>
> test way:
> % ./getrusage -lce
>
> bsd result:
> fork3: self 103204 children 60000
> exec: self 103204 children 60000
>
> fork3 and exec line are the same.
>
> -> rusage shold be inherit by exec.
> (both RUSAGE_SELF and RUSAGE_CHILDREN)
>
> 3. getrusage(RUSAGE_CHILDREN) collect grandchild statics?
>
> test way:
> % ./getrusage
>
> bsd result:
> post_wait line is about "post_wait: self 0 children 90000".
>
> -> RUSAGE_CHILDREN can collect grandchild.
>
> 4. zombie, but not waited children collect or not?
>
> test way:
> % ./getrusage -z
>
> bsd result:
> pre_wait line is "pre_wait: self 0 children 0".
>
> -> zombie child process (not waited-for child process)
> isn't accounted.
>
> 5. SIG_IGN collect or not
>
> test way:
> % ./getrusage -s
>
> bsd result:
> post_wait line is "post_wait: self 0 children 0".
>
> -> if SIGCHLD is ignored, children isn't accounted.
>
> 6. fork and malloc
> test way:
> % ./getrusage -lcf
>
> bsd result:
> fork line is "fork: self 0 children 0".
> fork2 line is about "fork: self 130000 children 0".
>
> -> rusage sholdn't be inherit by fork.
> (both RUSAGE_SELF and RUSAGE_CHILDREN)
> but additional memory cunsumption cause right
> maxrss calculation.
>
Testcases look good. Thanks for doing this.

>
> Signed-off-by: Jiri Pirko <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> fs/exec.c | 5 +++++
> include/linux/sched.h | 1 +
> kernel/exit.c | 6 ++++++
> kernel/fork.c | 1 +
> kernel/sys.c | 15 +++++++++++++++
> 5 files changed, 28 insertions(+)
>
> Index: b/include/linux/sched.h
> ===================================================================
> --- a/include/linux/sched.h 2008-12-29 23:27:59.000000000 +0900
> +++ b/include/linux/sched.h 2008-12-30 03:25:23.000000000 +0900
> @@ -562,6 +562,7 @@ struct signal_struct {
> unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
> unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
> unsigned long inblock, oublock, cinblock, coublock;
> + unsigned long maxrss, cmaxrss;
> struct task_io_accounting ioac;
>
> /*
> Index: b/kernel/exit.c
> ===================================================================
> --- a/kernel/exit.c 2008-12-29 23:27:59.000000000 +0900
> +++ b/kernel/exit.c 2008-12-30 17:35:51.000000000 +0900
> @@ -1053,6 +1053,10 @@ NORET_TYPE void do_exit(long code)
> if (group_dead) {
> hrtimer_cancel(&tsk->signal->real_timer);
> exit_itimers(tsk->signal);
> + if (tsk->mm) {
> + unsigned long maxrss = get_mm_hiwater_rss(tsk->mm);
> + tsk->signal->maxrss = max(maxrss, tsk->signal->maxrss);
> + }
> }
> acct_collect(code, group_dead);
> if (group_dead)
> @@ -1349,6 +1353,8 @@ static int wait_task_zombie(struct task_
> psig->coublock +=
> task_io_get_oublock(p) +
> sig->oublock + sig->coublock;
> + psig->cmaxrss = max(max(sig->maxrss, sig->cmaxrss),
> + psig->cmaxrss);
> task_io_accounting_add(&psig->ioac, &p->ioac);
> task_io_accounting_add(&psig->ioac, &sig->ioac);
> spin_unlock_irq(&p->parent->sighand->siglock);
> Index: b/kernel/fork.c
> ===================================================================
> --- a/kernel/fork.c 2008-12-25 08:26:37.000000000 +0900
> +++ b/kernel/fork.c 2008-12-30 03:48:09.000000000 +0900
> @@ -849,6 +849,7 @@ static int copy_signal(unsigned long clo
> sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
> sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
> sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
> + sig->maxrss = sig->cmaxrss = 0;
> task_io_accounting_init(&sig->ioac);
> taskstats_tgid_init(sig);
>
> Index: b/kernel/sys.c
> ===================================================================
> --- a/kernel/sys.c 2008-12-25 08:26:37.000000000 +0900
> +++ b/kernel/sys.c 2008-12-30 04:04:18.000000000 +0900
> @@ -1569,6 +1569,7 @@ static void k_getrusage(struct task_stru
> r->ru_majflt = p->signal->cmaj_flt;
> r->ru_inblock = p->signal->cinblock;
> r->ru_oublock = p->signal->coublock;
> + r->ru_maxrss = p->signal->cmaxrss;
>
> if (who == RUSAGE_CHILDREN)
> break;
> @@ -1583,6 +1584,8 @@ static void k_getrusage(struct task_stru
> r->ru_majflt += p->signal->maj_flt;
> r->ru_inblock += p->signal->inblock;
> r->ru_oublock += p->signal->oublock;
> + if (r->ru_maxrss < p->signal->maxrss)
> + r->ru_maxrss = p->signal->maxrss;
> t = p;
> do {
> accumulate_thread_rusage(t, r);
> @@ -1598,6 +1601,18 @@ static void k_getrusage(struct task_stru
> out:
> cputime_to_timeval(utime, &r->ru_utime);
> cputime_to_timeval(stime, &r->ru_stime);
> +
> + if (who != RUSAGE_CHILDREN) {
> + struct mm_struct *mm = get_task_mm(p);
> + if (mm) {
> + unsigned long maxrss = get_mm_hiwater_rss(mm);
> +
> + if (r->ru_maxrss < maxrss)
> + r->ru_maxrss = maxrss;
> + mmput(mm);
> + }
> + }
> + r->ru_maxrss <<= PAGE_SHIFT - 10;
> }
>
> int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
> Index: b/fs/exec.c
> ===================================================================
> --- a/fs/exec.c 2008-12-25 08:26:37.000000000 +0900
> +++ b/fs/exec.c 2008-12-30 17:42:32.000000000 +0900
> @@ -774,6 +774,7 @@ static int de_thread(struct task_struct
> spinlock_t *lock = &oldsighand->siglock;
> struct task_struct *leader = NULL;
> int count;
> + unsigned long maxrss = 0;
>
> if (thread_group_empty(tsk))
> goto no_thread_group;
> @@ -870,6 +871,10 @@ static int de_thread(struct task_struct
> sig->notify_count = 0;
>
> no_thread_group:
> + if (current->mm)
> + maxrss = get_mm_hiwater_rss(current->mm);
> + sig->maxrss = max(sig->maxrss, maxrss);
> +
> exit_itimers(sig);
> flush_itimer_signals();
> if (leader)
>
>

2008-12-31 12:43:21

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

Hi!

> > Changes Jiris's last version
> > - At wait_task_zombie(), parent process doesn't only collect child maxrss,
> > but also cmaxrss.
> Yes, this is what we were missing so far.
>
> > - ru_maxrss inherit at exec()
> > - style fixes.
>
> New patch seems almost fine to me. Only I do not like usage of max()
> macro in do_exit() and in de_thread(). It's unnecessary and little
> wasting (yes only a little :)). This two spots would be maybe better
> solved with if(<).

Done!



Changelog
v1 -> v2
- Removed unnecessary max() macro
- To avoid ru_maxrss recalculation in k_getrusage()
- style fixes

Jiri's resend3 -> v1
- At wait_task_zombie(), parent process doesn't only collect child maxrss,
but also cmaxrss.
- ru_maxrss inherit at exec()
- style fixes


Applied after: introduce-get_mm_hiwater_xxx-fix-taskstats-hiwater_xxx-accounting.patch
==
From: Signed-off-by: Jiri Pirko <[email protected]>
Subject: [PATCH for -mm] getrusage: fill ru_maxrss value

This patch makes ->ru_maxrss value in struct rusage filled accordingly to
rss hiwater mark. This struct is filled as a parameter to
getrusage syscall. ->ru_maxrss value is set to KBs which is the way it
is done in BSD systems. /usr/bin/time (gnu time) application converts
->ru_maxrss to KBs which seems to be incorrect behavior. Maintainer of
this util was notified by me with the patch which corrects it and cc'ed.

To make this happen we extend struct signal_struct by two fields. The
first one is ->maxrss which we use to store rss hiwater of the task. The
second one is ->cmaxrss which we use to store highest rss hiwater of all
task childs. These values are used in k_getrusage() to actually fill
->ru_maxrss. k_getrusage() uses current rss hiwater value directly
if mm struct exists.

Note:
exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
it is intetionally behavior. *BSD getrusage have exec() inheriting.


Test progmam and test case
===========================

getrusage.c
----
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");
}

int main(int argc, char** argv)
{
int status;
int c;
int need_sleep_before_wait = 0;
int consume_large_memory_at_first = 0;
int create_child_at_first = 0;
int sigign = 0;
int create_child_before_exec = 0;
int after_fork_test = 0;

while ((c = getopt(argc, argv, "ceflsz")) != -1) {
switch (c) {
case 'c':
create_child_at_first = 1;
break;
case 'e':
create_child_before_exec = 1;
break;
case 'f':
after_fork_test = 1;
break;
case 'l':
consume_large_memory_at_first = 1;
break;
case 's':
sigign = 1;
break;
case 'z':
need_sleep_before_wait = 1;
break;
default:
break;
}
}

if (consume_large_memory_at_first)
consume(100);

if (create_child_at_first)
system("./child -q");

if (sigign)
signal(SIGCHLD, SIG_IGN);

if (fork()) {
usleep(1);
if (need_sleep_before_wait)
sleep(3); /* children become zombie */
show_rusage("pre_wait");
wait(&status);
show_rusage("post_wait");
} else {
usleep(1);
show_rusage("fork");

if (after_fork_test) {
consume(30);
show_rusage("fork2");
}
if (create_child_before_exec) {
system("./child -lq");
usleep(1);
show_rusage("fork3");
}

execl("./child", "child", 0);
exit(0);
}

return 0;
}

child.c
----
#include <sys/types.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");

}


int main(int argc, char** argv)
{
int status;
int c;
int silent = 0;
int light_weight = 0;

while ((c = getopt(argc, argv, "lq")) != -1) {
switch (c) {
case 'l':
light_weight = 1;
break;
case 'q':
silent = 1;
break;
default:
break;
}
}

if (!silent)
show_rusage("exec");

if (fork()) {
if (light_weight)
consume(400);
else
consume(700);
wait(&status);
} else {
if (light_weight)
consume(600);
else
consume(900);

exit(0);
}

return 0;
}

testcase
==================
1. inherit fork?

test way:
% ./getrusage -lc

bsd result:
fork line is "fork: self 0 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

2. inherit exec?

test way:
% ./getrusage -lce

bsd result:
fork3: self 103204 children 60000
exec: self 103204 children 60000

fork3 and exec line are the same.

-> rusage shold be inherit by exec.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

3. getrusage(RUSAGE_CHILDREN) collect grandchild statics?

test way:
% ./getrusage

bsd result:
post_wait line is about "post_wait: self 0 children 90000".

-> RUSAGE_CHILDREN can collect grandchild.

4. zombie, but not waited children collect or not?

test way:
% ./getrusage -z

bsd result:
pre_wait line is "pre_wait: self 0 children 0".

-> zombie child process (not waited-for child process)
isn't accounted.

5. SIG_IGN collect or not

test way:
% ./getrusage -s

bsd result:
post_wait line is "post_wait: self 0 children 0".

-> if SIGCHLD is ignored, children isn't accounted.

6. fork and malloc
test way:
% ./getrusage -lcf

bsd result:
fork line is "fork: self 0 children 0".
fork2 line is about "fork: self 130000 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)
but additional memory cunsumption cause right
maxrss calculation.


Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
fs/exec.c | 7 +++++++
include/linux/sched.h | 1 +
kernel/exit.c | 10 ++++++++++
kernel/fork.c | 1 +
kernel/sys.c | 16 ++++++++++++++++
5 files changed, 35 insertions(+)

Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h 2008-12-29 23:27:59.000000000 +0900
+++ b/include/linux/sched.h 2008-12-30 03:25:23.000000000 +0900
@@ -562,6 +562,7 @@ struct signal_struct {
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
+ unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

/*
Index: b/kernel/exit.c
===================================================================
--- a/kernel/exit.c 2008-12-29 23:27:59.000000000 +0900
+++ b/kernel/exit.c 2008-12-31 21:08:08.000000000 +0900
@@ -1053,6 +1053,12 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(tsk->mm);
+
+ if (tsk->signal->maxrss < hiwater_rss)
+ tsk->signal->maxrss = hiwater_rss;
+ }
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1298,6 +1304,7 @@ static int wait_task_zombie(struct task_
struct signal_struct *psig;
struct signal_struct *sig;
struct task_cputime cputime;
+ unsigned long maxrss;

/*
* The resource counters for the group leader are in its
@@ -1349,6 +1356,9 @@ static int wait_task_zombie(struct task_
psig->coublock +=
task_io_get_oublock(p) +
sig->oublock + sig->coublock;
+ maxrss = max(sig->maxrss, sig->cmaxrss);
+ if (psig->cmaxrss < maxrss)
+ psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
spin_unlock_irq(&p->parent->sighand->siglock);
Index: b/kernel/fork.c
===================================================================
--- a/kernel/fork.c 2008-12-25 08:26:37.000000000 +0900
+++ b/kernel/fork.c 2008-12-30 03:48:09.000000000 +0900
@@ -849,6 +849,7 @@ static int copy_signal(unsigned long clo
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
+ sig->maxrss = sig->cmaxrss = 0;
task_io_accounting_init(&sig->ioac);
taskstats_tgid_init(sig);

Index: b/kernel/sys.c
===================================================================
--- a/kernel/sys.c 2008-12-25 08:26:37.000000000 +0900
+++ b/kernel/sys.c 2008-12-31 21:06:03.000000000 +0900
@@ -1546,6 +1546,7 @@ static void k_getrusage(struct task_stru
unsigned long flags;
cputime_t utime, stime;
struct task_cputime cputime;
+ unsigned long maxrss = 0;

memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;
@@ -1569,6 +1570,7 @@ static void k_getrusage(struct task_stru
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1583,6 +1585,8 @@ static void k_getrusage(struct task_stru
r->ru_majflt += p->signal->maj_flt;
r->ru_inblock += p->signal->inblock;
r->ru_oublock += p->signal->oublock;
+ if (maxrss < p->signal->maxrss)
+ maxrss = p->signal->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
@@ -1598,6 +1602,18 @@ static void k_getrusage(struct task_stru
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ struct mm_struct *mm = get_task_mm(p);
+ if (mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
+
+ if (maxrss < hiwater_rss)
+ maxrss = hiwater_rss;
+ mmput(mm);
+ }
+ }
+ r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
Index: b/fs/exec.c
===================================================================
--- a/fs/exec.c 2008-12-25 08:26:37.000000000 +0900
+++ b/fs/exec.c 2008-12-31 21:11:28.000000000 +0900
@@ -870,6 +870,13 @@ static int de_thread(struct task_struct
sig->notify_count = 0;

no_thread_group:
+ if (current->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
+
+ if (sig->maxrss < hiwater_rss)
+ sig->maxrss = hiwater_rss;
+ }
+
exit_itimers(sig);
flush_itimer_signals();
if (leader)

2008-12-31 18:09:19

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Wed, 31 Dec 2008 21:42:36 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Hi!
>
> > > Changes Jiris's last version
> > > - At wait_task_zombie(), parent process doesn't only collect child maxrss,
> > > but also cmaxrss.
> > Yes, this is what we were missing so far.
> >
> > > - ru_maxrss inherit at exec()
> > > - style fixes.
> >
> > New patch seems almost fine to me. Only I do not like usage of max()
> > macro in do_exit() and in de_thread(). It's unnecessary and little
> > wasting (yes only a little :)). This two spots would be maybe better
> > solved with if(<).
>
> Done!
>
>
Okay - now I'm happy with this patch. Let's see with what Oleg will come :)

>
> Changelog
> v1 -> v2
> - Removed unnecessary max() macro
> - To avoid ru_maxrss recalculation in k_getrusage()
> - style fixes
>
> Jiri's resend3 -> v1
> - At wait_task_zombie(), parent process doesn't only collect child maxrss,
> but also cmaxrss.
> - ru_maxrss inherit at exec()
> - style fixes
>
>
> Applied after: introduce-get_mm_hiwater_xxx-fix-taskstats-hiwater_xxx-accounting.patch
> ==
> From: Signed-off-by: Jiri Pirko <[email protected]>
> Subject: [PATCH for -mm] getrusage: fill ru_maxrss value
>
> This patch makes ->ru_maxrss value in struct rusage filled accordingly to
> rss hiwater mark. This struct is filled as a parameter to
> getrusage syscall. ->ru_maxrss value is set to KBs which is the way it
> is done in BSD systems. /usr/bin/time (gnu time) application converts
> ->ru_maxrss to KBs which seems to be incorrect behavior. Maintainer of
> this util was notified by me with the patch which corrects it and cc'ed.
>
> To make this happen we extend struct signal_struct by two fields. The
> first one is ->maxrss which we use to store rss hiwater of the task. The
> second one is ->cmaxrss which we use to store highest rss hiwater of all
> task childs. These values are used in k_getrusage() to actually fill
> ->ru_maxrss. k_getrusage() uses current rss hiwater value directly
> if mm struct exists.
>
> Note:
> exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
> it is intetionally behavior. *BSD getrusage have exec() inheriting.
>
>
> Test progmam and test case
> ===========================
>
> getrusage.c
> ----
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/time.h>
> #include <sys/resource.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <unistd.h>
> #include <signal.h>
>
> static void consume(int mega)
> {
> size_t sz = mega * 1024 * 1024;
> void *ptr;
>
> ptr = malloc(sz);
> memset(ptr, 0, sz);
> usleep(1); /* BSD rusage statics need to sleep 1 tick */
> }
>
> static void show_rusage(char *prefix)
> {
> int err, err2;
> struct rusage rusage_self;
> struct rusage rusage_children;
>
> printf("%s: ", prefix);
> err = getrusage(RUSAGE_SELF, &rusage_self);
> if (!err)
> printf("self %ld ", rusage_self.ru_maxrss);
> err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
> if (!err2)
> printf("children %ld ", rusage_children.ru_maxrss);
>
> printf("\n");
> }
>
> int main(int argc, char** argv)
> {
> int status;
> int c;
> int need_sleep_before_wait = 0;
> int consume_large_memory_at_first = 0;
> int create_child_at_first = 0;
> int sigign = 0;
> int create_child_before_exec = 0;
> int after_fork_test = 0;
>
> while ((c = getopt(argc, argv, "ceflsz")) != -1) {
> switch (c) {
> case 'c':
> create_child_at_first = 1;
> break;
> case 'e':
> create_child_before_exec = 1;
> break;
> case 'f':
> after_fork_test = 1;
> break;
> case 'l':
> consume_large_memory_at_first = 1;
> break;
> case 's':
> sigign = 1;
> break;
> case 'z':
> need_sleep_before_wait = 1;
> break;
> default:
> break;
> }
> }
>
> if (consume_large_memory_at_first)
> consume(100);
>
> if (create_child_at_first)
> system("./child -q");
>
> if (sigign)
> signal(SIGCHLD, SIG_IGN);
>
> if (fork()) {
> usleep(1);
> if (need_sleep_before_wait)
> sleep(3); /* children become zombie */
> show_rusage("pre_wait");
> wait(&status);
> show_rusage("post_wait");
> } else {
> usleep(1);
> show_rusage("fork");
>
> if (after_fork_test) {
> consume(30);
> show_rusage("fork2");
> }
> if (create_child_before_exec) {
> system("./child -lq");
> usleep(1);
> show_rusage("fork3");
> }
>
> execl("./child", "child", 0);
> exit(0);
> }
>
> return 0;
> }
>
> child.c
> ----
> #include <sys/types.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/types.h>
> #include <sys/time.h>
> #include <sys/resource.h>
>
> static void consume(int mega)
> {
> size_t sz = mega * 1024 * 1024;
> void *ptr;
>
> ptr = malloc(sz);
> memset(ptr, 0, sz);
> usleep(1); /* BSD rusage statics need to sleep 1 tick */
> }
>
> static void show_rusage(char *prefix)
> {
> int err, err2;
> struct rusage rusage_self;
> struct rusage rusage_children;
>
> printf("%s: ", prefix);
> err = getrusage(RUSAGE_SELF, &rusage_self);
> if (!err)
> printf("self %ld ", rusage_self.ru_maxrss);
> err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
> if (!err2)
> printf("children %ld ", rusage_children.ru_maxrss);
>
> printf("\n");
>
> }
>
>
> int main(int argc, char** argv)
> {
> int status;
> int c;
> int silent = 0;
> int light_weight = 0;
>
> while ((c = getopt(argc, argv, "lq")) != -1) {
> switch (c) {
> case 'l':
> light_weight = 1;
> break;
> case 'q':
> silent = 1;
> break;
> default:
> break;
> }
> }
>
> if (!silent)
> show_rusage("exec");
>
> if (fork()) {
> if (light_weight)
> consume(400);
> else
> consume(700);
> wait(&status);
> } else {
> if (light_weight)
> consume(600);
> else
> consume(900);
>
> exit(0);
> }
>
> return 0;
> }
>
> testcase
> ==================
> 1. inherit fork?
>
> test way:
> % ./getrusage -lc
>
> bsd result:
> fork line is "fork: self 0 children 0".
>
> -> rusage sholdn't be inherit by fork.
> (both RUSAGE_SELF and RUSAGE_CHILDREN)
>
> 2. inherit exec?
>
> test way:
> % ./getrusage -lce
>
> bsd result:
> fork3: self 103204 children 60000
> exec: self 103204 children 60000
>
> fork3 and exec line are the same.
>
> -> rusage shold be inherit by exec.
> (both RUSAGE_SELF and RUSAGE_CHILDREN)
>
> 3. getrusage(RUSAGE_CHILDREN) collect grandchild statics?
>
> test way:
> % ./getrusage
>
> bsd result:
> post_wait line is about "post_wait: self 0 children 90000".
>
> -> RUSAGE_CHILDREN can collect grandchild.
>
> 4. zombie, but not waited children collect or not?
>
> test way:
> % ./getrusage -z
>
> bsd result:
> pre_wait line is "pre_wait: self 0 children 0".
>
> -> zombie child process (not waited-for child process)
> isn't accounted.
>
> 5. SIG_IGN collect or not
>
> test way:
> % ./getrusage -s
>
> bsd result:
> post_wait line is "post_wait: self 0 children 0".
>
> -> if SIGCHLD is ignored, children isn't accounted.
>
> 6. fork and malloc
> test way:
> % ./getrusage -lcf
>
> bsd result:
> fork line is "fork: self 0 children 0".
> fork2 line is about "fork: self 130000 children 0".
>
> -> rusage sholdn't be inherit by fork.
> (both RUSAGE_SELF and RUSAGE_CHILDREN)
> but additional memory cunsumption cause right
> maxrss calculation.
>
>
> Signed-off-by: Jiri Pirko <[email protected]>
> Signed-off-by: KOSAKI Motohiro <[email protected]>
> ---
> fs/exec.c | 7 +++++++
> include/linux/sched.h | 1 +
> kernel/exit.c | 10 ++++++++++
> kernel/fork.c | 1 +
> kernel/sys.c | 16 ++++++++++++++++
> 5 files changed, 35 insertions(+)
>
> Index: b/include/linux/sched.h
> ===================================================================
> --- a/include/linux/sched.h 2008-12-29 23:27:59.000000000 +0900
> +++ b/include/linux/sched.h 2008-12-30 03:25:23.000000000 +0900
> @@ -562,6 +562,7 @@ struct signal_struct {
> unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
> unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
> unsigned long inblock, oublock, cinblock, coublock;
> + unsigned long maxrss, cmaxrss;
> struct task_io_accounting ioac;
>
> /*
> Index: b/kernel/exit.c
> ===================================================================
> --- a/kernel/exit.c 2008-12-29 23:27:59.000000000 +0900
> +++ b/kernel/exit.c 2008-12-31 21:08:08.000000000 +0900
> @@ -1053,6 +1053,12 @@ NORET_TYPE void do_exit(long code)
> if (group_dead) {
> hrtimer_cancel(&tsk->signal->real_timer);
> exit_itimers(tsk->signal);
> + if (tsk->mm) {
> + unsigned long hiwater_rss = get_mm_hiwater_rss(tsk->mm);
> +
> + if (tsk->signal->maxrss < hiwater_rss)
> + tsk->signal->maxrss = hiwater_rss;
> + }
> }
> acct_collect(code, group_dead);
> if (group_dead)
> @@ -1298,6 +1304,7 @@ static int wait_task_zombie(struct task_
> struct signal_struct *psig;
> struct signal_struct *sig;
> struct task_cputime cputime;
> + unsigned long maxrss;
>
> /*
> * The resource counters for the group leader are in its
> @@ -1349,6 +1356,9 @@ static int wait_task_zombie(struct task_
> psig->coublock +=
> task_io_get_oublock(p) +
> sig->oublock + sig->coublock;
> + maxrss = max(sig->maxrss, sig->cmaxrss);
> + if (psig->cmaxrss < maxrss)
> + psig->cmaxrss = maxrss;
> task_io_accounting_add(&psig->ioac, &p->ioac);
> task_io_accounting_add(&psig->ioac, &sig->ioac);
> spin_unlock_irq(&p->parent->sighand->siglock);
> Index: b/kernel/fork.c
> ===================================================================
> --- a/kernel/fork.c 2008-12-25 08:26:37.000000000 +0900
> +++ b/kernel/fork.c 2008-12-30 03:48:09.000000000 +0900
> @@ -849,6 +849,7 @@ static int copy_signal(unsigned long clo
> sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
> sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
> sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
> + sig->maxrss = sig->cmaxrss = 0;
> task_io_accounting_init(&sig->ioac);
> taskstats_tgid_init(sig);
>
> Index: b/kernel/sys.c
> ===================================================================
> --- a/kernel/sys.c 2008-12-25 08:26:37.000000000 +0900
> +++ b/kernel/sys.c 2008-12-31 21:06:03.000000000 +0900
> @@ -1546,6 +1546,7 @@ static void k_getrusage(struct task_stru
> unsigned long flags;
> cputime_t utime, stime;
> struct task_cputime cputime;
> + unsigned long maxrss = 0;
>
> memset((char *) r, 0, sizeof *r);
> utime = stime = cputime_zero;
> @@ -1569,6 +1570,7 @@ static void k_getrusage(struct task_stru
> r->ru_majflt = p->signal->cmaj_flt;
> r->ru_inblock = p->signal->cinblock;
> r->ru_oublock = p->signal->coublock;
> + maxrss = p->signal->cmaxrss;
>
> if (who == RUSAGE_CHILDREN)
> break;
> @@ -1583,6 +1585,8 @@ static void k_getrusage(struct task_stru
> r->ru_majflt += p->signal->maj_flt;
> r->ru_inblock += p->signal->inblock;
> r->ru_oublock += p->signal->oublock;
> + if (maxrss < p->signal->maxrss)
> + maxrss = p->signal->maxrss;
> t = p;
> do {
> accumulate_thread_rusage(t, r);
> @@ -1598,6 +1602,18 @@ static void k_getrusage(struct task_stru
> out:
> cputime_to_timeval(utime, &r->ru_utime);
> cputime_to_timeval(stime, &r->ru_stime);
> +
> + if (who != RUSAGE_CHILDREN) {
> + struct mm_struct *mm = get_task_mm(p);
> + if (mm) {
> + unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
> +
> + if (maxrss < hiwater_rss)
> + maxrss = hiwater_rss;
> + mmput(mm);
> + }
> + }
> + r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
> }
>
> int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
> Index: b/fs/exec.c
> ===================================================================
> --- a/fs/exec.c 2008-12-25 08:26:37.000000000 +0900
> +++ b/fs/exec.c 2008-12-31 21:11:28.000000000 +0900
> @@ -870,6 +870,13 @@ static int de_thread(struct task_struct
> sig->notify_count = 0;
>
> no_thread_group:
> + if (current->mm) {
> + unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
> +
> + if (sig->maxrss < hiwater_rss)
> + sig->maxrss = hiwater_rss;
> + }
> +
> exit_itimers(sig);
> flush_itimer_signals();
> if (leader)
>
>

2009-01-03 18:01:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

sorry for delay!

On 12/31, KOSAKI Motohiro wrote:
>
> Jiri's resend3 -> v1
> - At wait_task_zombie(), parent process doesn't only collect child maxrss,
> but also cmaxrss.

Ah yes, this looks very right to me.

> - ru_maxrss inherit at exec()

I must admit, I hate this ;)

That said, I agree with you point about compatibility. So I have to
agree with this change.

Still, I'd like to know what other people think ;)

And I also agree that xacct is linux specific feature, but I still
I dislike the fact that xacct and getrusage report different numbers.
Perhaps we should change xacct as well?

> --- a/kernel/exit.c 2008-12-29 23:27:59.000000000 +0900
> +++ b/kernel/exit.c 2008-12-31 21:08:08.000000000 +0900
> @@ -1053,6 +1053,12 @@ NORET_TYPE void do_exit(long code)
> if (group_dead) {
> hrtimer_cancel(&tsk->signal->real_timer);
> exit_itimers(tsk->signal);
> + if (tsk->mm) {
> + unsigned long hiwater_rss = get_mm_hiwater_rss(tsk->mm);
> +
> + if (tsk->signal->maxrss < hiwater_rss)
> + tsk->signal->maxrss = hiwater_rss;
> + }
[...snip...]
> --- a/fs/exec.c 2008-12-25 08:26:37.000000000 +0900
> +++ b/fs/exec.c 2008-12-31 21:11:28.000000000 +0900
> @@ -870,6 +870,13 @@ static int de_thread(struct task_struct
> sig->notify_count = 0;
>
> no_thread_group:
> + if (current->mm) {
> + unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
> +
> + if (sig->maxrss < hiwater_rss)
> + sig->maxrss = hiwater_rss;
> + }

Perhaps it makes sense to factor out this code and make a helper?

Unfortunately, exit_mm() and exec_mmap() do not have the common
path which can update sig->maxrss, mm_release() can't do this...

> + if (who != RUSAGE_CHILDREN) {
> + struct mm_struct *mm = get_task_mm(p);
> + if (mm) {
> + unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
> +
> + if (maxrss < hiwater_rss)
> + maxrss = hiwater_rss;
> + mmput(mm);
> + }
> + }
> + r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */

Hmm... So, RUSAGE_THREAD always report maxrss == get_mm_hiwater_rss(mm)
and ignores signal->maxrss. Doesn't look right to me...

Unless I missed something, Jiris's patch was fine, but given that now
we inherit maxrss at exec(), signal->maxrss can have the "inherited"
value?

Oleg.

2009-01-03 21:13:45

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

> sorry for delay!
>
>>
>> Jiri's resend3 -> v1
>> - At wait_task_zombie(), parent process doesn't only collect child maxrss,
>> but also cmaxrss.
>
> Ah yes, this looks very right to me.

hehe, therefore I like testcase driven development :)


>> - ru_maxrss inherit at exec()
>
> I must admit, I hate this ;)

Your feelings are understood well.


> That said, I agree with you point about compatibility. So I have to
> agree with this change.
>
> Still, I'd like to know what other people think ;)

Yeah, me too.


> And I also agree that xacct is linux specific feature, but I still
> I dislike the fact that xacct and getrusage report different numbers.
> Perhaps we should change xacct as well?

I don't have any strong opinion.
it seems good idea. but I don't know xacct so much.



>
>> --- a/kernel/exit.c 2008-12-29 23:27:59.000000000 +0900
>> +++ b/kernel/exit.c 2008-12-31 21:08:08.000000000 +0900
>> @@ -1053,6 +1053,12 @@ NORET_TYPE void do_exit(long code)
>> if (group_dead) {
>> hrtimer_cancel(&tsk->signal->real_timer);
>> exit_itimers(tsk->signal);
>> + if (tsk->mm) {
>> + unsigned long hiwater_rss = get_mm_hiwater_rss(tsk->mm);
>> +
>> + if (tsk->signal->maxrss < hiwater_rss)
>> + tsk->signal->maxrss = hiwater_rss;
>> + }
> [...snip...]
>> --- a/fs/exec.c 2008-12-25 08:26:37.000000000 +0900
>> +++ b/fs/exec.c 2008-12-31 21:11:28.000000000 +0900
>> @@ -870,6 +870,13 @@ static int de_thread(struct task_struct
>> sig->notify_count = 0;
>>
>> no_thread_group:
>> + if (current->mm) {
>> + unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
>> +
>> + if (sig->maxrss < hiwater_rss)
>> + sig->maxrss = hiwater_rss;
>> + }
>
> Perhaps it makes sense to factor out this code and make a helper?
>
> Unfortunately, exit_mm() and exec_mmap() do not have the common
> path which can update sig->maxrss, mm_release() can't do this...

last problem are, function name and which file have it :)


>> + if (who != RUSAGE_CHILDREN) {
>> + struct mm_struct *mm = get_task_mm(p);
>> + if (mm) {
>> + unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
>> +
>> + if (maxrss < hiwater_rss)
>> + maxrss = hiwater_rss;
>> + mmput(mm);
>> + }
>> + }
>> + r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
>
> Hmm... So, RUSAGE_THREAD always report maxrss == get_mm_hiwater_rss(mm)
> and ignores signal->maxrss. Doesn't look right to me...
>
> Unless I missed something, Jiris's patch was fine, but given that now
> we inherit maxrss at exec(), signal->maxrss can have the "inherited"
> value?

you are right. thanks.
will fix.

2009-01-05 15:33:52

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

Changelog
v2 -> v3
- in k_getrusage() use (inherited) sig->maxrss value in case of
RUSAGE_THREAD

v1 -> v2
- Removed unnecessary max() macro
- To avoid ru_maxrss recalculation in k_getrusage()
- style fixes

Jiri's resend3 -> v1
- At wait_task_zombie(), parent process doesn't only collect child maxrss,
but also cmaxrss.
- ru_maxrss inherit at exec()
- style fixes


Applied after: introduce-get_mm_hiwater_xxx-fix-taskstats-hiwater_xxx-accounting.patch
==
From: Signed-off-by: Jiri Pirko <[email protected]>
Subject: [PATCH for -mm] getrusage: fill ru_maxrss value

This patch makes ->ru_maxrss value in struct rusage filled accordingly to
rss hiwater mark. This struct is filled as a parameter to
getrusage syscall. ->ru_maxrss value is set to KBs which is the way it
is done in BSD systems. /usr/bin/time (gnu time) application converts
->ru_maxrss to KBs which seems to be incorrect behavior. Maintainer of
this util was notified by me with the patch which corrects it and cc'ed.

To make this happen we extend struct signal_struct by two fields. The
first one is ->maxrss which we use to store rss hiwater of the task. The
second one is ->cmaxrss which we use to store highest rss hiwater of all
task childs. These values are used in k_getrusage() to actually fill
->ru_maxrss. k_getrusage() uses current rss hiwater value directly
if mm struct exists.

Note:
exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
it is intetionally behavior. *BSD getrusage have exec() inheriting.


Test progmam and test case
===========================

getrusage.c
----
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");
}

int main(int argc, char** argv)
{
int status;
int c;
int need_sleep_before_wait = 0;
int consume_large_memory_at_first = 0;
int create_child_at_first = 0;
int sigign = 0;
int create_child_before_exec = 0;
int after_fork_test = 0;

while ((c = getopt(argc, argv, "ceflsz")) != -1) {
switch (c) {
case 'c':
create_child_at_first = 1;
break;
case 'e':
create_child_before_exec = 1;
break;
case 'f':
after_fork_test = 1;
break;
case 'l':
consume_large_memory_at_first = 1;
break;
case 's':
sigign = 1;
break;
case 'z':
need_sleep_before_wait = 1;
break;
default:
break;
}
}

if (consume_large_memory_at_first)
consume(100);

if (create_child_at_first)
system("./child -q");

if (sigign)
signal(SIGCHLD, SIG_IGN);

if (fork()) {
usleep(1);
if (need_sleep_before_wait)
sleep(3); /* children become zombie */
show_rusage("pre_wait");
wait(&status);
show_rusage("post_wait");
} else {
usleep(1);
show_rusage("fork");

if (after_fork_test) {
consume(30);
show_rusage("fork2");
}
if (create_child_before_exec) {
system("./child -lq");
usleep(1);
show_rusage("fork3");
}

execl("./child", "child", 0);
exit(0);
}

return 0;
}

child.c
----
#include <sys/types.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");

}


int main(int argc, char** argv)
{
int status;
int c;
int silent = 0;
int light_weight = 0;

while ((c = getopt(argc, argv, "lq")) != -1) {
switch (c) {
case 'l':
light_weight = 1;
break;
case 'q':
silent = 1;
break;
default:
break;
}
}

if (!silent)
show_rusage("exec");

if (fork()) {
if (light_weight)
consume(400);
else
consume(700);
wait(&status);
} else {
if (light_weight)
consume(600);
else
consume(900);

exit(0);
}

return 0;
}

testcase
==================
1. inherit fork?

test way:
% ./getrusage -lc

bsd result:
fork line is "fork: self 0 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

2. inherit exec?

test way:
% ./getrusage -lce

bsd result:
fork3: self 103204 children 60000
exec: self 103204 children 60000

fork3 and exec line are the same.

-> rusage shold be inherit by exec.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

3. getrusage(RUSAGE_CHILDREN) collect grandchild statics?

test way:
% ./getrusage

bsd result:
post_wait line is about "post_wait: self 0 children 90000".

-> RUSAGE_CHILDREN can collect grandchild.

4. zombie, but not waited children collect or not?

test way:
% ./getrusage -z

bsd result:
pre_wait line is "pre_wait: self 0 children 0".

-> zombie child process (not waited-for child process)
isn't accounted.

5. SIG_IGN collect or not

test way:
% ./getrusage -s

bsd result:
post_wait line is "post_wait: self 0 children 0".

-> if SIGCHLD is ignored, children isn't accounted.

6. fork and malloc
test way:
% ./getrusage -lcf

bsd result:
fork line is "fork: self 0 children 0".
fork2 line is about "fork: self 130000 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)
but additional memory cunsumption cause right
maxrss calculation.


Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
fs/exec.c | 7 +++++++
include/linux/sched.h | 1 +
kernel/exit.c | 10 ++++++++++
kernel/fork.c | 1 +
kernel/sys.c | 17 +++++++++++++++++
5 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 3ef9cf9..b939ef5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -867,6 +867,13 @@ static int de_thread(struct task_struct *tsk)
sig->notify_count = 0;

no_thread_group:
+ if (current->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
+
+ if (sig->maxrss < hiwater_rss)
+ sig->maxrss = hiwater_rss;
+ }
+
exit_itimers(sig);
flush_itimer_signals();

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ea41513..62a0f45 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -560,6 +560,7 @@ struct signal_struct {
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
+ unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

/*
diff --git a/kernel/exit.c b/kernel/exit.c
index 1a8c22f..5c0d601 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1060,6 +1060,12 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(tsk->mm);
+
+ if (tsk->signal->maxrss < hiwater_rss)
+ tsk->signal->maxrss = hiwater_rss;
+ }
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1303,6 +1309,7 @@ static int wait_task_zombie(struct task_struct *p, int options,
struct signal_struct *psig;
struct signal_struct *sig;
struct task_cputime cputime;
+ unsigned long maxrss;

/*
* The resource counters for the group leader are in its
@@ -1354,6 +1361,9 @@ static int wait_task_zombie(struct task_struct *p, int options,
psig->coublock +=
task_io_get_oublock(p) +
sig->oublock + sig->coublock;
+ maxrss = max(sig->maxrss, sig->cmaxrss);
+ if (psig->cmaxrss < maxrss)
+ psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
spin_unlock_irq(&p->parent->sighand->siglock);
diff --git a/kernel/fork.c b/kernel/fork.c
index 43cbf30..35bec65 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -846,6 +846,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
+ sig->maxrss = sig->cmaxrss = 0;
task_io_accounting_init(&sig->ioac);
taskstats_tgid_init(sig);

diff --git a/kernel/sys.c b/kernel/sys.c
index d356d79..f5ca281 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1622,12 +1622,14 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
unsigned long flags;
cputime_t utime, stime;
struct task_cputime cputime;
+ unsigned long maxrss = 0;

memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;

if (who == RUSAGE_THREAD) {
accumulate_thread_rusage(p, r);
+ maxrss = p->signal->maxrss;
goto out;
}

@@ -1645,6 +1647,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1659,6 +1662,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt += p->signal->maj_flt;
r->ru_inblock += p->signal->inblock;
r->ru_oublock += p->signal->oublock;
+ if (maxrss < p->signal->maxrss)
+ maxrss = p->signal->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
@@ -1674,6 +1679,18 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ struct mm_struct *mm = get_task_mm(p);
+ if (mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
+
+ if (maxrss < hiwater_rss)
+ maxrss = hiwater_rss;
+ mmput(mm);
+ }


2009-01-05 22:13:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Mon, 5 Jan 2009 16:32:04 +0100
Jiri Pirko <[email protected]> wrote:

> Changelog
> v2 -> v3
> - in k_getrusage() use (inherited) sig->maxrss value in case of
> RUSAGE_THREAD

The patch which you sent was mysteriously truncated - the kernel/sys.c hunk
is partly missing. So I took that bit from the earlier version of the patch.

Please check that the below is still identical to your version 3.


diff -puN fs/exec.c~getrusage-fill-ru_maxrss-value fs/exec.c
--- a/fs/exec.c~getrusage-fill-ru_maxrss-value
+++ a/fs/exec.c
@@ -864,6 +864,13 @@ static int de_thread(struct task_struct
sig->notify_count = 0;

no_thread_group:
+ if (current->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
+
+ if (sig->maxrss < hiwater_rss)
+ sig->maxrss = hiwater_rss;
+ }
+
exit_itimers(sig);
flush_itimer_signals();

diff -puN include/linux/sched.h~getrusage-fill-ru_maxrss-value include/linux/sched.h
--- a/include/linux/sched.h~getrusage-fill-ru_maxrss-value
+++ a/include/linux/sched.h
@@ -560,6 +560,7 @@ struct signal_struct {
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
+ unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

/*
diff -puN kernel/exit.c~getrusage-fill-ru_maxrss-value kernel/exit.c
--- a/kernel/exit.c~getrusage-fill-ru_maxrss-value
+++ a/kernel/exit.c
@@ -1053,6 +1053,12 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(tsk->mm);
+
+ if (tsk->signal->maxrss < hiwater_rss)
+ tsk->signal->maxrss = hiwater_rss;
+ }
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1296,6 +1302,7 @@ static int wait_task_zombie(struct task_
struct signal_struct *psig;
struct signal_struct *sig;
struct task_cputime cputime;
+ unsigned long maxrss;

/*
* The resource counters for the group leader are in its
@@ -1347,6 +1354,9 @@ static int wait_task_zombie(struct task_
psig->coublock +=
task_io_get_oublock(p) +
sig->oublock + sig->coublock;
+ maxrss = max(sig->maxrss, sig->cmaxrss);
+ if (psig->cmaxrss < maxrss)
+ psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
spin_unlock_irq(&p->parent->sighand->siglock);
diff -puN kernel/fork.c~getrusage-fill-ru_maxrss-value kernel/fork.c
--- a/kernel/fork.c~getrusage-fill-ru_maxrss-value
+++ a/kernel/fork.c
@@ -851,6 +851,7 @@ static int copy_signal(unsigned long clo
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
+ sig->maxrss = sig->cmaxrss = 0;
task_io_accounting_init(&sig->ioac);
taskstats_tgid_init(sig);

diff -puN kernel/sys.c~getrusage-fill-ru_maxrss-value kernel/sys.c
--- a/kernel/sys.c~getrusage-fill-ru_maxrss-value
+++ a/kernel/sys.c
@@ -1676,6 +1676,18 @@ static void k_getrusage(struct task_stru
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ struct mm_struct *mm = get_task_mm(p);
+ if (mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
+
+ if (maxrss < hiwater_rss)
+ maxrss = hiwater_rss;
+ mmput(mm);
+ }
+ }
+ r->ru_maxrss <<= PAGE_SHIFT - 10;
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
_

2009-01-06 09:49:18

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Mon, 5 Jan 2009 14:13:13 -0800
Andrew Morton <[email protected]> wrote:

> On Mon, 5 Jan 2009 16:32:04 +0100
> Jiri Pirko <[email protected]> wrote:
>
> > Changelog
> > v2 -> v3
> > - in k_getrusage() use (inherited) sig->maxrss value in case of
> > RUSAGE_THREAD
>
> The patch which you sent was mysteriously truncated - the kernel/sys.c hunk
> is partly missing. So I took that bit from the earlier version of the patch.
Sorry for this. I should probably toss this Claws into the garbage can :/
>
> Please check that the below is still identical to your version 3.
Nope - the tail is different. - Sending the patch again. Please use
the changelog text from previous send.
Thanks Andrew and again sorry for problems.


Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
---
fs/exec.c | 7 +++++++
include/linux/sched.h | 1 +
kernel/exit.c | 10 ++++++++++
kernel/fork.c | 1 +
kernel/sys.c | 17 +++++++++++++++++
5 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 3ef9cf9..b939ef5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -867,6 +867,13 @@ static int de_thread(struct task_struct *tsk)
sig->notify_count = 0;

no_thread_group:
+ if (current->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
+
+ if (sig->maxrss < hiwater_rss)
+ sig->maxrss = hiwater_rss;
+ }
+
exit_itimers(sig);
flush_itimer_signals();

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ea41513..62a0f45 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -560,6 +560,7 @@ struct signal_struct {
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
+ unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

/*
diff --git a/kernel/exit.c b/kernel/exit.c
index 1a8c22f..5c0d601 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1060,6 +1060,12 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(tsk->mm);
+
+ if (tsk->signal->maxrss < hiwater_rss)
+ tsk->signal->maxrss = hiwater_rss;
+ }
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1303,6 +1309,7 @@ static int wait_task_zombie(struct task_struct *p, int options,
struct signal_struct *psig;
struct signal_struct *sig;
struct task_cputime cputime;
+ unsigned long maxrss;

/*
* The resource counters for the group leader are in its
@@ -1354,6 +1361,9 @@ static int wait_task_zombie(struct task_struct *p, int options,
psig->coublock +=
task_io_get_oublock(p) +
sig->oublock + sig->coublock;
+ maxrss = max(sig->maxrss, sig->cmaxrss);
+ if (psig->cmaxrss < maxrss)
+ psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
spin_unlock_irq(&p->parent->sighand->siglock);
diff --git a/kernel/fork.c b/kernel/fork.c
index 43cbf30..35bec65 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -846,6 +846,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
+ sig->maxrss = sig->cmaxrss = 0;
task_io_accounting_init(&sig->ioac);
taskstats_tgid_init(sig);

diff --git a/kernel/sys.c b/kernel/sys.c
index d356d79..f5ca281 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1622,12 +1622,14 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
unsigned long flags;
cputime_t utime, stime;
struct task_cputime cputime;
+ unsigned long maxrss = 0;

memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;

if (who == RUSAGE_THREAD) {
accumulate_thread_rusage(p, r);
+ maxrss = p->signal->maxrss;
goto out;
}

@@ -1645,6 +1647,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1659,6 +1662,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt += p->signal->maj_flt;
r->ru_inblock += p->signal->inblock;
r->ru_oublock += p->signal->oublock;
+ if (maxrss < p->signal->maxrss)
+ maxrss = p->signal->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
@@ -1674,6 +1679,18 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ struct mm_struct *mm = get_task_mm(p);
+ if (mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
+
+ if (maxrss < hiwater_rss)
+ maxrss = hiwater_rss;
+ mmput(mm);
+ }
+ }
+ r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
--
1.6.0.3

2009-04-02 20:57:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Tue, 6 Jan 2009 10:48:39 +0100
Jiri Pirko <[email protected]> wrote:

> On Mon, 5 Jan 2009 14:13:13 -0800
> Andrew Morton <[email protected]> wrote:
>
> > On Mon, 5 Jan 2009 16:32:04 +0100
> > Jiri Pirko <[email protected]> wrote:
> >
> > > Changelog
> > > v2 -> v3
> > > - in k_getrusage() use (inherited) sig->maxrss value in case of
> > > RUSAGE_THREAD
> >
> > The patch which you sent was mysteriously truncated - the kernel/sys.c hunk
> > is partly missing. So I took that bit from the earlier version of the patch.
> Sorry for this. I should probably toss this Claws into the garbage can :/
> >
> > Please check that the below is still identical to your version 3.
> Nope - the tail is different. - Sending the patch again. Please use
> the changelog text from previous send.
> Thanks Andrew and again sorry for problems.
>
>

I have a note here that this patch needs acks, but I didn't note who
from.

Someone ack it :)


From: Jiri Pirko <[email protected]>

Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater
mark. This struct is filled as a parameter to getrusage syscall.
->ru_maxrss value is set to KBs which is the way it is done in BSD
systems. /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs
which seems to be incorrect behavior. Maintainer of this util was
notified by me with the patch which corrects it and cc'ed.

To make this happen we extend struct signal_struct by two fields. The
first one is ->maxrss which we use to store rss hiwater of the task. The
second one is ->cmaxrss which we use to store highest rss hiwater of all
task childs. These values are used in k_getrusage() to actually fill
->ru_maxrss. k_getrusage() uses current rss hiwater value directly if mm
struct exists.

Note:
exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
it is intetionally behavior. *BSD getrusage have exec() inheriting.

Test progmam and test case
===========================

getrusage.c
====
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");
}

int main(int argc, char** argv)
{
int status;
int c;
int need_sleep_before_wait = 0;
int consume_large_memory_at_first = 0;
int create_child_at_first = 0;
int sigign = 0;
int create_child_before_exec = 0;
int after_fork_test = 0;

while ((c = getopt(argc, argv, "ceflsz")) != -1) {
switch (c) {
case 'c':
create_child_at_first = 1;
break;
case 'e':
create_child_before_exec = 1;
break;
case 'f':
after_fork_test = 1;
break;
case 'l':
consume_large_memory_at_first = 1;
break;
case 's':
sigign = 1;
break;
case 'z':
need_sleep_before_wait = 1;
break;
default:
break;
}
}

if (consume_large_memory_at_first)
consume(100);

if (create_child_at_first)
system("./child -q");

if (sigign)
signal(SIGCHLD, SIG_IGN);

if (fork()) {
usleep(1);
if (need_sleep_before_wait)
sleep(3); /* children become zombie */
show_rusage("pre_wait");
wait(&status);
show_rusage("post_wait");
} else {
usleep(1);
show_rusage("fork");

if (after_fork_test) {
consume(30);
show_rusage("fork2");
}
if (create_child_before_exec) {
system("./child -lq");
usleep(1);
show_rusage("fork3");
}

execl("./child", "child", 0);
exit(0);
}

return 0;
}

child.c
====
#include <sys/types.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");

}

int main(int argc, char** argv)
{
int status;
int c;
int silent = 0;
int light_weight = 0;

while ((c = getopt(argc, argv, "lq")) != -1) {
switch (c) {
case 'l':
light_weight = 1;
break;
case 'q':
silent = 1;
break;
default:
break;
}
}

if (!silent)
show_rusage("exec");

if (fork()) {
if (light_weight)
consume(400);
else
consume(700);
wait(&status);
} else {
if (light_weight)
consume(600);
else
consume(900);

exit(0);
}

return 0;
}

testcase
========

1. inherit fork?

test way:
% ./getrusage -lc

bsd result:
fork line is "fork: self 0 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

2. inherit exec?

test way:
% ./getrusage -lce

bsd result:
fork3: self 103204 children 60000
exec: self 103204 children 60000

fork3 and exec line are the same.

-> rusage shold be inherit by exec.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

3. getrusage(RUSAGE_CHILDREN) collect grandchild statics?

test way:
% ./getrusage

bsd result:
post_wait line is about "post_wait: self 0 children 90000".

-> RUSAGE_CHILDREN can collect grandchild.

4. zombie, but not waited children collect or not?

test way:
% ./getrusage -z

bsd result:
pre_wait line is "pre_wait: self 0 children 0".

-> zombie child process (not waited-for child process)
isn't accounted.

5. SIG_IGN collect or not

test way:
% ./getrusage -s

bsd result:
post_wait line is "post_wait: self 0 children 0".

-> if SIGCHLD is ignored, children isn't accounted.

6. fork and malloc
test way:
% ./getrusage -lcf

bsd result:
fork line is "fork: self 0 children 0".
fork2 line is about "fork: self 130000 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)
but additional memory cunsumption cause right
maxrss calculation.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

fs/exec.c | 7 +++++++
include/linux/sched.h | 1 +
kernel/exit.c | 10 ++++++++++
kernel/fork.c | 1 +
kernel/sys.c | 17 +++++++++++++++++
5 files changed, 36 insertions(+)

diff -puN fs/exec.c~getrusage-fill-ru_maxrss-value fs/exec.c
--- a/fs/exec.c~getrusage-fill-ru_maxrss-value
+++ a/fs/exec.c
@@ -871,6 +871,13 @@ static int de_thread(struct task_struct
sig->notify_count = 0;

no_thread_group:
+ if (current->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
+
+ if (sig->maxrss < hiwater_rss)
+ sig->maxrss = hiwater_rss;
+ }
+
exit_itimers(sig);
flush_itimer_signals();

diff -puN include/linux/sched.h~getrusage-fill-ru_maxrss-value include/linux/sched.h
--- a/include/linux/sched.h~getrusage-fill-ru_maxrss-value
+++ a/include/linux/sched.h
@@ -588,6 +588,7 @@ struct signal_struct {
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
+ unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

/*
diff -puN kernel/exit.c~getrusage-fill-ru_maxrss-value kernel/exit.c
--- a/kernel/exit.c~getrusage-fill-ru_maxrss-value
+++ a/kernel/exit.c
@@ -1056,6 +1056,12 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(tsk->mm);
+
+ if (tsk->signal->maxrss < hiwater_rss)
+ tsk->signal->maxrss = hiwater_rss;
+ }
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1301,6 +1307,7 @@ static int wait_task_zombie(struct task_
struct signal_struct *psig;
struct signal_struct *sig;
struct task_cputime cputime;
+ unsigned long maxrss;

/*
* The resource counters for the group leader are in its
@@ -1352,6 +1359,9 @@ static int wait_task_zombie(struct task_
psig->coublock +=
task_io_get_oublock(p) +
sig->oublock + sig->coublock;
+ maxrss = max(sig->maxrss, sig->cmaxrss);
+ if (psig->cmaxrss < maxrss)
+ psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
spin_unlock_irq(&p->parent->sighand->siglock);
diff -puN kernel/fork.c~getrusage-fill-ru_maxrss-value kernel/fork.c
--- a/kernel/fork.c~getrusage-fill-ru_maxrss-value
+++ a/kernel/fork.c
@@ -862,6 +862,7 @@ static int copy_signal(unsigned long clo
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
+ sig->maxrss = sig->cmaxrss = 0;
task_io_accounting_init(&sig->ioac);
sig->sum_sched_runtime = 0;
taskstats_tgid_init(sig);
diff -puN kernel/sys.c~getrusage-fill-ru_maxrss-value kernel/sys.c
--- a/kernel/sys.c~getrusage-fill-ru_maxrss-value
+++ a/kernel/sys.c
@@ -1618,6 +1618,7 @@ static void k_getrusage(struct task_stru
unsigned long flags;
cputime_t utime, stime;
struct task_cputime cputime;
+ unsigned long maxrss = 0;

memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;
@@ -1626,6 +1627,7 @@ static void k_getrusage(struct task_stru
utime = task_utime(current);
stime = task_stime(current);
accumulate_thread_rusage(p, r);
+ maxrss = p->signal->maxrss;
goto out;
}

@@ -1643,6 +1645,7 @@ static void k_getrusage(struct task_stru
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1657,6 +1660,8 @@ static void k_getrusage(struct task_stru
r->ru_majflt += p->signal->maj_flt;
r->ru_inblock += p->signal->inblock;
r->ru_oublock += p->signal->oublock;
+ if (maxrss < p->signal->maxrss)
+ maxrss = p->signal->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
@@ -1672,6 +1677,18 @@ static void k_getrusage(struct task_stru
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ struct mm_struct *mm = get_task_mm(p);
+ if (mm) {
+ unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
+
+ if (maxrss < hiwater_rss)
+ maxrss = hiwater_rss;
+ mmput(mm);
+ }
+ }
+ r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
_

2009-04-02 21:15:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value


* Andrew Morton <[email protected]> wrote:

> I have a note here that this patch needs acks, but I didn't note who
> from.
>
> Someone ack it :)

looks good to me at a quick glance. A stupid technicality. There's
repetitive patterns of:

> + if (current->mm) {
> + unsigned long hiwater_rss = get_mm_hiwater_rss(current->mm);
> +
> + if (sig->maxrss < hiwater_rss)
> + sig->maxrss = hiwater_rss;
> + }

in about 3 separate places. Wouldnt a helper along the lines of:

sig->maxrss = mm_hiwater_rss(current->mm, sig->maxrss);

be much more readable?

The helper could be something like:

static inline unsigned long
mm_hiwater_rss(struct mm_struct *mm, unsigned long maxrss)
{
return max(maxrss, mm ? get_mm_hiwater_rss(mm) : 0);
}

much nicer?

Ingo

2009-04-05 08:50:28

by Jiri Pirko

[permalink] [raw]
Subject: [PATCH for -mm] getrusage: fill ru_maxrss value

(resend, repetitive patterns put into an inline function - not using max macro
because it's was decided not to use it in previous conversation)

From: Jiri Pirko <[email protected]>

Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater
mark. This struct is filled as a parameter to getrusage syscall.
->ru_maxrss value is set to KBs which is the way it is done in BSD
systems. /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs
which seems to be incorrect behavior. Maintainer of this util was
notified by me with the patch which corrects it and cc'ed.

To make this happen we extend struct signal_struct by two fields. The
first one is ->maxrss which we use to store rss hiwater of the task. The
second one is ->cmaxrss which we use to store highest rss hiwater of all
task childs. These values are used in k_getrusage() to actually fill
->ru_maxrss. k_getrusage() uses current rss hiwater value directly if mm
struct exists.

Note:
exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
it is intetionally behavior. *BSD getrusage have exec() inheriting.

Test progmam and test case
===========================

getrusage.c
====
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");
}

int main(int argc, char** argv)
{
int status;
int c;
int need_sleep_before_wait = 0;
int consume_large_memory_at_first = 0;
int create_child_at_first = 0;
int sigign = 0;
int create_child_before_exec = 0;
int after_fork_test = 0;

while ((c = getopt(argc, argv, "ceflsz")) != -1) {
switch (c) {
case 'c':
create_child_at_first = 1;
break;
case 'e':
create_child_before_exec = 1;
break;
case 'f':
after_fork_test = 1;
break;
case 'l':
consume_large_memory_at_first = 1;
break;
case 's':
sigign = 1;
break;
case 'z':
need_sleep_before_wait = 1;
break;
default:
break;
}
}

if (consume_large_memory_at_first)
consume(100);

if (create_child_at_first)
system("./child -q");

if (sigign)
signal(SIGCHLD, SIG_IGN);

if (fork()) {
usleep(1);
if (need_sleep_before_wait)
sleep(3); /* children become zombie */
show_rusage("pre_wait");
wait(&status);
show_rusage("post_wait");
} else {
usleep(1);
show_rusage("fork");

if (after_fork_test) {
consume(30);
show_rusage("fork2");
}
if (create_child_before_exec) {
system("./child -lq");
usleep(1);
show_rusage("fork3");
}

execl("./child", "child", 0);
exit(0);
}

return 0;
}

child.c
====
#include <sys/types.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>

static void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
usleep(1); /* BSD rusage statics need to sleep 1 tick */
}

static void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");

}

int main(int argc, char** argv)
{
int status;
int c;
int silent = 0;
int light_weight = 0;

while ((c = getopt(argc, argv, "lq")) != -1) {
switch (c) {
case 'l':
light_weight = 1;
break;
case 'q':
silent = 1;
break;
default:
break;
}
}

if (!silent)
show_rusage("exec");

if (fork()) {
if (light_weight)
consume(400);
else
consume(700);
wait(&status);
} else {
if (light_weight)
consume(600);
else
consume(900);

exit(0);
}

return 0;
}

testcase
========

1. inherit fork?

test way:
% ./getrusage -lc

bsd result:
fork line is "fork: self 0 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

2. inherit exec?

test way:
% ./getrusage -lce

bsd result:
fork3: self 103204 children 60000
exec: self 103204 children 60000

fork3 and exec line are the same.

-> rusage shold be inherit by exec.
(both RUSAGE_SELF and RUSAGE_CHILDREN)

3. getrusage(RUSAGE_CHILDREN) collect grandchild statics?

test way:
% ./getrusage

bsd result:
post_wait line is about "post_wait: self 0 children 90000".

-> RUSAGE_CHILDREN can collect grandchild.

4. zombie, but not waited children collect or not?

test way:
% ./getrusage -z

bsd result:
pre_wait line is "pre_wait: self 0 children 0".

-> zombie child process (not waited-for child process)
isn't accounted.

5. SIG_IGN collect or not

test way:
% ./getrusage -s

bsd result:
post_wait line is "post_wait: self 0 children 0".

-> if SIGCHLD is ignored, children isn't accounted.

6. fork and malloc
test way:
% ./getrusage -lcf

bsd result:
fork line is "fork: self 0 children 0".
fork2 line is about "fork: self 130000 children 0".

-> rusage sholdn't be inherit by fork.
(both RUSAGE_SELF and RUSAGE_CHILDREN)
but additional memory cunsumption cause right
maxrss calculation.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
CC: Ingo Molnar <[email protected]>
---

fs/exec.c | 3 +++
include/linux/sched.h | 10 ++++++++++
kernel/exit.c | 6 ++++++
kernel/fork.c | 1 +
kernel/sys.c | 14 ++++++++++++++
5 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 052a961..ddf6d1b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -872,6 +872,9 @@ static int de_thread(struct task_struct *tsk)
sig->notify_count = 0;

no_thread_group:
+ if (current->mm)
+ setmax_mm_hiwater_rss(&sig->maxrss, current->mm);
+
exit_itimers(sig);
flush_itimer_signals();

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9da5aa0..f138651 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -397,6 +397,15 @@ static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
return max(mm->hiwater_rss, get_mm_rss(mm));
}

+static inline void setmax_mm_hiwater_rss(unsigned long *maxrss,
+ struct mm_struct *mm)
+{
+ unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
+
+ if (*maxrss < hiwater_rss)
+ *maxrss = hiwater_rss;
+}
+
static inline unsigned long get_mm_hiwater_vm(struct mm_struct *mm)
{
return max(mm->hiwater_vm, mm->total_vm);
@@ -567,6 +576,7 @@ struct signal_struct {
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
+ unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

/*
diff --git a/kernel/exit.c b/kernel/exit.c
index 6686ed1..39600f6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -943,6 +943,8 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm)
+ setmax_mm_hiwater_rss(&tsk->signal->maxrss, tsk->mm);
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1188,6 +1190,7 @@ static int wait_task_zombie(struct task_struct *p, int options,
struct signal_struct *psig;
struct signal_struct *sig;
struct task_cputime cputime;
+ unsigned long maxrss;

/*
* The resource counters for the group leader are in its
@@ -1239,6 +1242,9 @@ static int wait_task_zombie(struct task_struct *p, int options,
psig->coublock +=
task_io_get_oublock(p) +
sig->oublock + sig->coublock;
+ maxrss = max(sig->maxrss, sig->cmaxrss);
+ if (psig->cmaxrss < maxrss)
+ psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
spin_unlock_irq(&p->parent->sighand->siglock);
diff --git a/kernel/fork.c b/kernel/fork.c
index 660c2b8..70b3a55 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -848,6 +848,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
+ sig->maxrss = sig->cmaxrss = 0;
task_io_accounting_init(&sig->ioac);
sig->sum_sched_runtime = 0;
taskstats_tgid_init(sig);
diff --git a/kernel/sys.c b/kernel/sys.c
index 51dbb55..45611b0 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1626,6 +1626,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
unsigned long flags;
cputime_t utime, stime;
struct task_cputime cputime;
+ unsigned long maxrss = 0;

memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;
@@ -1634,6 +1635,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
utime = task_utime(current);
stime = task_stime(current);
accumulate_thread_rusage(p, r);
+ maxrss = p->signal->maxrss;
goto out;
}

@@ -1651,6 +1653,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1665,6 +1668,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt += p->signal->maj_flt;
r->ru_inblock += p->signal->inblock;
r->ru_oublock += p->signal->oublock;
+ if (maxrss < p->signal->maxrss)
+ maxrss = p->signal->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
@@ -1680,6 +1685,15 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ struct mm_struct *mm = get_task_mm(p);
+ if (mm) {
+ setmax_mm_hiwater_rss(&maxrss, mm);
+ mmput(mm);
+ }
+ }
+ r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)

2009-04-05 17:25:22

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Sun, 5 Apr 2009, Jiri Pirko wrote:
> (resend, repetitive patterns put into an inline function - not using max macro
> because it's was decided not to use it in previous conversation)
>
> From: Jiri Pirko <[email protected]>
>
> Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater
> mark. This struct is filled as a parameter to getrusage syscall.
> ->ru_maxrss value is set to KBs which is the way it is done in BSD
> systems. /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs
> which seems to be incorrect behavior. Maintainer of this util was
> notified by me with the patch which corrects it and cc'ed.
>
> To make this happen we extend struct signal_struct by two fields. The
> first one is ->maxrss which we use to store rss hiwater of the task. The
> second one is ->cmaxrss which we use to store highest rss hiwater of all
> task childs. These values are used in k_getrusage() to actually fill
> ->ru_maxrss. k_getrusage() uses current rss hiwater value directly if mm
> struct exists.
>
> Note:
> exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
> it is intetionally behavior. *BSD getrusage have exec() inheriting.

Sorry, I'm finding myself quite unable to Ack this: that may well
be a measure of my density, rather than a criticism of your patch.

On the nitpicking level: I wish you'd put the args the other way
round in your setmax_mm_hiwater_rss(maxrss, mm): (mm, maxrss) would
be more conventional. Though we're gathering so many helpers for
these things, hard-to-distinguish without looking up in the header
file each time, that I wonder if they could all be refactored better.

But on the more serious level: I find I don't really understand what
this ru_maxrss number is supposed to be, so have a hard job telling
whether you've implemented it right (as to whether things are
updated in the right places, I'd rather rely on Oleg for that).

You convey the impression that it's supposed to be similar to
whatever it is in BSD, but little meaning beyond that. We agreed
before that it should be derived from hiwater_rss - though now I
google for "BSD getrusage" I find ru_maxrss described as "maximum
shared memory or current resident set", which leaves me thoroughly
confused.

I'm worrying particularly about the fork/exec issue you highlight.
You're exemplary in providing your test programs, but there's a big
omission: you don't mention that the first test, "./getrusage -lc",
gives a very different result on Linux than you say it does on BSD -
you say the BSD fork line is "fork: self 0 children 0", whereas
I find my Linux fork line is "fork: self 102636 children 0".

So after that discrepancy, I can't tell what to expect. Not that
I can make any sense of BSD's "self 0" there - I don't know how
you could present 0 there if this is related to hiwater_rss.

Now I'm seriously wondering if the ru_maxrss reported will generate
more bugreports from people puzzled as to how it should behave,
than help anyone in studying their process behaviour.

Sorry to be so negative after all this time: I genuinely hope others
will spring up to defend your patch and illustrate my stupidity.

Hugh

2009-04-06 00:21:56

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

Hi

> I'm worrying particularly about the fork/exec issue you highlight.
> You're exemplary in providing your test programs, but there's a big
> omission: you don't mention that the first test, "./getrusage -lc",
> gives a very different result on Linux than you say it does on BSD -
> you say the BSD fork line is "fork: self 0 children 0", whereas
> I find my Linux fork line is "fork: self 102636 children 0".

FreeBSD update rusage at tick updating point. (I think all bsd do that)
Then, bsd displaing 0 is bsd's problem :)

Do I must change test program?

> So after that discrepancy, I can't tell what to expect. Not that
> I can make any sense of BSD's "self 0" there - I don't know how
> you could present 0 there if this is related to hiwater_rss.
>
> Now I'm seriously wondering if the ru_maxrss reported will generate
> more bugreports from people puzzled as to how it should behave,
> than help anyone in studying their process behaviour.
>
> Sorry to be so negative after all this time: I genuinely hope others
> will spring up to defend your patch and illustrate my stupidity.




2009-04-06 07:37:52

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Mon, 6 Apr 2009, KOSAKI Motohiro wrote:
>
> > I'm worrying particularly about the fork/exec issue you highlight.
> > You're exemplary in providing your test programs, but there's a big
> > omission: you don't mention that the first test, "./getrusage -lc",
> > gives a very different result on Linux than you say it does on BSD -
> > you say the BSD fork line is "fork: self 0 children 0", whereas
> > I find my Linux fork line is "fork: self 102636 children 0".
>
> FreeBSD update rusage at tick updating point. (I think all bsd do that)
> Then, bsd displaing 0 is bsd's problem :)

Ah, thank you.

>
> Do I must change test program?

Apparently somebody needs to, please; though it appears to be already
well supplied with usleep(1)s - maybe they needed to be usleep(2)s?

And then change results shown in the changelog, and check conclusions
drawn from them (if BSD is behaving as we do, it should still show
maxrss not inherited over fork, but less obviously - the number goes
down slightly, because the history is lost, but nowhere near to zero).

Hugh

2009-06-17 20:22:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Mon, 6 Apr 2009 08:22:07 +0100 (BST)
Hugh Dickins <[email protected]> wrote:

> On Mon, 6 Apr 2009, KOSAKI Motohiro wrote:
> >
> > > I'm worrying particularly about the fork/exec issue you highlight.
> > > You're exemplary in providing your test programs, but there's a big
> > > omission: you don't mention that the first test, "./getrusage -lc",
> > > gives a very different result on Linux than you say it does on BSD -
> > > you say the BSD fork line is "fork: self 0 children 0", whereas
> > > I find my Linux fork line is "fork: self 102636 children 0".
> >
> > FreeBSD update rusage at tick updating point. (I think all bsd do that)
> > Then, bsd displaing 0 is bsd's problem :)
>
> Ah, thank you.
>
> >
> > Do I must change test program?
>
> Apparently somebody needs to, please; though it appears to be already
> well supplied with usleep(1)s - maybe they needed to be usleep(2)s?
>
> And then change results shown in the changelog, and check conclusions
> drawn from them (if BSD is behaving as we do, it should still show
> maxrss not inherited over fork, but less obviously - the number goes
> down slightly, because the history is lost, but nowhere near to zero).
>

afaik none of this happened, so I have the patch on hold.

2009-06-18 00:57:50

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

> On Mon, 6 Apr 2009 08:22:07 +0100 (BST)
> Hugh Dickins <[email protected]> wrote:
>
> > On Mon, 6 Apr 2009, KOSAKI Motohiro wrote:
> > >
> > > > I'm worrying particularly about the fork/exec issue you highlight.
> > > > You're exemplary in providing your test programs, but there's a big
> > > > omission: you don't mention that the first test, "./getrusage -lc",
> > > > gives a very different result on Linux than you say it does on BSD -
> > > > you say the BSD fork line is "fork: self 0 children 0", whereas
> > > > I find my Linux fork line is "fork: self 102636 children 0".
> > >
> > > FreeBSD update rusage at tick updating point. (I think all bsd do that)
> > > Then, bsd displaing 0 is bsd's problem :)
> >
> > Ah, thank you.
> >
> > >
> > > Do I must change test program?
> >
> > Apparently somebody needs to, please; though it appears to be already
> > well supplied with usleep(1)s - maybe they needed to be usleep(2)s?
> >
> > And then change results shown in the changelog, and check conclusions
> > drawn from them (if BSD is behaving as we do, it should still show
> > maxrss not inherited over fork, but less obviously - the number goes
> > down slightly, because the history is lost, but nowhere near to zero).
> >
>
> afaik none of this happened, so I have the patch on hold.

Grr, my fault.
I recognize it. sorry.




2009-09-07 02:58:40

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

> > On Mon, 6 Apr 2009 08:22:07 +0100 (BST)
> > Hugh Dickins <[email protected]> wrote:
> >
> > > On Mon, 6 Apr 2009, KOSAKI Motohiro wrote:
> > > >
> > > > > I'm worrying particularly about the fork/exec issue you highlight.
> > > > > You're exemplary in providing your test programs, but there's a big
> > > > > omission: you don't mention that the first test, "./getrusage -lc",
> > > > > gives a very different result on Linux than you say it does on BSD -
> > > > > you say the BSD fork line is "fork: self 0 children 0", whereas
> > > > > I find my Linux fork line is "fork: self 102636 children 0".
> > > >
> > > > FreeBSD update rusage at tick updating point. (I think all bsd do that)
> > > > Then, bsd displaing 0 is bsd's problem :)
> > >
> > > Ah, thank you.
> > >
> > > >
> > > > Do I must change test program?
> > >
> > > Apparently somebody needs to, please; though it appears to be already
> > > well supplied with usleep(1)s - maybe they needed to be usleep(2)s?
> > >
> > > And then change results shown in the changelog, and check conclusions
> > > drawn from them (if BSD is behaving as we do, it should still show
> > > maxrss not inherited over fork, but less obviously - the number goes
> > > down slightly, because the history is lost, but nowhere near to zero).
> > >
> >
> > afaik none of this happened, so I have the patch on hold.
>
> Grr, my fault.
> I recognize it. sorry.

I've finished my long pending homework ;)

Andrew, can you please replace following patch with getrusage-fill-ru_maxrss-value.patch
and getrusage-fill-ru_maxrss-value-update.patch?



ChangeLog
===============================
o Merge getrusage-fill-ru_maxrss-value.patch and getrusage-fill-ru_maxrss-value-update.patch
o rewrote test programs (older version hit FreeBSD bug and it obfuscate testcase intention, thanks Hugh)


========================================================
From: Jiri Pirko <[email protected]>
Subject: [PATCH] getrusage: fill ru_maxrss value

Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater
mark. This struct is filled as a parameter to getrusage syscall.
->ru_maxrss value is set to KBs which is the way it is done in BSD
systems. /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs
which seems to be incorrect behavior. Maintainer of this util was
notified by me with the patch which corrects it and cc'ed.

To make this happen we extend struct signal_struct by two fields. The
first one is ->maxrss which we use to store rss hiwater of the task. The
second one is ->cmaxrss which we use to store highest rss hiwater of all
task childs. These values are used in k_getrusage() to actually fill
->ru_maxrss. k_getrusage() uses current rss hiwater value directly if mm
struct exists.

Note:
exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
it is intetionally behavior. *BSD getrusage have exec() inheriting.

test programs
========================================================

getrusage.c
-----------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>

#include "common.h"

#define err(str) perror(str), exit(1)

int main(int argc, char** argv)
{
int status;

printf("allocate 100MB\n");
consume(100);

printf("testcase1: fork inherit? \n");
printf(" expect: initial.self ~= child.self\n");
show_rusage("initial");
if (__fork()) {
wait(&status);
} else {
show_rusage("fork child");
_exit(0);
}
printf("\n");

printf("testcase2: fork inherit? (cont.) \n");
printf(" expect: initial.children ~= 100MB, but child.children = 0\n");
show_rusage("initial");
if (__fork()) {
wait(&status);
} else {
show_rusage("child");
_exit(0);
}
printf("\n");

printf("testcase3: fork + malloc \n");
printf(" expect: child.self ~= initial.self + 50MB\n");
show_rusage("initial");
if (__fork()) {
wait(&status);
} else {
printf("allocate +50MB\n");
consume(50);
show_rusage("fork child");
_exit(0);
}
printf("\n");

printf("testcase4: grandchild maxrss\n");
printf(" expect: post_wait.children ~= 300MB\n");
show_rusage("initial");
if (__fork()) {
wait(&status);
show_rusage("post_wait");
} else {
system("./child -n 0 -g 300");
_exit(0);
}
printf("\n");

printf("testcase5: zombie\n");
printf(" expect: pre_wait ~= initial, IOW the zombie process is not accounted.\n");
printf(" post_wait ~= 400MB, IOW wait() collect child's max_rss. \n");
show_rusage("initial");
if (__fork()) {
sleep(1); /* children become zombie */
show_rusage("pre_wait");
wait(&status);
show_rusage("post_wait");
} else {
system("./child -n 400");
_exit(0);
}
printf("\n");

printf("testcase6: SIG_IGN\n");
printf(" expect: initial ~= after_zombie (child's 500MB alloc should be ignored).\n");
show_rusage("initial");
signal(SIGCHLD, SIG_IGN);
if (__fork()) {
sleep(1); /* children become zombie */
show_rusage("after_zombie");
} else {
system("./child -n 500");
_exit(0);
}
printf("\n");
signal(SIGCHLD, SIG_DFL);

printf("testcase7: exec (without fork) \n");
printf(" expect: initial ~= exec \n");
show_rusage("initial");
execl("./child", "child", "-v", NULL);

return 0;
}

child.c
------------------------
#include <sys/types.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>

#include "common.h"

int main(int argc, char** argv)
{
int status;
int c;
long consume_size = 0;
long grandchild_consume_size = 0;
int show = 0;

while ((c = getopt(argc, argv, "n:g:v")) != -1) {
switch (c) {
case 'n':
consume_size = atol(optarg);
break;
case 'v':
show = 1;
break;
case 'g':

grandchild_consume_size = atol(optarg);
break;
default:
break;
}
}

if (show)
show_rusage("exec");

if (consume_size) {
printf("child alloc %ldMB\n", consume_size);
consume(consume_size);
}

if (grandchild_consume_size) {
if (fork()) {
wait(&status);
} else {
printf("grandchild alloc %ldMB\n", grandchild_consume_size);
consume(grandchild_consume_size);

exit(0);
}
}

return 0;
}

common.c
--------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>

#include "common.h"
#define err(str) perror(str), exit(1)

void show_rusage(char *prefix)
{
int err, err2;
struct rusage rusage_self;
struct rusage rusage_children;

printf("%s: ", prefix);
err = getrusage(RUSAGE_SELF, &rusage_self);
if (!err)
printf("self %ld ", rusage_self.ru_maxrss);
err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
if (!err2)
printf("children %ld ", rusage_children.ru_maxrss);

printf("\n");
}

/* Some buggy OS need this worthless CPU waste. */
void make_pagefault(void)
{
void *addr;
int size = getpagesize();
int i;

for (i=0; i<1000; i++) {
addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
if (addr == MAP_FAILED)
err("make_pagefault");
memset(addr, 0, size);
munmap(addr, size);
}
}

void consume(int mega)
{
size_t sz = mega * 1024 * 1024;
void *ptr;

ptr = malloc(sz);
memset(ptr, 0, sz);
make_pagefault();
}

pid_t __fork(void)
{
pid_t pid;

pid = fork();
make_pagefault();

return pid;
}

common.h
----------------------
void show_rusage(char *prefix);
void make_pagefault(void);
void consume(int mega);
pid_t __fork(void);

FreeBSD result (expected result)
========================================================
allocate 100MB
testcase1: fork inherit?
expect: initial.self ~= child.self
initial: self 103492 children 0
fork child: self 103540 children 0

testcase2: fork inherit? (cont.)
expect: initial.children ~= 100MB, but child.children = 0
initial: self 103540 children 103540
child: self 103564 children 0

testcase3: fork + malloc
expect: child.self ~= initial.self + 50MB
initial: self 103564 children 103564
allocate +50MB
fork child: self 154860 children 0

testcase4: grandchild maxrss
expect: post_wait.children ~= 300MB
initial: self 103564 children 154860
grandchild alloc 300MB
post_wait: self 103564 children 308720

testcase5: zombie
expect: pre_wait ~= initial, IOW the zombie process is not accounted.
post_wait ~= 400MB, IOW wait() collect child's max_rss.
initial: self 103564 children 308720
child alloc 400MB
pre_wait: self 103564 children 308720
post_wait: self 103564 children 411312

testcase6: SIG_IGN
expect: initial ~= after_zombie (child's 500MB alloc should be ignored).
initial: self 103564 children 411312
child alloc 500MB
after_zombie: self 103624 children 411312

testcase7: exec (without fork)
expect: initial ~= exec
initial: self 103624 children 411312
exec: self 103624 children 411312

Linux result (actual test result)
========================================================
allocate 100MB
testcase1: fork inherit?
expect: initial.self ~= child.self
initial: self 102848 children 0
fork child: self 102572 children 0

testcase2: fork inherit? (cont.)
expect: initial.children ~= 100MB, but child.children = 0
initial: self 102876 children 102644
child: self 102572 children 0

testcase3: fork + malloc
expect: child.self ~= initial.self + 50MB
initial: self 102876 children 102644
allocate +50MB
fork child: self 153804 children 0

testcase4: grandchild maxrss
expect: post_wait.children ~= 300MB
initial: self 102876 children 153864
grandchild alloc 300MB
post_wait: self 102876 children 307536

testcase5: zombie
expect: pre_wait ~= initial, IOW the zombie process is not accounted.
post_wait ~= 400MB, IOW wait() collect child's max_rss.
initial: self 102876 children 307536
child alloc 400MB
pre_wait: self 102876 children 307536
post_wait: self 102876 children 410076

testcase6: SIG_IGN
expect: initial ~= after_zombie (child's 500MB alloc should be ignored).
initial: self 102876 children 410076
child alloc 500MB
after_zombie: self 102880 children 410076

testcase7: exec (without fork)
expect: initial ~= exec
initial: self 102880 children 410076
exec: self 102880 children 410076

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Hugh Dickins <[email protected]>
---
fs/exec.c | 3 +++
include/linux/sched.h | 10 ++++++++++
kernel/exit.c | 6 ++++++
kernel/fork.c | 1 +
kernel/sys.c | 14 ++++++++++++++
5 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index e51e7c2..a231af6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -849,6 +849,9 @@ static int de_thread(struct task_struct *tsk)
sig->notify_count = 0;

no_thread_group:
+ if (current->mm)
+ setmax_mm_hiwater_rss(&sig->maxrss, current->mm);
+
exit_itimers(sig);
flush_itimer_signals();

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 27da112..3a2ef73 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -426,6 +426,15 @@ static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
return max(mm->hiwater_rss, get_mm_rss(mm));
}

+static inline void setmax_mm_hiwater_rss(unsigned long *maxrss,
+ struct mm_struct *mm)
+{
+ unsigned long hiwater_rss = get_mm_hiwater_rss(mm);
+
+ if (*maxrss < hiwater_rss)
+ *maxrss = hiwater_rss;
+}
+
static inline unsigned long get_mm_hiwater_vm(struct mm_struct *mm)
{
return max(mm->hiwater_vm, mm->total_vm);
@@ -637,6 +646,7 @@ struct signal_struct {
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
+ unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

/*
diff --git a/kernel/exit.c b/kernel/exit.c
index 2d273fd..18735bc 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -948,6 +948,8 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm)
+ setmax_mm_hiwater_rss(&tsk->signal->maxrss, tsk->mm);
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1203,6 +1205,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
if (likely(!traced) && likely(!task_detached(p))) {
struct signal_struct *psig;
struct signal_struct *sig;
+ unsigned long maxrss;

/*
* The resource counters for the group leader are in its
@@ -1251,6 +1254,9 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
psig->coublock +=
task_io_get_oublock(p) +
sig->oublock + sig->coublock;
+ maxrss = max(sig->maxrss, sig->cmaxrss);
+ if (psig->cmaxrss < maxrss)
+ psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
spin_unlock_irq(&p->real_parent->sighand->siglock);
diff --git a/kernel/fork.c b/kernel/fork.c
index fdc6467..58cd45a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -889,6 +889,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
+ sig->maxrss = sig->cmaxrss = 0;
task_io_accounting_init(&sig->ioac);
sig->sum_sched_runtime = 0;
taskstats_tgid_init(sig);
diff --git a/kernel/sys.c b/kernel/sys.c
index 41e02ef..a7e36b6 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1338,6 +1338,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
unsigned long flags;
cputime_t utime, stime;
struct task_cputime cputime;
+ unsigned long maxrss = 0;

memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;
@@ -1346,6 +1347,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
utime = task_utime(current);
stime = task_stime(current);
accumulate_thread_rusage(p, r);
+ maxrss = p->signal->maxrss;
goto out;
}

@@ -1363,6 +1365,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1377,6 +1380,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt += p->signal->maj_flt;
r->ru_inblock += p->signal->inblock;
r->ru_oublock += p->signal->oublock;
+ if (maxrss < p->signal->maxrss)
+ maxrss = p->signal->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
@@ -1392,6 +1397,15 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ struct mm_struct *mm = get_task_mm(p);
+ if (mm) {
+ setmax_mm_hiwater_rss(&maxrss, mm);
+ mmput(mm);
+ }
+ }
+ r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
--
1.6.0.6




2009-09-09 20:47:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Mon, 7 Sep 2009 11:58:36 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> > Grr, my fault.
> > I recognize it. sorry.
>
> I've finished my long pending homework ;)
>
> Andrew, can you please replace following patch with getrusage-fill-ru_maxrss-value.patch
> and getrusage-fill-ru_maxrss-value-update.patch?
>
>
>
> ChangeLog
> ===============================
> o Merge getrusage-fill-ru_maxrss-value.patch and getrusage-fill-ru_maxrss-value-update.patch
> o rewrote test programs (older version hit FreeBSD bug and it obfuscate testcase intention, thanks Hugh)

The code changes are unaltered, so I merely updated the changelog.

The changelog had lots of ^------- lines in it. But those are
conventionally the end-of-changelog separator so I rewrote them to
^=======

2009-09-09 23:17:27

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

> On Mon, 7 Sep 2009 11:58:36 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > Grr, my fault.
> > > I recognize it. sorry.
> >
> > I've finished my long pending homework ;)
> >
> > Andrew, can you please replace following patch with getrusage-fill-ru_maxrss-value.patch
> > and getrusage-fill-ru_maxrss-value-update.patch?
> >
> >
> >
> > ChangeLog
> > ===============================
> > o Merge getrusage-fill-ru_maxrss-value.patch and getrusage-fill-ru_maxrss-value-update.patch
> > o rewrote test programs (older version hit FreeBSD bug and it obfuscate testcase intention, thanks Hugh)
>
> The code changes are unaltered, so I merely updated the changelog.

I see. thanks.


> The changelog had lots of ^------- lines in it. But those are
> conventionally the end-of-changelog separator so I rewrote them to
> ^=======

sorry, I have stupid question.
I thought "--" and "---" have special meaning. but other length "-" are safe.
Is this incorrect?

or You mean it's easy confusing bad style?


2009-09-09 23:33:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

On Thu, 10 Sep 2009 08:17:27 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

>
> > The changelog had lots of ^------- lines in it. But those are
> > conventionally the end-of-changelog separator so I rewrote them to
> > ^=======
>
> sorry, I have stupid question.
> I thought "--" and "---" have special meaning. but other length "-" are safe.
> Is this incorrect?
>
> or You mean it's easy confusing bad style?

Ideally, ^---$ is the only pattern we need to worry about.

In the real world, ^-------- might trigger people's sloppy scripts so
it's best to be safe and avoid it altogether.

2009-09-09 23:37:56

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH for -mm] getrusage: fill ru_maxrss value

> On Thu, 10 Sep 2009 08:17:27 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> >
> > > The changelog had lots of ^------- lines in it. But those are
> > > conventionally the end-of-changelog separator so I rewrote them to
> > > ^=======
> >
> > sorry, I have stupid question.
> > I thought "--" and "---" have special meaning. but other length "-" are safe.
> > Is this incorrect?
> >
> > or You mean it's easy confusing bad style?
>
> Ideally, ^---$ is the only pattern we need to worry about.
>
> In the real world, ^-------- might trigger people's sloppy scripts so
> it's best to be safe and avoid it altogether.

Ah I see.
Thank you!