2008-12-18 13:00:26

by Jiri Pirko

[permalink] [raw]
Subject: [PATCH, RESEND3] getrusage: fill ru_maxrss value

(updated, updated changelog)

This patch makes ->ru_maxrss value in struct rusage filled accordingly to
rss hiwater mark. This struct is filled as a parameter to
getrusage syscall. ->ru_maxrss value is set to KBs which is the way it
is done in BSD systems. /usr/bin/time (gnu time) application converts
->ru_maxrss to KBs which seems to be incorrect behavior. Maintainer of
this util was notified by me with the patch which corrects it and cc'ed.

To make this happen we extend struct signal_struct by two fields. The
first one is ->maxrss which we use to store rss hiwater of the task. The
second one is ->cmaxrss which we use to store highest rss hiwater of all
task childs. These values are used in k_getrusage() to actually fill
->ru_maxrss. k_getrusage() uses current rss hiwater value directly
if mm struct exists.

We clear the ->maxrss as a part of flush_old_exec() to be consistent
because bprm_mm_init() does not copy ->hiwater_rss.

Note that we use recently introduced get_mm_hiwater_rss() helper to
actually get the rss hiwater value:
http://lkml.org/lkml/2008/12/12/172


Signed-off-by: Jiri Pirko <[email protected]>
---
fs/exec.c | 1 +
include/linux/sched.h | 1 +
kernel/exit.c | 4 ++++
kernel/fork.c | 1 +
kernel/sys.c | 15 +++++++++++++++
5 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index ec5df9a..8d3d0f9 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -870,6 +870,7 @@ static int de_thread(struct task_struct *tsk)
sig->notify_count = 0;

no_thread_group:
+ sig->maxrss = 0;
exit_itimers(sig);
flush_itimer_signals();
if (leader)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f4c70dc..41b04ee 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -563,6 +563,7 @@ struct signal_struct {
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
struct task_io_accounting ioac;
+ unsigned long maxrss, cmaxrss;

/*
* We don't bother to synchronize most readers of this at all,
diff --git a/kernel/exit.c b/kernel/exit.c
index 81b6372..61d622d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1053,6 +1053,8 @@ NORET_TYPE void do_exit(long code)
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
+ if (tsk->mm)
+ tsk->signal->maxrss = get_mm_hiwater_rss(tsk->mm);
}
acct_collect(code, group_dead);
if (group_dead)
@@ -1351,6 +1353,8 @@ static int wait_task_zombie(struct task_struct *p, int options,
sig->oublock + sig->coublock;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
+ if (psig->cmaxrss < sig->maxrss)
+ psig->cmaxrss = sig->maxrss;
spin_unlock_irq(&p->parent->sighand->siglock);
}

diff --git a/kernel/fork.c b/kernel/fork.c
index 495da2e..36ac3e5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -850,6 +850,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
task_io_accounting_init(&sig->ioac);
+ sig->maxrss = sig->cmaxrss = 0;
taskstats_tgid_init(sig);

task_lock(current->group_leader);
diff --git a/kernel/sys.c b/kernel/sys.c
index 31deba8..f369099 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1569,6 +1569,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ r->ru_maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1583,6 +1584,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt += p->signal->maj_flt;
r->ru_inblock += p->signal->inblock;
r->ru_oublock += p->signal->oublock;
+ if (r->ru_maxrss < p->signal->maxrss)
+ r->ru_maxrss = p->signal->maxrss;
t = p;
do {
accumulate_thread_rusage(t, r);
@@ -1598,6 +1601,18 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
out:
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
+
+ if (who != RUSAGE_CHILDREN) {
+ task_lock(p);
+ if (p->mm) {
+ unsigned long maxrss = get_mm_hiwater_rss(p->mm);
+
+ if (r->ru_maxrss < maxrss)
+ r->ru_maxrss = maxrss;
+ }
+ task_unlock(p);
+ }
+ r->ru_maxrss <<= PAGE_SHIFT - 10;
}

int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
--
1.6.0.4


2008-12-24 07:38:26

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH, RESEND3] getrusage: fill ru_maxrss value

Hi

> diff --git a/fs/exec.c b/fs/exec.c
> index ec5df9a..8d3d0f9 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -870,6 +870,7 @@ static int de_thread(struct task_struct *tsk)
> sig->notify_count = 0;
>
> no_thread_group:
> + sig->maxrss = 0;
> exit_itimers(sig);
> flush_itimer_signals();
> if (leader)

I don't know getrusage correct behavior so detail.
Why don't update parent process's sig->cmaxrss ?


> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f4c70dc..41b04ee 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -563,6 +563,7 @@ struct signal_struct {
> unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
> unsigned long inblock, oublock, cinblock, coublock;
> struct task_io_accounting ioac;
> + unsigned long maxrss, cmaxrss;

unsigned long inblock, oublock, cinblock, coublock;
unsigned long maxrss, cmaxrss;
struct task_io_accounting ioac;

is better.
I like related member sit on nearly place.



> diff --git a/kernel/exit.c b/kernel/exit.c
> index 81b6372..61d622d 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1053,6 +1053,8 @@ NORET_TYPE void do_exit(long code)
> if (group_dead) {
> hrtimer_cancel(&tsk->signal->real_timer);
> exit_itimers(tsk->signal);
> + if (tsk->mm)
> + tsk->signal->maxrss = get_mm_hiwater_rss(tsk->mm);
> }
> acct_collect(code, group_dead);
> if (group_dead)

At first look, I think "hm, group_dead mean this thread is last thread.
hehe, this assignment is obiously meaningless".
but it is wrong. you inserted maxrss usage in wait_task_zombie().

I think this is a bit confusable code.
I hope you add kindly comment here.



> @@ -1351,6 +1353,8 @@ static int wait_task_zombie(struct task_struct *p, int options,
> sig->oublock + sig->coublock;
> task_io_accounting_add(&psig->ioac, &p->ioac);
> task_io_accounting_add(&psig->ioac, &sig->ioac);
> + if (psig->cmaxrss < sig->maxrss)
> + psig->cmaxrss = sig->maxrss;
> spin_unlock_irq(&p->parent->sighand->siglock);
> }

ditto. I like following order.

sig->oublock + sig->coublock;
if (psig->cmaxrss < sig->maxrss)
psig->cmaxrss = sig->maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);



> diff --git a/kernel/fork.c b/kernel/fork.c
> index 495da2e..36ac3e5 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -850,6 +850,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
> sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
> sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
> task_io_accounting_init(&sig->ioac);
> + sig->maxrss = sig->cmaxrss = 0;
> taskstats_tgid_init(sig);
>
> task_lock(current->group_leader);
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 31deba8..f369099 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1569,6 +1569,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> r->ru_majflt = p->signal->cmaj_flt;
> r->ru_inblock = p->signal->cinblock;
> r->ru_oublock = p->signal->coublock;
> + r->ru_maxrss = p->signal->cmaxrss;
>
> if (who == RUSAGE_CHILDREN)
> break;
> @@ -1583,6 +1584,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> r->ru_majflt += p->signal->maj_flt;
> r->ru_inblock += p->signal->inblock;
> r->ru_oublock += p->signal->oublock;
> + if (r->ru_maxrss < p->signal->maxrss)
> + r->ru_maxrss = p->signal->maxrss;
> t = p;
> do {
> accumulate_thread_rusage(t, r);
> @@ -1598,6 +1601,18 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> out:
> cputime_to_timeval(utime, &r->ru_utime);
> cputime_to_timeval(stime, &r->ru_stime);
> +
> + if (who != RUSAGE_CHILDREN) {
> + task_lock(p);
> + if (p->mm) {
> + unsigned long maxrss = get_mm_hiwater_rss(p->mm);
> +
> + if (r->ru_maxrss < maxrss)
> + r->ru_maxrss = maxrss;
> + }
> + task_unlock(p);

get_task_mm() and mmput() instead task_lock() is better?

and, why don't this code move to "case RUSAGE_SELF" processing point?


> + }
> + r->ru_maxrss <<= PAGE_SHIFT - 10;
> }

using local variable is better because local variable can stay on
register by compiler easily, but indirect access doesn't.

r->ru_maxrss = maxrss << PAGE_SHIFT - 10;
(similar utime and r->ru_utime)

and we need good comment. e.g. /* Convert to KB */
or good macros (likely linux/fs/proc/meminfo.c::K() macro)


>
> int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
> --
> 1.6.0.4
>

In addision, you also need change man pages and notice to linux-api mailing list.



2008-12-24 11:18:00

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH, RESEND3] getrusage: fill ru_maxrss value

On Wed, 24 Dec 2008 16:37:55 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> Hi
Hi yourself.
>
> > diff --git a/fs/exec.c b/fs/exec.c
> > index ec5df9a..8d3d0f9 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -870,6 +870,7 @@ static int de_thread(struct task_struct *tsk)
> > sig->notify_count = 0;
> >
> > no_thread_group:
> > + sig->maxrss = 0;
> > exit_itimers(sig);
> > flush_itimer_signals();
> > if (leader)
>
> I don't know getrusage correct behavior so detail.
> Why don't update parent process's sig->cmaxrss ?
Because exec affects only this task and we want to forgot maxrss value.
That does not implicate that we want to forgot highest maxrss value of
our childs because exec does not affect them. I think this is right
behavior.


>
>
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index f4c70dc..41b04ee 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -563,6 +563,7 @@ struct signal_struct {
> > unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
> > unsigned long inblock, oublock, cinblock, coublock;
> > struct task_io_accounting ioac;
> > + unsigned long maxrss, cmaxrss;
>
> unsigned long inblock, oublock, cinblock, coublock;
> unsigned long maxrss, cmaxrss;
> struct task_io_accounting ioac;
>
> is better.
> I like related member sit on nearly place.
No problem with this.


>
>
>
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 81b6372..61d622d 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -1053,6 +1053,8 @@ NORET_TYPE void do_exit(long code)
> > if (group_dead) {
> > hrtimer_cancel(&tsk->signal->real_timer);
> > exit_itimers(tsk->signal);
> > + if (tsk->mm)
> > + tsk->signal->maxrss = get_mm_hiwater_rss(tsk->mm);
> > }
> > acct_collect(code, group_dead);
> > if (group_dead)
>
> At first look, I think "hm, group_dead mean this thread is last thread.
> hehe, this assignment is obiously meaningless".
> but it is wrong. you inserted maxrss usage in wait_task_zombie().
>
> I think this is a bit confusable code.
> I hope you add kindly comment here.
>
Ok I will add comment here.


>
>
> > @@ -1351,6 +1353,8 @@ static int wait_task_zombie(struct task_struct *p, int options,
> > sig->oublock + sig->coublock;
> > task_io_accounting_add(&psig->ioac, &p->ioac);
> > task_io_accounting_add(&psig->ioac, &sig->ioac);
> > + if (psig->cmaxrss < sig->maxrss)
> > + psig->cmaxrss = sig->maxrss;
> > spin_unlock_irq(&p->parent->sighand->siglock);
> > }
>
> ditto. I like following order.
>
> sig->oublock + sig->coublock;
> if (psig->cmaxrss < sig->maxrss)
> psig->cmaxrss = sig->maxrss;
> task_io_accounting_add(&psig->ioac, &p->ioac);
> task_io_accounting_add(&psig->ioac, &sig->ioac);
No problem with this.


>
>
>
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 495da2e..36ac3e5 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -850,6 +850,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
> > sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
> > sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
> > task_io_accounting_init(&sig->ioac);
> > + sig->maxrss = sig->cmaxrss = 0;
> > taskstats_tgid_init(sig);
> >
> > task_lock(current->group_leader);
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 31deba8..f369099 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -1569,6 +1569,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> > r->ru_majflt = p->signal->cmaj_flt;
> > r->ru_inblock = p->signal->cinblock;
> > r->ru_oublock = p->signal->coublock;
> > + r->ru_maxrss = p->signal->cmaxrss;
> >
> > if (who == RUSAGE_CHILDREN)
> > break;
> > @@ -1583,6 +1584,8 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> > r->ru_majflt += p->signal->maj_flt;
> > r->ru_inblock += p->signal->inblock;
> > r->ru_oublock += p->signal->oublock;
> > + if (r->ru_maxrss < p->signal->maxrss)
> > + r->ru_maxrss = p->signal->maxrss;
> > t = p;
> > do {
> > accumulate_thread_rusage(t, r);
> > @@ -1598,6 +1601,18 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> > out:
> > cputime_to_timeval(utime, &r->ru_utime);
> > cputime_to_timeval(stime, &r->ru_stime);
> > +
> > + if (who != RUSAGE_CHILDREN) {
> > + task_lock(p);
> > + if (p->mm) {
> > + unsigned long maxrss = get_mm_hiwater_rss(p->mm);
> > +
> > + if (r->ru_maxrss < maxrss)
> > + r->ru_maxrss = maxrss;
> > + }
> > + task_unlock(p);
>
> get_task_mm() and mmput() instead task_lock() is better?
Maybe it's better looking. I wanted to use these too. Oleg suggested to
optimize the way it is in the patch. I can change it, no problem.


>
> and, why don't this code move to "case RUSAGE_SELF" processing point?
Because we need this to be done for RUSAGE_THREAD too. Or don't we?


>
>
> > + }
> > + r->ru_maxrss <<= PAGE_SHIFT - 10;
> > }
>
> using local variable is better because local variable can stay on
> register by compiler easily, but indirect access doesn't.
OK


>
> r->ru_maxrss = maxrss << PAGE_SHIFT - 10;
> (similar utime and r->ru_utime)
>
> and we need good comment. e.g. /* Convert to KB */
> or good macros (likely linux/fs/proc/meminfo.c::K() macro)
OK


>
>
> >
> > int getrusage(struct task_struct *p, int who, struct rusage __user *ru)
> > --
> > 1.6.0.4
> >
>
> In addision, you also need change man pages and notice to linux-api mailing list.
I cc'ed this list while I was sending the patch. I was not aware I
should change the man page. Will do that too.


Thanks a lot for comments Kosaki

Regards.

Jirka
>
>
>
>

2008-12-24 17:44:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH, RESEND3] getrusage: fill ru_maxrss value

On 12/24, Jiri Pirko wrote:
>
> On Wed, 24 Dec 2008 16:37:55 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > @@ -870,6 +870,7 @@ static int de_thread(struct task_struct *tsk)
> > > sig->notify_count = 0;
> > >
> > > no_thread_group:
> > > + sig->maxrss = 0;
> > > exit_itimers(sig);
> > > flush_itimer_signals();
> > > if (leader)
> >
> > I don't know getrusage correct behavior so detail.
> > Why don't update parent process's sig->cmaxrss ?
> Because exec affects only this task and we want to forgot maxrss value.
> That does not implicate that we want to forgot highest maxrss value of
> our childs because exec does not affect them. I think this is right
> behavior.

Well, I don't understand the explanation above, but I agree with the
code ;)

parent process's sig->cmaxrss, like other parent->signal->cxxxx fields
should only be changed by wait_task_zombie(), ->cmaxrss is not special.

The real question is why do we clear sig->maxrss on exec. Again, I agree
with the patch because this is consistent with xacct_add_tsk/etc, but
it is not clear to me whether this right or not.

Yes, technically this is right because we have the new ->mm after exec,
but this is "transparent" for the user-space. For example, we don't
clear ->min_flt on exec...

Perhaps it makes sense to change the bprm_mm_init's path to do

if (!PF_FORKNOEXEC)
new_mm->hiwater_xxx = old_mm->hiwater_xxx;

But once again, imho the patch does the right thing for now.

> > > unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
> > > unsigned long inblock, oublock, cinblock, coublock;
> > > struct task_io_accounting ioac;
> > > + unsigned long maxrss, cmaxrss;
> >
> > unsigned long inblock, oublock, cinblock, coublock;
> > unsigned long maxrss, cmaxrss;
> > struct task_io_accounting ioac;
> >
> > is better.
> > I like related member sit on nearly place.
> No problem with this.

Agreed.

> > > + if (who != RUSAGE_CHILDREN) {
> > > + task_lock(p);
> > > + if (p->mm) {
> > > + unsigned long maxrss = get_mm_hiwater_rss(p->mm);
> > > +
> > > + if (r->ru_maxrss < maxrss)
> > > + r->ru_maxrss = maxrss;
> > > + }
> > > + task_unlock(p);
> >
> > get_task_mm() and mmput() instead task_lock() is better?
> Maybe it's better looking. I wanted to use these too. Oleg suggested to
> optimize the way it is in the patch. I can change it, no problem.

I have no problem with get_task_mm() too, the optimization is minor.

(btw, that is why I didn't like the fact we discussed this patch privately,
imho it is always better to do on lkml).

> > > + r->ru_maxrss <<= PAGE_SHIFT - 10;
> > > }
> >
> > using local variable is better because local variable can stay on
> > register by compiler easily, but indirect access doesn't.
> OK

Not that I have a strong opinion, but the code looks simpler without
the local var. Up to you.

> > and we need good comment. e.g. /* Convert to KB */
> > or good macros (likely linux/fs/proc/meminfo.c::K() macro)

Yes! Please ;) I just can't parse the code above, I am not compiler.
Even

r->ru_maxrss *= (PAGE_SIZE / 1024); /* convert pages to KBs */

is much better, imho. Or at least just add the comment, so the
poor reader can understand what the code does without parsing.

> > In addision, you also need change man pages and notice to linux-api mailing list.
> I cc'ed this list while I was sending the patch.

Hmm. The biggest mistake with this patch is that you lost the
CC list completely ;) Adding Hugh.

> I was not aware I
> should change the man page. Will do that too.

Well. I don't think you must change the man page. Of course it
would be nice if you do, but in a separate patch please. But
you must cc Michael at least ;)

Oleg.

2008-12-25 00:11:25

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH, RESEND3] getrusage: fill ru_maxrss value

On Wed, 24 Dec 2008 18:42:48 +0100
Oleg Nesterov <[email protected]> wrote:


> > > and we need good comment. e.g. /* Convert to KB */
> > > or good macros (likely linux/fs/proc/meminfo.c::K() macro)
>
> Yes! Please ;) I just can't parse the code above, I am not compiler.
> Even
>
> r->ru_maxrss *= (PAGE_SIZE / 1024); /* convert pages to KBs */
>
> is much better, imho. Or at least just add the comment, so the
> poor reader can understand what the code does without parsing.
Fine
>
> > > In addision, you also need change man pages and notice to linux-api mailing list.
> > I cc'ed this list while I was sending the patch.
>
> Hmm. The biggest mistake with this patch is that you lost the
> CC list completely ;) Adding Hugh.
>
> > I was not aware I
> > should change the man page. Will do that too.
>
> Well. I don't think you must change the man page. Of course it
> would be nice if you do, but in a separate patch please. But
> you must cc Michael at least ;)
I cc'ed him with in my patch post - It has disappeared later in RE.
>
> Oleg.
>

2008-12-25 04:46:30

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH, RESEND3] getrusage: fill ru_maxrss value

> On Wed, 24 Dec 2008 16:37:55 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > Hi
> Hi yourself.
> >
> > > diff --git a/fs/exec.c b/fs/exec.c
> > > index ec5df9a..8d3d0f9 100644
> > > --- a/fs/exec.c
> > > +++ b/fs/exec.c
> > > @@ -870,6 +870,7 @@ static int de_thread(struct task_struct *tsk)
> > > sig->notify_count = 0;
> > >
> > > no_thread_group:
> > > + sig->maxrss = 0;
> > > exit_itimers(sig);
> > > flush_itimer_signals();
> > > if (leader)
> >
> > I don't know getrusage correct behavior so detail.
> > Why don't update parent process's sig->cmaxrss ?
> Because exec affects only this task and we want to forgot maxrss value.
> That does not implicate that we want to forgot highest maxrss value of
> our childs because exec does not affect them. I think this is right
> behavior.

Hmm, "we want" is a bit ambiguously word.
We have three reviewing viewpoint.

1) this code is consistent to other linux kernel code.

this patch fill its requrement perfectly.

2) the behavior is enough surpriseless?

Honestly, I think this patch is a bit supriseful.
example,

1. process-A fork process-B
2. process-A wait by wait4()
3. process-B consume 1GB memory
4. process-B exec another program
5. process-B consume 100KB memory
6. process-B exit
7. process-A get maxrss=100KB

oh, 1GB consumption is disappeared.


However, if we choice process don't forget maxrss at exec,
another supriseful happend.
example,

1. process-A consume 1GB memroy
2. process-A fork process-B
3. process-A wait by wait4()
4. right after, process-B exec another program
5. process-B consume 100KB memory
6. process-B exit
7. process-A get maxrss=1GB

oh, this design cause large process can't get child maxrss.


So either choice have both merits and demerits.
Then, I suggest to take prior other os compatibility than own thinking
good behavior.


3) if the feature is other os compatibility feature, the bahavior is
the same other?

I think most important thing.
Some scripting language (e.g. perl, PHP) already can use getrusage()
return value. (but linux maxrss is always 0)
Any scripter don't like incompatibility behavior.

I don't know other os maxrss inheriting rule.
May I ask you investiate maxrss inheriting of exec of which os behavior?



> > > @@ -1598,6 +1601,18 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> > > out:
> > > cputime_to_timeval(utime, &r->ru_utime);
> > > cputime_to_timeval(stime, &r->ru_stime);
> > > +
> > > + if (who != RUSAGE_CHILDREN) {
> > > + task_lock(p);
> > > + if (p->mm) {
> > > + unsigned long maxrss = get_mm_hiwater_rss(p->mm);
> > > +
> > > + if (r->ru_maxrss < maxrss)
> > > + r->ru_maxrss = maxrss;
> > > + }
> > > + task_unlock(p);
> >
> > get_task_mm() and mmput() instead task_lock() is better?
> Maybe it's better looking. I wanted to use these too. Oleg suggested to
> optimize the way it is in the patch. I can change it, no problem.

hm, performance is obiously important.
if you get much performance improvement, I'll withdraw my claim.


> > and, why don't this code move to "case RUSAGE_SELF" processing point?
> Because we need this to be done for RUSAGE_THREAD too. Or don't we?

Ah, I forgot RUSAGE_THREAD. thanks.
However I still think current code have unnecessary assumption.

if who==RUSAGE_BOTH,

correct behavior: max(hiwater_rss, mm_rss) + signal->cmaxrss)
this code: max(signal->maxrss + signal->cmaxrss, max(hiwater_rss, mm_rss))

1. if mm_rss == hiwater_rss, above two code is equivalent.
2. if signal->maxrss == 0, above two code is equivalent too.

So all current caller keep this two assumption. but I think it is
unnecessary assumption. imho we can write generic correct code.


2008-12-25 12:44:33

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH, RESEND3] getrusage: fill ru_maxrss value

On Thu, 25 Dec 2008 13:46:15 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> > On Wed, 24 Dec 2008 16:37:55 +0900 (JST)
> > KOSAKI Motohiro <[email protected]> wrote:
> >
> > > Hi
> > Hi yourself.
> > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index ec5df9a..8d3d0f9 100644
> > > > --- a/fs/exec.c
> > > > +++ b/fs/exec.c
> > > > @@ -870,6 +870,7 @@ static int de_thread(struct task_struct *tsk)
> > > > sig->notify_count = 0;
> > > >
> > > > no_thread_group:
> > > > + sig->maxrss = 0;
> > > > exit_itimers(sig);
> > > > flush_itimer_signals();
> > > > if (leader)
> > >
> > > I don't know getrusage correct behavior so detail.
> > > Why don't update parent process's sig->cmaxrss ?
> > Because exec affects only this task and we want to forgot maxrss value.
> > That does not implicate that we want to forgot highest maxrss value of
> > our childs because exec does not affect them. I think this is right
> > behavior.
>
> Hmm, "we want" is a bit ambiguously word.
> We have three reviewing viewpoint.
>
> 1) this code is consistent to other linux kernel code.
>
> this patch fill its requrement perfectly.
>
> 2) the behavior is enough surpriseless?
>
> Honestly, I think this patch is a bit supriseful.
> example,
>
> 1. process-A fork process-B
> 2. process-A wait by wait4()
> 3. process-B consume 1GB memory
> 4. process-B exec another program
> 5. process-B consume 100KB memory
> 6. process-B exit
> 7. process-A get maxrss=100KB
>
> oh, 1GB consumption is disappeared.
>
>
> However, if we choice process don't forget maxrss at exec,
> another supriseful happend.
> example,
>
> 1. process-A consume 1GB memroy
> 2. process-A fork process-B
> 3. process-A wait by wait4()
> 4. right after, process-B exec another program
> 5. process-B consume 100KB memory
> 6. process-B exit
> 7. process-A get maxrss=1GB
>
> oh, this design cause large process can't get child maxrss.
I agree this is strange...

>
>
> So either choice have both merits and demerits.
> Then, I suggest to take prior other os compatibility than own thinking
> good behavior.
>
>
> 3) if the feature is other os compatibility feature, the bahavior is
> the same other?
>
> I think most important thing.
> Some scripting language (e.g. perl, PHP) already can use getrusage()
> return value. (but linux maxrss is always 0)
> Any scripter don't like incompatibility behavior.
>
> I don't know other os maxrss inheriting rule.
> May I ask you investiate maxrss inheriting of exec of which os behavior?
I looked into freebsd sources and there is not this resetting there.
They have the whole rusage structure in task structure and they leave
it untouched during the exec.

I agree with you that it would make more sense to remember rss hiwater
after exec. It's even logical from the userspace point of view. However
in the taskstats code the rss hiwater value is taken directly by
get_mm_hiwater_rss() macro which gets the value from new mm. I think
the behavior of this and behavior of this patch should be the same (and
in current version of the patch it is).

>
>
>
> > > > @@ -1598,6 +1601,18 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> > > > out:
> > > > cputime_to_timeval(utime, &r->ru_utime);
> > > > cputime_to_timeval(stime, &r->ru_stime);
> > > > +
> > > > + if (who != RUSAGE_CHILDREN) {
> > > > + task_lock(p);
> > > > + if (p->mm) {
> > > > + unsigned long maxrss = get_mm_hiwater_rss(p->mm);
> > > > +
> > > > + if (r->ru_maxrss < maxrss)
> > > > + r->ru_maxrss = maxrss;
> > > > + }
> > > > + task_unlock(p);
> > >
> > > get_task_mm() and mmput() instead task_lock() is better?
> > Maybe it's better looking. I wanted to use these too. Oleg suggested to
> > optimize the way it is in the patch. I can change it, no problem.
>
> hm, performance is obiously important.
> if you get much performance improvement, I'll withdraw my claim.
The improvement is minimal.

>
>
> > > and, why don't this code move to "case RUSAGE_SELF" processing point?
> > Because we need this to be done for RUSAGE_THREAD too. Or don't we?
>
> Ah, I forgot RUSAGE_THREAD. thanks.
> However I still think current code have unnecessary assumption.
>
> if who==RUSAGE_BOTH,
>
> correct behavior: max(hiwater_rss, mm_rss) + signal->cmaxrss)
> this code: max(signal->maxrss + signal->cmaxrss, max(hiwater_rss, mm_rss))
I'm not sure I understand this correctly. Shouldn't here be max() instead of +() ?

>
> 1. if mm_rss == hiwater_rss, above two code is equivalent.
> 2. if signal->maxrss == 0, above two code is equivalent too.
>
> So all current caller keep this two assumption. but I think it is
> unnecessary assumption. imho we can write generic correct code.
But it might happen that mm struct is gone when k_getrusage is called
and then we need signal->maxrss value.



Regards

Jirka
>
>
>

2008-12-25 14:44:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH, RESEND3] getrusage: fill ru_maxrss value

On 12/25, KOSAKI Motohiro wrote:
>
> > On Wed, 24 Dec 2008 16:37:55 +0900 (JST)
> > KOSAKI Motohiro <[email protected]> wrote:
> >
> > > > @@ -870,6 +870,7 @@ static int de_thread(struct task_struct *tsk)
> > > > sig->notify_count = 0;
> > > >
> > > > no_thread_group:
> > > > + sig->maxrss = 0;
> > > > exit_itimers(sig);
> > > > flush_itimer_signals();
> > > > if (leader)
> > >
> > > I don't know getrusage correct behavior so detail.
> > > Why don't update parent process's sig->cmaxrss ?
> > Because exec affects only this task and we want to forgot maxrss value.
> > That does not implicate that we want to forgot highest maxrss value of
> > our childs because exec does not affect them. I think this is right
> > behavior.
>
> Hmm, "we want" is a bit ambiguously word.
> We have three reviewing viewpoint.
>
> 1) this code is consistent to other linux kernel code.
>
> this patch fill its requrement perfectly.
>
> 2) the behavior is enough surpriseless?
>
> Honestly, I think this patch is a bit supriseful.
> example,
>
> 1. process-A fork process-B
> 2. process-A wait by wait4()
> 3. process-B consume 1GB memory
> 4. process-B exec another program
> 5. process-B consume 100KB memory
> 6. process-B exit
> 7. process-A get maxrss=100KB
>
> oh, 1GB consumption is disappeared.

Yes, and I also think this is not right.

But this has nothing to do with parent->signal->cmaxrss, the question
is why we lost the info.

And please note we have the same problem with xacct_add_tsk(), that
is why I think this patch is right _for now_, but see below.

> However, if we choice process don't forget maxrss at exec,
> another supriseful happend.
> example,
>
> 1. process-A consume 1GB memroy
> 2. process-A fork process-B
> 3. process-A wait by wait4()
> 4. right after, process-B exec another program
> 5. process-B consume 100KB memory
> 6. process-B exit
> 7. process-A get maxrss=1GB
>
> oh, this design cause large process can't get child maxrss.

Even simpler. If we never reset marxrss, we start inherit this
value from /sbin/init which is obviously wrong.

That is why I suggested to change bprm_mm_init's path to keep
mm->hiwater_xxx but only if !PF_FORKNOEXEC (and in that case
we must kill "sig->maxrss = 0" from de_thread()).

What do you think?

> if who==RUSAGE_BOTH,

Ah. I forgot to mention what the code does to discuss...

> correct behavior: max(hiwater_rss, mm_rss) + signal->cmaxrss)

I disagree. This doesn't look correct to me. ->maxrss is a maximum,
we shouldn't report the sum.

> this code: max(signal->maxrss + signal->cmaxrss, max(hiwater_rss, mm_rss))

Afaics, no. this code reports

max(signal->cmaxrss, max(signal->maxrss, get_mm_hiwater_rss())

and this looks right to me.

But I agree, this is debateable.

Oleg.

2008-12-30 11:11:18

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH, RESEND3] getrusage: fill ru_maxrss value

Hi

sorry for late responce.

but I thought I should investigate BSD behavior at first.
and I've installed FreeBSD on my PC and tested it.

I'll post this test case and my proposal patch soon.


> On 12/25, KOSAKI Motohiro wrote:
> >
> > > On Wed, 24 Dec 2008 16:37:55 +0900 (JST)
> > > KOSAKI Motohiro <[email protected]> wrote:
> > >
> > > > > @@ -870,6 +870,7 @@ static int de_thread(struct task_struct *tsk)
> > > > > sig->notify_count = 0;
> > > > >
> > > > > no_thread_group:
> > > > > + sig->maxrss = 0;
> > > > > exit_itimers(sig);
> > > > > flush_itimer_signals();
> > > > > if (leader)
> > > >
> > > > I don't know getrusage correct behavior so detail.
> > > > Why don't update parent process's sig->cmaxrss ?
> > > Because exec affects only this task and we want to forgot maxrss value.
> > > That does not implicate that we want to forgot highest maxrss value of
> > > our childs because exec does not affect them. I think this is right
> > > behavior.
> >
> > Hmm, "we want" is a bit ambiguously word.
> > We have three reviewing viewpoint.
> >
> > 1) this code is consistent to other linux kernel code.
> >
> > this patch fill its requrement perfectly.
> >
> > 2) the behavior is enough surpriseless?
> >
> > Honestly, I think this patch is a bit supriseful.
> > example,
> >
> > 1. process-A fork process-B
> > 2. process-A wait by wait4()
> > 3. process-B consume 1GB memory
> > 4. process-B exec another program
> > 5. process-B consume 100KB memory
> > 6. process-B exit
> > 7. process-A get maxrss=100KB
> >
> > oh, 1GB consumption is disappeared.
>
> Yes, and I also think this is not right.
>
> But this has nothing to do with parent->signal->cmaxrss, the question
> is why we lost the info.

BSD inherit maxrss at exec(), but Jiri's patch doesn't.


> And please note we have the same problem with xacct_add_tsk(), that
> is why I think this patch is right _for now_, but see below.

xacct is linux specific feature.
it doesn't need other os compatibility.



> > However, if we choice process don't forget maxrss at exec,
> > another supriseful happend.
> > example,
> >
> > 1. process-A consume 1GB memroy
> > 2. process-A fork process-B
> > 3. process-A wait by wait4()
> > 4. right after, process-B exec another program
> > 5. process-B consume 100KB memory
> > 6. process-B exit
> > 7. process-A get maxrss=1GB
> >
> > oh, this design cause large process can't get child maxrss.
>
> Even simpler. If we never reset marxrss, we start inherit this
> value from /sbin/init which is obviously wrong.
>
> That is why I suggested to change bprm_mm_init's path to keep
> mm->hiwater_xxx but only if !PF_FORKNOEXEC (and in that case
> we must kill "sig->maxrss = 0" from de_thread()).
>
> What do you think?

your idea is nice. but it is incompatibility to bsd.
I beilieve incompatibility of compatibility feature is worthless.

BSD return 1GB at above case. then I like this behavior.




> > if who==RUSAGE_BOTH,
>
> Ah. I forgot to mention what the code does to discuss...
>
> > correct behavior: max(hiwater_rss, mm_rss) + signal->cmaxrss)
>
> I disagree. This doesn't look correct to me. ->maxrss is a maximum,
> we shouldn't report the sum.

you are absolutely right.
I was crap thinking. thanks clarification.


> > this code: max(signal->maxrss + signal->cmaxrss, max(hiwater_rss, mm_rss))
>
> Afaics, no. this code reports
>
> max(signal->cmaxrss, max(signal->maxrss, get_mm_hiwater_rss())
>
> and this looks right to me.
>
> But I agree, this is debateable.
>
> Oleg.
>