2008-12-02 12:18:45

by Jiri Pirko

[permalink] [raw]
Subject: [PATCH] getrusage: fill ru_maxrss value

Based on the patch previously posted by Frank Mayhar:
http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772

Changed to do not panic and to set the value properly. Also made some
modifications as Oleg suggested. maxrss value is set to pages as "time"
application converts it co KBs.

This patch enables "time" application to show maxresident value
correctly.

Note that even without this patch applied there is a race between two
parallel update_hiwater_rss() callers.

Signed-off-by: Jiri Pirko <[email protected]>
---
include/linux/sched.h | 1 +
kernel/exit.c | 4 ++++
kernel/fork.c | 1 +
kernel/sys.c | 17 +++++++++++++++++
4 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 644ffbd..818c37d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -560,6 +560,7 @@ struct signal_struct {
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
struct task_io_accounting ioac;
+ unsigned long maxrss, cmaxrss;

/*
* We don't bother to synchronize most readers of this at all,
diff --git a/kernel/exit.c b/kernel/exit.c
index 2d8be7e..f09d4e7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1051,6 +1051,8 @@ NORET_TYPE void do_exit(long code)
if (tsk->mm) {
update_hiwater_rss(tsk->mm);
update_hiwater_vm(tsk->mm);
+
+ tsk->signal->maxrss = tsk->mm->hiwater_rss;
}
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
@@ -1354,6 +1356,8 @@ static int wait_task_zombie(struct task_struct *p, int options,
sig->oublock + sig->coublock;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
+ if (psig->cmaxrss < sig->maxrss)
+ psig->cmaxrss = sig->maxrss;
spin_unlock_irq(&p->parent->sighand->siglock);
}

diff --git a/kernel/fork.c b/kernel/fork.c
index 2a372a0..bb881aa 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -847,6 +847,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
task_io_accounting_init(&sig->ioac);
+ sig->maxrss = sig->cmaxrss = 0;
taskstats_tgid_init(sig);

task_lock(current->group_leader);
diff --git a/kernel/sys.c b/kernel/sys.c
index 31deba8..f604b05 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1546,6 +1546,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
unsigned long flags;
cputime_t utime, stime;
struct task_cputime cputime;
+ unsigned long maxrss = 0;

memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;
@@ -1555,6 +1556,17 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
goto out;
}

+ if (who == RUSAGE_SELF || who == RUSAGE_BOTH) {
+ struct mm_struct *mm;
+
+ mm = get_task_mm(p);
+ if (mm) {
+ update_hiwater_rss(mm);
+ maxrss = mm->hiwater_rss;
+ mmput(mm);
+ }
+ }
+
if (!lock_task_sighand(p, &flags))
return;

@@ -1569,6 +1581,7 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_majflt = p->signal->cmaj_flt;
r->ru_inblock = p->signal->cinblock;
r->ru_oublock = p->signal->coublock;
+ r->ru_maxrss = p->signal->cmaxrss;

if (who == RUSAGE_CHILDREN)
break;
@@ -1588,6 +1601,10 @@ static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
accumulate_thread_rusage(t, r);
t = next_thread(t);
} while (t != p);
+ if (r->ru_maxrss < maxrss)
+ r->ru_maxrss = maxrss;
+ if (r->ru_maxrss < p->signal->maxrss)
+ r->ru_maxrss = p->signal->maxrss;
break;

default:
--
1.5.6.5


2008-12-02 16:06:53

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] getrusage: fill ru_maxrss value

On 12/02, Jiri Pirko wrote:
>
> Based on the patch previously posted by Frank Mayhar:
> http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772
>
> Changed to do not panic and to set the value properly. Also made some
> modifications as Oleg suggested. maxrss value is set to pages as "time"
> application converts it co KBs.
>
> This patch enables "time" application to show maxresident value
> correctly.

I believe the changelog could be a bit more descriptive ;)
And I think the patch needs more CCs. (Hugh and Michael cc'ed).

What about exec? Should we preserve signal_struct->maxrss, or should
we reset it? I don't know what is the right behaviour, but imho it
should be consistent with task_mem()/xacct_add_tsk(). And since
bprm_mm_init() does not copy ->hiwater_xxx, perhaps we should reset.
Or we should change the exec_mmap() path to preserve ->hiwater_xxx,
I dunno.

> Note that even without this patch applied there is a race between two
> parallel update_hiwater_rss() callers.
>
> [...snip...]
> @@ -1051,6 +1051,8 @@ NORET_TYPE void do_exit(long code)
> if (tsk->mm) {
> update_hiwater_rss(tsk->mm);
> update_hiwater_vm(tsk->mm);
> +
> + tsk->signal->maxrss = tsk->mm->hiwater_rss;

Yes. But still it is not good the patch adds new racy calls
(k_getrusage() too). And in fact I think this code in do_exit()
should die, and we can update signal->maxrss in exit_mm().

Unless I missed something, the "if (tsk->mm) {}" code in
do_exit() is not needed because xacct_add_tsk() is not right
anyway. What we imho need is get_mm_hiwater_xxx(), can be
also used by task_mm().

I'll send the patch a bit later, then we can tweak this patch.

Do you agree?

(just in case, I don't mean that mm/ uses update_hiwater_xxx()
"safely", afaics try_to_unmap can race with unmap_region
for example, but still).

Oleg.

2008-12-02 16:50:58

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] getrusage: fill ru_maxrss value

On Tue, Dec 2, 2008 at 11:05 AM, Oleg Nesterov <[email protected]> wrote:
> On 12/02, Jiri Pirko wrote:
>>
>> Based on the patch previously posted by Frank Mayhar:
>> http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772
>>
>> Changed to do not panic and to set the value properly. Also made some
>> modifications as Oleg suggested. maxrss value is set to pages as "time"
>> application converts it co KBs.
>>
>> This patch enables "time" application to show maxresident value
>> correctly.
>
> I believe the changelog could be a bit more descriptive ;)
> And I think the patch needs more CCs. (Hugh and Michael cc'ed).

Thanks Oleg.

Jiri: linux-api@ should also be CCed on API changes; see
Documentation/SubmitChecklist and
http://thread.gmane.org/gmane.linux.ltp/5658

> What about exec? Should we preserve signal_struct->maxrss, or should
> we reset it? I don't know what is the right behaviour, but imho it
> should be consistent with task_mem()/xacct_add_tsk(). And since
> bprm_mm_init() does not copy ->hiwater_xxx, perhaps we should reset.
> Or we should change the exec_mmap() path to preserve ->hiwater_xxx,
> I dunno.
>
>> Note that even without this patch applied there is a race between two
>> parallel update_hiwater_rss() callers.
>>
>> [...snip...]
>> @@ -1051,6 +1051,8 @@ NORET_TYPE void do_exit(long code)
>> if (tsk->mm) {
>> update_hiwater_rss(tsk->mm);
>> update_hiwater_vm(tsk->mm);
>> +
>> + tsk->signal->maxrss = tsk->mm->hiwater_rss;
>
> Yes. But still it is not good the patch adds new racy calls
> (k_getrusage() too). And in fact I think this code in do_exit()
> should die, and we can update signal->maxrss in exit_mm().
>
> Unless I missed something, the "if (tsk->mm) {}" code in
> do_exit() is not needed because xacct_add_tsk() is not right
> anyway. What we imho need is get_mm_hiwater_xxx(), can be
> also used by task_mm().
>
> I'll send the patch a bit later, then we can tweak this patch.
>
> Do you agree?
>
> (just in case, I don't mean that mm/ uses update_hiwater_xxx()
> "safely", afaics try_to_unmap can race with unmap_region
> for example, but still).
>
> Oleg.
>
>



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2008-12-02 16:53:26

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH] getrusage: fill ru_maxrss value

On Tue, 2 Dec 2008 17:05:07 +0100
Oleg Nesterov <[email protected]> wrote:

> On 12/02, Jiri Pirko wrote:
> >
> > Based on the patch previously posted by Frank Mayhar:
> > http://kerneltrap.org/mailarchive/linux-kernel/2007/9/20/264772
> >
> > Changed to do not panic and to set the value properly. Also made some
> > modifications as Oleg suggested. maxrss value is set to pages as "time"
> > application converts it co KBs.
> >
> > This patch enables "time" application to show maxresident value
> > correctly.
>
> I believe the changelog could be a bit more descriptive ;)
> And I think the patch needs more CCs. (Hugh and Michael cc'ed).
>
> What about exec? Should we preserve signal_struct->maxrss, or should
> we reset it? I don't know what is the right behaviour, but imho it
> should be consistent with task_mem()/xacct_add_tsk(). And since
> bprm_mm_init() does not copy ->hiwater_xxx, perhaps we should reset.
> Or we should change the exec_mmap() path to preserve ->hiwater_xxx,
> I dunno.
>
> > Note that even without this patch applied there is a race between two
> > parallel update_hiwater_rss() callers.
> >
> > [...snip...]
> > @@ -1051,6 +1051,8 @@ NORET_TYPE void do_exit(long code)
> > if (tsk->mm) {
> > update_hiwater_rss(tsk->mm);
> > update_hiwater_vm(tsk->mm);
> > +
> > + tsk->signal->maxrss = tsk->mm->hiwater_rss;
>
> Yes. But still it is not good the patch adds new racy calls
> (k_getrusage() too). And in fact I think this code in do_exit()
> should die, and we can update signal->maxrss in exit_mm().
>
> Unless I missed something, the "if (tsk->mm) {}" code in
> do_exit() is not needed because xacct_add_tsk() is not right
> anyway. What we imho need is get_mm_hiwater_xxx(), can be
> also used by task_mm().
>
> I'll send the patch a bit later, then we can tweak this patch.
>
> Do you agree?
Okay.

Jirka
>
> (just in case, I don't mean that mm/ uses update_hiwater_xxx()
> "safely", afaics try_to_unmap can race with unmap_region
> for example, but still).
>
> Oleg.
>