2008-12-12 14:07:28

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting

(changes: update the changelog/comments)

xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
mm->hiwater_xxx directly, this leads to 2 problems:

- taskstats_user_cmd() can call fill_pid()->xacct_add_tsk()
at any moment before the task exits, so we should check the
current values of rss/vm anyway.

- do_exit()->update_hiwater_xxx() calls are racy. An exiting
thread can be preempted right before mm->hiwater_xxx = new_val,
and another thread can use A_LOT of memory and exit in between.
When the first thread resumes it can be the last thread in the
thread group, in that case we report the wrong hiwater_xxx
values which do not take A_LOT into account.

Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and
change xacct_add_tsk() to use them. The first helper will also be
used by rusage->ru_maxrss accounting.

Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
decrease rss/vm there is no point to update mm->hiwater_xxx, and
nobody can look at this mm_struct when exit_mmap() actually unmaps
the memory.

Signed-off-by: Oleg Nesterov <[email protected]>

--- K-28/include/linux/sched.h~HIWATER 2008-12-02 17:12:40.000000000 +0100
+++ K-28/include/linux/sched.h 2008-12-03 18:17:18.000000000 +0100
@@ -388,6 +388,9 @@ extern void arch_unmap_area_topdown(stru
(mm)->hiwater_vm = (mm)->total_vm; \
} while (0)

+#define get_mm_hiwater_rss(mm) max((mm)->hiwater_rss, get_mm_rss(mm))
+#define get_mm_hiwater_vm(mm) max((mm)->hiwater_vm, (mm)->total_vm)
+
extern void set_dumpable(struct mm_struct *mm, int value);
extern int get_dumpable(struct mm_struct *mm);

--- K-28/kernel/tsacct.c~HIWATER 2008-10-10 00:13:53.000000000 +0200
+++ K-28/kernel/tsacct.c 2008-12-03 18:24:28.000000000 +0100
@@ -90,8 +90,8 @@ void xacct_add_tsk(struct taskstats *sta
mm = get_task_mm(p);
if (mm) {
/* adjust to KB unit */
- stats->hiwater_rss = mm->hiwater_rss * PAGE_SIZE / KB;
- stats->hiwater_vm = mm->hiwater_vm * PAGE_SIZE / KB;
+ stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
+ stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB;
mmput(mm);
}
stats->read_char = p->ioac.rchar;
--- K-28/kernel/exit.c~HIWATER 2008-12-02 17:12:40.000000000 +0100
+++ K-28/kernel/exit.c 2008-12-03 18:21:06.000000000 +0100
@@ -1048,10 +1048,7 @@ NORET_TYPE void do_exit(long code)
preempt_count());

acct_update_integrals(tsk);
- if (tsk->mm) {
- update_hiwater_rss(tsk->mm);
- update_hiwater_vm(tsk->mm);
- }
+
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
--- K-28/mm/mmap.c~HIWATER 2008-12-02 17:12:40.000000000 +0100
+++ K-28/mm/mmap.c 2008-12-11 09:13:07.000000000 +0100
@@ -2103,7 +2103,7 @@ void exit_mmap(struct mm_struct *mm)
lru_add_drain();
flush_cache_mm(mm);
tlb = tlb_gather_mmu(mm, 1);
- /* Don't update_hiwater_rss(mm) here, do_exit already did */
+ /* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
vm_unacct_memory(nr_accounted);


2008-12-12 15:55:44

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting

On Fri, 12 Dec 2008, Oleg Nesterov wrote:

> (changes: update the changelog/comments)
>
> xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
> mm->hiwater_xxx directly, this leads to 2 problems:
>
> - taskstats_user_cmd() can call fill_pid()->xacct_add_tsk()
> at any moment before the task exits, so we should check the
> current values of rss/vm anyway.
>
> - do_exit()->update_hiwater_xxx() calls are racy. An exiting
> thread can be preempted right before mm->hiwater_xxx = new_val,
> and another thread can use A_LOT of memory and exit in between.
> When the first thread resumes it can be the last thread in the
> thread group, in that case we report the wrong hiwater_xxx
> values which do not take A_LOT into account.
>
> Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and
> change xacct_add_tsk() to use them. The first helper will also be
> used by rusage->ru_maxrss accounting.
>
> Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
> decrease rss/vm there is no point to update mm->hiwater_xxx, and
> nobody can look at this mm_struct when exit_mmap() actually unmaps
> the memory.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Hugh Dickins <[email protected]>

>
> --- K-28/include/linux/sched.h~HIWATER 2008-12-02 17:12:40.000000000 +0100
> +++ K-28/include/linux/sched.h 2008-12-03 18:17:18.000000000 +0100
> @@ -388,6 +388,9 @@ extern void arch_unmap_area_topdown(stru
> (mm)->hiwater_vm = (mm)->total_vm; \
> } while (0)
>
> +#define get_mm_hiwater_rss(mm) max((mm)->hiwater_rss, get_mm_rss(mm))
> +#define get_mm_hiwater_vm(mm) max((mm)->hiwater_vm, (mm)->total_vm)
> +
> extern void set_dumpable(struct mm_struct *mm, int value);
> extern int get_dumpable(struct mm_struct *mm);
>
> --- K-28/kernel/tsacct.c~HIWATER 2008-10-10 00:13:53.000000000 +0200
> +++ K-28/kernel/tsacct.c 2008-12-03 18:24:28.000000000 +0100
> @@ -90,8 +90,8 @@ void xacct_add_tsk(struct taskstats *sta
> mm = get_task_mm(p);
> if (mm) {
> /* adjust to KB unit */
> - stats->hiwater_rss = mm->hiwater_rss * PAGE_SIZE / KB;
> - stats->hiwater_vm = mm->hiwater_vm * PAGE_SIZE / KB;
> + stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
> + stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB;
> mmput(mm);
> }
> stats->read_char = p->ioac.rchar;
> --- K-28/kernel/exit.c~HIWATER 2008-12-02 17:12:40.000000000 +0100
> +++ K-28/kernel/exit.c 2008-12-03 18:21:06.000000000 +0100
> @@ -1048,10 +1048,7 @@ NORET_TYPE void do_exit(long code)
> preempt_count());
>
> acct_update_integrals(tsk);
> - if (tsk->mm) {
> - update_hiwater_rss(tsk->mm);
> - update_hiwater_vm(tsk->mm);
> - }
> +
> group_dead = atomic_dec_and_test(&tsk->signal->live);
> if (group_dead) {
> hrtimer_cancel(&tsk->signal->real_timer);
> --- K-28/mm/mmap.c~HIWATER 2008-12-02 17:12:40.000000000 +0100
> +++ K-28/mm/mmap.c 2008-12-11 09:13:07.000000000 +0100
> @@ -2103,7 +2103,7 @@ void exit_mmap(struct mm_struct *mm)
> lru_add_drain();
> flush_cache_mm(mm);
> tlb = tlb_gather_mmu(mm, 1);
> - /* Don't update_hiwater_rss(mm) here, do_exit already did */
> + /* update_hiwater_rss(mm) here? but nobody should be looking */
> /* Use -1 here to ensure all VMAs in the mm are unmapped */
> end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
> vm_unacct_memory(nr_accounted);

2008-12-13 02:35:29

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting

> (changes: update the changelog/comments)
>
> xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
> mm->hiwater_xxx directly, this leads to 2 problems:
>
> - taskstats_user_cmd() can call fill_pid()->xacct_add_tsk()
> at any moment before the task exits, so we should check the
> current values of rss/vm anyway.
>
> - do_exit()->update_hiwater_xxx() calls are racy. An exiting
> thread can be preempted right before mm->hiwater_xxx = new_val,
> and another thread can use A_LOT of memory and exit in between.
> When the first thread resumes it can be the last thread in the
> thread group, in that case we report the wrong hiwater_xxx
> values which do not take A_LOT into account.
>
> Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and
> change xacct_add_tsk() to use them. The first helper will also be
> used by rusage->ru_maxrss accounting.
>
> Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
> decrease rss/vm there is no point to update mm->hiwater_xxx, and
> nobody can look at this mm_struct when exit_mmap() actually unmaps
> the memory.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Thanks! looks good to me.
Reviewed-by: KOSAKI Motohiro <[email protected]>



> --- K-28/mm/mmap.c~HIWATER 2008-12-02 17:12:40.000000000 +0100
> +++ K-28/mm/mmap.c 2008-12-11 09:13:07.000000000 +0100
> @@ -2103,7 +2103,7 @@ void exit_mmap(struct mm_struct *mm)
> lru_add_drain();
> flush_cache_mm(mm);
> tlb = tlb_gather_mmu(mm, 1);
> - /* Don't update_hiwater_rss(mm) here, do_exit already did */
> + /* update_hiwater_rss(mm) here? but nobody should be looking */
> /* Use -1 here to ensure all VMAs in the mm are unmapped */
> end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
> vm_unacct_memory(nr_accounted);

I also think hiwatermark don't need update here.

2008-12-13 03:49:19

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting

* KOSAKI Motohiro <[email protected]> [2008-12-13 11:34:53]:

> > (changes: update the changelog/comments)
> >
> > xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
> > mm->hiwater_xxx directly, this leads to 2 problems:
> >
> > - taskstats_user_cmd() can call fill_pid()->xacct_add_tsk()
> > at any moment before the task exits, so we should check the
> > current values of rss/vm anyway.
> >
> > - do_exit()->update_hiwater_xxx() calls are racy. An exiting
> > thread can be preempted right before mm->hiwater_xxx = new_val,
> > and another thread can use A_LOT of memory and exit in between.
> > When the first thread resumes it can be the last thread in the
> > thread group, in that case we report the wrong hiwater_xxx
> > values which do not take A_LOT into account.
> >
> > Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and
> > change xacct_add_tsk() to use them. The first helper will also be
> > used by rusage->ru_maxrss accounting.
> >
> > Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
> > decrease rss/vm there is no point to update mm->hiwater_xxx, and
> > nobody can look at this mm_struct when exit_mmap() actually unmaps
> > the memory.
> >
> > Signed-off-by: Oleg Nesterov <[email protected]>
>
> Thanks! looks good to me.
> Reviewed-by: KOSAKI Motohiro <[email protected]>

Me too, I am acking it, but you already have all the acks you need :)

Acked-by: Balbir Singh <[email protected]>

--
Balbir

2008-12-16 00:22:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting

On Fri, 12 Dec 2008 15:05:24 +0100
Oleg Nesterov <[email protected]> wrote:
>

> --- K-28/include/linux/sched.h~HIWATER 2008-12-02 17:12:40.000000000 +0100
> +++ K-28/include/linux/sched.h 2008-12-03 18:17:18.000000000 +0100

grumble

> +#define get_mm_hiwater_rss(mm) max((mm)->hiwater_rss, get_mm_rss(mm))

This evaluates its argument thrice.

> +#define get_mm_hiwater_vm(mm) max((mm)->hiwater_vm, (mm)->total_vm)

This evaluates its argument twice.


was sched.h the appropriate header in which to implement these? Maybe...

But they're only ever _used_ in kernel/tsacct.c, so do they actually
need to be implemented in any .h file?

2008-12-16 10:39:14

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting

On 12/15, Andrew Morton wrote:
>
> On Fri, 12 Dec 2008 15:05:24 +0100
> Oleg Nesterov <[email protected]> wrote:
>
> > +#define get_mm_hiwater_rss(mm) max((mm)->hiwater_rss, get_mm_rss(mm))
>
> This evaluates its argument thrice.
>
> > +#define get_mm_hiwater_vm(mm) max((mm)->hiwater_vm, (mm)->total_vm)
>
> This evaluates its argument twice.

I thought that any user should be careful anyway...

OK, agreed, will send the cleanup.

> was sched.h the appropriate header in which to implement these? Maybe...

Just because I'd like to put them near update_hiwater_xxx()

> But they're only ever _used_ in kernel/tsacct.c, so do they actually
> need to be implemented in any .h file?

Jiri cooks the patch which implements rusage->ru_maxrss accounting,
it will use the first helper.

Oleg.

2008-12-16 10:44:22

by Jiri Pirko

[permalink] [raw]
Subject: Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting

On Mon, 15 Dec 2008 16:21:48 -0800
Andrew Morton <[email protected]> wrote:

>
> > +#define get_mm_hiwater_rss(mm) max((mm)->hiwater_rss, get_mm_rss(mm))
>
> This evaluates its argument thrice.
>
> > +#define get_mm_hiwater_vm(mm) max((mm)->hiwater_vm, (mm)->total_vm)
>
> This evaluates its argument twice.
>
>
> was sched.h the appropriate header in which to implement these? Maybe...
I think it was. There are similar helpers at the same place.
>
> But they're only ever _used_ in kernel/tsacct.c, so do they actually
> need to be implemented in any .h file?
Yes because my patch (ru_maxrss filling) will be using
get_mm_hiwater_rss() from kernel/exit.c and kernel/sys.c
>