2003-05-28 10:41:40

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: [patch] Inline vm_acct_memory

Hi Andrew,
I found that inlining vm_acct_memory speeds up vm_enough_memory.
Since vm_acct_memory is only called by vm_enough_memory,
theoritically inlining should help, and my tests proved so -- there was
an improvement of about 10 % in profile ticks (vm_enough_memory) on a
4 processor PIII Xeon running kernbench. Here is the patch. Tested on 2.5.70

Thanks,
Kiran


D:
D: Patch inlines use of vm_acct_memory
D:

diff -ruN -X dontdiff linux-2.5.69/include/linux/mman.h linux-2.5.69-vm_acct_inline/include/linux/mman.h
--- linux-2.5.69/include/linux/mman.h Mon May 5 05:23:02 2003
+++ linux-2.5.69-vm_acct_inline/include/linux/mman.h Tue May 20 12:18:34 2003
@@ -2,6 +2,8 @@
#define _LINUX_MMAN_H

#include <linux/config.h>
+#include <linux/smp.h>
+#include <linux/percpu.h>

#include <asm/atomic.h>
#include <asm/mman.h>
@@ -13,7 +15,27 @@
extern atomic_t vm_committed_space;

#ifdef CONFIG_SMP
-extern void vm_acct_memory(long pages);
+DECLARE_PER_CPU(long, committed_space);
+
+/*
+ * We tolerate a little inaccuracy to avoid ping-ponging the counter between
+ * CPUs
+ */
+#define ACCT_THRESHOLD max(16, NR_CPUS * 2)
+
+static void inline vm_acct_memory(long pages)
+{
+ long *local;
+
+ preempt_disable();
+ local = &__get_cpu_var(committed_space);
+ *local += pages;
+ if (*local > ACCT_THRESHOLD || *local < -ACCT_THRESHOLD) {
+ atomic_add(*local, &vm_committed_space);
+ *local = 0;
+ }
+ preempt_enable();
+}
#else
static inline void vm_acct_memory(long pages)
{
diff -ruN -X dontdiff linux-2.5.69/mm/mmap.c linux-2.5.69-vm_acct_inline/mm/mmap.c
--- linux-2.5.69/mm/mmap.c Mon May 5 05:23:32 2003
+++ linux-2.5.69-vm_acct_inline/mm/mmap.c Tue May 20 12:05:36 2003
@@ -53,6 +53,10 @@
int sysctl_overcommit_ratio = 50; /* default is 50% */
atomic_t vm_committed_space = ATOMIC_INIT(0);

+#ifdef CONFIG_SMP
+DEFINE_PER_CPU(long, committed_space) = 0;
+#endif
+
/*
* Check that a process has enough memory to allocate a new virtual
* mapping. 1 means there is enough memory for the allocation to
diff -ruN -X dontdiff linux-2.5.69/mm/swap.c linux-2.5.69-vm_acct_inline/mm/swap.c
--- linux-2.5.69/mm/swap.c Mon May 5 05:23:13 2003
+++ linux-2.5.69-vm_acct_inline/mm/swap.c Tue May 20 11:50:59 2003
@@ -349,30 +349,6 @@
}


-#ifdef CONFIG_SMP
-/*
- * We tolerate a little inaccuracy to avoid ping-ponging the counter between
- * CPUs
- */
-#define ACCT_THRESHOLD max(16, NR_CPUS * 2)
-
-static DEFINE_PER_CPU(long, committed_space) = 0;
-
-void vm_acct_memory(long pages)
-{
- long *local;
-
- preempt_disable();
- local = &__get_cpu_var(committed_space);
- *local += pages;
- if (*local > ACCT_THRESHOLD || *local < -ACCT_THRESHOLD) {
- atomic_add(*local, &vm_committed_space);
- *local = 0;
- }
- preempt_enable();
-}
-#endif
-

/*
* Perform any setup for the swap system


2003-05-28 10:51:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] Inline vm_acct_memory

Ravikiran G Thirumalai <[email protected]> wrote:
>
> I found that inlining vm_acct_memory speeds up vm_enough_memory.
> Since vm_acct_memory is only called by vm_enough_memory,
> theoritically inlining should help, and my tests proved so -- there was
> an improvement of about 10 % in profile ticks (vm_enough_memory) on a
> 4 processor PIII Xeon running kernbench.

OK. But with just a single call site we may as well implement vm_acct_memory()
inside mm/memory.c. Could you please make that change?

Thanks.

2003-05-28 15:30:08

by Hugh Dickins

[permalink] [raw]
Subject: Re: [patch] Inline vm_acct_memory

On Wed, 28 May 2003, Ravikiran G Thirumalai wrote:
> I found that inlining vm_acct_memory speeds up vm_enough_memory.
> Since vm_acct_memory is only called by vm_enough_memory,

No, linux/mman.h declares

static inline void vm_unacct_memory(long pages)
{
vm_acct_memory(-pages);
}

and I count 18 callsites for vm_unacct_memory.

I'm no judge of what's worth inlining, but Andrew is widely known
and feared as The Scourge of Inliners, so I'd advise you to hide...

Hugh

2003-05-29 02:10:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] Inline vm_acct_memory

Hugh Dickins <[email protected]> wrote:
>
> On Wed, 28 May 2003, Ravikiran G Thirumalai wrote:
> > I found that inlining vm_acct_memory speeds up vm_enough_memory.
> > Since vm_acct_memory is only called by vm_enough_memory,
>
> No, linux/mman.h declares
>
> static inline void vm_unacct_memory(long pages)
> {
> vm_acct_memory(-pages);
> }
>
> and I count 18 callsites for vm_unacct_memory.
>
> I'm no judge of what's worth inlining, but Andrew is widely known
> and feared as The Scourge of Inliners, so I'd advise you to hide...

Maybe I'm wrong. kernbench is not some silly tight-loop microbenchmark.
It is a "real" workload: gcc.

The thing about pagetable setup and teardown is that it tends to be called
in big lumps: for a while the process is establishing thousands of pages
and then later it is tearing down thousands. So the cache-thrashing impact
of having those eighteen instances sprinkled around the place is less
than it would be if they were all being called randomly. If you can believe
such waffle.

Kiran, do you still have the raw data available? profiles and runtimes?

2003-05-29 04:18:43

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [patch] Inline vm_acct_memory

On Wed, May 28, 2003 at 07:23:30PM -0700, Andrew Morton wrote:
>...
> Maybe I'm wrong. kernbench is not some silly tight-loop microbenchmark.
> It is a "real" workload: gcc.
>
> The thing about pagetable setup and teardown is that it tends to be called
> in big lumps: for a while the process is establishing thousands of pages
> and then later it is tearing down thousands. So the cache-thrashing impact
> of having those eighteen instances sprinkled around the place is less
> than it would be if they were all being called randomly. If you can believe
> such waffle.
>
> Kiran, do you still have the raw data available? profiles and runtimes?

Yeah I kinda overlooked vm_unacct_memory 'til I wondered why I had put
the inlines in the header file earlier :). But I do have the profiles
and checking on all calls to vm_unacct_memory, I find there is no
regression at significant callsites.
I can provide the detailed profile if required,
but here is the summary for 4 runs of kernbench -- nos against routines
are profile ticks (oprofile) for 4 runs.


Vanilla
-------
vm_enough_memory 786 778 780 735
vm_acct_memory 870 952 884 898
-----------------------------------
tot of above 1656 1730 1664 1633

do_mmap_pgoff 3559 3673 3669 3807
expand_stack 27 34 33 42
unmap_region 236 267 280 280
do_brk 594 596 615 615
exit_mmap 51 52 44 52
mprotect_fixup 47 55 55 57
do_mremap 0 2 1 1
shmem_notify_change 0 0 0 0
shmem_notify_change 0 0 0 0
shmem_delete_inode 0 0 0 0
shmem_file_write 0 0 0 0
shmem_symlink 0 0 0 0
shmem_file_setup 0 0 0 0
sys_swapoff 0 0 0 0
poll_idle 101553 108687 89281 86251

Inline
------
vm_enough_memory 1539 1488 1488 1472
do_mmap_pgoff 3510 3523 3436 3475
expand_stack 27 33 32 27
unmap_region 295 340 311 349
do_brk 641 583 659 640
exit_mmap 33 52 44 42
mprotect_fixup 54 65 73 64
do_mremap 2 0 0 0
shmem_notify_change 0 0 0 0
shmem_notify_change 0 0 0 0
shmem_delete_inode 0 0 0 0
shmem_file_write 0 0 0 0
shmem_symlink 0 0 0 0
shmem_file_setup 0 0 0 0
sys_swapoff 0 0 0 0
poll_idle 98733 85659 104994 103096


Thanks,
Kiran

2003-05-29 05:44:25

by Hugh Dickins

[permalink] [raw]
Subject: Re: [patch] Inline vm_acct_memory

On Thu, 29 May 2003, Ravikiran G Thirumalai wrote:
>
> Yeah I kinda overlooked vm_unacct_memory 'til I wondered why I had put
> the inlines in the header file earlier :). But I do have the profiles
> and checking on all calls to vm_unacct_memory, I find there is no
> regression at significant callsites.
> I can provide the detailed profile if required,
> but here is the summary for 4 runs of kernbench -- nos against routines
> are profile ticks (oprofile) for 4 runs.
>
> Vanilla
> -------
> vm_enough_memory 786 778 780 735
> vm_acct_memory 870 952 884 898
> -----------------------------------
> tot of above 1656 1730 1664 1633
>
> do_mmap_pgoff 3559 3673 3669 3807
> expand_stack 27 34 33 42
> unmap_region 236 267 280 280
> do_brk 594 596 615 615
> exit_mmap 51 52 44 52
> mprotect_fixup 47 55 55 57
> do_mremap 0 2 1 1
> poll_idle 101553 108687 89281 86251
>
> Inline
> ------
> vm_enough_memory 1539 1488 1488 1472
> do_mmap_pgoff 3510 3523 3436 3475
> expand_stack 27 33 32 27
> unmap_region 295 340 311 349
> do_brk 641 583 659 640
> exit_mmap 33 52 44 42
> mprotect_fixup 54 65 73 64
> do_mremap 2 0 0 0
> poll_idle 98733 85659 104994 103096

There does look to be a regression in unmap_region.

Though I'd be reluctant to assign much general significance
to any of these numbers (suspect it might all come out quite
differently on another machine, another config, another run).

But the probable best course is just to inline vm_acct_memory
as you did, but also uninline vm_unacct_memory: placing the
static inline vm_acct_memory and then vm_unacct_memory just
before vm_enough_memory in mm/mmap.c.

Hugh


2003-05-29 09:28:15

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: Re: [patch] Inline vm_acct_memory

On Thu, May 29, 2003 at 06:59:47AM +0100, Hugh Dickins wrote:
> On Thu, 29 May 2003, Ravikiran G Thirumalai wrote:
> > ...
> > Vanilla
> > -------
> > vm_enough_memory 786 778 780 735
> > vm_acct_memory 870 952 884 898
> > -----------------------------------
> > tot of above 1656 1730 1664 1633
> >
> > do_mmap_pgoff 3559 3673 3669 3807
> > expand_stack 27 34 33 42
> > unmap_region 236 267 280 280
> > do_brk 594 596 615 615
> > exit_mmap 51 52 44 52
> > mprotect_fixup 47 55 55 57
> > do_mremap 0 2 1 1
> > poll_idle 101553 108687 89281 86251
> >
> > Inline
> > ------
> > vm_enough_memory 1539 1488 1488 1472
> > do_mmap_pgoff 3510 3523 3436 3475
> > expand_stack 27 33 32 27
> > unmap_region 295 340 311 349
> > do_brk 641 583 659 640
> > exit_mmap 33 52 44 42
> > mprotect_fixup 54 65 73 64
> > do_mremap 2 0 0 0
> > poll_idle 98733 85659 104994 103096
>
> There does look to be a regression in unmap_region.

Yeah, but there is an improvement of a greater magnitude in do_mmap_pgoff
...by about 190 ticks. There is a regression of about 60 ticks in
unmap_region (taking standard deviation into consideration). Change
in ticks for do_brk is not statistically significant.

>
> Though I'd be reluctant to assign much general significance
> to any of these numbers (suspect it might all come out quite
> differently on another machine, another config, another run).
>

I agree with this for vm_unacct_memory, since it is called from many
places. Based on the workload, nos could be different (if all the
callsites are called with comparable frequency).
However, if the workload is such that one particular caller is
stressed much more than other callers, then, maybe inlining helps
as in this case where do_mmap_pgoff is stresed more?

> But the probable best course is just to inline vm_acct_memory
> as you did, but also uninline vm_unacct_memory: placing the
> static inline vm_acct_memory and then vm_unacct_memory just
> before vm_enough_memory in mm/mmap.c.

Sounds reasonable. I'll give this a run and see how the profiles look.

Thanks,
Kiran