2007-05-04 10:53:22

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Rik van Riel wrote:
> With lazy freeing of anonymous pages through MADV_FREE, performance of
> the MySQL sysbench workload more than doubles on my quad-core system.

OK, I've run some tests on a 16 core Opteron system, both sysbench with
MySQL 5.33 (set up as described in the freebsd vs linux page), and with
ebizzy.

What I found is that, on this system, MADV_FREE performance improvement
was in the noise when you look at it on top of the MADV_DONTNEED glibc
and down_read(mmap_sem) patch in sysbench.

In ebizzy it was slightly up at low loads and slightly down at high loads,
though I wouldn't put as much stock in ebizzy as the real workload,
because the numbers are going to be highly dependand on access patterns.

Now these numbers are collected under best-case conditions for MADV_FREE,
ie. no page reclaim going on. If you consider page reclaim, then you would
think there might be room for regressions.

So far, I'm not convinced this is a good use of a page flag or the added
complexity. There are lots of ways we can improve performance using a page
flag (my recent PG_waiters, PG_mlock, PG_replicated, etc.) to improve
performance, so I think we need some more numbers.

(I'll be away for the weekend...)


LHS is # threads, numbers are +/- 99.9% confidence.

sysbench transactions per sec (higher is better)

2.6.21
1, 453.092000 +/- 7.089284
2, 831.722000 +/- 13.138541
4, 1468.590000 +/- 40.160654
8, 2139.822000 +/- 62.223220
16, 2118.802000 +/- 83.247076
32, 1051.596000 +/- 62.455236
64, 917.078000 +/- 21.086954

new glibc
1, 466.376000 +/- 9.018054
2, 867.020000 +/- 26.163901
4, 1535.880000 +/- 25.784081
8, 2261.856000 +/- 53.350146
16, 2249.020000 +/- 120.361138
32, 1521.858000 +/- 110.236781
64, 1405.262000 +/- 85.260624

mmap_sem
1, 476.144000 +/- 15.865284
2, 871.778000 +/- 12.736486
4, 1529.348000 +/- 21.400517
8, 2235.590000 +/- 54.192125
16, 2177.422000 +/- 27.416498
32, 2120.986000 +/- 58.499708
64, 1949.362000 +/- 51.177977

madv_free
1, 475.056000 +/- 6.943168
2, 861.438000 +/- 22.101826
4, 1564.782000 +/- 55.190110
8, 2211.792000 +/- 59.843995
16, 2163.232000 +/- 46.031627
32, 2100.544000 +/- 86.744497
64, 1947.058000 +/- 62.392049


ebizzy elapsed time (lower is better)

mmap_sem
1, 45.544000 +/- 3.538529
4, 78.492000 +/- 8.881464
16, 224.538000 +/- 7.762784
64, 913.466000 +/- 53.506338

madv_free
1, 43.350000 +/- 0.778292
4, 68.190000 +/- 8.623731
16, 225.568000 +/- 14.940109
64, 899.136000 +/- 56.153209

--
SUSE Labs, Novell Inc.


2007-05-04 11:59:08

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Nick Piggin wrote:
> Rik van Riel wrote:
>> With lazy freeing of anonymous pages through MADV_FREE, performance of
>> the MySQL sysbench workload more than doubles on my quad-core system.
>
> OK, I've run some tests on a 16 core Opteron system, both sysbench with
> MySQL 5.33 (set up as described in the freebsd vs linux page), and with
> ebizzy.
>
> What I found is that, on this system, MADV_FREE performance improvement
> was in the noise when you look at it on top of the MADV_DONTNEED glibc
> and down_read(mmap_sem) patch in sysbench.

Interesting, very different results from my system.

First, did you run with the properly TLB batched version of
the MADV_FREE patch? And did you make sure that MADV_FREE
takes the mmap_sem for reading? Without that, I did see
a similar thing to what you saw...

Secondly, I'll have to try some test runs one of the larger
systems in the lab.

Maybe the results from my quad core Intel system are not
typical; maybe the results from your 16 core Opteron are
not typical. Either way, I want to find out :)

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.

2007-05-04 16:05:11

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Nick Piggin wrote:
> What I found is that, on this system, MADV_FREE performance improvement
> was in the noise when you look at it on top of the MADV_DONTNEED glibc
> and down_read(mmap_sem) patch in sysbench.

I don't want to judge the numbers since I cannot but I want to make an
observations: even if in the SMP case MADV_FREE turns out to not be a
bigger boost then there is still the UP case to keep in mind where Rik
measured a significant speed-up. As long as the SMP case isn't hurt
this is reaosn enough to use the patch. With more and more cores on one
processor SMP systems are pushed evermore to the high-end side. You'll
find many installations which today use SMP will be happy enough with
many-core UP machines.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2007-05-04 23:48:22

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Ulrich Drepper wrote:
> Nick Piggin wrote:
>
>>What I found is that, on this system, MADV_FREE performance improvement
>>was in the noise when you look at it on top of the MADV_DONTNEED glibc
>>and down_read(mmap_sem) patch in sysbench.
>
>
> I don't want to judge the numbers since I cannot but I want to make an
> observations: even if in the SMP case MADV_FREE turns out to not be a
> bigger boost then there is still the UP case to keep in mind where Rik
> measured a significant speed-up. As long as the SMP case isn't hurt
> this is reaosn enough to use the patch. With more and more cores on one
> processor SMP systems are pushed evermore to the high-end side. You'll
> find many installations which today use SMP will be happy enough with
> many-core UP machines.

OK, sure. I think we need more numbers though.

And even if this was a patch with _no_ possibility for regressions and it
was a completely trivial one that improves performance in some cases...
one big problem is that it uses another page flag.

I literally have about 4 or 5 new page flags I'd like to add today :) I
can't of course, because we have very few spare ones left.

From the MySQL numbers on this system, it seems like performance is in the
noise, and MADV_DONTNEED makes the _vast_ majority of the improvement.
This is also the case with Rik's benchmarks, and while he did see some
improvement, I found the runs to be quite variable, so it would be ideal
to get a larger sample.

And the fact that the poor behaviour of the old style malloc/free went
unnoticed for so long indicates that it won't be the end of the world if
we didn't merge MADV_FREE right now.

--
SUSE Labs, Novell Inc.

2007-05-04 23:50:11

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Rik van Riel wrote:
> Nick Piggin wrote:
>
>> Rik van Riel wrote:
>>
>>> With lazy freeing of anonymous pages through MADV_FREE, performance of
>>> the MySQL sysbench workload more than doubles on my quad-core system.
>>
>>
>> OK, I've run some tests on a 16 core Opteron system, both sysbench with
>> MySQL 5.33 (set up as described in the freebsd vs linux page), and with
>> ebizzy.
>>
>> What I found is that, on this system, MADV_FREE performance improvement
>> was in the noise when you look at it on top of the MADV_DONTNEED glibc
>> and down_read(mmap_sem) patch in sysbench.
>
>
> Interesting, very different results from my system.
>
> First, did you run with the properly TLB batched version of
> the MADV_FREE patch? And did you make sure that MADV_FREE
> takes the mmap_sem for reading? Without that, I did see
> a similar thing to what you saw...

Yes and yes (I initially forgot to add MADV_FREE to the down_read
case and saw horrible performance!)


> Secondly, I'll have to try some test runs one of the larger
> systems in the lab.
>
> Maybe the results from my quad core Intel system are not
> typical; maybe the results from your 16 core Opteron are
> not typical. Either way, I want to find out :)

Yep. We might have something like that here, and I'll try with
some other architectures as well next week, if I can get glibc
built.

--
SUSE Labs, Novell Inc.

2007-05-05 00:11:06

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Nick Piggin wrote:
> I literally have about 4 or 5 new page flags I'd like to add today :) I
> can't of course, because we have very few spare ones left.

I remember Rik saying that if need be he can (try to?) think of a method
to implement it without a page flag.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2007-05-06 22:43:51

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Nick Piggin wrote:

> OK, sure. I think we need more numbers though.

Thinking about the issue some more, I think I know just the
number we might want to know.

It is pretty obvious that the kernel needs to do less work
with the MADV_FREE code present. However, it is possible
that userspace needs to do more work, by accessing pages
that are not in the CPU cache, or in another CPU's cache.

In the test cases where you see similar performance on the
workload with and without the MADV_FREE code, are you by any
chance seeing lower system time and higher user time?

I think that maybe for 2.6.22 we should just alias MADV_FREE
to run with the MADV_DONTNEED functionality, so that the glibc
people can make the change on their side while we figure out
what will be the best thing to do on the kernel side.

I'll send in a patch that does that once Linus has committed
your most recent flood of patches. What do you think?

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.

2007-05-07 02:42:51

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Rik van Riel wrote:
> I think that maybe for 2.6.22 we should just alias MADV_FREE
> to run with the MADV_DONTNEED functionality, so that the glibc
> people can make the change on their side while we figure out
> what will be the best thing to do on the kernel side.

No need for that. We can later extend glibc to use MADV_FREE and fall
back on MADV_DONTNEED.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2007-05-07 04:46:13

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Ulrich Drepper wrote:
> Rik van Riel wrote:
>> I think that maybe for 2.6.22 we should just alias MADV_FREE
>> to run with the MADV_DONTNEED functionality, so that the glibc
>> people can make the change on their side while we figure out
>> what will be the best thing to do on the kernel side.
>
> No need for that. We can later extend glibc to use MADV_FREE and fall
> back on MADV_DONTNEED.

It's trivial to merge the MADV_FREE #defines into the kernel
though, and aliasing MADV_FREE to MADV_DONTNEED for the time
being is a one-liner - just an extra constant into the big
switch statement in sys_madvise().

2007-05-07 04:53:36

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Rik van Riel wrote:
> It's trivial to merge the MADV_FREE #defines into the kernel
> though, and aliasing MADV_FREE to MADV_DONTNEED for the time
> being is a one-liner - just an extra constant into the big
> switch statement in sys_madvise().

Until the semantics of the implementation is cut into stone by having it
in the kernel I'll not start using it.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

2007-05-07 16:52:15

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Ulrich Drepper wrote:
> Rik van Riel wrote:
>> It's trivial to merge the MADV_FREE #defines into the kernel
>> though, and aliasing MADV_FREE to MADV_DONTNEED for the time
>> being is a one-liner - just an extra constant into the big
>> switch statement in sys_madvise().
>
> Until the semantics of the implementation is cut into stone by having it
> in the kernel I'll not start using it.

The current MADV_DONTNEED implementation conforms to the
semantics of MADV_FREE :)

With MADV_FREE you can get back either your old data, or
a freshly zeroed out new page. Always getting back the
second alternative is conformant :)

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.

2007-05-08 03:52:13

by Rik van Riel

[permalink] [raw]
Subject: [PATCH] stub MADV_FREE implementation

include/asm-alpha/mman.h | 1 +
include/asm-generic/mman.h | 1 +
include/asm-mips/mman.h | 1 +
include/asm-parisc/mman.h | 1 +
include/asm-sparc/mman.h | 2 --
include/asm-sparc64/mman.h | 2 --
include/asm-xtensa/mman.h | 1 +
mm/madvise.c | 2 ++
8 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/asm-alpha/mman.h b/include/asm-alpha/mman.h
index 90d7c35..d47b5a3 100644
--- a/include/asm-alpha/mman.h
+++ b/include/asm-alpha/mman.h
@@ -42,6 +42,7 @@ #define MADV_SEQUENTIAL 2 /* expect seq
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_SPACEAVAIL 5 /* ensure resources are available */
#define MADV_DONTNEED 6 /* don't need these pages */
+#define MADV_FREE 7 /* don't need the pages or the data */

/* common/generic parameters */
#define MADV_REMOVE 9 /* remove these pages & resources */
diff --git a/include/asm-generic/mman.h b/include/asm-generic/mman.h
index 5e3dde2..34a9ff1 100644
--- a/include/asm-generic/mman.h
+++ b/include/asm-generic/mman.h
@@ -29,6 +29,7 @@ #define MADV_RANDOM 1 /* expect random
#define MADV_SEQUENTIAL 2 /* expect sequential page references */
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_DONTNEED 4 /* don't need these pages */
+#define MADV_FREE 5 /* don't need the pages or the data */

/* common parameters: try to keep these consistent across architectures */
#define MADV_REMOVE 9 /* remove these pages & resources */
diff --git a/include/asm-mips/mman.h b/include/asm-mips/mman.h
index e4d6f1f..68067ff 100644
--- a/include/asm-mips/mman.h
+++ b/include/asm-mips/mman.h
@@ -65,6 +65,7 @@ #define MADV_RANDOM 1 /* expect random
#define MADV_SEQUENTIAL 2 /* expect sequential page references */
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_DONTNEED 4 /* don't need these pages */
+#define MADV_FREE 5 /* don't need the pages or the data */

/* common parameters: try to keep these consistent across architectures */
#define MADV_REMOVE 9 /* remove these pages & resources */
diff --git a/include/asm-parisc/mman.h b/include/asm-parisc/mman.h
index defe752..347fbca 100644
--- a/include/asm-parisc/mman.h
+++ b/include/asm-parisc/mman.h
@@ -38,6 +38,7 @@ #define MADV_DONTNEED 4
#define MADV_SPACEAVAIL 5 /* insure that resources are reserved */
#define MADV_VPS_PURGE 6 /* Purge pages from VM page cache */
#define MADV_VPS_INHERIT 7 /* Inherit parents page size */
+#define MADV_FREE 8 /* don't need the pages or the data */

/* common/generic parameters */
#define MADV_REMOVE 9 /* remove these pages & resources */
diff --git a/include/asm-sparc/mman.h b/include/asm-sparc/mman.h
index b7dc40b..5ec7106 100644
--- a/include/asm-sparc/mman.h
+++ b/include/asm-sparc/mman.h
@@ -33,8 +33,6 @@ #define MC_UNLOCK 3 /* Unlock pag
#define MC_LOCKAS 5 /* Lock an entire address space of the calling process */
#define MC_UNLOCKAS 6 /* Unlock entire address space of calling process */

-#define MADV_FREE 0x5 /* (Solaris) contents can be freed */
-
#ifdef __KERNEL__
#ifndef __ASSEMBLY__
#define arch_mmap_check sparc_mmap_check
diff --git a/include/asm-sparc64/mman.h b/include/asm-sparc64/mman.h
index 8cc1860..03b05d5 100644
--- a/include/asm-sparc64/mman.h
+++ b/include/asm-sparc64/mman.h
@@ -33,8 +33,6 @@ #define MC_UNLOCK 3 /* Unlock pag
#define MC_LOCKAS 5 /* Lock an entire address space of the calling process */
#define MC_UNLOCKAS 6 /* Unlock entire address space of calling process */

-#define MADV_FREE 0x5 /* (Solaris) contents can be freed */
-
#ifdef __KERNEL__
#ifndef __ASSEMBLY__
#define arch_mmap_check sparc64_mmap_check
diff --git a/include/asm-xtensa/mman.h b/include/asm-xtensa/mman.h
index 9b92620..1345703 100644
--- a/include/asm-xtensa/mman.h
+++ b/include/asm-xtensa/mman.h
@@ -72,6 +72,7 @@ #define MADV_RANDOM 1 /* expect random
#define MADV_SEQUENTIAL 2 /* expect sequential page references */
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_DONTNEED 4 /* don't need these pages */
+#define MADV_FREE 5 /* don't need the pages or the data */

/* common parameters: try to keep these consistent across architectures */
#define MADV_REMOVE 9 /* remove these pages & resources */
diff --git a/mm/madvise.c b/mm/madvise.c
index e75096b..ad067f2 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -22,6 +22,7 @@ static int madvise_need_mmap_write(int b
case MADV_REMOVE:
case MADV_WILLNEED:
case MADV_DONTNEED:
+ case MADV_FREE:
return 0;
default:
/* be safe, default to 1. list exceptions explicitly */
@@ -234,6 +235,7 @@ madvise_vma(struct vm_area_struct *vma,
break;

case MADV_DONTNEED:
+ case MADV_FREE:
error = madvise_dontneed(vma, prev, start, end);
break;


Attachments:
stub-madv_free (4.76 kB)

2007-05-08 06:12:13

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Rik van Riel wrote:
> Nick Piggin wrote:
>
>> OK, sure. I think we need more numbers though.
>
>
> Thinking about the issue some more, I think I know just the
> number we might want to know.
>
> It is pretty obvious that the kernel needs to do less work
> with the MADV_FREE code present. However, it is possible
> that userspace needs to do more work, by accessing pages
> that are not in the CPU cache, or in another CPU's cache.
>
> In the test cases where you see similar performance on the
> workload with and without the MADV_FREE code, are you by any
> chance seeing lower system time and higher user time?

I didn't actually check system and user times for the mysql
benchmark, but that's exactly what I had in mind when I
mentioned the poor cache behaviour this patch could cause. I
definitely did see user times go up in benchmarks where I
measured.

We have percpu and cache affine page allocators, so when
userspace just frees a page, it is likely to be cache hot, so
we want to free it up so it can be reused by this CPU ASAP.
Likewise, when we newly allocate a page, we want it to be one
that is cache hot on this CPU.


> I think that maybe for 2.6.22 we should just alias MADV_FREE
> to run with the MADV_DONTNEED functionality, so that the glibc
> people can make the change on their side while we figure out
> what will be the best thing to do on the kernel side.
>
> I'll send in a patch that does that once Linus has committed
> your most recent flood of patches. What do you think?

I'll let you and Ulrich decide on that. Keep in mind that older
kernels (without the mmap_sem patch for MADV_DONTNEED) still
seem to get a pretty decent improvement from using MADV_DONTNEED,
so it is possible glibc will want to start using that anyway.

--
SUSE Labs, Novell Inc.

2007-05-08 15:00:37

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Nick Piggin wrote:

> We have percpu and cache affine page allocators, so when
> userspace just frees a page, it is likely to be cache hot, so
> we want to free it up so it can be reused by this CPU ASAP.
> Likewise, when we newly allocate a page, we want it to be one
> that is cache hot on this CPU.

Actually, isn't the clear page function capable of doing
some magic, when it writes all zeroes into the page, that
causes the zeroes to just live in CPU cache without the old
data ever being loaded from RAM?

That would sure be faster than touching RAM. Not sure if
we use/trigger that kind of magic, though :)

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.

2007-05-08 18:35:51

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

On Tue, May 08, 2007 at 04:12:00PM +1000, Nick Piggin wrote:
> I didn't actually check system and user times for the mysql
> benchmark, but that's exactly what I had in mind when I
> mentioned the poor cache behaviour this patch could cause. I
> definitely did see user times go up in benchmarks where I
> measured.
>
> We have percpu and cache affine page allocators, so when
> userspace just frees a page, it is likely to be cache hot, so
> we want to free it up so it can be reused by this CPU ASAP.
> Likewise, when we newly allocate a page, we want it to be one
> that is cache hot on this CPU.

malloc has per-thread arenas, so when using MADV_FREE the pages
should be local to the thread as well (unless the thread has switched
to a different CPU also to the CPU) and in case of sysbench should
be cache hot as well (it is reused RSN). With MADV_DONTNEED you need to
clear the pages while that is not necessary with MADV_FREE.

Jakub

2007-05-08 23:06:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] stub MADV_FREE implementation

On Mon, 07 May 2007 23:51:47 -0400
Rik van Riel <[email protected]> wrote:

> Until we have better performance numbers on the lazy reclaim path,
> we can just alias MADV_FREE to MADV_DONTNEED with this trivial
> patch.
>
> This way glibc can go ahead with the optimization on their side
> and we can figure out the kernel side later.
>
> Signed-off-by: Rik van Riel <[email protected]>

Could someone please explain what is going on here?


And has Ulrich indicated that glibc would indeed go out ahead of
the kernel in this fashion?

2007-05-08 23:24:19

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Rik van Riel wrote:
> Nick Piggin wrote:
>
>> We have percpu and cache affine page allocators, so when
>> userspace just frees a page, it is likely to be cache hot, so
>> we want to free it up so it can be reused by this CPU ASAP.
>> Likewise, when we newly allocate a page, we want it to be one
>> that is cache hot on this CPU.
>
>
> Actually, isn't the clear page function capable of doing
> some magic, when it writes all zeroes into the page, that
> causes the zeroes to just live in CPU cache without the old
> data ever being loaded from RAM?
>
> That would sure be faster than touching RAM. Not sure if
> we use/trigger that kind of magic, though :)
>

powerpc has and uses an instruction to zero a full cacheline, yes.

Not sure about x86-64 CPUs... I don't think they can do it.

--
SUSE Labs, Novell Inc.

2007-05-08 23:43:52

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Jakub Jelinek wrote:
> On Tue, May 08, 2007 at 04:12:00PM +1000, Nick Piggin wrote:
>
>>I didn't actually check system and user times for the mysql
>>benchmark, but that's exactly what I had in mind when I
>>mentioned the poor cache behaviour this patch could cause. I
>>definitely did see user times go up in benchmarks where I
>>measured.
>>
>>We have percpu and cache affine page allocators, so when
>>userspace just frees a page, it is likely to be cache hot, so
>>we want to free it up so it can be reused by this CPU ASAP.
>>Likewise, when we newly allocate a page, we want it to be one
>>that is cache hot on this CPU.
>
>
> malloc has per-thread arenas, so when using MADV_FREE the pages
> should be local to the thread as well (unless the thread has switched
> to a different CPU also to the CPU) and in case of sysbench should
> be cache hot as well (it is reused RSN).

Right, but the kernel also wants to use cache hot pages for other
things, and it also frees back its own cache hot pages into the
allocator.

The fact that sysbench is a good candidate for this but does not
show any improvements is telling... if the workload does not reuse
the page RSN, or if it is reclaiming them, we could actually see
regressions.


> With MADV_DONTNEED you need to
> clear the pages while that is not necessary with MADV_FREE.

With MADV_FREE, you don't need to zero the memory, but the page
is uninitialised. So you need to initialise it *somehow* (ie. use
either a zeroing alloc, or initialise it with application specific
data). At that point, you have to touch the cachelines anyway, so
the extra zeroing is going to cost very little (and you can see
that single threaded performance isn't improved).

--
SUSE Labs, Novell Inc.

2007-05-09 16:39:23

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

On Fri, 4 May 2007, Ulrich Drepper wrote:
>
> I don't want to judge the numbers since I cannot but I want to make an
> observations: even if in the SMP case MADV_FREE turns out to not be a
> bigger boost then there is still the UP case to keep in mind where Rik
> measured a significant speed-up. As long as the SMP case isn't hurt
> this is reaosn enough to use the patch. With more and more cores on one
> processor SMP systems are pushed evermore to the high-end side. You'll
> find many installations which today use SMP will be happy enough with
> many-core UP machines.

Just remembered this mail from a few days ago, and how puzzled I'd been
by your last sentence or two: I seem to be reading it in the wrong way,
and don't understand why users of SMP kernels will be moving to UP?

UP in the sense of one processor but many cores? But that still needs
an SMP kernel to use all those cores. Or you're thinking of growing
virtualization? Would you please explain further?

Thanks,
Hugh

2007-05-09 17:15:55

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] stub MADV_FREE implementation

On 5/8/07, Andrew Morton <[email protected]> wrote:
> And has Ulrich indicated that glibc would indeed go out ahead of
> the kernel in this fashion?

Rik is concerned to get a glibc version which allows him to test the
improvements. That's really not a big problem. We laready have a
patch for this and can provide appropriate RPMs easily.

I don't want to set a precedence for adding glibc support for phantom
features. So, I would not add support to the official glibc anyway
until there is a fixed implementation which then also means a fixed
ABI. So, Andrew, applying the patch won't do any good.

2007-05-29 16:59:30

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

Nick Piggin wrote:
> Rik van Riel wrote:
>> With lazy freeing of anonymous pages through MADV_FREE, performance of
>> the MySQL sysbench workload more than doubles on my quad-core system.
>
> OK, I've run some tests on a 16 core Opteron system, both sysbench with
> MySQL 5.33 (set up as described in the freebsd vs linux page), and with
> ebizzy.
>
> What I found is that, on this system, MADV_FREE performance improvement
> was in the noise when you look at it on top of the MADV_DONTNEED glibc
> and down_read(mmap_sem) patch in sysbench.

It turns out that setting the pte accessed bit in hardware
can apparently take a few thousand CPU cycles - 3000 cycles
is the number I've heard for one CPU family.

This is a similar number of cycles as is needed to zero out
a page. Giving a cache hot page to userspace could cancel
out the rest of the cost of the page fault handling.

Lets stick with the simpler MADV_DONTNEED code for now and
save the page flag for something else...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.