2004-10-14 16:50:52

by Martin K. Petersen

[permalink] [raw]
Subject: [PATCH] General purpose zeroed page slab


A while back Bill Irwin converted the page table code on ppc64 to use
a zeroed page slab. I recently did the same on ia64 and got a
significant performance improvement in terms of fault time (4 usec ->
700 nsec).

This cache needs to be initialized fairly early on and so far we've
called it from pgtable_cache_init() on both archs. However, Tony Luck
thought it might be useful to have a general purpose slab cache with
zeroed pages. And other architectures might decide to use it for
their page tables too.

Consequently here's a patch that puts this functionality in slab.c.

Signed-off-by: Martin K. Petersen <[email protected]>

--
Martin K. Petersen Wild Open Source, Inc.
[email protected] http://www.wildopensource.com/

diff -urN -X /usr/people/mkp/bin/dontdiff linux-pristine/include/linux/slab.h zero-slab/include/linux/slab.h
--- linux-pristine/include/linux/slab.h 2004-10-11 14:57:20.000000000 -0700
+++ zero-slab/include/linux/slab.h 2004-10-13 17:49:29.000000000 -0700
@@ -115,6 +115,7 @@
extern kmem_cache_t *signal_cachep;
extern kmem_cache_t *sighand_cachep;
extern kmem_cache_t *bio_cachep;
+extern kmem_cache_t *zero_page_cachep;

extern atomic_t slab_reclaim_pages;

diff -urN -X /usr/people/mkp/bin/dontdiff linux-pristine/mm/slab.c zero-slab/mm/slab.c
--- linux-pristine/mm/slab.c 2004-10-11 14:57:20.000000000 -0700
+++ zero-slab/mm/slab.c 2004-10-13 17:49:57.000000000 -0700
@@ -716,6 +716,13 @@

static struct notifier_block cpucache_notifier = { &cpuup_callback, NULL, 0 };

+kmem_cache_t *zero_page_cachep;
+
+static void zero_page_ctor(void *pte, kmem_cache_t *cache, unsigned long flags)
+{
+ memset(pte, 0, PAGE_SIZE);
+}
+
/* Initialisation.
* Called after the gfp() functions have been enabled, and before smp_init().
*/
@@ -837,6 +844,16 @@
/* The reap timers are started later, with a module init call:
* That part of the kernel is not yet operational.
*/
+
+ /* General purpose cache of zeroed pages */
+ zero_page_cachep = kmem_cache_create("zero_page_cache",
+ PAGE_SIZE, 0,
+ SLAB_HWCACHE_ALIGN |
+ SLAB_MUST_HWCACHE_ALIGN,
+ zero_page_ctor,
+ NULL);
+ if (!zero_page_cachep)
+ panic("could not create zero_page_cache!\n");
}

static int __init cpucache_init(void)


2004-10-14 18:29:30

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Thu, Oct 14, 2004 at 01:30:21PM -0400, Brian Gerst wrote:
> This doesn't work as you expect it does. The constructor is only called
> when a new slab is created, for each new object on the slab. It is
> _not_ run again when an object is freed. So if a page is freed then
> immediately reallocated it will contain garbage.

The user is responsible for zeroing the page before handing it back to
the slab allocator.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-10-14 18:55:05

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Thu, Oct 14, 2004 at 06:36:37PM +0100, Matthew Wilcox wrote:
> On Thu, Oct 14, 2004 at 01:30:21PM -0400, Brian Gerst wrote:
> > This doesn't work as you expect it does. The constructor is only called
> > when a new slab is created, for each new object on the slab. It is
> > _not_ run again when an object is freed. So if a page is freed then
> > immediately reallocated it will contain garbage.
>
> The user is responsible for zeroing the page before handing it back to
> the slab allocator.

That sounds like an accident waiting to happen.
How about a CONFIG_DEBUG option to check its zeroed on free ?

Dave

2004-10-14 19:21:51

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Thursday 14 October 2004 21:49, Dave Jones wrote:
> On Thu, Oct 14, 2004 at 06:36:37PM +0100, Matthew Wilcox wrote:
> > On Thu, Oct 14, 2004 at 01:30:21PM -0400, Brian Gerst wrote:
> > > This doesn't work as you expect it does. The constructor is only called
> > > when a new slab is created, for each new object on the slab. It is
> > > _not_ run again when an object is freed. So if a page is freed then
> > > immediately reallocated it will contain garbage.
> >
> > The user is responsible for zeroing the page before handing it back to
> > the slab allocator.

BTW, zeroing with non-temporal stores may be a huge win here.
It is 300% faster on Athlon.

> That sounds like an accident waiting to happen.
> How about a CONFIG_DEBUG option to check its zeroed on free ?
--
vda

2004-10-14 19:57:23

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Thu, Oct 14, 2004 at 10:08:49PM +0300, Denis Vlasenko wrote:
> BTW, zeroing with non-temporal stores may be a huge win here.
> It is 300% faster on Athlon.

That assumes the page isn't going to be reused quickly. The point of
slab is that the page does get reused quickly. So you *want* the data
to be hot in cache.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-10-14 20:23:15

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Thu, Oct 14, 2004 at 12:50:43PM -0400, Martin K. Petersen wrote:
>
> A while back Bill Irwin converted the page table code on ppc64 to use
> a zeroed page slab. I recently did the same on ia64 and got a
> significant performance improvement in terms of fault time (4 usec ->
> 700 nsec).
>
> This cache needs to be initialized fairly early on and so far we've
> called it from pgtable_cache_init() on both archs. However, Tony Luck
> thought it might be useful to have a general purpose slab cache with
> zeroed pages. And other architectures might decide to use it for
> their page tables too.
>
> Consequently here's a patch that puts this functionality in slab.c.
>
> Signed-off-by: Martin K. Petersen <[email protected]>
>
> --
> Martin K. Petersen Wild Open Source, Inc.
> [email protected] http://www.wildopensource.com/
>
> diff -urN -X /usr/people/mkp/bin/dontdiff linux-pristine/include/linux/slab.h zero-slab/include/linux/slab.h
> --- linux-pristine/include/linux/slab.h 2004-10-11 14:57:20.000000000 -0700
> +++ zero-slab/include/linux/slab.h 2004-10-13 17:49:29.000000000 -0700
> @@ -115,6 +115,7 @@
> extern kmem_cache_t *signal_cachep;
> extern kmem_cache_t *sighand_cachep;
> extern kmem_cache_t *bio_cachep;
> +extern kmem_cache_t *zero_page_cachep;
>
> extern atomic_t slab_reclaim_pages;
>
> diff -urN -X /usr/people/mkp/bin/dontdiff linux-pristine/mm/slab.c zero-slab/mm/slab.c
> --- linux-pristine/mm/slab.c 2004-10-11 14:57:20.000000000 -0700
> +++ zero-slab/mm/slab.c 2004-10-13 17:49:57.000000000 -0700
> @@ -716,6 +716,13 @@
>
> static struct notifier_block cpucache_notifier = { &cpuup_callback, NULL, 0 };
>
> +kmem_cache_t *zero_page_cachep;
> +
> +static void zero_page_ctor(void *pte, kmem_cache_t *cache, unsigned long flags)
> +{
> + memset(pte, 0, PAGE_SIZE);
> +}

The means every user has to memset it to zero before free.
Add a comment for that at least.

Also that's pretty dumb. How about keeping track how much of the
page got non zeroed (e.g. by using a few free words in struct page
for a coarse grained dirty bitmap)

Then you could memset on free only the parts that got actually
changed, and never waste cache lines for anything else.

-Andi

2004-10-14 23:44:36

by Adam Heath

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Thu, 14 Oct 2004, Andi Kleen wrote:

> Also that's pretty dumb. How about keeping track how much of the
> page got non zeroed (e.g. by using a few free words in struct page
> for a coarse grained dirty bitmap)
>
> Then you could memset on free only the parts that got actually
> changed, and never waste cache lines for anything else.

That will fail when a struct is placed in the page, and only the beginning and
end of the struct was changed.

2004-10-15 00:18:39

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

Martin K. Petersen wrote:
> A while back Bill Irwin converted the page table code on ppc64 to use
> a zeroed page slab. I recently did the same on ia64 and got a
> significant performance improvement in terms of fault time (4 usec ->
> 700 nsec).
>
> This cache needs to be initialized fairly early on and so far we've
> called it from pgtable_cache_init() on both archs. However, Tony Luck
> thought it might be useful to have a general purpose slab cache with
> zeroed pages. And other architectures might decide to use it for
> their page tables too.
>
> Consequently here's a patch that puts this functionality in slab.c.
>
> Signed-off-by: Martin K. Petersen <[email protected]>
>

This doesn't work as you expect it does. The constructor is only called
when a new slab is created, for each new object on the slab. It is
_not_ run again when an object is freed. So if a page is freed then
immediately reallocated it will contain garbage.

--
Brian Gerst

2004-10-15 01:19:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Thu, Oct 14, 2004 at 06:36:47PM -0500, Adam Heath wrote:
> On Thu, 14 Oct 2004, Andi Kleen wrote:
>
> > Also that's pretty dumb. How about keeping track how much of the
> > page got non zeroed (e.g. by using a few free words in struct page
> > for a coarse grained dirty bitmap)
> >
> > Then you could memset on free only the parts that got actually
> > changed, and never waste cache lines for anything else.
>
> That will fail when a struct is placed in the page, and only the beginning and
> end of the struct was changed.

Hmm? I think you're misunderstanding me. The dirty bitmap would cover
all areas that are potentially not zero. If someone changes a byte
without setting the bitmap they're buggy.


-Andi
>

2004-10-18 18:06:53

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

>>>>> "Andi" == Andi Kleen <[email protected]> writes:

[Sorry about dropping off of the planet for a couple of days. Had to
deal with some water damage on the house :/]

Andi> The means every user has to memset it to zero before free. Add
Andi> a comment for that at least.

Andi> Also that's pretty dumb. How about keeping track how much of the
Andi> page got non zeroed (e.g. by using a few free words in struct
Andi> page for a coarse grained dirty bitmap)

Andi> Then you could memset on free only the parts that got actually
Andi> changed, and never waste cache lines for anything else.

Ayup. I'll ponder this a bit. For now I think I'm going to leave the
page table cache stuff as is and make the general purpose slab a
separate project. It'll be easy to switch the page tables over later.

--
Martin K. Petersen Wild Open Source, Inc.
[email protected] http://www.wildopensource.com/

2004-10-18 18:45:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Mon, Oct 18, 2004 at 02:06:45PM -0400, Martin K. Petersen wrote:
> Andi> The means every user has to memset it to zero before free. Add
> Andi> a comment for that at least.
>
> Andi> Also that's pretty dumb. How about keeping track how much of the
> Andi> page got non zeroed (e.g. by using a few free words in struct
> Andi> page for a coarse grained dirty bitmap)
>
> Andi> Then you could memset on free only the parts that got actually
> Andi> changed, and never waste cache lines for anything else.
>
> Ayup. I'll ponder this a bit. For now I think I'm going to leave the
> page table cache stuff as is and make the general purpose slab a
> separate project. It'll be easy to switch the page tables over later.

It's probably worth doing this with a static cachep in slab.c and only
exposing a get_zeroed_page() / free_zeroed_page() interface, with the
latter doing the memset to 0. I disagree with Andi over the dumbness
of zeroing the whole page. That makes it cache-hot, which is what you
want from a page you allocate from slab.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-10-18 19:12:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Mon, 18 Oct 2004 19:42:10 +0100
Matthew Wilcox <[email protected]> wrote:

> I disagree with Andi over the dumbness
> of zeroing the whole page. That makes it cache-hot, which is what you
> want from a page you allocate from slab.

Not for a page table, which tends to be not fully used.
It would be true for a user page, but that doesn't use this mechanism anyways.

-Andi

2004-10-18 19:18:16

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

On Mon, Oct 18, 2004 at 12:03:04PM -0700, Luck, Tony wrote:
> >It's probably worth doing this with a static cachep in slab.c and only
> >exposing a get_zeroed_page() / free_zeroed_page() interface, with the
> >latter doing the memset to 0. I disagree with Andi over the dumbness
> >of zeroing the whole page. That makes it cache-hot, which is what you
> >want from a page you allocate from slab.
>
> We started this discussion with the plan of using this interface to
> allocate/free page tables at all levels in the page table hierarchy
> (rather than maintain a special purpose "quicklist" allocator for each
> level). This is a somewhat specialized usage in that we know that we
> have a completely zeroed page when we free ... so we really don't
> want the overhead of zeroing it again.

Ah, I'm not a VM weenie, so I didn't know this was guaranteed ;-)

> There is also somewhat limited
> benefit to the cache hotness argument here as most page tables (especially
> higher-order ones) are used very sparsely.
>
> That said, the idea to expose this slab only through a specific API
> should calm fears about accidental mis-use (with people freeing a page
> that isn't all zeroes).

So alloc_zeroed_page(), free_zeroed_page(), zero_and_free_page()
interfaces with a CONFIG option to check that the page passed to
free_zeroed_page() is already zero?

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-10-18 19:22:47

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH] General purpose zeroed page slab

>It's probably worth doing this with a static cachep in slab.c and only
>exposing a get_zeroed_page() / free_zeroed_page() interface, with the
>latter doing the memset to 0. I disagree with Andi over the dumbness
>of zeroing the whole page. That makes it cache-hot, which is what you
>want from a page you allocate from slab.

We started this discussion with the plan of using this interface to
allocate/free page tables at all levels in the page table hierarchy
(rather than maintain a special purpose "quicklist" allocator for each
level). This is a somewhat specialized usage in that we know that we
have a completely zeroed page when we free ... so we really don't
want the overhead of zeroing it again. There is also somewhat limited
benefit to the cache hotness argument here as most page tables (especially
higher-order ones) are used very sparsely.

That said, the idea to expose this slab only through a specific API
should calm fears about accidental mis-use (with people freeing a page
that isn't all zeroes).

-Tony

2004-10-18 20:30:05

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

>>>>> "Matthew" == Matthew Wilcox <[email protected]> writes:

Matthew> It's probably worth doing this with a static cachep in slab.c
Matthew> and only exposing a get_zeroed_page() / free_zeroed_page()
Matthew> interface, with the latter doing the memset to 0.

*nod*


Matthew> I disagree with Andi over the dumbness of zeroing the whole
Matthew> page. That makes it cache-hot, which is what you want from a
Matthew> page you allocate from slab.

Yeah, plus the housekeeping may be more of a hassle than it's worth.

We'll see...

--
Martin K. Petersen Wild Open Source, Inc.
[email protected] http://www.wildopensource.com/

2004-10-18 21:16:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] General purpose zeroed page slab

> It's probably worth doing this with a static cachep in slab.c and only
> exposing a get_zeroed_page() / free_zeroed_page() interface, with the
> latter doing the memset to 0.

Putting a memset in there would be dumb because the mm cleanup already
zeroes the page tables.

My dirty bitmap proposal would make that faster however, same as
copy_page_range et.al.

> I disagree with Andi over the dumbness
> of zeroing the whole page. That makes it cache-hot, which is what you
> want from a page you allocate from slab.

It's already cache hot from the page table free and you only want one cache
line in it cache hot, not the whole page.

-Andi