2008-03-11 06:19:08

by Balbir Singh

[permalink] [raw]
Subject: [PATCH] Move memory controller allocations to their own slabs (v2)



Move the memory controller data structure page_cgroup to its own slab cache.
It saves space on the system, allocations are not necessarily pushed to order
of 2 and should provide performance benefits. Users who disable the memory
controller can also double check that the memory controller is not allocating
page_cgroup's.

Signed-off-by: Balbir Singh <[email protected]>
---

mm/memcontrol.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff -puN mm/memcontrol.c~memory-controller-move-to-own-slab mm/memcontrol.c
--- linux-2.6.25-rc4/mm/memcontrol.c~memory-controller-move-to-own-slab 2008-03-10 23:22:34.000000000 +0530
+++ linux-2.6.25-rc4-balbir/mm/memcontrol.c 2008-03-11 10:29:00.000000000 +0530
@@ -26,6 +26,7 @@
#include <linux/backing-dev.h>
#include <linux/bit_spinlock.h>
#include <linux/rcupdate.h>
+#include <linux/slab.h>
#include <linux/swap.h>
#include <linux/spinlock.h>
#include <linux/fs.h>
@@ -35,6 +36,7 @@

struct cgroup_subsys mem_cgroup_subsys;
static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
+static struct kmem_cache *page_cgroup_cache;

/*
* Statistics for memory cgroup.
@@ -560,7 +562,7 @@ retry:
}
unlock_page_cgroup(page);

- pc = kzalloc(sizeof(struct page_cgroup), gfp_mask);
+ pc = kmem_cache_zalloc(page_cgroup_cache, gfp_mask);
if (pc == NULL)
goto err;

@@ -622,7 +624,7 @@ retry:
*/
res_counter_uncharge(&mem->res, PAGE_SIZE);
css_put(&mem->css);
- kfree(pc);
+ kmem_cache_free(page_cgroup_cache, pc);
goto retry;
}
page_assign_page_cgroup(page, pc);
@@ -637,7 +639,7 @@ done:
return 0;
out:
css_put(&mem->css);
- kfree(pc);
+ kmem_cache_free(page_cgroup_cache, pc);
err:
return -ENOMEM;
}
@@ -695,7 +697,7 @@ void mem_cgroup_uncharge_page(struct pag
res_counter_uncharge(&mem->res, PAGE_SIZE);
css_put(&mem->css);

- kfree(pc);
+ kmem_cache_free(page_cgroup_cache, pc);
return;
}

@@ -1020,6 +1022,8 @@ mem_cgroup_create(struct cgroup_subsys *
if (unlikely((cont->parent) == NULL)) {
mem = &init_mem_cgroup;
init_mm.mem_cgroup = mem;
+ page_cgroup_cache = KMEM_CACHE(page_cgroup,
+ SLAB_HWCACHE_ALIGN | SLAB_PANIC);
} else
mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);

diff -puN include/linux/memcontrol.h~memory-controller-move-to-own-slab include/linux/memcontrol.h
_

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL


2008-03-11 08:16:11

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

Pavel Emelyanov wrote:
> Balbir Singh wrote:
>> Move the memory controller data structure page_cgroup to its own slab cache.
>> It saves space on the system, allocations are not necessarily pushed to order
>> of 2 and should provide performance benefits. Users who disable the memory
>> controller can also double check that the memory controller is not allocating
>> page_cgroup's.
>
> Can you, please, check how many objects-per-page we have with and
> without this patch for SLAB and SLUB?
>
> Thanks.

I can for objects-per-page with this patch for SLUB and SLAB. I am not sure
about what to check for without this patch. The machine is temporarily busy,
I'll check it once I get it back.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-03-11 08:16:29

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

Balbir Singh wrote:
> Move the memory controller data structure page_cgroup to its own slab cache.
> It saves space on the system, allocations are not necessarily pushed to order
> of 2 and should provide performance benefits. Users who disable the memory
> controller can also double check that the memory controller is not allocating
> page_cgroup's.

Can you, please, check how many objects-per-page we have with and
without this patch for SLAB and SLUB?

Thanks.

> Signed-off-by: Balbir Singh <[email protected]>
> ---
>
> mm/memcontrol.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff -puN mm/memcontrol.c~memory-controller-move-to-own-slab mm/memcontrol.c
> --- linux-2.6.25-rc4/mm/memcontrol.c~memory-controller-move-to-own-slab 2008-03-10 23:22:34.000000000 +0530
> +++ linux-2.6.25-rc4-balbir/mm/memcontrol.c 2008-03-11 10:29:00.000000000 +0530
> @@ -26,6 +26,7 @@
> #include <linux/backing-dev.h>
> #include <linux/bit_spinlock.h>
> #include <linux/rcupdate.h>
> +#include <linux/slab.h>
> #include <linux/swap.h>
> #include <linux/spinlock.h>
> #include <linux/fs.h>
> @@ -35,6 +36,7 @@
>
> struct cgroup_subsys mem_cgroup_subsys;
> static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
> +static struct kmem_cache *page_cgroup_cache;
>
> /*
> * Statistics for memory cgroup.
> @@ -560,7 +562,7 @@ retry:
> }
> unlock_page_cgroup(page);
>
> - pc = kzalloc(sizeof(struct page_cgroup), gfp_mask);
> + pc = kmem_cache_zalloc(page_cgroup_cache, gfp_mask);
> if (pc == NULL)
> goto err;
>
> @@ -622,7 +624,7 @@ retry:
> */
> res_counter_uncharge(&mem->res, PAGE_SIZE);
> css_put(&mem->css);
> - kfree(pc);
> + kmem_cache_free(page_cgroup_cache, pc);
> goto retry;
> }
> page_assign_page_cgroup(page, pc);
> @@ -637,7 +639,7 @@ done:
> return 0;
> out:
> css_put(&mem->css);
> - kfree(pc);
> + kmem_cache_free(page_cgroup_cache, pc);
> err:
> return -ENOMEM;
> }
> @@ -695,7 +697,7 @@ void mem_cgroup_uncharge_page(struct pag
> res_counter_uncharge(&mem->res, PAGE_SIZE);
> css_put(&mem->css);
>
> - kfree(pc);
> + kmem_cache_free(page_cgroup_cache, pc);
> return;
> }
>
> @@ -1020,6 +1022,8 @@ mem_cgroup_create(struct cgroup_subsys *
> if (unlikely((cont->parent) == NULL)) {
> mem = &init_mem_cgroup;
> init_mm.mem_cgroup = mem;
> + page_cgroup_cache = KMEM_CACHE(page_cgroup,
> + SLAB_HWCACHE_ALIGN | SLAB_PANIC);
> } else
> mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
>
> diff -puN include/linux/memcontrol.h~memory-controller-move-to-own-slab include/linux/memcontrol.h
> _
>

2008-03-11 08:44:20

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

Balbir Singh wrote:
> Pavel Emelyanov wrote:
>> Balbir Singh wrote:
>>> Move the memory controller data structure page_cgroup to its own slab cache.
>>> It saves space on the system, allocations are not necessarily pushed to order
>>> of 2 and should provide performance benefits. Users who disable the memory
>>> controller can also double check that the memory controller is not allocating
>>> page_cgroup's.
>> Can you, please, check how many objects-per-page we have with and
>> without this patch for SLAB and SLUB?
>>
>> Thanks.
>
> I can for objects-per-page with this patch for SLUB and SLAB. I am not sure
> about what to check for without this patch. The machine is temporarily busy,

Well, the objects-per-page without the patch is objects-per-page for
according kmalloc cache :)

> I'll check it once I get it back.
>

Ok, thanks!

2008-03-11 11:09:54

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

Pavel Emelyanov wrote:
> Balbir Singh wrote:
>> Pavel Emelyanov wrote:
>>> Balbir Singh wrote:
>>>> Move the memory controller data structure page_cgroup to its own slab cache.
>>>> It saves space on the system, allocations are not necessarily pushed to order
>>>> of 2 and should provide performance benefits. Users who disable the memory
>>>> controller can also double check that the memory controller is not allocating
>>>> page_cgroup's.
>>> Can you, please, check how many objects-per-page we have with and
>>> without this patch for SLAB and SLUB?
>>>
>>> Thanks.
>> I can for objects-per-page with this patch for SLUB and SLAB. I am not sure
>> about what to check for without this patch. The machine is temporarily busy,
>
> Well, the objects-per-page without the patch is objects-per-page for
> according kmalloc cache :)
>

OK, so here is the data

On my 64 bit powerpc system (structure size could be different on other systems)

1. sizeof page_cgroup is 40 bytes
which means kmalloc will allocate 64 bytes
2. With 4K pagesize SLAB with HWCACHE_ALIGN, 59 objects are packed per slab
3. With SLUB the value is 102 per slab



--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-03-11 18:48:20

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

Hugh Dickins wrote:
> On Tue, 11 Mar 2008, Balbir Singh wrote:
>> Move the memory controller data structure page_cgroup to its own slab cache.
>> It saves space on the system, allocations are not necessarily pushed to order
>> of 2 and should provide performance benefits. Users who disable the memory
>> controller can also double check that the memory controller is not allocating
>> page_cgroup's.
>>
>> Signed-off-by: Balbir Singh <[email protected]>
>
> I certainly approve of giving page_cgroups their own kmem_cache
> (and agree with Kame that it was overkill for the zones).
>

Thanks, yes, I agreed with Kame as well.

> But I don't agree with the SLAB_HWCACHE_ALIGN you've slipped into
> this version. That'll be wasting a lot of (all? more than?) the
> space you're trying to save with a kmem_cache, won't it? Let me
> talk about that separately, in reply to the mail where you report
> the numbers.
>

I slipped in the SLAB_HWCACHE_ALIGN to reduce page cgroup cache contention. Yes
you are right, we do lose out on space due to the extra alignment. My test
results (lmbench) on kmalloc, slub and slab are not very conclusive about
performance. SLUB had the best results w.r.t protection faults and context switches.

> Are you proposing this page_cgroup_cache mod for 2.6.25 or for 2.6.26?

I am OK with any version as long as we can get good feedback. I suspect the
feature will be useful for 2.6.25.

> I ask because I want to build upon it to fix up some GFP_ flag issues:
> I think we end up claiming the page_cgroups are __GFP_MOVABLE when they
> should be called __GFP_RECLAIMABLE; but I don't know how seriously we
> take MOVABLE/RECLAIMABLE discrepancies at present.
>

That's a very good question. I am not sure about the correct answer to that
myself at the moment.

> There's a patch I'd also like to build upon from Christoph in -mm
> (remove-set_migrateflags.patch), which sheds light on a similar issue
> with radix_tree_nodes). It's importance is again dependent on how
> seriously we're taking MOVABLE/RECLAIMABLE discrepancies.
>

I'll look at Christoph's patch and see what it addresses to understand the
MOVABLE/RECLAIMABLE discrepancy better.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2008-03-11 19:17:47

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

On Tue, 11 Mar 2008, Balbir Singh wrote:
>
> Move the memory controller data structure page_cgroup to its own slab cache.
> It saves space on the system, allocations are not necessarily pushed to order
> of 2 and should provide performance benefits. Users who disable the memory
> controller can also double check that the memory controller is not allocating
> page_cgroup's.
>
> Signed-off-by: Balbir Singh <[email protected]>

I certainly approve of giving page_cgroups their own kmem_cache
(and agree with Kame that it was overkill for the zones).

But I don't agree with the SLAB_HWCACHE_ALIGN you've slipped into
this version. That'll be wasting a lot of (all? more than?) the
space you're trying to save with a kmem_cache, won't it? Let me
talk about that separately, in reply to the mail where you report
the numbers.

Are you proposing this page_cgroup_cache mod for 2.6.25 or for 2.6.26?
I ask because I want to build upon it to fix up some GFP_ flag issues:
I think we end up claiming the page_cgroups are __GFP_MOVABLE when they
should be called __GFP_RECLAIMABLE; but I don't know how seriously we
take MOVABLE/RECLAIMABLE discrepancies at present.

There's a patch I'd also like to build upon from Christoph in -mm
(remove-set_migrateflags.patch), which sheds light on a similar issue
with radix_tree_nodes). It's importance is again dependent on how
seriously we're taking MOVABLE/RECLAIMABLE discrepancies.

Hugh

> ---
>
> mm/memcontrol.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff -puN mm/memcontrol.c~memory-controller-move-to-own-slab mm/memcontrol.c
> --- linux-2.6.25-rc4/mm/memcontrol.c~memory-controller-move-to-own-slab 2008-03-10 23:22:34.000000000 +0530
> +++ linux-2.6.25-rc4-balbir/mm/memcontrol.c 2008-03-11 10:29:00.000000000 +0530
> @@ -26,6 +26,7 @@
> #include <linux/backing-dev.h>
> #include <linux/bit_spinlock.h>
> #include <linux/rcupdate.h>
> +#include <linux/slab.h>
> #include <linux/swap.h>
> #include <linux/spinlock.h>
> #include <linux/fs.h>
> @@ -35,6 +36,7 @@
>
> struct cgroup_subsys mem_cgroup_subsys;
> static const int MEM_CGROUP_RECLAIM_RETRIES = 5;
> +static struct kmem_cache *page_cgroup_cache;
>
> /*
> * Statistics for memory cgroup.
> @@ -560,7 +562,7 @@ retry:
> }
> unlock_page_cgroup(page);
>
> - pc = kzalloc(sizeof(struct page_cgroup), gfp_mask);
> + pc = kmem_cache_zalloc(page_cgroup_cache, gfp_mask);
> if (pc == NULL)
> goto err;
>
> @@ -622,7 +624,7 @@ retry:
> */
> res_counter_uncharge(&mem->res, PAGE_SIZE);
> css_put(&mem->css);
> - kfree(pc);
> + kmem_cache_free(page_cgroup_cache, pc);
> goto retry;
> }
> page_assign_page_cgroup(page, pc);
> @@ -637,7 +639,7 @@ done:
> return 0;
> out:
> css_put(&mem->css);
> - kfree(pc);
> + kmem_cache_free(page_cgroup_cache, pc);
> err:
> return -ENOMEM;
> }
> @@ -695,7 +697,7 @@ void mem_cgroup_uncharge_page(struct pag
> res_counter_uncharge(&mem->res, PAGE_SIZE);
> css_put(&mem->css);
>
> - kfree(pc);
> + kmem_cache_free(page_cgroup_cache, pc);
> return;
> }
>
> @@ -1020,6 +1022,8 @@ mem_cgroup_create(struct cgroup_subsys *
> if (unlikely((cont->parent) == NULL)) {
> mem = &init_mem_cgroup;
> init_mm.mem_cgroup = mem;
> + page_cgroup_cache = KMEM_CACHE(page_cgroup,
> + SLAB_HWCACHE_ALIGN | SLAB_PANIC);
> } else
> mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
>

2008-03-12 00:04:23

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

On Tue, 11 Mar 2008, Balbir Singh wrote:
>
> On my 64 bit powerpc system (structure size could be different on other systems)
>
> 1. sizeof page_cgroup is 40 bytes
> which means kmalloc will allocate 64 bytes
> 2. With 4K pagesize SLAB with HWCACHE_ALIGN, 59 objects are packed per slab
> 3. With SLUB the value is 102 per slab

I expect you got those numbers with 2.6.25-rc4? Towards the end of -rc5
there's a patch from Nick to make SLUB's treatment of HWCACHE_ALIGN the
same as SLAB's, so I expect you'd be back to a similar poor density with
SLUB too. (But I'm replying without actually testing it out myself.)

I think you'd need a strong reason to choose HWCACHE_ALIGN for these.

Consider: the (normal configuration) x86_64 struct page size was 56
bytes for a long time (and still is without MEM_RES_CTLR), but we've
never inserted padding to make that a round 64 bytes (and they would
benefit additionally from some simpler arithmetic, not the case with
page_cgroups). Though it's good to avoid unnecessary sharing and
multiple cacheline accesses, it's not so good as to justify almost
doubling the size of a very very common structure. I think.

Hugh

2008-03-12 03:38:44

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

On Wednesday 12 March 2008 00:05, Hugh Dickins wrote:
> On Tue, 11 Mar 2008, Balbir Singh wrote:
> > On my 64 bit powerpc system (structure size could be different on other
> > systems)
> >
> > 1. sizeof page_cgroup is 40 bytes
> > which means kmalloc will allocate 64 bytes
> > 2. With 4K pagesize SLAB with HWCACHE_ALIGN, 59 objects are packed per
> > slab 3. With SLUB the value is 102 per slab
>
> I expect you got those numbers with 2.6.25-rc4? Towards the end of -rc5
> there's a patch from Nick to make SLUB's treatment of HWCACHE_ALIGN the
> same as SLAB's, so I expect you'd be back to a similar poor density with
> SLUB too. (But I'm replying without actually testing it out myself.)

Yes, that will be the case.

With a 64 byte cacheline size,

page_cgroup: |----|----|----|----|----|----|----|----
cacheline: |-------|-------|-------|-------|-------

So if you are accessing each field in a random page_cgroup, then it
will statistically cost you 1.5 cachelines (really: half the time, it
takes 2 cachelines).

If you HWCACHE_ALIGN this, it will cost you just 1 line.

Long live the size/speed tradeoff ;)

Whether this improvement is actually noticable is a different matter.


> I think you'd need a strong reason to choose HWCACHE_ALIGN for these.
>
> Consider: the (normal configuration) x86_64 struct page size was 56
> bytes for a long time (and still is without MEM_RES_CTLR), but we've
> never inserted padding to make that a round 64 bytes (and they would
> benefit additionally from some simpler arithmetic, not the case with
> page_cgroups).

I actually still think padding struct page will be a good idea. And
hopefully we can get rid of the page_cgroups pointer out of struct
page in order that we can have both config cases padded to 64 bytes
and use the extra space for larger refcounters maybe ;)

I simply haven't tried to run a realistic workload where it would
matter a great deal. I have a feeling that the oltp workloads could
see some improvement, because they're mainly just using the kernel
for direct IO, struct page is one of the few data structures that
they actually spend any cache on.


> Though it's good to avoid unnecessary sharing and
> multiple cacheline accesses, it's not so good as to justify almost
> doubling the size of a very very common structure. I think.

Anyway, back on topic from my rant, yes I agree with Hugh: I think
there should be some demonstration of a speedup (even in a "best
case" situation), before spending more bytes on a common data
structure.

2008-03-12 03:49:18

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Move memory controller allocations to their own slabs (v2)

Hugh Dickins wrote:
> On Tue, 11 Mar 2008, Balbir Singh wrote:
>> On my 64 bit powerpc system (structure size could be different on other systems)
>>
>> 1. sizeof page_cgroup is 40 bytes
>> which means kmalloc will allocate 64 bytes
>> 2. With 4K pagesize SLAB with HWCACHE_ALIGN, 59 objects are packed per slab
>> 3. With SLUB the value is 102 per slab
>
> I expect you got those numbers with 2.6.25-rc4? Towards the end of -rc5
> there's a patch from Nick to make SLUB's treatment of HWCACHE_ALIGN the
> same as SLAB's, so I expect you'd be back to a similar poor density with
> SLUB too. (But I'm replying without actually testing it out myself.)
>

You are right, these numbers are against -rc4 and I do see HWCACHE_ALIGN code
for SLUB in the latest kernel -git tree

> I think you'd need a strong reason to choose HWCACHE_ALIGN for these.
>
> Consider: the (normal configuration) x86_64 struct page size was 56
> bytes for a long time (and still is without MEM_RES_CTLR), but we've
> never inserted padding to make that a round 64 bytes (and they would
> benefit additionally from some simpler arithmetic, not the case with
> page_cgroups). Though it's good to avoid unnecessary sharing and
> multiple cacheline accesses, it's not so good as to justify almost
> doubling the size of a very very common structure. I think.
>

Very good point. I suppose an overhead proportional to the memory on the system
is too much to digest. I think struct page is a good example for page_cgroup to
follow.


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL