2007-05-01 18:10:55

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Mon, 30 Apr 2007, Andrew Morton wrote:
>
> i386-use-page-allocator-to-allocate-thread_info-structure.patch
> slub-core.patch
>
> slub. Or part thereof. This is another patch series which got messed up by
> poor patch sequencing.
>
> make-page-private-usable-in-compound-pages-v1.patch
> optimize-compound_head-by-avoiding-a-shared-page.patch
> add-virt_to_head_page-and-consolidate-code-in-slab-and-slub.patch
> slub-fix-object-tracking.patch
> slub-enable-tracking-of-full-slabs.patch
> slub-validation-of-slabs-metadata-and-guard-zones.patch
> slub-add-min_partial.patch
> slub-add-ability-to-list-alloc--free-callers-per-slab.patch
> slub-free-slabs-and-sort-partial-slab-lists-in-kmem_cache_shrink.patch
> slub-remove-object-activities-out-of-checking-functions.patch
> slub-user-documentation.patch
> slub-add-slabinfo-tool.patch
>
> Most of the rest of slub. Will merge it all.

Merging slub already? I'm surprised. That's a very key piece of
infrastructure, and I doubt it's had the exposure it needs yet.

Just what has it been widely tested on so far? x86_64. Not many
of us have ia64, but I guess SGI people will have been trying it
on that. Not i386, that's excluded.

Not powerpc - hmm, I thought that was known, but looking I see no
ARCH_USES_SLAB_PAGE_STRUCT there: just built and tried to run it up,
crashes in slab_free from pgtable_free_tlb frpm free_pte_range from
free_pgd_range from free_pgtables from unmap_region form do_munmap.
That's 2.6.21-rc7-mm2.

slob has a justified place at the low end, but do we want some
people running with slab and some with slub? I'd expected slub
to stay in 2.6.22-mm, and have all the architectures cut over to
it in that time, before advancing to mainline.

I've nothing against slub in itself, though I'm wary of its
cache merging (more scope for one corrupting another) (and
sometimes I think Christoph spent one life uglifying slab for
NUMA, then another life ripping that all out to make slub ;)

Hugh


2007-05-01 19:25:56

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007, Hugh Dickins wrote:

> > Most of the rest of slub. Will merge it all.
>
> Merging slub already? I'm surprised. That's a very key piece of
> infrastructure, and I doubt it's had the exposure it needs yet.

Its not the default. Its just an alternative like SLOB. It will take some
time to test with various loads in order to see if it can really replace
SLAB in all scenarios.

> Just what has it been widely tested on so far? x86_64. Not many
> of us have ia64, but I guess SGI people will have been trying it
> on that. Not i386, that's excluded.

There is an i386 patch pending and I have used it on i386 for a while.

> Not powerpc - hmm, I thought that was known, but looking I see no
> ARCH_USES_SLAB_PAGE_STRUCT there: just built and tried to run it up,
> crashes in slab_free from pgtable_free_tlb frpm free_pte_range from
> free_pgd_range from free_pgtables from unmap_region form do_munmap.
> That's 2.6.21-rc7-mm2.

Hmmm... True I have not spend any time with that platform. We can set
ARCH_USES_SLAB_PAGE_STRUCT there to switch it off. SLUB is the default for
mm so I am a bit surprised that this did not surface earlier.

> I've nothing against slub in itself, though I'm wary of its
> cache merging (more scope for one corrupting another) (and

Yes but then SLUB has more diagnostics etc etc than SLAB to prevent any
issues. In debug mode all slabs are separate. The merge feature is very
stable these days and significantly reduces cache overhead problems
that plague SLAB and require it to have a complex object expiration
technique. As a result I was able to rip out all timers. SLUB has no cache
reaper nor any timer. Its silent if not in use.

> sometimes I think Christoph spent one life uglifying slab for
> NUMA, then another life ripping that all out to make slub ;)

SLAB has a certain paradigm of doing things (queues) and I had to work
within that framework. It was a group effort. SLUB is an answer to those
complaints and a result of the lessons learned through years of some
painful slab debugging. SLUB makes debugging extremely easy (and also the
design is very simple and comprehensible). No rebuilding of the kernel.
Just pop in a debug option on the command line which can even be targeted
to a slab cache if we know that things break there.

2007-05-01 19:56:23

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007 19:10:29 +0100 (BST)
Hugh Dickins <[email protected]> wrote:

> > Most of the rest of slub. Will merge it all.
>
> Merging slub already? I'm surprised.

My thinking here is "does slub have a future". I think the answer is
"yes", so we're reasonably safe getting it into mainline for the finishing
work. The kernel.org kernel will still default to slab.

Does that sound wrong?

2007-05-01 20:19:25

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007, Andrew Morton wrote:
> On Tue, 1 May 2007 19:10:29 +0100 (BST)
> Hugh Dickins <[email protected]> wrote:
>
> > > Most of the rest of slub. Will merge it all.
> >
> > Merging slub already? I'm surprised.
>
> My thinking here is "does slub have a future".
> I think the answer is "yes",

I think I agree with that,
though it's a judgement I'd leave to you and others.

> so we're reasonably safe getting it into mainline for the finishing
> work. The kernel.org kernel will still default to slab.
>
> Does that sound wrong?

Yes, to me it does. If it could be defaulted to on throughout the
-rcs, on every architecture, then I'd say that's "finishing work";
and we'd be safe knowing we could go back to slab in a hurry if
needed. But it hasn't reached that stage yet, I think.

Hugh

2007-05-01 20:37:49

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007 21:19:09 +0100 (BST)
Hugh Dickins <[email protected]> wrote:

> On Tue, 1 May 2007, Andrew Morton wrote:
> > On Tue, 1 May 2007 19:10:29 +0100 (BST)
> > Hugh Dickins <[email protected]> wrote:
> >
> > > > Most of the rest of slub. Will merge it all.
> > >
> > > Merging slub already? I'm surprised.
> >
> > My thinking here is "does slub have a future".
> > I think the answer is "yes",
>
> I think I agree with that,
> though it's a judgement I'd leave to you and others.
>
> > so we're reasonably safe getting it into mainline for the finishing
> > work. The kernel.org kernel will still default to slab.
> >
> > Does that sound wrong?
>
> Yes, to me it does. If it could be defaulted to on throughout the
> -rcs, on every architecture, then I'd say that's "finishing work";
> and we'd be safe knowing we could go back to slab in a hurry if
> needed. But it hasn't reached that stage yet, I think.
>

Given the current state and the current rate of development I'd expect slub
to have reached the level of completion which you're describing around -rc2
or -rc3. I think we'd be pretty safe making that assumption.

This is a bit unusual but there is of course some self-interest here: the
patch dependencies are getting awful and having this hanging around
out-of-tree will make 2.6.23 development harder for everyone.

So on balance, given that we _do_ expect slub to have a future, I'm
inclined to crash ahead with it. The worst that can happen will be a later
rm mm/slub.c which would be pretty simple to do.

otoh I could do some frantic patch mangling and make it easier to carry
slub out-of-tree, but do we gain much from that?

2007-05-01 20:46:34

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007, Andrew Morton wrote:

> otoh I could do some frantic patch mangling and make it easier to carry
> slub out-of-tree, but do we gain much from that?

Then we may loose all the slab API cleanups? Yuck. I really do not want
redo those....

2007-05-01 21:08:54

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007, Hugh Dickins wrote:

> Yes, to me it does. If it could be defaulted to on throughout the
> -rcs, on every architecture, then I'd say that's "finishing work";
> and we'd be safe knowing we could go back to slab in a hurry if
> needed. But it hasn't reached that stage yet, I think.

Why would we need to go back to SLAB if we have not switched to SLUB? SLUB
is marked experimental and not the default.

The only problems that I am aware of is(or was) the issue with arches
modifying page struct fields of slab pages that SLUB needs for its own
operations. And I thought it was all fixed since the powerpc guys were
quiet and the patch was in for i386.

2007-05-01 21:09:27

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007 13:46:26 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Tue, 1 May 2007, Andrew Morton wrote:
>
> > otoh I could do some frantic patch mangling and make it easier to carry
> > slub out-of-tree, but do we gain much from that?
>
> Then we may loose all the slab API cleanups? Yuck. I really do not want
> redo those....

No, I meant that I'd look at splitting those patches up into
one-against-mainline and one-against-slub.

2007-05-02 12:45:59

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007, Christoph Lameter wrote:
> On Tue, 1 May 2007, Hugh Dickins wrote:
>
> > Yes, to me it does. If it could be defaulted to on throughout the
> > -rcs, on every architecture, then I'd say that's "finishing work";
> > and we'd be safe knowing we could go back to slab in a hurry if
> > needed. But it hasn't reached that stage yet, I think.
>
> Why would we need to go back to SLAB if we have not switched to SLUB? SLUB
> is marked experimental and not the default.

I said above that I thought SLUB ought to be defaulted to on throughout
the -rcs: if we don't do that, we're not going to learn much from having
it in Linus' tree.

And perhaps that line which appends "PREEMPT " to an oops report ought
to append "SLUB " too, for so long as there's a choice.

> The only problems that I am aware of is(or was) the issue with arches
> modifying page struct fields of slab pages that SLUB needs for its own
> operations. And I thought it was all fixed since the powerpc guys were
> quiet and the patch was in for i386.

You're forgetting your unions in struct page: in the SPLIT_PTLOCK
case (NR_CPUS >= 4) the pagetable code is using spinlock_t ptl,
which overlays SLUB's first_page and slab pointers.

I just tried rebuilding powerpc with the SPLIT_PTLOCK cutover
edited to 8 cpus instead, and then no crash.

I presume the answer is just to extend your quicklist work to
powerpc's lowest level of pagetables. The only other architecture
which is using kmem_cache for them is arm26, which has
"#error SMP is not supported", so won't be giving this problem.

Hugh

2007-05-02 12:55:09

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Tue, 1 May 2007, Andrew Morton wrote:
>
> Given the current state and the current rate of development I'd expect slub
> to have reached the level of completion which you're describing around -rc2
> or -rc3. I think we'd be pretty safe making that assumption.

Its developer does show signs of being active!

>
> This is a bit unusual but there is of course some self-interest here: the
> patch dependencies are getting awful and having this hanging around
> out-of-tree will make 2.6.23 development harder for everyone.

That is a very strong argument: a somewhat worrisome argument,
but a very strong one. Maintaining your sanity is important.

>
> So on balance, given that we _do_ expect slub to have a future, I'm
> inclined to crash ahead with it. The worst that can happen will be a later
> rm mm/slub.c which would be pretty simple to do.

Okay. And there's been no chorus to echo my concern.

But if Linus' tree is to be better than a warehouse to avoid
awkward merges, I still think we want it to default to on for
all the architectures, and for most if not all -rcs.

>
> otoh I could do some frantic patch mangling and make it easier to carry
> slub out-of-tree, but do we gain much from that?

No, keep away from that.

Hugh

2007-05-02 17:01:44

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Hugh Dickins wrote:

> > Why would we need to go back to SLAB if we have not switched to SLUB? SLUB
> > is marked experimental and not the default.
>
> I said above that I thought SLUB ought to be defaulted to on throughout
> the -rcs: if we don't do that, we're not going to learn much from having
> it in Linus' tree.

I'd rather be careful with that..... mm is enough for now. Why go to the
extremes immediately. If it is an option then people can gradually start
testing with it.

> > The only problems that I am aware of is(or was) the issue with arches
> > modifying page struct fields of slab pages that SLUB needs for its own
> > operations. And I thought it was all fixed since the powerpc guys were
> > quiet and the patch was in for i386.
>
> You're forgetting your unions in struct page: in the SPLIT_PTLOCK
> case (NR_CPUS >= 4) the pagetable code is using spinlock_t ptl,
> which overlays SLUB's first_page and slab pointers.

Uhhh.... Right. So SLUB wont work if the lowest page table block is
managed via slabs.

> I just tried rebuilding powerpc with the SPLIT_PTLOCK cutover
> edited to 8 cpus instead, and then no crash.
>
> I presume the answer is just to extend your quicklist work to
> powerpc's lowest level of pagetables. The only other architecture

I am not sure how PowerPCs lower pagetable pages work. If they are of
PAGE_SIZE then this is no problem.

> which is using kmem_cache for them is arm26, which has
> "#error SMP is not supported", so won't be giving this problem.

Ahh. Good.

But these are arch specific problems. We could use
ARCH_USES_SLAB_PAGE_STRUCT to disable SLUB on these platforms.

2007-05-02 17:03:53

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Hugh Dickins wrote:

> But if Linus' tree is to be better than a warehouse to avoid
> awkward merges, I still think we want it to default to on for
> all the architectures, and for most if not all -rcs.

At some point I dream that SLUB could become the default but I thought
this would take at least 6 month or so. If want to force this now then I
will certainly have some busy weeks ahead.

2007-05-02 17:25:50

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Hugh Dickins wrote:

> I presume the answer is just to extend your quicklist work to
> powerpc's lowest level of pagetables. The only other architecture
> which is using kmem_cache for them is arm26, which has
> "#error SMP is not supported", so won't be giving this problem.

In the meantime we would need something like this to disable SLUB in this
particular configuration. Note that I have not tested this and the <= for
the comparision with SPLIT_PTLOCK_CPUS may not work (Never seen such a
construct in a Kconfig file but it is needed here).



PowerPC: Disable SLUB for configurations in which slab page structs are modified

PowerPC uses the slab allocator to manage the lowest level of the page table.
In high cpu configurations we also use the page struct to split the page
table lock. Disallow the selection of SLUB for that case.

[Not tested: I am not familiar with powerpc build procedures etc]

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.21-rc7-mm2/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.21-rc7-mm2.orig/arch/powerpc/Kconfig 2007-05-02 10:07:34.000000000 -0700
+++ linux-2.6.21-rc7-mm2/arch/powerpc/Kconfig 2007-05-02 10:13:37.000000000 -0700
@@ -117,6 +117,19 @@ config GENERIC_BUG
default y
depends on BUG

+#
+# Powerpc uses the slab allocator to manage its ptes and the
+# page structs of ptes are used for splitting the page table
+# lock for configurations supporting more than SPLIT_PTLOCK_CPUS.
+#
+# In that special configuration the page structs of slabs are modified.
+# This setting disables the selection of SLUB as a slab allocator.
+#
+config ARCH_USES_SLAB_PAGE_STRUCT
+ bool
+ default y
+ depends on SPLIT_PTLOCK_CPUS <= NR_CPUS
+
config DEFAULT_UIMAGE
bool
help

2007-05-02 18:08:57

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Christoph Lameter wrote:
>
> But these are arch specific problems. We could use
> ARCH_USES_SLAB_PAGE_STRUCT to disable SLUB on these platforms.

As a quick hack, sure. But every ARCH_USES_SLAB_PAGE_STRUCT
diminishes the testing SLUB will get. If the idea is that we're
going to support both SLAB and SLUB, some arches with one, some
with another, some with either, for more than a single release,
then I'm back to saying SLUB is being pushed in too early.
I can understand people wanting pluggable schedulers,
but pluggable slab allocators?

Hugh

2007-05-02 18:28:28

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Hugh Dickins wrote:

> On Wed, 2 May 2007, Christoph Lameter wrote:
> >
> > But these are arch specific problems. We could use
> > ARCH_USES_SLAB_PAGE_STRUCT to disable SLUB on these platforms.
>
> As a quick hack, sure. But every ARCH_USES_SLAB_PAGE_STRUCT
> diminishes the testing SLUB will get. If the idea is that we're
> going to support both SLAB and SLUB, some arches with one, some
> with another, some with either, for more than a single release,
> then I'm back to saying SLUB is being pushed in too early.
> I can understand people wanting pluggable schedulers,
> but pluggable slab allocators?

This is a sensitive piece of the kernel as you say and we better allow the
running of two allocator for some time to make sure that it behaves in all
load situations. The design is fundamentally different so its performance
characteristics may diverge significantly and perhaps there will be corner
cases for each where they do the best job.

I have already reworked the slab API to allow for an easy implementation
of alternate slab allocators (released with 2.6.20) which only covered
SLAB and SLOB. This is continuing the cleanup work and adding a third one.


2007-05-02 18:36:33

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Christoph Lameter wrote:
> On Wed, 2 May 2007, Hugh Dickins wrote:
>
> > I presume the answer is just to extend your quicklist work to
> > powerpc's lowest level of pagetables. The only other architecture
> > which is using kmem_cache for them is arm26, which has
> > "#error SMP is not supported", so won't be giving this problem.
>
> In the meantime we would need something like this to disable SLUB in this
> particular configuration. Note that I have not tested this and the <= for
> the comparision with SPLIT_PTLOCK_CPUS may not work (Never seen such a
> construct in a Kconfig file but it is needed here).

I'm astonished and impressed, both with Kconfig and your use of it:
that does seem to work. Though I don't dare go so far as to give
the patch an ack, and don't like this way out at all. It needs a
proper (quicklist) solution, and by the time that solution comes
along, all the powerpc people will have CONFIG_SLAB=y in their
.config, and "make oldconfig" will just perpetuate that status quo,
instead of the switching over to CONFIG_SLUB=y. I think. Unless
we keep changing the config option names, or go through a phase
with no option.

I'd much rather be testing a quicklist patch:
I'd better give that a try.

Hugh

>
>
>
> PowerPC: Disable SLUB for configurations in which slab page structs are modified
>
> PowerPC uses the slab allocator to manage the lowest level of the page table.
> In high cpu configurations we also use the page struct to split the page
> table lock. Disallow the selection of SLUB for that case.
>
> [Not tested: I am not familiar with powerpc build procedures etc]
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
> Index: linux-2.6.21-rc7-mm2/arch/powerpc/Kconfig
> ===================================================================
> --- linux-2.6.21-rc7-mm2.orig/arch/powerpc/Kconfig 2007-05-02 10:07:34.000000000 -0700
> +++ linux-2.6.21-rc7-mm2/arch/powerpc/Kconfig 2007-05-02 10:13:37.000000000 -0700
> @@ -117,6 +117,19 @@ config GENERIC_BUG
> default y
> depends on BUG
>
> +#
> +# Powerpc uses the slab allocator to manage its ptes and the
> +# page structs of ptes are used for splitting the page table
> +# lock for configurations supporting more than SPLIT_PTLOCK_CPUS.
> +#
> +# In that special configuration the page structs of slabs are modified.
> +# This setting disables the selection of SLUB as a slab allocator.
> +#
> +config ARCH_USES_SLAB_PAGE_STRUCT
> + bool
> + default y
> + depends on SPLIT_PTLOCK_CPUS <= NR_CPUS
> +
> config DEFAULT_UIMAGE
> bool
> help

2007-05-02 18:39:21

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Hugh Dickins wrote:

> I'm astonished and impressed, both with Kconfig and your use of it:

Thanks!

> I'd much rather be testing a quicklist patch:
> I'd better give that a try.

Great. But I certainly do not mind people use SLAB. I do not think that
one approach should be there for all. Choice is the way to have multiple
allocators compete. One reason that SLAB is so crusty is because it was
the only solution for so long.


2007-05-02 18:43:00

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007 11:28:26 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Wed, 2 May 2007, Hugh Dickins wrote:
>
> > On Wed, 2 May 2007, Christoph Lameter wrote:
> > >
> > > But these are arch specific problems. We could use
> > > ARCH_USES_SLAB_PAGE_STRUCT to disable SLUB on these platforms.
> >
> > As a quick hack, sure. But every ARCH_USES_SLAB_PAGE_STRUCT
> > diminishes the testing SLUB will get. If the idea is that we're
> > going to support both SLAB and SLUB, some arches with one, some
> > with another, some with either, for more than a single release,
> > then I'm back to saying SLUB is being pushed in too early.
> > I can understand people wanting pluggable schedulers,
> > but pluggable slab allocators?
>
> This is a sensitive piece of the kernel as you say and we better allow the
> running of two allocator for some time to make sure that it behaves in all
> load situations. The design is fundamentally different so its performance
> characteristics may diverge significantly and perhaps there will be corner
> cases for each where they do the best job.

eek. We'd need to fix those corner cases then. Our endgame
here really must be rm mm/slab.c.

2007-05-02 18:53:21

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Andrew Morton wrote:

> > This is a sensitive piece of the kernel as you say and we better allow the
> > running of two allocator for some time to make sure that it behaves in all
> > load situations. The design is fundamentally different so its performance
> > characteristics may diverge significantly and perhaps there will be corner
> > cases for each where they do the best job.
>
> eek. We'd need to fix those corner cases then. Our endgame
> here really must be rm mm/slab.c.

First we need to discover them and I doubt that mm covers much more than
development loads. I hope we can get to a point where we have SLUB be
the primarily allocator soon but I would expect various performance issues
to show up.

On the other hand: I am pretty sure that SLUB can replace SLOB completely
given SLOBs limitations and SLUBs more efficient use of space. SLOB needs
8 bytes of overhead. SLUB needs none. We may just have to #ifdef out the
debugging support to make the code be of similar size to SLOB too. SLOB is
a general problem because its features are not compatible to SLAB. F.e. it
does not support DESTROY_BY_RCU and does not do reclaim the right way etc
etc. SLUB may turn out to be the ideal embedded slab allocator.

2007-05-02 18:54:15

by Suresh Siddha

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, May 02, 2007 at 05:54:53AM -0700, Hugh Dickins wrote:
> On Tue, 1 May 2007, Andrew Morton wrote:
> > So on balance, given that we _do_ expect slub to have a future, I'm
> > inclined to crash ahead with it. The worst that can happen will be a later
> > rm mm/slub.c which would be pretty simple to do.
>
> Okay. And there's been no chorus to echo my concern.

I have been looking into "slub" recently to avoid some of the NUMA alien
cache issues that we were encountering on the regular slab.

I am having some stability issues with slub on an ia64 NUMA platform and
didn't have time to dig further. I am hoping to look into it soon
and share the data/findings with Christoph.

We also did a quick perf collection on x86_64(atleast didn't hear
any stability issues from our team on regular x86_64 SMP), that we will be
sharing shortly.

> But if Linus' tree is to be better than a warehouse to avoid
> awkward merges, I still think we want it to default to on for
> all the architectures, and for most if not all -rcs.

I will not suggest for default on at this point.

thanks,
suresh

2007-05-02 18:57:45

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007 11:39:20 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Wed, 2 May 2007, Hugh Dickins wrote:
>
> > I'm astonished and impressed, both with Kconfig and your use of it:
>
> Thanks!
>
> > I'd much rather be testing a quicklist patch:
> > I'd better give that a try.
>
> Great. But I certainly do not mind people use SLAB. I do not think that
> one approach should be there for all. Choice is the way to have multiple
> allocators compete. One reason that SLAB is so crusty is because it was
> the only solution for so long.
>

noooo, we don't want competing slab allocators, please. We should get slub
working well on all architectures then remove slab completely. Having to
maintain both slab.c and slub.c would be awful.

2007-05-02 18:58:08

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Siddha, Suresh B wrote:

> I have been looking into "slub" recently to avoid some of the NUMA alien
> cache issues that we were encountering on the regular slab.

Yes that is also our main concern.

> I am having some stability issues with slub on an ia64 NUMA platform and
> didn't have time to dig further. I am hoping to look into it soon
> and share the data/findings with Christoph.

There is at least one patch on top of 2.6.21-rc7-mm2 already in mm that
may be necessary for you.

2007-05-02 19:01:23

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Andrew Morton wrote:

> noooo, we don't want competing slab allocators, please. We should get slub
> working well on all architectures then remove slab completely. Having to
> maintain both slab.c and slub.c would be awful.

Owww... You throw my roadmap out of the window and may create too
high expectations of SLUB.

I am the one who has to maintain SLAB and SLUB it seems and I have been
dealing with the trio SLAB, SLOB and SLUB for awhile now. Its okay and it
will be much easier once the cleanups are in.

2007-05-02 19:11:46

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007 10:03:50 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Wed, 2 May 2007, Hugh Dickins wrote:
>
> > But if Linus' tree is to be better than a warehouse to avoid
> > awkward merges, I still think we want it to default to on for
> > all the architectures, and for most if not all -rcs.
>
> At some point I dream that SLUB could become the default but I thought
> this would take at least 6 month or so. If want to force this now then I
> will certainly have some busy weeks ahead.

s/dream/promise/ ;)

Six months sounds reasonable - I was kind of hoping for less. Make it
default-to-on in 2.6.23-rc1, see how it goes.

2007-05-02 19:18:59

by Pekka Enberg

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On 5/2/07, Christoph Lameter <[email protected]> wrote:
> Owww... You throw my roadmap out of the window and may create too
> high expectations of SLUB.

Me too!

On 5/2/07, Christoph Lameter <[email protected]> wrote:
> I am the one who has to maintain SLAB and SLUB it seems and I have been
> dealing with the trio SLAB, SLOB and SLUB for awhile now. Its okay and it
> will be much easier once the cleanups are in.

And then there's patches such as kmemleak which would need to target
all three. Plus it doesn't really make sense for users to select
between three competiting implementations. Please don't take away our
high hopes of getting rid of mm/slab.c Christoph =)

Pekka

2007-05-02 19:34:44

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Pekka Enberg wrote:

> On 5/2/07, Christoph Lameter <[email protected]> wrote:
> > I am the one who has to maintain SLAB and SLUB it seems and I have been
> > dealing with the trio SLAB, SLOB and SLUB for awhile now. Its okay and it
> > will be much easier once the cleanups are in.
>
> And then there's patches such as kmemleak which would need to target
> all three. Plus it doesn't really make sense for users to select
> between three competiting implementations. Please don't take away our
> high hopes of getting rid of mm/slab.c Christoph =)

SLUB supports kmemleak (actually its quite improved). Switch debugging on
and try

cat /sys/slab/kmalloc-128/alloc_calls.

2007-05-02 19:42:55

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Andrew Morton wrote:

> > At some point I dream that SLUB could become the default but I thought
> > this would take at least 6 month or so. If want to force this now then I
> > will certainly have some busy weeks ahead.
>
> s/dream/promise/ ;)
>
> Six months sounds reasonable - I was kind of hoping for less. Make it
> default-to-on in 2.6.23-rc1, see how it goes.

Here is how I think the future could develop

Cycle SLAB SLUB SLOB SLxB

2.6.22 API fixes Stabilization API fixes

Major event: SLUB availability as experimental

2.6.23 API upgrades Perf. Valid. EOL

Major events: SLUB performance validation. Switch off
experimental (could even be the default)
Slab allocators support targeted reclaim for at
least one slab cache (dentry?)
(vacate/move all objects in a slab)

2.6.24 Earliest EOL Stable - Experiments

Major events: SLUB stable. Stable targeted reclaim
for all major reclaimable slabs.
Maybe experiments with another new allocator?

2.6.25 EOL default - ?

Death of SLAB. SLUB default. Hopefully new ideas on the horizon.

2007-05-02 19:43:45

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Pekka Enberg wrote:

> And then there's patches such as kmemleak which would need to target
> all three. Plus it doesn't really make sense for users to select
> between three competiting implementations. Please don't take away our
> high hopes of getting rid of mm/slab.c Christoph =)

You too, Brutus ...

2007-05-02 19:53:20

by Sam Ravnborg

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, May 02, 2007 at 12:42:54PM -0700, Christoph Lameter wrote:
> On Wed, 2 May 2007, Andrew Morton wrote:
>
> > > At some point I dream that SLUB could become the default but I thought
> > > this would take at least 6 month or so. If want to force this now then I
> > > will certainly have some busy weeks ahead.
> >
> > s/dream/promise/ ;)
> >
> > Six months sounds reasonable - I was kind of hoping for less. Make it
> > default-to-on in 2.6.23-rc1, see how it goes.
>
> Here is how I think the future could develop
>
> Cycle SLAB SLUB SLOB SLxB
>
> 2.6.22 API fixes Stabilization API fixes
>
> Major event: SLUB availability as experimental
>
> 2.6.23 API upgrades Perf. Valid. EOL
>
> Major events: SLUB performance validation. Switch off
> experimental (could even be the default)
> Slab allocators support targeted reclaim for at
> least one slab cache (dentry?)
> (vacate/move all objects in a slab)

To facilitate this do NOT introduce CONFIG_SLAB until we decide
that SLUB are default. In this way we can make CONFIG_SLUB be default
and people will not continue with CONFIG_SLAB because they had it in their
.config already.
Or just rename CONFIG_SLAB to CONFIG_SLAB_DEPRECATED or something.

The point is make sure that LSUB becomes default for people that does
an make oldconfig (explicit or implicit).

Sam

2007-05-02 20:14:12

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007, Sam Ravnborg wrote:

> To facilitate this do NOT introduce CONFIG_SLAB until we decide
> that SLUB are default. In this way we can make CONFIG_SLUB be default
> and people will not continue with CONFIG_SLAB because they had it in their
> config already.

We already have CONFIG_SLAB. If you use your existing .config then
you will stay with SLAB.

> The point is make sure that LSUB becomes default for people that does
> an make oldconfig (explicit or implicit).

Hmmmm... We can think about that when we actually want to make SLUB the
default.

2007-05-03 08:15:35

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Wed, 2 May 2007 10:25:47 -0700 (PDT) Christoph Lameter <[email protected]> wrote:

> On Wed, 2 May 2007, Hugh Dickins wrote:
>
> > I presume the answer is just to extend your quicklist work to
> > powerpc's lowest level of pagetables. The only other architecture
> > which is using kmem_cache for them is arm26, which has
> > "#error SMP is not supported", so won't be giving this problem.
>
> In the meantime we would need something like this to disable SLUB in this
> particular configuration. Note that I have not tested this and the <= for
> the comparision with SPLIT_PTLOCK_CPUS may not work (Never seen such a
> construct in a Kconfig file but it is needed here).
>
>
>
> PowerPC: Disable SLUB for configurations in which slab page structs are modified
>
> PowerPC uses the slab allocator to manage the lowest level of the page table.
> In high cpu configurations we also use the page struct to split the page
> table lock. Disallow the selection of SLUB for that case.
>
> [Not tested: I am not familiar with powerpc build procedures etc]
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
> Index: linux-2.6.21-rc7-mm2/arch/powerpc/Kconfig
> ===================================================================
> --- linux-2.6.21-rc7-mm2.orig/arch/powerpc/Kconfig 2007-05-02 10:07:34.000000000 -0700
> +++ linux-2.6.21-rc7-mm2/arch/powerpc/Kconfig 2007-05-02 10:13:37.000000000 -0700
> @@ -117,6 +117,19 @@ config GENERIC_BUG
> default y
> depends on BUG
>
> +#
> +# Powerpc uses the slab allocator to manage its ptes and the
> +# page structs of ptes are used for splitting the page table
> +# lock for configurations supporting more than SPLIT_PTLOCK_CPUS.
> +#
> +# In that special configuration the page structs of slabs are modified.
> +# This setting disables the selection of SLUB as a slab allocator.
> +#
> +config ARCH_USES_SLAB_PAGE_STRUCT
> + bool
> + default y
> + depends on SPLIT_PTLOCK_CPUS <= NR_CPUS
> +

That all seems to work as intended.

However with NR_CPUS=8 SPLIT_PTLOCK_CPUS=4, enabling SLUB=y crashes the
machine early in boot.

Too early for netconsole, no serial console. Wedges up uselessly with
CONFIG_XMON=n, does mysterious repeated uncontrollable exceptions with
CONFIG_XMON=y. This is all fairly typical for a powerpc/G5 crash :(

However I was able to glimpse some stuff as it flew past. Crash started in
flush_old_exec and ended in pgtable_free_tlb -> kmem_cache_free. I don't know
how to do better than that I'm afraid, unless I'm to hunt down a PCIE serial
card, perhaps.

2007-05-03 08:26:46

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Thu, May 03, 2007 at 01:15:15AM -0700, Andrew Morton wrote:
> That all seems to work as intended.
> However with NR_CPUS=8 SPLIT_PTLOCK_CPUS=4, enabling SLUB=y crashes the
> machine early in boot.
> Too early for netconsole, no serial console. Wedges up uselessly with
> CONFIG_XMON=n, does mysterious repeated uncontrollable exceptions with
> CONFIG_XMON=y. This is all fairly typical for a powerpc/G5 crash :(
> However I was able to glimpse some stuff as it flew past. Crash started in
> flush_old_exec and ended in pgtable_free_tlb -> kmem_cache_free. I don't know
> how to do better than that I'm afraid, unless I'm to hunt down a PCIE serial
> card, perhaps.

I've seen this crash in flush_old_exec() before. ISTR it being due to
slub vs. pagetable alignment or something on that order.


-- wli

2007-05-03 08:47:31

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Thu, 3 May 2007, Andrew Morton wrote:
> On Wed, 2 May 2007 10:25:47 -0700 (PDT) Christoph Lameter <[email protected]> wrote:
>
> > +config ARCH_USES_SLAB_PAGE_STRUCT
> > + bool
> > + default y
> > + depends on SPLIT_PTLOCK_CPUS <= NR_CPUS
> > +
>
> That all seems to work as intended.
>
> However with NR_CPUS=8 SPLIT_PTLOCK_CPUS=4, enabling SLUB=y crashes the
> machine early in boot.

I thought that if that worked as intended, you wouldn't even
get the chance to choose SLUB=y? That was how it was working
for me (but I realize I didn't try more than make oldconfig).

>
> Too early for netconsole, no serial console. Wedges up uselessly with
> CONFIG_XMON=n, does mysterious repeated uncontrollable exceptions with
> CONFIG_XMON=y. This is all fairly typical for a powerpc/G5 crash :(
>
> However I was able to glimpse some stuff as it flew past. Crash started in
> flush_old_exec and ended in pgtable_free_tlb -> kmem_cache_free. I don't know
> how to do better than that I'm afraid, unless I'm to hunt down a PCIE serial
> card, perhaps.

That sounds like what happens when SLUB's pagestruct use meets
SPLIT_PTLOCK's pagestruct use. Does your .config really show
CONFIG_SLUB=y together with CONFIG_ARCH_USES_SLAB_PAGE_STRUCT=y?

Hugh

2007-05-03 08:57:48

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Thu, 3 May 2007 09:46:32 +0100 (BST) Hugh Dickins <[email protected]> wrote:

> On Thu, 3 May 2007, Andrew Morton wrote:
> > On Wed, 2 May 2007 10:25:47 -0700 (PDT) Christoph Lameter <[email protected]> wrote:
> >
> > > +config ARCH_USES_SLAB_PAGE_STRUCT
> > > + bool
> > > + default y
> > > + depends on SPLIT_PTLOCK_CPUS <= NR_CPUS
> > > +
> >
> > That all seems to work as intended.
> >
> > However with NR_CPUS=8 SPLIT_PTLOCK_CPUS=4, enabling SLUB=y crashes the
> > machine early in boot.
>
> I thought that if that worked as intended, you wouldn't even
> get the chance to choose SLUB=y? That was how it was working
> for me (but I realize I didn't try more than make oldconfig).

Right. This can be tested on x86 without a cross-compiler:

ARCH=powerpc make mrproper
ARCH=powerpc make fooconfig

> >
> > Too early for netconsole, no serial console. Wedges up uselessly with
> > CONFIG_XMON=n, does mysterious repeated uncontrollable exceptions with
> > CONFIG_XMON=y. This is all fairly typical for a powerpc/G5 crash :(
> >
> > However I was able to glimpse some stuff as it flew past. Crash started in
> > flush_old_exec and ended in pgtable_free_tlb -> kmem_cache_free. I don't know
> > how to do better than that I'm afraid, unless I'm to hunt down a PCIE serial
> > card, perhaps.
>
> That sounds like what happens when SLUB's pagestruct use meets
> SPLIT_PTLOCK's pagestruct use. Does your .config really show
> CONFIG_SLUB=y together with CONFIG_ARCH_USES_SLAB_PAGE_STRUCT=y?
>

Nope.

g5:/usr/src/25> grep SLUB .config
CONFIG_SLUB=y
g5:/usr/src/25> grep SLAB .config
# CONFIG_SLAB is not set
g5:/usr/src/25> grep CPUS .config
CONFIG_NR_CPUS=8
# CONFIG_CPUSETS is not set
# CONFIG_IRQ_ALL_CPUS is not set
CONFIG_SPLIT_PTLOCK_CPUS=4

It's in http://userweb.kernel.org/~akpm/config-g5.txt

2007-05-03 09:16:46

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Thu, 3 May 2007, Andrew Morton wrote:
> On Thu, 3 May 2007 09:46:32 +0100 (BST) Hugh Dickins <[email protected]> wrote:
> > On Thu, 3 May 2007, Andrew Morton wrote:
> > > On Wed, 2 May 2007 10:25:47 -0700 (PDT) Christoph Lameter <[email protected]> wrote:
> > >
> > > > +config ARCH_USES_SLAB_PAGE_STRUCT
> > > > + bool
> > > > + default y
> > > > + depends on SPLIT_PTLOCK_CPUS <= NR_CPUS
> > > > +
> > >
> > > That all seems to work as intended.
> > >
> > > However with NR_CPUS=8 SPLIT_PTLOCK_CPUS=4, enabling SLUB=y crashes the
> > > machine early in boot.
> >
> > I thought that if that worked as intended, you wouldn't even
> > get the chance to choose SLUB=y? That was how it was working
> > for me (but I realize I didn't try more than make oldconfig).
> >
> > That sounds like what happens when SLUB's pagestruct use meets
> > SPLIT_PTLOCK's pagestruct use. Does your .config really show
> > CONFIG_SLUB=y together with CONFIG_ARCH_USES_SLAB_PAGE_STRUCT=y?
>
> Nope.
>
> g5:/usr/src/25> grep SLUB .config
> CONFIG_SLUB=y
> g5:/usr/src/25> grep SLAB .config
> # CONFIG_SLAB is not set
> g5:/usr/src/25> grep CPUS .config
> CONFIG_NR_CPUS=8
> # CONFIG_CPUSETS is not set
> # CONFIG_IRQ_ALL_CPUS is not set
> CONFIG_SPLIT_PTLOCK_CPUS=4
>
> It's in http://userweb.kernel.org/~akpm/config-g5.txt

Seems we're all wrong in thinking Christoph's Kconfiggery worked
as intended: maybe it just works some of the time. I'm not going
to hazard a guess as to how to fix it up, will resume looking at
the powerpc's quicklist potential later.

Hugh

2007-05-03 16:31:01

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

On Thu, 3 May 2007, William Lee Irwin III wrote:

> I've seen this crash in flush_old_exec() before. ISTR it being due to
> slub vs. pagetable alignment or something on that order.

>From from other discussion regarding SLAB: It may be necessary for
powerpc to set the default alignment to 8 bytes on 32 bit powerpc
because it requires that alignemnt for 64 bit value. SLUB will *not*
disable debugging like SLAB if you do that.

2007-05-03 16:45:44

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub

Hmmmm...There are a gazillion configs to choose from. It works fine with
cell_defconfig. If I switch to 2 processors I can enable SLUB if I switch
to 4 I cannot.

I saw some other config weirdness like being unable to set SMP if SLOB is
enabled with some configs. This should not work and does not work but
the menus are then vanishing and one can still configure lots of
processors while not having enabled SMP.

It works as far as I can tell... The rest is arch weirdness.


2007-05-03 21:04:28

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub on PowerPC

On Thu, 3 May 2007, Hugh Dickins wrote:
>
> Seems we're all wrong in thinking Christoph's Kconfiggery worked
> as intended: maybe it just works some of the time. I'm not going
> to hazard a guess as to how to fix it up, will resume looking at
> the powerpc's quicklist potential later.

Here's the patch I've been testing on G5, with 4k and with 64k pages,
with SLAB and with SLUB. But, though it doesn't crash, the pgd
kmem_cache in the 4k-page SLUB case is revealing SLUB's propensity
for using highorder allocations where SLAB would stick to order 0:
under load, exec's mm_init gets page allocation failure on order 4
- SLUB's calculate_order may need some retuning. (I'd expect it to
be going for order 3 actually, I'm not sure how order 4 comes about.)

I don't know how offensive Ben and Paulus may find this patch:
the kmem_cache use was nicely done and this messes it up a little.


The SLUB allocator relies on struct page fields first_page and slab,
overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then
be used for the lowest level of pagetable pages. This was obstructing
SLUB on PowerPC, which uses kmem_caches for its pagetables. So convert
its pte level to use quicklist pages (whereas pmd, pud and 64k-page pgd
want partpages, so continue to use kmem_caches for pmd, pud and pgd).
But to keep up appearances for pgtable_free, we still need PTE_CACHE_NUM.

Signed-off-by: Hugh Dickins <[email protected]>
---

arch/powerpc/Kconfig | 4 ++++
arch/powerpc/mm/init_64.c | 17 ++++++-----------
include/asm-powerpc/pgalloc.h | 26 +++++++++++---------------
3 files changed, 21 insertions(+), 26 deletions(-)

--- 2.6.21-rc7-mm2/arch/powerpc/Kconfig 2007-04-26 13:33:51.000000000 +0100
+++ linux/arch/powerpc/Kconfig 2007-05-03 20:45:12.000000000 +0100
@@ -31,6 +31,10 @@ config MMU
bool
default y

+config QUICKLIST
+ bool
+ default y
+
config GENERIC_HARDIRQS
bool
default y
--- 2.6.21-rc7-mm2/arch/powerpc/mm/init_64.c 2007-04-26 13:33:51.000000000 +0100
+++ linux/arch/powerpc/mm/init_64.c 2007-05-03 20:45:12.000000000 +0100
@@ -146,21 +146,16 @@ static void zero_ctor(void *addr, struct
memset(addr, 0, kmem_cache_size(cache));
}

-#ifdef CONFIG_PPC_64K_PAGES
-static const unsigned int pgtable_cache_size[3] = {
- PTE_TABLE_SIZE, PMD_TABLE_SIZE, PGD_TABLE_SIZE
-};
-static const char *pgtable_cache_name[ARRAY_SIZE(pgtable_cache_size)] = {
- "pte_pmd_cache", "pmd_cache", "pgd_cache",
-};
-#else
static const unsigned int pgtable_cache_size[2] = {
- PTE_TABLE_SIZE, PMD_TABLE_SIZE
+ PGD_TABLE_SIZE, PMD_TABLE_SIZE
};
static const char *pgtable_cache_name[ARRAY_SIZE(pgtable_cache_size)] = {
- "pgd_pte_cache", "pud_pmd_cache",
-};
+#ifdef CONFIG_PPC_64K_PAGES
+ "pgd_cache", "pmd_cache",
+#else
+ "pgd_cache", "pud_pmd_cache",
#endif /* CONFIG_PPC_64K_PAGES */
+};

#ifdef CONFIG_HUGETLB_PAGE
/* Hugepages need one extra cache, initialized in hugetlbpage.c. We
--- 2.6.21-rc7-mm2/include/asm-powerpc/pgalloc.h 2007-02-04 18:44:54.000000000 +0000
+++ linux/include/asm-powerpc/pgalloc.h 2007-05-03 20:45:12.000000000 +0100
@@ -10,21 +10,15 @@
#include <linux/slab.h>
#include <linux/cpumask.h>
#include <linux/percpu.h>
+#include <linux/quicklist.h>

extern struct kmem_cache *pgtable_cache[];

-#ifdef CONFIG_PPC_64K_PAGES
-#define PTE_CACHE_NUM 0
-#define PMD_CACHE_NUM 1
-#define PGD_CACHE_NUM 2
-#define HUGEPTE_CACHE_NUM 3
-#else
-#define PTE_CACHE_NUM 0
-#define PMD_CACHE_NUM 1
-#define PUD_CACHE_NUM 1
#define PGD_CACHE_NUM 0
+#define PUD_CACHE_NUM 1
+#define PMD_CACHE_NUM 1
#define HUGEPTE_CACHE_NUM 2
-#endif
+#define PTE_CACHE_NUM 3 /* from quicklist rather than kmem_cache */

/*
* This program is free software; you can redistribute it and/or
@@ -97,8 +91,7 @@ static inline void pmd_free(pmd_t *pmd)
static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
{
- return kmem_cache_alloc(pgtable_cache[PTE_CACHE_NUM],
- GFP_KERNEL|__GFP_REPEAT);
+ return quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
}

static inline struct page *pte_alloc_one(struct mm_struct *mm,
@@ -109,7 +102,7 @@ static inline struct page *pte_alloc_one

static inline void pte_free_kernel(pte_t *pte)
{
- kmem_cache_free(pgtable_cache[PTE_CACHE_NUM], pte);
+ quicklist_free(0, NULL, pte);
}

static inline void pte_free(struct page *ptepage)
@@ -136,7 +129,10 @@ static inline void pgtable_free(pgtable_
void *p = (void *)(pgf.val & ~PGF_CACHENUM_MASK);
int cachenum = pgf.val & PGF_CACHENUM_MASK;

- kmem_cache_free(pgtable_cache[cachenum], p);
+ if (cachenum == PTE_CACHE_NUM)
+ quicklist_free(0, NULL, p);
+ else
+ kmem_cache_free(pgtable_cache[cachenum], p);
}

extern void pgtable_free_tlb(struct mmu_gather *tlb, pgtable_free_t pgf);
@@ -153,7 +149,7 @@ extern void pgtable_free_tlb(struct mmu_
PUD_CACHE_NUM, PUD_TABLE_SIZE-1))
#endif /* CONFIG_PPC_64K_PAGES */

-#define check_pgt_cache() do { } while (0)
+#define check_pgt_cache() quicklist_trim(0, NULL, 25, 16)

#endif /* CONFIG_PPC64 */
#endif /* __KERNEL__ */

2007-05-03 21:15:17

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub on PowerPC

On Thu, 3 May 2007, Hugh Dickins wrote:

> On Thu, 3 May 2007, Hugh Dickins wrote:
> >
> > Seems we're all wrong in thinking Christoph's Kconfiggery worked
> > as intended: maybe it just works some of the time. I'm not going
> > to hazard a guess as to how to fix it up, will resume looking at
> > the powerpc's quicklist potential later.
>
> Here's the patch I've been testing on G5, with 4k and with 64k pages,
> with SLAB and with SLUB. But, though it doesn't crash, the pgd
> kmem_cache in the 4k-page SLUB case is revealing SLUB's propensity
> for using highorder allocations where SLAB would stick to order 0:
> under load, exec's mm_init gets page allocation failure on order 4
> - SLUB's calculate_order may need some retuning. (I'd expect it to
> be going for order 3 actually, I'm not sure how order 4 comes about.)

There are SLUB patches pending (not in rc7-mm2 as far as I can recall)
that reduce the default page order sizes to head off this issue. The
defaults were initially too large (and they still default to large
for testing if Mel's Antifrag work is detected to be active).

> - return kmem_cache_alloc(pgtable_cache[PTE_CACHE_NUM],
> - GFP_KERNEL|__GFP_REPEAT);
> + return quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);

__GFP_REPEAT is unusual here but this was carried over from the
kmem_cache_alloc it seems. Hmm... There is some variance on how we do this
between arches. Should we uniformly set or not set this flag?

clameter@schroedinger:~/software/linux-2.6.21-rc7-mm2$ grep quicklist_alloc include/asm-ia64/*
include/asm-ia64/pgalloc.h: return quicklist_alloc(0, GFP_KERNEL, NULL);
include/asm-ia64/pgalloc.h: return quicklist_alloc(0, GFP_KERNEL, NULL);
include/asm-ia64/pgalloc.h: return quicklist_alloc(0, GFP_KERNEL, NULL);
include/asm-ia64/pgalloc.h: void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
include/asm-ia64/pgalloc.h: return quicklist_alloc(0, GFP_KERNEL, NULL);
clameter@schroedinger:~/software/linux-2.6.21-rc7-mm2$ grep quicklist_alloc arch/i386/mm/*
arch/i386/mm/pgtable.c: pgd_t *pgd = quicklist_alloc(0, GFP_KERNEL, pgd_ctor);
clameter@schroedinger:~/software/linux-2.6.21-rc7-mm2$ grep quicklist_alloc include/asm-sparc64/*
include/asm-sparc64/pgalloc.h: return quicklist_alloc(0, GFP_KERNEL, NULL);
include/asm-sparc64/pgalloc.h: return quicklist_alloc(0, GFP_KERNEL, NULL);
include/asm-sparc64/pgalloc.h: return quicklist_alloc(0, GFP_KERNEL, NULL);
include/asm-sparc64/pgalloc.h: void *pg = quicklist_alloc(0, GFP_KERNEL, NULL);
clameter@schroedinger:~/software/linux-2.6.21-rc7-mm2$ grep quicklist_alloc include/asm-x86_64/*
include/asm-x86_64/pgalloc.h: return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
include/asm-x86_64/pgalloc.h: return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
include/asm-x86_64/pgalloc.h: pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD,
include/asm-x86_64/pgalloc.h: return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);
include/asm-x86_64/pgalloc.h: void *p = (void *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL);

2007-05-03 22:41:34

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub on PowerPC

On Thu, 3 May 2007, Christoph Lameter wrote:
>
> There are SLUB patches pending (not in rc7-mm2 as far as I can recall)
> that reduce the default page order sizes to head off this issue. The
> defaults were initially too large (and they still default to large
> for testing if Mel's Antifrag work is detected to be active).

Sounds good.

> > - return kmem_cache_alloc(pgtable_cache[PTE_CACHE_NUM],
> > - GFP_KERNEL|__GFP_REPEAT);
> > + return quicklist_alloc(0, GFP_KERNEL|__GFP_REPEAT, NULL);
>
> __GFP_REPEAT is unusual here but this was carried over from the
> kmem_cache_alloc it seems. Hmm... There is some variance on how we do this
> between arches. Should we uniformly set or not set this flag?

Not something to get into in this patch, but it did surprise me too.
I believe __GFP_REPEAT should be avoided, and I don't see justification
for it here (but need to remember not to do a blind virt_to_page on the
result in some places if it might return NULL - which IIRC it actually
can do even if __GFP_REPEAT, when chosen for OOM kill). But I've a
suspicion it got put in there for some good reason I don't know about.

Hugh

2007-05-04 00:26:05

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub on PowerPC

On Thu, 2007-05-03 at 22:04 +0100, Hugh Dickins wrote:
> On Thu, 3 May 2007, Hugh Dickins wrote:
> >
> > Seems we're all wrong in thinking Christoph's Kconfiggery worked
> > as intended: maybe it just works some of the time. I'm not going
> > to hazard a guess as to how to fix it up, will resume looking at
> > the powerpc's quicklist potential later.
>
> Here's the patch I've been testing on G5, with 4k and with 64k pages,
> with SLAB and with SLUB. But, though it doesn't crash, the pgd
> kmem_cache in the 4k-page SLUB case is revealing SLUB's propensity
> for using highorder allocations where SLAB would stick to order 0:
> under load, exec's mm_init gets page allocation failure on order 4
> - SLUB's calculate_order may need some retuning. (I'd expect it to
> be going for order 3 actually, I'm not sure how order 4 comes about.)
>
> I don't know how offensive Ben and Paulus may find this patch:
> the kmem_cache use was nicely done and this messes it up a little.
>
>
> The SLUB allocator relies on struct page fields first_page and slab,
> overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then
> be used for the lowest level of pagetable pages. This was obstructing
> SLUB on PowerPC, which uses kmem_caches for its pagetables. So convert
> its pte level to use quicklist pages (whereas pmd, pud and 64k-page pgd
> want partpages, so continue to use kmem_caches for pmd, pud and pgd).
> But to keep up appearances for pgtable_free, we still need PTE_CACHE_NUM.

Interesting... I'll have a look asap.

Ben.


2007-05-04 00:54:28

by Christoph Lameter

[permalink] [raw]
Subject: Re: 2.6.22 -mm merge plans: slub on PowerPC

On Fri, 4 May 2007, Benjamin Herrenschmidt wrote:

> > The SLUB allocator relies on struct page fields first_page and slab,
> > overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then
> > be used for the lowest level of pagetable pages. This was obstructing
> > SLUB on PowerPC, which uses kmem_caches for its pagetables. So convert
> > its pte level to use quicklist pages (whereas pmd, pud and 64k-page pgd
> > want partpages, so continue to use kmem_caches for pmd, pud and pgd).
> > But to keep up appearances for pgtable_free, we still need PTE_CACHE_NUM.
>
> Interesting... I'll have a look asap.

I would also recommend looking at removing the constructors for the
remaining slabs. A constructor requires that SLUB never touch the object
(same situation as is resulting from enabling debugging). So it must
increase the object size in order to put the free pointer after the
object. In case of a order of 2 cache this has a particularly bad effect
of doubling object size. If the objects can be overwritten on free (no
constructor) then we can use the first word of the object as a freepointer
on kfree. Meaning we can use a hot cacheline so no cache miss. On
alloc we have already touched the first cacheline which also avoids a
cacheline fetch there. This is the optimal way of operation for SLUB.

Hmmm.... We could add an option to allow the use of a constructor while
keeping the free pointer at the beginning of the object? Then we would
have to zap the first word on alloc. Would work like quicklists.

Add SLAB_FREEPOINTER_MAY_OVERLAP?