LinuxLists.cc - tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Wed, 5 Sep 2007, Zhang, Yanmin wrote:

> 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a

You can change that by booting with slub_max_order=0. Then we can also use
the per cpu queues to get these order 0 objects which may speed up the
allocations because we do not have to take zone locks on slab allocation.

Note also that Andrew's tree has a page allocator pass through for SLUB
for 4k kmallocs bypassing slab completely. That may also address the
issue.

If you want SLUB to handle more objects in the 4k kmalloc cache
without going to the page allocator then you can boot f.e. with

slub_max_order=3 slub_min_objects=8

which will result in a kmalloc-4096 that caches 8 objects.

> b) Change SLUB per-cpu slab cache, to cache more slabs instead of only one
> slab. This way could use page->lru to creates a list linked in kmem_cache->cpu_slab[]
> whose members need to be changed to as list_head. As for how many slabs could be in
> a per-cpu slab cache, it might be implemented as a sysfs parameter under /sys/slab/XXX/.
> Default could be 1 to satisfy big machines.

Try the ways to address the issue that I mentioned above.

2007-09-05 05:32:43

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote:
> On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
>
> > 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a
>
> You can change that by booting with slub_max_order=0. Then we can also use
> the per cpu queues to get these order 0 objects which may speed up the
> allocations because we do not have to take zone locks on slab allocation.
>
> Note also that Andrew's tree has a page allocator pass through for SLUB
> for 4k kmallocs bypassing slab completely. That may also address the
> issue.
>
> If you want SLUB to handle more objects in the 4k kmalloc cache
> without going to the page allocator then you can boot f.e. with
>
> slub_max_order=3 slub_min_objects=8
I tried this approach. The testing result showed 2.6.23-rc4 is about
2.5% better than 2.6.22. It really resovles the issue.

However, the approach treats the slabs in the same policy. Could we
implement a per-slab specific approach like direct b)?

>
> which will result in a kmalloc-4096 that caches 8 objects.
>
> > b) Change SLUB per-cpu slab cache, to cache more slabs instead of only one
> > slab. This way could use page->lru to creates a list linked in kmem_cache->cpu_slab[]
> > whose members need to be changed to as list_head. As for how many slabs could be in
> > a per-cpu slab cache, it might be implemented as a sysfs parameter under /sys/slab/XXX/.
> > Default could be 1 to satisfy big machines.
Above direction b) looks more flexible.

In addition, could process scheduler also have an enhancement to schedule waken
processes firstly or do some favor for waken processes? From cache-hot point of view,
this enhancement might help performance, because mostly, waken process and waker share
some data.

> Try the ways to address the issue that I mentioned above.
I really appreciate your kind comments!

-yanmin

2007-09-05 06:58:19

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Wed, 5 Sep 2007, Zhang, Yanmin wrote:

> On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote:
> > On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
> >
> > > 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a
> >
> > You can change that by booting with slub_max_order=0. Then we can also use
> > the per cpu queues to get these order 0 objects which may speed up the
> > allocations because we do not have to take zone locks on slab allocation.
> >
> > Note also that Andrew's tree has a page allocator pass through for SLUB
> > for 4k kmallocs bypassing slab completely. That may also address the
> > issue.
> >
> > If you want SLUB to handle more objects in the 4k kmalloc cache
> > without going to the page allocator then you can boot f.e. with
> >
> > slub_max_order=3 slub_min_objects=8
> I tried this approach. The testing result showed 2.6.23-rc4 is about
> 2.5% better than 2.6.22. It really resovles the issue.
>
> However, the approach treats the slabs in the same policy. Could we
> implement a per-slab specific approach like direct b)?

I am not sure what you mean by same policy. Same configuration for all
slabs?

> > Try the ways to address the issue that I mentioned above.
> I really appreciate your kind comments!

Would it be possible to try the two other approaches that I suggested? I
think both of those may also solve the issue. Try booting with
slab_max_order=0 and see what effect it has. The queues of the page
allocator can be much larger than what slab has for 4k pages. There is
really not much of a point in using a slab allocator for page sized
allocations.

2007-09-05 07:07:40

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Wed, 5 Sep 2007, Zhang, Yanmin wrote:

> > slub_max_order=3 slub_min_objects=8
> I tried this approach. The testing result showed 2.6.23-rc4 is about
> 2.5% better than 2.6.22. It really resovles the issue.

Note also that the configuration you tried is the way SLUB is configured
in Andrew's tree.

2007-09-05 09:14:24

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Tue, 2007-09-04 at 23:58 -0700, Christoph Lameter wrote:
> On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
>
> > On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote:
> > > On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
> > >
> > > > 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a
> > >
> > > You can change that by booting with slub_max_order=0. Then we can also use
> > > the per cpu queues to get these order 0 objects which may speed up the
> > > allocations because we do not have to take zone locks on slab allocation.
> > >
> > > Note also that Andrew's tree has a page allocator pass through for SLUB
> > > for 4k kmallocs bypassing slab completely. That may also address the
> > > issue.
> > >
> > > If you want SLUB to handle more objects in the 4k kmalloc cache
> > > without going to the page allocator then you can boot f.e. with
> > >
> > > slub_max_order=3 slub_min_objects=8
> > I tried this approach. The testing result showed 2.6.23-rc4 is about
> > 2.5% better than 2.6.22. It really resovles the issue.
> >
> > However, the approach treats the slabs in the same policy. Could we
> > implement a per-slab specific approach like direct b)?
>
> I am not sure what you mean by same policy. Same configuration for all
> slabs?
Yes.

>
> > > Try the ways to address the issue that I mentioned above.
> > I really appreciate your kind comments!
>
> Would it be possible to try the two other approaches that I suggested? I
> think both of those may also solve the issue. Try booting with
> slab_max_order=0
1) I tried slab_max_order=0 and the regression becomes 12.5%. It's still not good.

2) I apllied patch slub-direct-pass-through-of-page-size-or-higher-kmalloc.patch
to kernel 2.6.23-rc4. The new testing result is much better, only 1% less than
2.6.22.

So the best solution is booting kernel with "slub_max_order=3 slub_min_objects=8".

> and see what effect it has. The queues of the page
> allocator can be much larger than what slab has for 4k pages. There is
> really not much of a point in using a slab allocator for page sized
> allocations.

2007-09-05 10:46:16

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Wed, 5 Sep 2007, Zhang, Yanmin wrote:

> > > However, the approach treats the slabs in the same policy. Could we
> > > implement a per-slab specific approach like direct b)?
> >
> > I am not sure what you mean by same policy. Same configuration for all
> > slabs?
> Yes.

Ok. I could add the ability to specify parameters for some slabs.

> > Would it be possible to try the two other approaches that I suggested? I
> > think both of those may also solve the issue. Try booting with
> > slab_max_order=0
> 1) I tried slab_max_order=0 and the regression becomes 12.5%. It's still
> not good.
>
> 2) I apllied patch
> slub-direct-pass-through-of-page-size-or-higher-kmalloc.patch to kernel
> 2.6.23-rc4. The new testing result is much better, only 1% less than
> 2.6.22.

Ok. That seems to indicate that we should improve the alloc path in the
page allocator. The page allocator performance needs to be competitive on
page sized allocations. The problem will be largely going away when we
merge the pass through patch in 2.6.24.

2007-09-06 00:53:20

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Wed, 2007-09-05 at 03:45 -0700, Christoph Lameter wrote:
> On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
>
> > > > However, the approach treats the slabs in the same policy. Could we
> > > > implement a per-slab specific approach like direct b)?
> > >
> > > I am not sure what you mean by same policy. Same configuration for all
> > > slabs?
> > Yes.
>
> Ok. I could add the ability to specify parameters for some slabs.
Thanks. That will be more flexible.

>
> > > Would it be possible to try the two other approaches that I suggested? I
> > > think both of those may also solve the issue. Try booting with
> > > slab_max_order=0
> > 1) I tried slab_max_order=0 and the regression becomes 12.5%. It's still
> > not good.
> >
> > 2) I apllied patch
> > slub-direct-pass-through-of-page-size-or-higher-kmalloc.patch to kernel
> > 2.6.23-rc4. The new testing result is much better, only 1% less than
> > 2.6.22.
I retested 2.6.22 and booted kernel with "slub_max_order=3 slub_min_objects=8".
The result is about 8.7% better than without booting parameters.

So all with booting parameter "slub_max_order=3 slub_min_objects=8", 2.6.22 is
about 5.8% better than 2.6.23-rc4. I suspect process scheduler is responsible
for the 5.8% regressions.

>
> Ok. That seems to indicate that we should improve the alloc path in the
> page allocator. The page allocator performance needs to be competitive on
> page sized allocations. The problem will be largely going away when we
> merge the pass through patch in 2.6.24.

2007-09-07 22:11:21

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Wednesday 05 September 2007 17:07, Christoph Lameter wrote:
> On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
> > > slub_max_order=3 slub_min_objects=8
> >
> > I tried this approach. The testing result showed 2.6.23-rc4 is about
> > 2.5% better than 2.6.22. It really resovles the issue.
>
> Note also that the configuration you tried is the way SLUB is configured
> in Andrew's tree.

It still doesn't sound like it is competitive with SLAB at the same sizes.
What's the problem?

2007-09-10 00:57:33

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Sat, 2007-09-08 at 18:08 +1000, Nick Piggin wrote:
> On Wednesday 05 September 2007 17:07, Christoph Lameter wrote:
> > On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
> > > > slub_max_order=3 slub_min_objects=8
> > >
> > > I tried this approach. The testing result showed 2.6.23-rc4 is about
> > > 2.5% better than 2.6.22. It really resovles the issue.
> >
> > Note also that the configuration you tried is the way SLUB is configured
> > in Andrew's tree.
>
> It still doesn't sound like it is competitive with SLAB at the same sizes.
> What's the problem?
Process scheduler and small SLUB per-cpu cache work together to create the tebnch regression.

Pls. see the starting of the thread.

-yanmin

2007-09-10 13:52:21

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Monday 10 September 2007 10:56, Zhang, Yanmin wrote:
> On Sat, 2007-09-08 at 18:08 +1000, Nick Piggin wrote:
> > On Wednesday 05 September 2007 17:07, Christoph Lameter wrote:
> > > On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
> > > > > slub_max_order=3 slub_min_objects=8
> > > >
> > > > I tried this approach. The testing result showed 2.6.23-rc4 is about
> > > > 2.5% better than 2.6.22. It really resovles the issue.
> > >
> > > Note also that the configuration you tried is the way SLUB is
> > > configured in Andrew's tree.
> >
> > It still doesn't sound like it is competitive with SLAB at the same
> > sizes. What's the problem?
>
> Process scheduler and small SLUB per-cpu cache work together to create the
> tebnch regression.

OK, so after isolating the scheduler, then SLUB should be as fast as SLAB
at the same allocation size. That's basically what we need to do before we
can replace SLAB with it, I think?

2007-09-10 19:08:06

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Mon, 10 Sep 2007, Nick Piggin wrote:

> OK, so after isolating the scheduler, then SLUB should be as fast as SLAB
> at the same allocation size. That's basically what we need to do before we
> can replace SLAB with it, I think?

The regression is due to the limited number of objects in the per cpu
"queue" in SLUB for 4k objects. With the .23 code this is one or two
(order 1 slab). So we have to call into the page allocator frequently and
do it for order 1 pages which requires the zone locks. Urgh.

I think the regression is best addressed by the page allocator pass
through patch in mm which makes the page allocator handle these objects.
They are single pages so the pcp lists are in use which provide much
larger queues than SLUB/SLAB.

IMHO >=4k objects should be handled by the page allocator. From the
numbers I have seen there is then still a 1% regression left. If
that is still the case after we have fixed the scheduler then maybe
we need to slim down the page allocator fast path.

2007-09-11 06:59:36

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Tuesday 11 September 2007 05:07, Christoph Lameter wrote:
> On Mon, 10 Sep 2007, Nick Piggin wrote:
> > OK, so after isolating the scheduler, then SLUB should be as fast as SLAB
> > at the same allocation size. That's basically what we need to do before
> > we can replace SLAB with it, I think?
>
> The regression is due to the limited number of objects in the per cpu
> "queue" in SLUB for 4k objects. With the .23 code this is one or two
> (order 1 slab). So we have to call into the page allocator frequently and
> do it for order 1 pages which requires the zone locks. Urgh.

The impression I got at vm meeting was that SLUB was good to go :(

> I think the regression is best addressed by the page allocator pass
> through patch in mm which makes the page allocator handle these objects.
> They are single pages so the pcp lists are in use which provide much
> larger queues than SLUB/SLAB.
>
> IMHO >=4k objects should be handled by the page allocator. From the
> numbers I have seen there is then still a 1% regression left. If
> that is still the case after we have fixed the scheduler then maybe
> we need to slim down the page allocator fast path.

It is trivial to test SLUB vs SLAB independently of the scheduler change.
And actually, a scheduler regression here might just never be fixed,
because it is likely to be a higher level thing where the scheduling just
happens not to interact with tbench so well (and either it would be
impossible to find out why, or no point tuning the scheduler for such a
case).

But slab allocations don't really control the macro behaviour of a
benchmark like that so much. So don't wait until something happens
with the scheduler, fix it now.

Yay, looks like we'll get yet more logic in the VM to polish the proverbial
turd that is higher order allocations :P

2007-09-11 20:21:22

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Tue, 11 Sep 2007, Nick Piggin wrote:

> The impression I got at vm meeting was that SLUB was good to go :(

Its not? I have had Intel test this thoroughly and they assured me that it
is up to SLAB. This particular case is an synthetic tests for a PAGE_SIZE
alloc and SLUB was not optimized for that case because PAGE_SIZEd
allocations should be handled by the page allocators. Quicklists were
introduced for the explicit purpose to get these messy page sized cases
out of the slab allocators.

> But slab allocations don't really control the macro behaviour of a
> benchmark like that so much. So don't wait until something happens
> with the scheduler, fix it now.

Ok so you are for pushing in the page allocator pass through patch from mm
into rc6? Isnt it a bit late for such a change? I would think that 2.6.24
is early enough.

2007-09-11 20:40:56

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Wednesday 12 September 2007 06:19, Christoph Lameter wrote:
> On Tue, 11 Sep 2007, Nick Piggin wrote:
> > The impression I got at vm meeting was that SLUB was good to go :(
>
> Its not? I have had Intel test this thoroughly and they assured me that it
> is up to SLAB. This particular case is an synthetic tests for a PAGE_SIZE
> alloc and SLUB was not optimized for that case because PAGE_SIZEd
> allocations should be handled by the page allocators. Quicklists were
> introduced for the explicit purpose to get these messy page sized cases
> out of the slab allocators.

I heard from one person at KS and one person here that it is not. If they're
simply missing some patch that's in -mm, and there is no longer a SLUB vs
SLAB regression when using equivalent page allocation order, then that's
fine.

2007-09-13 06:04:43

by Suresh Siddha

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Tue, Sep 11, 2007 at 01:19:30PM -0700, Christoph Lameter wrote:
> On Tue, 11 Sep 2007, Nick Piggin wrote:
>
> > The impression I got at vm meeting was that SLUB was good to go :(
>
> Its not? I have had Intel test this thoroughly and they assured me that it
> is up to SLAB.

Christoph, Not sure if you are referring to me or not here. But our
tests(atleast on with the database workloads) approx 1.5 months or so back
showed that on ia64 slub was on par with slab and on x86_64, slub was 9% down.
And after changing the slub min order and max order, slub perf on x86_64 is
down approx 3.5% or so compared to slab.

While I don't rule out large sized allocations like PAGE_SIZE, I am mostly
certain that the critical allocations in this workload are not PAGE_SIZE
based. Mostly they are in the range less than 300-500 bytes or so.

Any changes in the recent slub which takes the pressure away from the page
allocator especially for smaller page sized architectures? If so, we can
redo some of the experiments. Looking at this thread, it doesn't sound like?

thanks,
suresh

2007-09-13 18:04:07

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Wed, 12 Sep 2007, Siddha, Suresh B wrote:

> Christoph, Not sure if you are referring to me or not here. But our
> tests(atleast on with the database workloads) approx 1.5 months or so back
> showed that on ia64 slub was on par with slab and on x86_64, slub was 9% down.
> And after changing the slub min order and max order, slub perf on x86_64 is
> down approx 3.5% or so compared to slab.

No, I was referring to another talk that I had at the OLS with Corey
Gough. I keep getting confusing information from Intel. Last I heard was
that IA64 had a regression and x86_64 was fine (but they were not allowed
to tell me details). Would you please straighten out your story and give
me details?

AFAIK the two of us discussed some issues related to object handover
between processors that cause cache line bouncing and I sent you a
patchset for testing but I did not get any feedback. The patches that were
discussed are now in mm.

> While I don't rule out large sized allocations like PAGE_SIZE, I am mostly
> certain that the critical allocations in this workload are not PAGE_SIZE
> based. Mostly they are in the range less than 300-500 bytes or so.
>
> Any changes in the recent slub which takes the pressure away from the page
> allocator especially for smaller page sized architectures? If so, we can
> redo some of the experiments. Looking at this thread, it doesn't sound like?

Its too late for 2.6.23. But we can certainly do things for .24. Could you
please test the patches queued up in Andrew's tree? In particular the page
allocator pass through and the per cpu structures optimizations?

There is more work out of tree to optimize the fastpath that is mostly
driven by Mathieu Desnoyers. I hope to get that into mm in the next weeks
but I do not think that it is going to be available before .25.

The work of Matheiu also has implications for the page allocator. We may
be able to significantly speed up the fastpath there as well.

2007-09-14 19:15:27

by Suresh Siddha

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

Christoph,

On Thu, Sep 13, 2007 at 11:03:53AM -0700, Christoph Lameter wrote:
> On Wed, 12 Sep 2007, Siddha, Suresh B wrote:
>
> > Christoph, Not sure if you are referring to me or not here. But our
> > tests(atleast on with the database workloads) approx 1.5 months or so back
> > showed that on ia64 slub was on par with slab and on x86_64, slub was 9% down.
> > And after changing the slub min order and max order, slub perf on x86_64 is
> > down approx 3.5% or so compared to slab.
>
> No, I was referring to another talk that I had at the OLS with Corey
> Gough. I keep getting confusing information from Intel. Last I heard was

Please don't go with informal talks and discussions. Please demand the numbers
and make decisions, conclusions based on those numbers. AFAIK, we haven't
posted confusing numbers so far.

> that IA64 had a regression and x86_64 was fine (but they were not allowed
> to tell me details). Would you please straighten out your story and give
> me details?

Numbers I posted in the previous e-mail is the only story we have so far.

> AFAIK the two of us discussed some issues related to object handover
> between processors that cause cache line bouncing and I sent you a
> patchset for testing but I did not get any feedback. The patches that were

Sorry, These systems are huge and limited. We are raising the priority
with the performance team to do the latest slub patch testing.

> discussed are now in mm.
>
> > While I don't rule out large sized allocations like PAGE_SIZE, I am mostly
> > certain that the critical allocations in this workload are not PAGE_SIZE
> > based. Mostly they are in the range less than 300-500 bytes or so.
> >
> > Any changes in the recent slub which takes the pressure away from the page
> > allocator especially for smaller page sized architectures? If so, we can
> > redo some of the experiments. Looking at this thread, it doesn't sound like?
>
> Its too late for 2.6.23. But we can certainly do things for .24. Could you
> please test the patches queued up in Andrew's tree? In particular the page
> allocator pass through and the per cpu structures optimizations?

We are trying to get the latest data with 2.6.23-rc4-mm1 with and without
slub. Is this good enough?

>
> There is more work out of tree to optimize the fastpath that is mostly
> driven by Mathieu Desnoyers. I hope to get that into mm in the next weeks
> but I do not think that it is going to be available before .25.
>
> The work of Matheiu also has implications for the page allocator. We may
> be able to significantly speed up the fastpath there as well.

Ok. Atleast till all the regressions addressed and all these patches well
tested, we shouldn't do away with slab from mainline anytime soon.

Other than us, who else are you banking on for analysing slub? Do
you have any numbers that you can share, which show where slub
is good or bad...

thanks,
suresh

2007-09-14 19:51:44

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Fri, 14 Sep 2007, Siddha, Suresh B wrote:

> Numbers I posted in the previous e-mail is the only story we have so far.

It would be interesting to know more about how the allocator is used
there.

> Sorry, These systems are huge and limited. We are raising the priority
> with the performance team to do the latest slub patch testing.

Ok. Thanks.

> > Its too late for 2.6.23. But we can certainly do things for .24. Could you
> > please test the patches queued up in Andrew's tree? In particular the page
> > allocator pass through and the per cpu structures optimizations?
>
> We are trying to get the latest data with 2.6.23-rc4-mm1 with and without
> slub. Is this good enough?

Good enough. If you are concerned about the page allocator pass through
then you may want to test the page allocator pass through patchset
separately. The fastpath of the page allocator is currently not
competitive if you always free and allocate a single page. If contiguous
pages are allocated then the pass through is superior.

> > The work of Matheiu also has implications for the page allocator. We may
> > be able to significantly speed up the fastpath there as well.
>
> Ok. Atleast till all the regressions addressed and all these patches well
> tested, we shouldn't do away with slab from mainline anytime soon.

Ok. We will hold off. It was so silent about this issue though and from
the talk with Corey I may have wrongly concluded that this was because the
issues were resolved.

> Other than us, who else are you banking on for analysing slub? Do
> you have any numbers that you can share, which show where slub
> is good or bad...

http://lwn.net/Articles/246927/ contains some cycle measurements for the
per cpu patchset and also for the page allocator pass through.

If there is a problem with certain sizes for page allocator pass through
then we may want to increase the boundary so that the page allocator is
only called for objects larger than page size.

2007-09-19 02:17:18

by Suresh Siddha

[permalink] [raw]

Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

On Fri, Sep 14, 2007 at 12:51:34PM -0700, Christoph Lameter wrote:
> On Fri, 14 Sep 2007, Siddha, Suresh B wrote:
> > We are trying to get the latest data with 2.6.23-rc4-mm1 with and without
> > slub. Is this good enough?
>
> Good enough. If you are concerned about the page allocator pass through
> then you may want to test the page allocator pass through patchset
> separately. The fastpath of the page allocator is currently not
> competitive if you always free and allocate a single page. If contiguous
> pages are allocated then the pass through is superior.

We are having all sorts of stability issues with -mm kernels, let alone
perf testing :(

For now, we are trying to do slab Vs slub comparisons for the mainline kernels.
Let's see how that goes.

Meanwhile, any chance that you can point us at relevant recent patches/fixes
that are in -mm and perhaps that can be applied to mainline kernel?

thanks,
suresh

2007-09-20 17:53:19