branch profiling on my nehalem machine showed 99% incorrect branch hints:
28459 7678524 99 __cache_alloc_node slab.c
3551
Signed-off-by: Tim Blechmann <[email protected]>
---
mm/slab.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c
index f70b326..4125fcd 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3548,7 +3548,7 @@ __cache_alloc_node(struct kmem_cache *cachep,
gfp_t flags, int nodeid,
slab_irq_save(save_flags, this_cpu);
this_node = cpu_to_node(this_cpu);
- if (unlikely(nodeid == -1))
+ if (nodeid == -1)
nodeid = this_node;
if (unlikely(!cachep->nodelists[nodeid])) {
--
1.6.4.2
(Pekka Cc:-ed)
* Tim Blechmann <[email protected]> wrote:
> branch profiling on my nehalem machine showed 99% incorrect branch hints:
>
> 28459 7678524 99 __cache_alloc_node slab.c
> 3551
>
> Signed-off-by: Tim Blechmann <[email protected]>
> ---
> mm/slab.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index f70b326..4125fcd 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3548,7 +3548,7 @@ __cache_alloc_node(struct kmem_cache *cachep,
> gfp_t flags, int nodeid,
> slab_irq_save(save_flags, this_cpu);
> this_node = cpu_to_node(this_cpu);
> - if (unlikely(nodeid == -1))
> + if (nodeid == -1)
> nodeid = this_node;
> if (unlikely(!cachep->nodelists[nodeid])) {
> --
> 1.6.4.2
>
>
On Tue, Nov 24, 2009 at 1:20 PM, Ingo Molnar <[email protected]> wrote:
> (Pekka Cc:-ed)
>
> * Tim Blechmann <[email protected]> wrote:
>
>> branch profiling on my nehalem machine showed 99% incorrect branch hints:
>>
>> ? ?28459 ?7678524 ?99 __cache_alloc_node ? ? ? ? ? ? slab.c
>> ? 3551
>>
>> Signed-off-by: Tim Blechmann <[email protected]>
>> ---
>> ?mm/slab.c | ? ?2 +-
>> ?1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index f70b326..4125fcd 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -3548,7 +3548,7 @@ __cache_alloc_node(struct kmem_cache *cachep,
>> gfp_t flags, int nodeid,
>> ? ? ? slab_irq_save(save_flags, this_cpu);
>> ? ? ? this_node = cpu_to_node(this_cpu);
>> - ? ? if (unlikely(nodeid == -1))
>> + ? ? if (nodeid == -1)
>> ? ? ? ? ? ? ? nodeid = this_node;
>> ? ? ? if (unlikely(!cachep->nodelists[nodeid])) {
That sounds odd to me. Can you see where the incorrectly predicted
calls are coming from? Calling kmem_cache_alloc_node() with node set
to -1 most of the time could be a real bug somewhere.
* Pekka Enberg <[email protected]> wrote:
> On Tue, Nov 24, 2009 at 1:20 PM, Ingo Molnar <[email protected]> wrote:
> > (Pekka Cc:-ed)
> >
> > * Tim Blechmann <[email protected]> wrote:
> >
> >> branch profiling on my nehalem machine showed 99% incorrect branch hints:
> >>
> >> ? ?28459 ?7678524 ?99 __cache_alloc_node ? ? ? ? ? ? slab.c
> >> ? 3551
> >>
> >> Signed-off-by: Tim Blechmann <[email protected]>
> >> ---
> >> ?mm/slab.c | ? ?2 +-
> >> ?1 files changed, 1 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/mm/slab.c b/mm/slab.c
> >> index f70b326..4125fcd 100644
> >> --- a/mm/slab.c
> >> +++ b/mm/slab.c
> >> @@ -3548,7 +3548,7 @@ __cache_alloc_node(struct kmem_cache *cachep,
> >> gfp_t flags, int nodeid,
> >> ? ? ? slab_irq_save(save_flags, this_cpu);
> >> ? ? ? this_node = cpu_to_node(this_cpu);
> >> - ? ? if (unlikely(nodeid == -1))
> >> + ? ? if (nodeid == -1)
> >> ? ? ? ? ? ? ? nodeid = this_node;
> >> ? ? ? if (unlikely(!cachep->nodelists[nodeid])) {
>
> That sounds odd to me. Can you see where the incorrectly predicted
> calls are coming from? Calling kmem_cache_alloc_node() with node set
> to -1 most of the time could be a real bug somewhere.
I think it could occur in too limited tests - the branch prediction
looks 'wrong' in certain tests - while it's OK in general.
Is there some easy to run workload you consider more or less
representative of typical SLAB patterns?
<plug> You might want to pull even with the scheduler subsystem and in
addition to 'perf bench sched', add a 'perf bench slab' set of
interesting testcases for SLAB performance testing. :-)
</plug>
Ingo
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11/24/2009 12:28 PM, Pekka Enberg wrote:
> On Tue, Nov 24, 2009 at 1:20 PM, Ingo Molnar <[email protected]> wrote:
>> (Pekka Cc:-ed)
>>
>> * Tim Blechmann <[email protected]> wrote:
>>
>>> branch profiling on my nehalem machine showed 99% incorrect branch hints:
>>>
>>> 28459 7678524 99 __cache_alloc_node slab.c
>>> 3551
>>>
>>> Signed-off-by: Tim Blechmann <[email protected]>
>>> ---
>>> mm/slab.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index f70b326..4125fcd 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -3548,7 +3548,7 @@ __cache_alloc_node(struct kmem_cache *cachep,
>>> gfp_t flags, int nodeid,
>>> slab_irq_save(save_flags, this_cpu);
>>> this_node = cpu_to_node(this_cpu);
>>> - if (unlikely(nodeid == -1))
>>> + if (nodeid == -1)
>>> nodeid = this_node;
>>> if (unlikely(!cachep->nodelists[nodeid])) {
>
> That sounds odd to me. Can you see where the incorrectly predicted
> calls are coming from? Calling kmem_cache_alloc_node() with node set
> to -1 most of the time could be a real bug somewhere.
i don't know, if there is any facility in the ftrace branch profiler to
get call graph information, but i can try to manually dump backtraces in
this condition path ...
could be a specific situation on my machine, though ...
tim
- --
[email protected]
http://tim.klingt.org
You don't have to call it music if the term shocks you.
John Cage
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAksLx0kACgkQdL+4qsZfVsskOQCglFQG3eYAfdgXoOAHAGTqaLcU
8e0AoIQNbzSRxttGFaXTF3PEh5O4aGEB
=3nmT
-----END PGP SIGNATURE-----
Ingo Molnar kirjoitti:
>> That sounds odd to me. Can you see where the incorrectly predicted
>> calls are coming from? Calling kmem_cache_alloc_node() with node set
>> to -1 most of the time could be a real bug somewhere.
>
> I think it could occur in too limited tests - the branch prediction
> looks 'wrong' in certain tests - while it's OK in general.
>
> Is there some easy to run workload you consider more or less
> representative of typical SLAB patterns?
I can think of three main classes: VFS, SCSI, or network intensive
applications and benchmarks tend to bring out the worst in SLAB. There
are probably some interesting NUMA related things that I'm not really
aware of.
On 11/24/2009 12:28 PM, Pekka Enberg wrote:
> On Tue, Nov 24, 2009 at 1:20 PM, Ingo Molnar <[email protected]> wrote:
>> (Pekka Cc:-ed)
>>
>> * Tim Blechmann <[email protected]> wrote:
>>
>>> branch profiling on my nehalem machine showed 99% incorrect branch hints:
>>>
>>> 28459 7678524 99 __cache_alloc_node slab.c
>>> 3551
>>>
>>> Signed-off-by: Tim Blechmann <[email protected]>
>>> ---
>>> mm/slab.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index f70b326..4125fcd 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -3548,7 +3548,7 @@ __cache_alloc_node(struct kmem_cache *cachep,
>>> gfp_t flags, int nodeid,
>>> slab_irq_save(save_flags, this_cpu);
>>> this_node = cpu_to_node(this_cpu);
>>> - if (unlikely(nodeid == -1))
>>> + if (nodeid == -1)
>>> nodeid = this_node;
>>> if (unlikely(!cachep->nodelists[nodeid])) {
>
> That sounds odd to me. Can you see where the incorrectly predicted
> calls are coming from? Calling kmem_cache_alloc_node() with node set
> to -1 most of the time could be a real bug somewhere.
when dumping the stack for the incorrectly hinted branches, i get the
attached stack traces...
hth, tim
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3548,8 +3548,10 @@ __cache_alloc_node(struct kmem_cache *cachep,
gfp_t flags, int nodeid,
slab_irq_save(save_flags, this_cpu);
this_node = cpu_to_node(this_cpu);
- if (nodeid == -1)
+ if (nodeid == -1) {
+ dump_stack();
nodeid = this_node;
+ }
if (unlikely(!cachep->nodelists[nodeid])) {
/* Node not bootstrapped yet */
--
[email protected]
http://tim.klingt.org
It is better to make a piece of music than to perform one, better to
perform one than to listen to one, better to listen to one than to
misuse it as a means of distraction, entertainment, or acquisition of
'culture'.
John Cage
Tim Blechmann kirjoitti:
> On 11/24/2009 12:28 PM, Pekka Enberg wrote:
>> On Tue, Nov 24, 2009 at 1:20 PM, Ingo Molnar <[email protected]> wrote:
>>> (Pekka Cc:-ed)
>>>
>>> * Tim Blechmann <[email protected]> wrote:
>>>
>>>> branch profiling on my nehalem machine showed 99% incorrect branch hints:
>>>>
>>>> 28459 7678524 99 __cache_alloc_node slab.c
>>>> 3551
>>>>
>>>> Signed-off-by: Tim Blechmann <[email protected]>
>>>> ---
>>>> mm/slab.c | 2 +-
>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/mm/slab.c b/mm/slab.c
>>>> index f70b326..4125fcd 100644
>>>> --- a/mm/slab.c
>>>> +++ b/mm/slab.c
>>>> @@ -3548,7 +3548,7 @@ __cache_alloc_node(struct kmem_cache *cachep,
>>>> gfp_t flags, int nodeid,
>>>> slab_irq_save(save_flags, this_cpu);
>>>> this_node = cpu_to_node(this_cpu);
>>>> - if (unlikely(nodeid == -1))
>>>> + if (nodeid == -1)
>>>> nodeid = this_node;
>>>> if (unlikely(!cachep->nodelists[nodeid])) {
>> That sounds odd to me. Can you see where the incorrectly predicted
>> calls are coming from? Calling kmem_cache_alloc_node() with node set
>> to -1 most of the time could be a real bug somewhere.
>
> when dumping the stack for the incorrectly hinted branches, i get the
> attached stack traces...
>
> hth, tim
>
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3548,8 +3548,10 @@ __cache_alloc_node(struct kmem_cache *cachep,
> gfp_t flags, int nodeid,
> slab_irq_save(save_flags, this_cpu);
>
> this_node = cpu_to_node(this_cpu);
> - if (nodeid == -1)
> + if (nodeid == -1) {
> + dump_stack();
> nodeid = this_node;
> + }
>
> if (unlikely(!cachep->nodelists[nodeid])) {
> /* Node not bootstrapped yet */
>
>
>
OK, so it's the generic alloc_skb() function that keeps hitting
kmem_cache_alloc_node() with "-1". Christoph, are you okay with removing
the unlikely() annotation from __cache_alloc_node()?
Pekka
On Mon, 30 Nov 2009, Pekka Enberg wrote:
> OK, so it's the generic alloc_skb() function that keeps hitting
> kmem_cache_alloc_node() with "-1". Christoph, are you okay with removing the
> unlikely() annotation from __cache_alloc_node()?
Yes. Lets look for other cases in the allocators too.
kmem_cache_alloc_node used to be mainly used for off node allocations but
the network alloc_skb() case shows that this is changing now.
I hope the users of kmem_cache_alloc_node() realize that using -1 is not
equivalent to kmem_cache_alloc(). kmem_cache_alloc follows numa policies
for memory allocations. kmem_cache_alloc_node() does not.
On Mon, Nov 30, 2009 at 6:09 PM, Christoph Lameter
<[email protected]> wrote:
> On Mon, 30 Nov 2009, Pekka Enberg wrote:
>
>> OK, so it's the generic alloc_skb() function that keeps hitting
>> kmem_cache_alloc_node() with "-1". Christoph, are you okay with removing the
>> unlikely() annotation from __cache_alloc_node()?
>
> Yes. Lets look for other cases in the allocators too.
> kmem_cache_alloc_node used to be mainly used for off node allocations but
> the network alloc_skb() case shows that this is changing now.
Tim, can you please re-send the patch to me so I can apply it?
On Mon, 30 Nov 2009, Pekka Enberg wrote:
> Tim, can you please re-send the patch to me so I can apply it?
SLUB has no issue since NUMA decisions are deferred to the page allocator.
i have regenerated the patch for tip/tip ...
--
28459 7678524 99 __cache_alloc_node slab.c 3551
Discussion on lkml [1] led to the solution to remove this hint.
[1] http://patchwork.kernel.org/patch/63517/
Signed-off-by: Tim Blechmann <[email protected]>
---
mm/slab.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
Tim Blechmann wrote:
> i have regenerated the patch for tip/tip ...
>
> --
> 28459 7678524 99 __cache_alloc_node slab.c 3551
>
> Discussion on lkml [1] led to the solution to remove this hint.
>
> [1] http://patchwork.kernel.org/patch/63517/
>
> Signed-off-by: Tim Blechmann <[email protected]>
> ---
> mm/slab.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
Applied, thanks.
Please use plain text email for sending patches in the future.