2009-11-25 18:21:31

by Pekka Enberg

[permalink] [raw]
Subject: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation

The unlikely() annotation in slab_alloc() covers too much of the expression.
It's actually very likely that the object is not NULL so use unlikely() only
for the __GFP_ZERO expression like SLAB does.

The patch reduces kernel text by 29 bytes on x86-64:

text data bss dec hex filename
24185 8560 176 32921 8099 mm/slub.o.orig
24156 8560 176 32892 807c mm/slub.o

Cc: Christoph Lameter <[email protected]>
Signed-off-by: Pekka Enberg <[email protected]>
---
mm/slub.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 4996fc7..0956396 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1735,7 +1735,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
}
local_irq_restore(flags);

- if (unlikely((gfpflags & __GFP_ZERO) && object))
+ if (unlikely(gfpflags & __GFP_ZERO) && object)
memset(object, 0, objsize);

kmemcheck_slab_alloc(s, gfpflags, object, c->objsize);
--
1.6.3.3


2009-11-25 18:27:13

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation

On Wed, 2009-11-25 at 20:21 +0200, Pekka Enberg wrote:
> The unlikely() annotation in slab_alloc() covers too much of the expression.
> It's actually very likely that the object is not NULL so use unlikely() only
> for the __GFP_ZERO expression like SLAB does.
[]
> +++ b/mm/slub.c
> @@ -1735,7 +1735,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
> - if (unlikely((gfpflags & __GFP_ZERO) && object))
> + if (unlikely(gfpflags & __GFP_ZERO) && object)
> memset(object, 0, objsize);

so why not use

if (unlikely(gfpflags & __GFP_ZERO) && likely(object))

2009-11-25 18:30:39

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation

Joe Perches wrote:
> On Wed, 2009-11-25 at 20:21 +0200, Pekka Enberg wrote:
>> The unlikely() annotation in slab_alloc() covers too much of the expression.
>> It's actually very likely that the object is not NULL so use unlikely() only
>> for the __GFP_ZERO expression like SLAB does.
> []
>> +++ b/mm/slub.c
>> @@ -1735,7 +1735,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
>> - if (unlikely((gfpflags & __GFP_ZERO) && object))
>> + if (unlikely(gfpflags & __GFP_ZERO) && object)
>> memset(object, 0, objsize);
>
> so why not use
>
> if (unlikely(gfpflags & __GFP_ZERO) && likely(object))

Because that has no effect on the generated code.

2009-11-25 20:15:34

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation

On Wed, 25 Nov 2009, Pekka Enberg wrote:

> Because that has no effect on the generated code.

Does this affect the cycle count in slab_alloc?

2009-11-25 20:19:41

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation

On Wed, Nov 25, 2009 at 10:15 PM, Christoph Lameter
<[email protected]> wrote:
> On Wed, 25 Nov 2009, Pekka Enberg wrote:
>
>> Because that has no effect on the generated code.
>
> Does this affect the cycle count in slab_alloc?

Joe's suggestion doesn't change the generated code, my patch does and
probably changes cycle count. Let me see if I can run the in-kernel
slab tests.

2009-11-26 20:27:42

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation

On Wed, Nov 25, 2009 at 10:15 PM, Christoph Lameter
<[email protected]> wrote:
> On Wed, 25 Nov 2009, Pekka Enberg wrote:
>
>> Because that has no effect on the generated code.
>
> Does this affect the cycle count in slab_alloc?

I ran the synthetic benchmarks here and cycle count is roughly the
same for both versions.

SLUB 2.6.32-rc8:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 118 cycles kfree -> 158 cycles
10000 times kmalloc(16) -> 129 cycles kfree -> 162 cycles
10000 times kmalloc(32) -> 136 cycles kfree -> 149 cycles
10000 times kmalloc(64) -> 148 cycles kfree -> 168 cycles
10000 times kmalloc(128) -> 189 cycles kfree -> 189 cycles
10000 times kmalloc(256) -> 217 cycles kfree -> 242 cycles
10000 times kmalloc(512) -> 254 cycles kfree -> 377 cycles
10000 times kmalloc(1024) -> 284 cycles kfree -> 468 cycles
10000 times kmalloc(2048) -> 304 cycles kfree -> 509 cycles
10000 times kmalloc(4096) -> 388 cycles kfree -> 567 cycles
10000 times kmalloc(8192) -> 523 cycles kfree -> 658 cycles
10000 times kmalloc(16384) -> 790 cycles kfree -> 843 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 199 cycles
10000 times kmalloc(16)/kfree -> 197 cycles
10000 times kmalloc(32)/kfree -> 197 cycles
10000 times kmalloc(64)/kfree -> 196 cycles
10000 times kmalloc(128)/kfree -> 197 cycles
10000 times kmalloc(256)/kfree -> 197 cycles
10000 times kmalloc(512)/kfree -> 210 cycles
10000 times kmalloc(1024)/kfree -> 201 cycles
10000 times kmalloc(2048)/kfree -> 186 cycles
10000 times kmalloc(4096)/kfree -> 204 cycles
10000 times kmalloc(8192)/kfree -> 202 cycles
10000 times kmalloc(16384)/kfree -> 871 cycles
Concurrent allocs
=================
Kmalloc N*alloc N*free(8): 0=160/182 1=160/179 Average=160/180
Kmalloc N*alloc N*free(16): 0=164/189 1=163/186 Average=164/188
Kmalloc N*alloc N*free(32): 0=177/194 1=176/192 Average=177/193
Kmalloc N*alloc N*free(64): 0=206/211 1=202/209 Average=204/210
Kmalloc N*alloc N*free(128): 0=238/283 1=238/288 Average=238/285
Kmalloc N*alloc N*free(256): 0=286/465 1=282/462 Average=284/463
Kmalloc N*alloc N*free(512): 0=385/604 1=382/600 Average=384/602
Kmalloc N*alloc N*free(1024): 0=466/606 1=405/604 Average=436/605
Kmalloc N*alloc N*free(2048): 0=417/650 1=415/659 Average=416/654
Kmalloc N*alloc N*free(4096): 0=574/838 1=574/827 Average=574/833
Kmalloc N*(alloc free)(8): 0=222 1=219 Average=221
Kmalloc N*(alloc free)(16): 0=222 1=220 Average=221
Kmalloc N*(alloc free)(32): 0=222 1=219 Average=221
Kmalloc N*(alloc free)(64): 0=185 1=184 Average=184
Kmalloc N*(alloc free)(128): 0=222 1=219 Average=221
Kmalloc N*(alloc free)(256): 0=224 1=221 Average=223
Kmalloc N*(alloc free)(512): 0=225 1=221 Average=223
Kmalloc N*(alloc free)(1024): 0=225 1=223 Average=224
Kmalloc N*(alloc free)(2048): 0=225 1=223 Average=224
Kmalloc N*(alloc free)(4096): 0=225 1=222 Average=224
Remote free test
================
N*remote free(8): 0=8/187 1=161/0 Average=84/93
N*remote free(16): 0=8/189 1=163/0 Average=86/94
N*remote free(32): 0=8/198 1=177/0 Average=92/99
N*remote free(64): 0=8/197 1=210/0 Average=109/98
N*remote free(128): 0=8/214 1=249/0 Average=129/107
N*remote free(256): 0=8/273 1=267/0 Average=137/136
N*remote free(512): 0=8/380 1=283/0 Average=145/190
N*remote free(1024): 0=8/466 1=300/0 Average=154/233
N*remote free(2048): 0=8/486 1=325/0 Average=166/243
N*remote free(4096): 0=8/564 1=420/0 Average=214/282
1 alloc N free test
===================
1 alloc N free(8): 0=211 1=152 Average=181
1 alloc N free(16): 0=217 1=155 Average=186
1 alloc N free(32): 0=243 1=165 Average=204
1 alloc N free(64): 0=352 1=233 Average=293
1 alloc N free(128): 0=356 1=232 Average=294
1 alloc N free(256): 0=448 1=296 Average=372
1 alloc N free(512): 0=468 1=308 Average=388
1 alloc N free(1024): 0=450 1=282 Average=366
1 alloc N free(2048): 0=391 1=218 Average=305
1 alloc N free(4096): 0=508 1=286 Average=397


SLUB 2.6.32-rc8 + GFP_ZERO fix:

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 114 cycles kfree -> 146 cycles
10000 times kmalloc(16) -> 131 cycles kfree -> 161 cycles
10000 times kmalloc(32) -> 139 cycles kfree -> 186 cycles
10000 times kmalloc(64) -> 162 cycles kfree -> 168 cycles
10000 times kmalloc(128) -> 193 cycles kfree -> 190 cycles
10000 times kmalloc(256) -> 218 cycles kfree -> 246 cycles
10000 times kmalloc(512) -> 254 cycles kfree -> 390 cycles
10000 times kmalloc(1024) -> 276 cycles kfree -> 450 cycles
10000 times kmalloc(2048) -> 284 cycles kfree -> 492 cycles
10000 times kmalloc(4096) -> 406 cycles kfree -> 536 cycles
10000 times kmalloc(8192) -> 495 cycles kfree -> 630 cycles
10000 times kmalloc(16384) -> 797 cycles kfree -> 788 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 197 cycles
10000 times kmalloc(16)/kfree -> 224 cycles
10000 times kmalloc(32)/kfree -> 199 cycles
10000 times kmalloc(64)/kfree -> 199 cycles
10000 times kmalloc(128)/kfree -> 199 cycles
10000 times kmalloc(256)/kfree -> 199 cycles
10000 times kmalloc(512)/kfree -> 199 cycles
10000 times kmalloc(1024)/kfree -> 199 cycles
10000 times kmalloc(2048)/kfree -> 178 cycles
10000 times kmalloc(4096)/kfree -> 208 cycles
10000 times kmalloc(8192)/kfree -> 233 cycles
10000 times kmalloc(16384)/kfree -> 875 cycles
Concurrent allocs
=================
Kmalloc N*alloc N*free(8): 0=157/182 1=157/178 Average=157/180
Kmalloc N*alloc N*free(16): 0=161/188 1=161/184 Average=161/186
Kmalloc N*alloc N*free(32): 0=173/194 1=174/192 Average=173/193
Kmalloc N*alloc N*free(64): 0=201/212 1=197/211 Average=199/211
Kmalloc N*alloc N*free(128): 0=233/281 1=233/283 Average=233/282
Kmalloc N*alloc N*free(256): 0=279/446 1=283/447 Average=281/446
Kmalloc N*alloc N*free(512): 0=374/565 1=372/570 Average=373/567
Kmalloc N*alloc N*free(1024): 0=394/591 1=391/593 Average=393/592
Kmalloc N*alloc N*free(2048): 0=407/621 1=404/629 Average=405/625
Kmalloc N*alloc N*free(4096): 0=559/786 1=558/788 Average=559/787
Kmalloc N*(alloc free)(8): 0=216 1=214 Average=215
Kmalloc N*(alloc free)(16): 0=216 1=214 Average=215
Kmalloc N*(alloc free)(32): 0=216 1=214 Average=215
Kmalloc N*(alloc free)(64): 0=216 1=214 Average=215
Kmalloc N*(alloc free)(128): 0=216 1=214 Average=215
Kmalloc N*(alloc free)(256): 0=216 1=214 Average=215
Kmalloc N*(alloc free)(512): 0=216 1=215 Average=216
Kmalloc N*(alloc free)(1024): 0=217 1=215 Average=216
Kmalloc N*(alloc free)(2048): 0=217 1=215 Average=216
Kmalloc N*(alloc free)(4096): 0=217 1=215 Average=216
Remote free test
================
N*remote free(8): 0=8/188 1=156/0 Average=82/94
N*remote free(16): 0=8/189 1=159/0 Average=84/94
N*remote free(32): 0=8/191 1=171/0 Average=89/95
N*remote free(64): 0=8/197 1=200/0 Average=104/98
N*remote free(128): 0=8/214 1=243/0 Average=125/107
N*remote free(256): 0=8/266 1=258/0 Average=133/133
N*remote free(512): 0=8/379 1=274/0 Average=141/189
N*remote free(1024): 0=8/443 1=291/0 Average=150/221
N*remote free(2048): 0=8/472 1=318/0 Average=163/236
N*remote free(4096): 0=8/546 1=416/0 Average=212/273
1 alloc N free test
===================
1 alloc N free(8): 0=208 1=149 Average=179
1 alloc N free(16): 0=219 1=158 Average=189
1 alloc N free(32): 0=241 1=165 Average=203
1 alloc N free(64): 0=311 1=207 Average=259
1 alloc N free(128): 0=350 1=226 Average=288
1 alloc N free(256): 0=439 1=292 Average=366
1 alloc N free(512): 0=470 1=314 Average=392
1 alloc N free(1024): 0=361 1=203 Average=282
1 alloc N free(2048): 0=385 1=216 Average=300
1 alloc N free(4096): 0=492 1=278 Average=385

2009-11-27 19:26:42

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation

On Thu, 26 Nov 2009, Pekka Enberg wrote:

> I ran the synthetic benchmarks here and cycle count is roughly the
> same for both versions.

Ok.

Acked-by: Christoph Lameter <[email protected]>