2007-06-18 10:00:48

by Christoph Lameter

[permalink] [raw]
Subject: [patch 00/26] Current slab allocator / SLUB patch queue

These contain the following groups of patches:

1. Slab allocator code consolidation and fixing of inconsistencies

This makes ZERO_SIZE_PTR generic so that it works in all
slab allocators.

It adds __GFP_ZERO support to all slab allocators and
cleans up the zeroing in the slabs and provides modifications
to remove explicit zeroing following kmalloc_node and
kmem_cache_alloc_node calls.

2. SLUB improvements

Inline some small functions to reduce code size. Some more memory
optimizations using CONFIG_SLUB_DEBUG. Changes to handling of the
slub_lock and an optimization of runtime determination of kmalloc slabs
(replaces ilog2 patch that failed with gcc 3.3 on powerpc).

3. Slab defragmentation

This is V3 of the patchset with the one fix for the locking problem that
showed up during testing.

4. Performance optimizations

These patches have a long history since the early drafts of SLUB. The
problem with these patches is that they require the touching of additional
cachelines (only for read) and SLUB was designed for minimal cacheline
touching. In doing so we may be able to remove cacheline bouncing in
particular for remote alloc/ free situations where I have had reports of
issues that I was not able to confirm for lack of specificity. The tradeoffs
here are not clear. Certainly the larger cacheline footprint will hurt the
casual slab user somewhat but it will benefit processes that perform these
local/remote alloc/free operations.

I'd appreciate if someone could evaluate these.

The complete patchset against 2.6.22-rc4-mm2 is available at

http://ftp.kernel.org/pub/linux/kernel/people/christoph/slub/2.6.22-rc4-mm2

Tested on

x86_64 SMP
x86_64 NUMA emulation
IA64 emulator
Altix 64p/128G NUMA system.
Altix 8p/6G asymmetric NUMA system.


--


2007-06-18 11:58:43

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

Hi,

[email protected] pisze:
> These contain the following groups of patches:
>
> 1. Slab allocator code consolidation and fixing of inconsistencies
>
> This makes ZERO_SIZE_PTR generic so that it works in all
> slab allocators.
>
> It adds __GFP_ZERO support to all slab allocators and
> cleans up the zeroing in the slabs and provides modifications
> to remove explicit zeroing following kmalloc_node and
> kmem_cache_alloc_node calls.
>
> 2. SLUB improvements
>
> Inline some small functions to reduce code size. Some more memory
> optimizations using CONFIG_SLUB_DEBUG. Changes to handling of the
> slub_lock and an optimization of runtime determination of kmalloc slabs
> (replaces ilog2 patch that failed with gcc 3.3 on powerpc).
>
> 3. Slab defragmentation
>
> This is V3 of the patchset with the one fix for the locking problem that
> showed up during testing.
>
> 4. Performance optimizations
>
> These patches have a long history since the early drafts of SLUB. The
> problem with these patches is that they require the touching of additional
> cachelines (only for read) and SLUB was designed for minimal cacheline
> touching. In doing so we may be able to remove cacheline bouncing in
> particular for remote alloc/ free situations where I have had reports of
> issues that I was not able to confirm for lack of specificity. The tradeoffs
> here are not clear. Certainly the larger cacheline footprint will hurt the
> casual slab user somewhat but it will benefit processes that perform these
> local/remote alloc/free operations.
>
> I'd appreciate if someone could evaluate these.
>
> The complete patchset against 2.6.22-rc4-mm2 is available at
>
> http://ftp.kernel.org/pub/linux/kernel/people/christoph/slub/2.6.22-rc4-mm2
>
> Tested on
>
> x86_64 SMP
> x86_64 NUMA emulation
> IA64 emulator
> Altix 64p/128G NUMA system.
> Altix 8p/6G asymmetric NUMA system.
>
>

Testcase:

#! /bin/sh

for i in `find /sys/ -type f`
do
echo "wy?wietlam $i"
sudo cat $i > /dev/null
# sleep 1s
done

Result:

[ 212.247759] WARNING: at lib/vsprintf.c:280 vsnprintf()
[ 212.253263] [<c04052ad>] dump_trace+0x63/0x1eb
[ 212.259042] [<c040544f>] show_trace_log_lvl+0x1a/0x2f
[ 212.266672] [<c040608d>] show_trace+0x12/0x14
[ 212.271622] [<c04060a5>] dump_stack+0x16/0x18
[ 212.276663] [<c050d512>] vsnprintf+0x6b/0x48c
[ 212.281325] [<c050d9f0>] scnprintf+0x20/0x2d
[ 212.286707] [<c0508dbc>] bitmap_scnlistprintf+0xa8/0xec
[ 212.292508] [<c0480d40>] list_locations+0x24c/0x2a2
[ 212.298241] [<c0480dde>] alloc_calls_show+0x1f/0x26
[ 212.303459] [<c047e72e>] slab_attr_show+0x1c/0x20
[ 212.309469] [<c04c1cf9>] sysfs_read_file+0x94/0x105
[ 212.315519] [<c0485933>] vfs_read+0xcf/0x158
[ 212.320215] [<c0485d99>] sys_read+0x3d/0x72
[ 212.327539] [<c040420c>] syscall_call+0x7/0xb
[ 212.332203] [<b7f74410>] 0xb7f74410
[ 212.336229] =======================

Unfortunately, I don't know which file was cat'ed

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc4-mm2-slub/slub-config
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc4-mm2-slub/slub-dmesg

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-18 16:46:21

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

On Mon, 18 Jun 2007, Michal Piotrowski wrote:

> Result:
>
> [ 212.247759] WARNING: at lib/vsprintf.c:280 vsnprintf()
> [ 212.253263] [<c04052ad>] dump_trace+0x63/0x1eb
> [ 212.259042] [<c040544f>] show_trace_log_lvl+0x1a/0x2f
> [ 212.266672] [<c040608d>] show_trace+0x12/0x14
> [ 212.271622] [<c04060a5>] dump_stack+0x16/0x18
> [ 212.276663] [<c050d512>] vsnprintf+0x6b/0x48c
> [ 212.281325] [<c050d9f0>] scnprintf+0x20/0x2d
> [ 212.286707] [<c0508dbc>] bitmap_scnlistprintf+0xa8/0xec
> [ 212.292508] [<c0480d40>] list_locations+0x24c/0x2a2
> [ 212.298241] [<c0480dde>] alloc_calls_show+0x1f/0x26
> [ 212.303459] [<c047e72e>] slab_attr_show+0x1c/0x20
> [ 212.309469] [<c04c1cf9>] sysfs_read_file+0x94/0x105
> [ 212.315519] [<c0485933>] vfs_read+0xcf/0x158
> [ 212.320215] [<c0485d99>] sys_read+0x3d/0x72
> [ 212.327539] [<c040420c>] syscall_call+0x7/0xb
> [ 212.332203] [<b7f74410>] 0xb7f74410
> [ 212.336229] =======================
>
> Unfortunately, I don't know which file was cat'ed

The dump shows that it was alloc_calls. But the issue is not related to
this patchset.

Looks like we overflowed the buffer available for /sys output. The calls
in list_location to format cpulist and node lists attempt to allow very
long lists by trying to calculate how many bytes are remaining in the
page. If we are beyond the space left over by them then we may pass a
negative size to the scn_printf functions.

So we need to check first if there are enough bytes remaining before
doing the calculation of how many remaining bytes can be used to
format these lists.

Does this patch fix the issue?

Index: linux-2.6.22-rc4-mm2/mm/slub.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/mm/slub.c 2007-06-18 09:37:41.000000000 -0700
+++ linux-2.6.22-rc4-mm2/mm/slub.c 2007-06-18 09:44:38.000000000 -0700
@@ -3649,13 +3649,15 @@ static int list_locations(struct kmem_ca
n += sprintf(buf + n, " pid=%ld",
l->min_pid);

- if (num_online_cpus() > 1 && !cpus_empty(l->cpus)) {
+ if (num_online_cpus() > 1 && !cpus_empty(l->cpus) &&
+ n < PAGE_SIZE - n - 57) {
n += sprintf(buf + n, " cpus=");
n += cpulist_scnprintf(buf + n, PAGE_SIZE - n - 50,
l->cpus);
}

- if (num_online_nodes() > 1 && !nodes_empty(l->nodes)) {
+ if (num_online_nodes() > 1 && !nodes_empty(l->nodes) &&
+ n < PAGE_SIZE - n - 57) {
n += sprintf(buf + n, " nodes=");
n += nodelist_scnprintf(buf + n, PAGE_SIZE - n - 50,
l->nodes);





2007-06-18 17:38:32

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

On 18/06/07, Christoph Lameter <[email protected]> wrote:
> On Mon, 18 Jun 2007, Michal Piotrowski wrote:
>
> > Result:
> >
> > [ 212.247759] WARNING: at lib/vsprintf.c:280 vsnprintf()
> > [ 212.253263] [<c04052ad>] dump_trace+0x63/0x1eb
> > [ 212.259042] [<c040544f>] show_trace_log_lvl+0x1a/0x2f
> > [ 212.266672] [<c040608d>] show_trace+0x12/0x14
> > [ 212.271622] [<c04060a5>] dump_stack+0x16/0x18
> > [ 212.276663] [<c050d512>] vsnprintf+0x6b/0x48c
> > [ 212.281325] [<c050d9f0>] scnprintf+0x20/0x2d
> > [ 212.286707] [<c0508dbc>] bitmap_scnlistprintf+0xa8/0xec
> > [ 212.292508] [<c0480d40>] list_locations+0x24c/0x2a2
> > [ 212.298241] [<c0480dde>] alloc_calls_show+0x1f/0x26
> > [ 212.303459] [<c047e72e>] slab_attr_show+0x1c/0x20
> > [ 212.309469] [<c04c1cf9>] sysfs_read_file+0x94/0x105
> > [ 212.315519] [<c0485933>] vfs_read+0xcf/0x158
> > [ 212.320215] [<c0485d99>] sys_read+0x3d/0x72
> > [ 212.327539] [<c040420c>] syscall_call+0x7/0xb
> > [ 212.332203] [<b7f74410>] 0xb7f74410
> > [ 212.336229] =======================
> >
> > Unfortunately, I don't know which file was cat'ed
>
> The dump shows that it was alloc_calls. But the issue is not related to
> this patchset.
>
> Looks like we overflowed the buffer available for /sys output. The calls
> in list_location to format cpulist and node lists attempt to allow very
> long lists by trying to calculate how many bytes are remaining in the
> page. If we are beyond the space left over by them then we may pass a
> negative size to the scn_printf functions.
>
> So we need to check first if there are enough bytes remaining before
> doing the calculation of how many remaining bytes can be used to
> format these lists.
>
> Does this patch fix the issue?
>

Unfortunately no.

AFAIR I didn't see it in 2.6.22-rc4-mm2

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-18 18:06:10

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

On Mon, 18 Jun 2007, Michal Piotrowski wrote:

> > Does this patch fix the issue?
> Unfortunately no.
>
> AFAIR I didn't see it in 2.6.22-rc4-mm2

Seems that I miscounted. We need a larger safe area.


SLUB: Fix behavior if the text output of list_locations overflows PAGE_SIZE

If slabs are allocated or freed from a large set of call sites (typical
for the kmalloc area) then we may create more output than fits into
a single PAGE and sysfs only gives us one page. The output should be
truncated. This patch fixes the checks to do the truncation properly.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.22-rc4-mm2/mm/slub.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/mm/slub.c 2007-06-18 09:37:41.000000000 -0700
+++ linux-2.6.22-rc4-mm2/mm/slub.c 2007-06-18 11:02:19.000000000 -0700
@@ -3649,13 +3649,15 @@ static int list_locations(struct kmem_ca
n += sprintf(buf + n, " pid=%ld",
l->min_pid);

- if (num_online_cpus() > 1 && !cpus_empty(l->cpus)) {
+ if (num_online_cpus() > 1 && !cpus_empty(l->cpus) &&
+ n < PAGE_SIZE - n - 60) {
n += sprintf(buf + n, " cpus=");
n += cpulist_scnprintf(buf + n, PAGE_SIZE - n - 50,
l->cpus);
}

- if (num_online_nodes() > 1 && !nodes_empty(l->nodes)) {
+ if (num_online_nodes() > 1 && !nodes_empty(l->nodes) &&
+ n < PAGE_SIZE - n - 60) {
n += sprintf(buf + n, " nodes=");
n += nodelist_scnprintf(buf + n, PAGE_SIZE - n - 50,
l->nodes);

2007-06-18 18:59:07

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

On 18/06/07, Christoph Lameter <[email protected]> wrote:
> On Mon, 18 Jun 2007, Michal Piotrowski wrote:
>
> > > Does this patch fix the issue?
> > Unfortunately no.
> >
> > AFAIR I didn't see it in 2.6.22-rc4-mm2
>
> Seems that I miscounted. We need a larger safe area.
>

Still the same.

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-18 19:00:40

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

On Mon, 18 Jun 2007, Michal Piotrowski wrote:

> Still the same.

Is it still exactly the same strack trace? There could be multiple issue
if we overflow PAGE_SIZE there.

2007-06-18 19:09:28

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

On 18/06/07, Christoph Lameter <[email protected]> wrote:
> On Mon, 18 Jun 2007, Michal Piotrowski wrote:
>
> > Still the same.
>
> Is it still exactly the same strack trace?

Not exactly the same
[<c0480d4b>] list_locations+0x257/0x2ad
is the only difference

l *list_locations+0x257
0xc1080d4b is in list_locations (mm/slub.c:3655).
3650 l->min_pid);
3651
3652 if (num_online_cpus() > 1 && !cpus_empty(l->cpus) &&
3653 n < PAGE_SIZE - n - 60) {
3654 n += sprintf(buf + n, " cpus=");
3655 n += cpulist_scnprintf(buf + n,
PAGE_SIZE - n - 50,
3656 l->cpus);
3657 }
3658
3659 if (num_online_nodes() > 1 && !nodes_empty(l->nodes) &&


> There could be multiple issue
> if we overflow PAGE_SIZE there.

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/

2007-06-18 19:19:21

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

Stupid me. n on both sides of the comparison. Tried to run your script
here but I cannot trigger it.

Next attempt: Sorry for the churn.

SLUB: Fix behavior if the text output of list_locations overflows PAGE_SIZE

If slabs are allocated or freed from a large set of call sites (typical
for the kmalloc area) then we may create more output than fits into
a single PAGE and sysfs only gives us one page. The output should be
truncated. This patch fixes the checks to do the truncation properly.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.22-rc4-mm2/mm/slub.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/mm/slub.c 2007-06-18 12:13:48.000000000 -0700
+++ linux-2.6.22-rc4-mm2/mm/slub.c 2007-06-18 12:15:10.000000000 -0700
@@ -3649,13 +3649,15 @@ static int list_locations(struct kmem_ca
n += sprintf(buf + n, " pid=%ld",
l->min_pid);

- if (num_online_cpus() > 1 && !cpus_empty(l->cpus)) {
+ if (num_online_cpus() > 1 && !cpus_empty(l->cpus) &&
+ n < PAGE_SIZE - 60) {
n += sprintf(buf + n, " cpus=");
n += cpulist_scnprintf(buf + n, PAGE_SIZE - n - 50,
l->cpus);
}

- if (num_online_nodes() > 1 && !nodes_empty(l->nodes)) {
+ if (num_online_nodes() > 1 && !nodes_empty(l->nodes) &&
+ n < PAGE_SIZE - 60) {
n += sprintf(buf + n, " nodes=");
n += nodelist_scnprintf(buf + n, PAGE_SIZE - n - 50,
l->nodes);

2007-06-18 20:43:25

by Michal Piotrowski

[permalink] [raw]
Subject: Re: [patch 00/26] Current slab allocator / SLUB patch queue

On 18/06/07, Christoph Lameter <[email protected]> wrote:
> Stupid me. n on both sides of the comparison. Tried to run your script
> here but I cannot trigger it.
>
> Next attempt: Sorry for the churn.

Problem fixed. Thanks!

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/