2006-10-29 05:58:36

by Martin Bligh

[permalink] [raw]
Subject: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

-git4 was fine. -git5 is broken (on PPC64 blade)

As -rc2-mm2 seemed fine on this box, I'm guessing it's something
that didn't go via Andrew ;-( Looks like it might be something
JFS or slab specific. Bigger PPC64 box with different config
was OK though.

Full log is here: http://test.kernel.org/abat/59046/debug/console.log
Good -git4 run: http://test.kernel.org/abat/58997/debug/console.log

kernel BUG in cache_grow at mm/slab.c:2705!
cpu 0x1: Vector: 700 (Program Check) at [c0000000fffb7710]
pc: c0000000000c8ad4: .cache_grow+0x64/0x4f0
lr: c0000000000c91a8: .cache_alloc_refill+0x248/0x2cc
sp: c0000000fffb7990
msr: 8000000000021032
current = 0xc0000000fffab800
paca = 0xc00000000047e780
pid = 1, comm = swapper
kernel BUG in cache_grow at mm/slab.c:2705!
enter ? for help
[c0000000fffb7a60] c0000000000c91a8 .cache_alloc_refill+0x248/0x2cc
[c0000000fffb7b20] c0000000000c9708 .kmem_cache_alloc_node+0xd0/0x10c
[c0000000fffb7bc0] c0000000000b69cc .__get_vm_area_node+0xcc/0x230
[c0000000fffb7c70] c0000000000b7640 .__vmalloc_node+0x60/0xc0
[c0000000fffb7d10] c0000000001ad4c8 .txInit+0x2a0/0x3a8
[c0000000fffb7e20] c00000000044c1ec .init_jfs_fs+0x78/0x27c
[c0000000fffb7ec0] c0000000000094c0 .init+0x1f4/0x3e4
[c0000000fffb7f90] c000000000027270 .kernel_thread+0x4c/0x68


2006-10-29 06:58:36

by Pekka Enberg

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

Hi,

On 10/29/06, Martin J. Bligh <[email protected]> wrote:
> -git4 was fine. -git5 is broken (on PPC64 blade)
>
> As -rc2-mm2 seemed fine on this box, I'm guessing it's something
> that didn't go via Andrew ;-( Looks like it might be something
> JFS or slab specific. Bigger PPC64 box with different config
> was OK though.
>
> Full log is here: http://test.kernel.org/abat/59046/debug/console.log
> Good -git4 run: http://test.kernel.org/abat/58997/debug/console.log
>
> kernel BUG in cache_grow at mm/slab.c:2705!
> cpu 0x1: Vector: 700 (Program Check) at [c0000000fffb7710]
> pc: c0000000000c8ad4: .cache_grow+0x64/0x4f0
> lr: c0000000000c91a8: .cache_alloc_refill+0x248/0x2cc
> sp: c0000000fffb7990
> msr: 8000000000021032
> current = 0xc0000000fffab800
> paca = 0xc00000000047e780
> pid = 1, comm = swapper
> kernel BUG in cache_grow at mm/slab.c:2705!
> enter ? for help
> [c0000000fffb7a60] c0000000000c91a8 .cache_alloc_refill+0x248/0x2cc
> [c0000000fffb7b20] c0000000000c9708 .kmem_cache_alloc_node+0xd0/0x10c
> [c0000000fffb7bc0] c0000000000b69cc .__get_vm_area_node+0xcc/0x230
> [c0000000fffb7c70] c0000000000b7640 .__vmalloc_node+0x60/0xc0
> [c0000000fffb7d10] c0000000001ad4c8 .txInit+0x2a0/0x3a8
> [c0000000fffb7e20] c00000000044c1ec .init_jfs_fs+0x78/0x27c
> [c0000000fffb7ec0] c0000000000094c0 .init+0x1f4/0x3e4
> [c0000000fffb7f90] c000000000027270 .kernel_thread+0x4c/0x68

I only skimmed through this briefly but it looks like due to
52fd24ca1db3a741f144bbc229beefe044202cac __get_vm_area_node is passing
GFP_HIGHMEM to kmem_cache_alloc_node which is a no-no.

2006-10-29 07:05:24

by Andrew Morton

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

On Sat, 28 Oct 2006 22:57:48 -0700
"Martin J. Bligh" <[email protected]> wrote:

> -git4 was fine. -git5 is broken (on PPC64 blade)
>
> As -rc2-mm2 seemed fine on this box, I'm guessing it's something
> that didn't go via Andrew ;-( Looks like it might be something
> JFS or slab specific. Bigger PPC64 box with different config
> was OK though.
>
> Full log is here: http://test.kernel.org/abat/59046/debug/console.log
> Good -git4 run: http://test.kernel.org/abat/58997/debug/console.log
>
> kernel BUG in cache_grow at mm/slab.c:2705!

This?

--- a/mm/vmalloc.c~__vmalloc_area_node-fix
+++ a/mm/vmalloc.c
@@ -428,7 +428,8 @@ void *__vmalloc_area_node(struct vm_stru
area->nr_pages = nr_pages;
/* Please note that the recursion is strictly bounded. */
if (array_size > PAGE_SIZE) {
- pages = __vmalloc_node(array_size, gfp_mask, PAGE_KERNEL, node);
+ pages = __vmalloc_node(array_size, gfp_mask & ~__GFP_HIGHMEM,
+ PAGE_KERNEL, node);
area->flags |= VM_VPAGES;
} else {
pages = kmalloc_node(array_size,
_

2006-10-29 09:18:13

by Nick Piggin

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

Andrew Morton wrote:
> On Sat, 28 Oct 2006 22:57:48 -0700
> "Martin J. Bligh" <[email protected]> wrote:
>
>
>>-git4 was fine. -git5 is broken (on PPC64 blade)
>>
>>As -rc2-mm2 seemed fine on this box, I'm guessing it's something
>>that didn't go via Andrew ;-( Looks like it might be something
>>JFS or slab specific. Bigger PPC64 box with different config
>>was OK though.
>>
>>Full log is here: http://test.kernel.org/abat/59046/debug/console.log
>>Good -git4 run: http://test.kernel.org/abat/58997/debug/console.log
>>
>>kernel BUG in cache_grow at mm/slab.c:2705!
>
>
> This?
>
> --- a/mm/vmalloc.c~__vmalloc_area_node-fix
> +++ a/mm/vmalloc.c
> @@ -428,7 +428,8 @@ void *__vmalloc_area_node(struct vm_stru
> area->nr_pages = nr_pages;
> /* Please note that the recursion is strictly bounded. */
> if (array_size > PAGE_SIZE) {
> - pages = __vmalloc_node(array_size, gfp_mask, PAGE_KERNEL, node);
> + pages = __vmalloc_node(array_size, gfp_mask & ~__GFP_HIGHMEM,
> + PAGE_KERNEL, node);
> area->flags |= VM_VPAGES;
> } else {
> pages = kmalloc_node(array_size,

Don't you actually *want* the page array to be allocated from highmem? So the
gfp mask here should be just for whether we're allowed to sleep / reclaim (ie
gfp_mask & ~(__GFP_DMA|__GFP_DMA32) | (__GFP_HIGHMEM))?

Slab allocations should be (gfp_mask & ~(__GFP_DMA|__GFP_DMA32|__GFP_HIGHMEM)),
which you could mask in __get_vm_area_node

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-10-29 12:46:57

by Giridhar Pemmasani

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

--- Pekka Enberg <[email protected]> wrote:

> Hi,
>
> On 10/29/06, Martin J. Bligh <[email protected]> wrote:
> > -git4 was fine. -git5 is broken (on PPC64 blade)
> >
> > As -rc2-mm2 seemed fine on this box, I'm guessing it's something
> > that didn't go via Andrew ;-( Looks like it might be something
> > JFS or slab specific. Bigger PPC64 box with different config
> > was OK though
> >
> > Full log is here: http://test.kernel.org/abat/59046/debug/console.log
> > Good -git4 run: http://test.kernel.org/abat/58997/debug/console.log
> >
> > kernel BUG in cache_grow at mm/slab.c:2705!
> > cpu 0x1: Vector: 700 (Program Check) at [c0000000fffb7710]
> > pc: c0000000000c8ad4: .cache_grow+0x64/0x4f0
> > lr: c0000000000c91a8: .cache_alloc_refill+0x248/0x2cc
> > sp: c0000000fffb7990
> > msr: 8000000000021032
> > current = 0xc0000000fffab800
> > paca = 0xc00000000047e780
> > pid = 1, comm = swapper
> > kernel BUG in cache_grow at mm/slab.c:2705!
> > enter ? for help
> > [c0000000fffb7a60] c0000000000c91a8 .cache_alloc_refill+0x248/0x2cc
> > [c0000000fffb7b20] c0000000000c9708 .kmem_cache_alloc_node+0xd0/0x10c
> > [c0000000fffb7bc0] c0000000000b69cc .__get_vm_area_node+0xcc/0x230
> > [c0000000fffb7c70] c0000000000b7640 .__vmalloc_node+0x60/0xc0
> > [c0000000fffb7d10] c0000000001ad4c8 .txInit+0x2a0/0x3a8
> > [c0000000fffb7e20] c00000000044c1ec .init_jfs_fs+0x78/0x27c
> > [c0000000fffb7ec0] c0000000000094c0 .init+0x1f4/0x3e4
> > [c0000000fffb7f90] c000000000027270 .kernel_thread+0x4c/0x68
>
> I only skimmed through this briefly but it looks like due to
> 52fd24ca1db3a741f144bbc229beefe044202cac __get_vm_area_node is passing
> GFP_HIGHMEM to kmem_cache_alloc_node which is a no-no.
>

I haven't been able to reproduce this, although I understand why it happens:
vmalloc allocates memory with

GFP_KERNEL | __GFP_HIGHMEM

and with git5, the same flags are passed down to cache_alloc_refill, causing
the BUG. The following patch against 2.6.19-rc3-git5 (also attached as
attachment, as this mailer may mess up inline copying) should fix it.

Note that when calling kmalloc_node, I am masking off __GFP_HIGHMEM with
GFP_LEVEL_MASK, whereas __vmalloc_area_node does the same with

~(__GFP_HIGHMEM | __GFP_ZERO).

IMHO, using GFP_LEVEL_MASK is preferable, but either should fix this problem.

Signed-off-by: Giridhar Pemmasani ([email protected])

diff -Naur linux-2.6.19-rc3-git5.orig/mm/vmalloc.c
linux-2.6.19-rc3-git5/mm/vmalloc.c
--- linux-2.6.19-rc3-git5.orig/mm/vmalloc.c 2006-10-29 07:26:34.000000000
-0500
+++ linux-2.6.19-rc3-git5/mm/vmalloc.c 2006-10-29 07:28:12.000000000 -0500
@@ -182,7 +182,7 @@
addr = ALIGN(start, align);
size = PAGE_ALIGN(size);

- area = kmalloc_node(sizeof(*area), gfp_mask, node);
+ area = kmalloc_node(sizeof(*area), gfp_mask & GFP_LEVEL_MASK, node);
if (unlikely(!area))
return NULL;



____________________________________________________________________________________
Access over 1 million songs - Yahoo! Music Unlimited
(http://music.yahoo.com/unlimited)


Attachments:
__get_vm_area_node-should-mask-off-gfp-highmem.patch (486.00 B)
16165293-__get_vm_area_node-should-mask-off-gfp-highmem.patch

2006-10-29 15:22:44

by Martin Bligh

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

>> I only skimmed through this briefly but it looks like due to
>> 52fd24ca1db3a741f144bbc229beefe044202cac __get_vm_area_node is passing
>> GFP_HIGHMEM to kmem_cache_alloc_node which is a no-no.
>
> I haven't been able to reproduce this, although I understand why it happens:
> vmalloc allocates memory with
>
> GFP_KERNEL | __GFP_HIGHMEM
>
> and with git5, the same flags are passed down to cache_alloc_refill, causing
> the BUG. The following patch against 2.6.19-rc3-git5 (also attached as
> attachment, as this mailer may mess up inline copying) should fix it.

Thanks for the patch ... but more worrying is how this got broken.
Wasn't the point of having the -mm tree that patches like this went
through it for testing, and we avoid breaking mainline? especially
this late in the -rc cycle.

M.

2006-10-29 15:53:56

by Giridhar Pemmasani

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

--- "Martin J. Bligh" <[email protected]> wrote:

> Thanks for the patch ... but more worrying is how this got broken.
> Wasn't the point of having the -mm tree that patches like this went
> through it for testing, and we avoid breaking mainline? especially
> this late in the -rc cycle.

I don't know how it got into Linus's tree, but the breakage was due to my
earlier patch - sorry.

Giri



____________________________________________________________________________________
Get your email and see which of your friends are online - Right on the New Yahoo.com
(http://www.yahoo.com/preview)

2006-10-29 17:48:10

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

Andrew Morton wrote:
> On Sat, 28 Oct 2006 22:57:48 -0700
> "Martin J. Bligh" <[email protected]> wrote:
>
>> -git4 was fine. -git5 is broken (on PPC64 blade)
>>
>> As -rc2-mm2 seemed fine on this box, I'm guessing it's something
>> that didn't go via Andrew ;-( Looks like it might be something
>> JFS or slab specific. Bigger PPC64 box with different config
>> was OK though.
>>
>> Full log is here: http://test.kernel.org/abat/59046/debug/console.log
>> Good -git4 run: http://test.kernel.org/abat/58997/debug/console.log
>>
>> kernel BUG in cache_grow at mm/slab.c:2705!
>
> This?
>
> --- a/mm/vmalloc.c~__vmalloc_area_node-fix
> +++ a/mm/vmalloc.c
> @@ -428,7 +428,8 @@ void *__vmalloc_area_node(struct vm_stru
> area->nr_pages = nr_pages;
> /* Please note that the recursion is strictly bounded. */
> if (array_size > PAGE_SIZE) {
> - pages = __vmalloc_node(array_size, gfp_mask, PAGE_KERNEL, node);
> + pages = __vmalloc_node(array_size, gfp_mask & ~__GFP_HIGHMEM,
> + PAGE_KERNEL, node);
> area->flags |= VM_VPAGES;
> } else {
> pages = kmalloc_node(array_size,
> _

/me shoves it into the tests... results in a couple of hours.

-apw


2006-10-29 20:53:44

by Giridhar Pemmasani

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

Nick Piggin wrote:

> Andrew Morton wrote:
>> --- a/mm/vmalloc.c~__vmalloc_area_node-fix
>> +++ a/mm/vmalloc.c
>> @@ -428,7 +428,8 @@ void *__vmalloc_area_node(struct vm_stru
>> area->nr_pages = nr_pages;
>> /* Please note that the recursion is strictly bounded. */
>> if (array_size > PAGE_SIZE) {
>> - pages = __vmalloc_node(array_size, gfp_mask, PAGE_KERNEL, node);
>> + pages = __vmalloc_node(array_size, gfp_mask & ~__GFP_HIGHMEM,
>> + PAGE_KERNEL, node);
>> area->flags |= VM_VPAGES;
>> } else {
>> pages = kmalloc_node(array_size,
>
> Don't you actually *want* the page array to be allocated from highmem? So
> the gfp mask here should be just for whether we're allowed to sleep /
> reclaim (ie gfp_mask & ~(__GFP_DMA|__GFP_DMA32) | (__GFP_HIGHMEM))?
>
> Slab allocations should be (gfp_mask &
> ~(__GFP_DMA|__GFP_DMA32|__GFP_HIGHMEM)), which you could mask in
> __get_vm_area_node
>

Since gfp_mask there would also have GFP_ZERO, we need to mask off that too.
How about my earlier suggestion of masking off flags in __get_vm_area_node
with GFP_LEVEL_MASK?

Giri

PS: I am not sure if this mail gets to all recipients in the original
thread - I am not subscribed to lkml and I haven't found a way to reply to
all people and the group.

2006-10-29 23:00:18

by Martin Bligh

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)


>>> kernel BUG in cache_grow at mm/slab.c:2705!
>> This?
>>
>> --- a/mm/vmalloc.c~__vmalloc_area_node-fix
>> +++ a/mm/vmalloc.c
>> @@ -428,7 +428,8 @@ void *__vmalloc_area_node(struct vm_stru
>> area->nr_pages = nr_pages;
>> /* Please note that the recursion is strictly bounded. */
>> if (array_size > PAGE_SIZE) {
>> - pages = __vmalloc_node(array_size, gfp_mask, PAGE_KERNEL, node);
>> + pages = __vmalloc_node(array_size, gfp_mask & ~__GFP_HIGHMEM,
>> + PAGE_KERNEL, node);
>> area->flags |= VM_VPAGES;
>> } else {
>> pages = kmalloc_node(array_size,
>> _
>
> /me shoves it into the tests... results in a couple of hours.

Seems like that doesn't fix it, I'm afraid.

M.

2006-10-30 01:19:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)



On Sun, 29 Oct 2006, Martin J. Bligh wrote:
>
> Seems like that doesn't fix it, I'm afraid.

Does the one in the current -git tree? It's commit
5211e6e6c671f0d4b1e1a1023384d20227d8ee65, as below..

Linus

---
commit 5211e6e6c671f0d4b1e1a1023384d20227d8ee65
Author: Giridhar Pemmasani <[email protected]>
Date: Sun Oct 29 04:46:55 2006 -0800

[PATCH] Fix GFP_HIGHMEM slab panic

As reported by Martin J. Bligh <[email protected]>, we let through some
non-slab bits to slab allocation through __get_vm_area_node when doing a
vmalloc.

I haven't been able to reproduce this, although I understand why it
happens: vmalloc allocates memory with

GFP_KERNEL | __GFP_HIGHMEM

and commit 52fd24ca1db3a741f144bbc229beefe044202cac resulted in the same
flags are passed down to cache_alloc_refill, causing the BUG. The
following patch fixes it.

Note that when calling kmalloc_node, I am masking off __GFP_HIGHMEM with
GFP_LEVEL_MASK, whereas __vmalloc_area_node does the same with

~(__GFP_HIGHMEM | __GFP_ZERO).

IMHO, using GFP_LEVEL_MASK is preferable, but either should fix this
problem.

Signed-off-by: Giridhar Pemmasani ([email protected])
Cc: Martin J. Bligh <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6d381df..46606c1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -182,7 +182,7 @@ static struct vm_struct *__get_vm_area_n
addr = ALIGN(start, align);
size = PAGE_ALIGN(size);

- area = kmalloc_node(sizeof(*area), gfp_mask, node);
+ area = kmalloc_node(sizeof(*area), gfp_mask & GFP_LEVEL_MASK, node);
if (unlikely(!area))
return NULL;

2006-10-30 09:54:24

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

Linus Torvalds wrote:
>
> On Sun, 29 Oct 2006, Martin J. Bligh wrote:
>> Seems like that doesn't fix it, I'm afraid.
>
> Does the one in the current -git tree? It's commit
> 5211e6e6c671f0d4b1e1a1023384d20227d8ee65, as below..
>
> Linus

Submitted that commit, results in a couple of hours.

-apw

>
> ---
> commit 5211e6e6c671f0d4b1e1a1023384d20227d8ee65
> Author: Giridhar Pemmasani <[email protected]>
> Date: Sun Oct 29 04:46:55 2006 -0800
>
> [PATCH] Fix GFP_HIGHMEM slab panic
>
> As reported by Martin J. Bligh <[email protected]>, we let through some
> non-slab bits to slab allocation through __get_vm_area_node when doing a
> vmalloc.
>
> I haven't been able to reproduce this, although I understand why it
> happens: vmalloc allocates memory with
>
> GFP_KERNEL | __GFP_HIGHMEM
>
> and commit 52fd24ca1db3a741f144bbc229beefe044202cac resulted in the same
> flags are passed down to cache_alloc_refill, causing the BUG. The
> following patch fixes it.
>
> Note that when calling kmalloc_node, I am masking off __GFP_HIGHMEM with
> GFP_LEVEL_MASK, whereas __vmalloc_area_node does the same with
>
> ~(__GFP_HIGHMEM | __GFP_ZERO).
>
> IMHO, using GFP_LEVEL_MASK is preferable, but either should fix this
> problem.
>
> Signed-off-by: Giridhar Pemmasani ([email protected])
> Cc: Martin J. Bligh <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 6d381df..46606c1 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -182,7 +182,7 @@ static struct vm_struct *__get_vm_area_n
> addr = ALIGN(start, align);
> size = PAGE_ALIGN(size);
>
> - area = kmalloc_node(sizeof(*area), gfp_mask, node);
> + area = kmalloc_node(sizeof(*area), gfp_mask & GFP_LEVEL_MASK, node);
> if (unlikely(!area))
> return NULL;
>

2006-10-30 15:36:27

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

Linus Torvalds wrote:
>
> On Sun, 29 Oct 2006, Martin J. Bligh wrote:
>> Seems like that doesn't fix it, I'm afraid.
>
> Does the one in the current -git tree? It's commit
> 5211e6e6c671f0d4b1e1a1023384d20227d8ee65, as below..
>
> Linus

Test results are back on the version of the slab panic fix which Linus'
has committed in his tree. This change on top of 2.6.19-rc3-git5 is
good. 2.6.19-rc3-git6 is also showing good on this machine.

-apw

2006-10-30 15:47:20

by Pekka Enberg

[permalink] [raw]
Subject: Re: Re: Slab panic on 2.6.19-rc3-git5 (-git4 was OK)

On 10/30/06, Andy Whitcroft <[email protected]> wrote:
> Test results are back on the version of the slab panic fix which Linus'
> has committed in his tree. This change on top of 2.6.19-rc3-git5 is
> good. 2.6.19-rc3-git6 is also showing good on this machine.

FWIW, the patch looks correct to me also.

Pekka