LinuxLists.cc - [linux-next-20130422] Bug in bootup code or debug code?

[permalink] [raw]

Subject: [linux-next-20130422] Bug in SLAB?

Tetsuo Handa wrote:
> Hello.
>
> linux-next-20130422 does not boot when built with CONFIG_DEBUG_SLAB=y &&
> CONFIG_DEBUG_SPINLOCK=y && CONFIG_DEBUG_PAGEALLOC=y .
>
> It hangs (with CPU#0 spinning) immediately after printing
>
> Decompressing Linux... Parsing ELF... done.
> Booting the kernel.
>
> lines.
>
> Config is at http://I-love.SAKURA.ne.jp/tmp/config-3.9-rc8-next-20130422 .
>
> Any clue before starting bisection?
>
> Regards.
>

Bisection (with a build fix from commit db845067 "slab: Fixup
CONFIG_PAGE_ALLOC/DEBUG_SLAB_LEAK sections") reached commit e3366016
"slab: Use common kmalloc_index/kmalloc_size functions".
Would you have a look at commit e3366016?

By the way, I have a fear regarding commit e3366016.

include/linux/slab_def.h says

extern struct kmem_cache *kmalloc_caches[PAGE_SHIFT + MAX_ORDER];
extern struct kmem_cache *kmalloc_dma_caches[PAGE_SHIFT + MAX_ORDER];

while mm/slab.c says

struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];

and include/linux/slab.h says

#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
(MAX_ORDER + PAGE_SHIFT - 1) : 25)

.

If (MAX_ORDER + PAGE_SHIFT - 1) <= 25 is true, mm/slab.c will allocate

struct kmem_cache *kmalloc_caches[MAX_ORDER + PAGE_SHIFT - 1 + 1];
struct kmem_cache *kmalloc_dma_caches[MAX_ORDER + PAGE_SHIFT - 1 + 1];

which matches the size declared in include/linux/slab_def.h .

If (MAX_ORDER + PAGE_SHIFT - 1) > 25 is true (I don't know whether such case
happens), mm/slab.c will allocate

struct kmem_cache *kmalloc_caches[25 + 1];
struct kmem_cache *kmalloc_dma_caches[25 + 1];

which is smaller than the size declared in include/linux/slab_def.h .

Well, this mismatch seems to be fixed by commit 9425c58e "slab: Common
definition for the array of kmalloc caches". But usage like

for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
struct kmem_cache_node *n;
struct kmem_cache *cache = kmalloc_caches[i];

is remaining.

Also, kmalloc_index() in include/linux/slab.h can return 0 to 26.

If (MAX_ORDER + PAGE_SHIFT - 1) > 25 is true and
kmalloc_index(64 * 1024 * 1024) is requested (I don't know whether such case
happens), kmalloc_caches[26] is beyond the array, for kmalloc_caches[26]
allows 0 to 25.

If (MAX_ORDER + PAGE_SHIFT - 1) <= 25 is true and
kmalloc_index(64 * 1024 * 1024) is requested (I don't know whether such case
happens), kmalloc_caches[26] is beyond the array, for
kmalloc_caches[MAX_ORDER + PAGE_SHIFT] allows 0 to MAX_ORDER + PAGE_SHIFT - 1.

Would you recheck that the array size is correct?

Regards.

2013-04-29 02:40:47

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Tetsuo Handa wrote:
> Also, kmalloc_index() in include/linux/slab.h can return 0 to 26.
>
> If (MAX_ORDER + PAGE_SHIFT - 1) > 25 is true and
> kmalloc_index(64 * 1024 * 1024) is requested (I don't know whether such case
> happens), kmalloc_caches[26] is beyond the array, for kmalloc_caches[26]
> allows 0 to 25.
>
> If (MAX_ORDER + PAGE_SHIFT - 1) <= 25 is true and
> kmalloc_index(64 * 1024 * 1024) is requested (I don't know whether such case
> happens), kmalloc_caches[26] is beyond the array, for
> kmalloc_caches[MAX_ORDER + PAGE_SHIFT] allows 0 to MAX_ORDER + PAGE_SHIFT - 1.
>
> Would you recheck that the array size is correct?
>

I confirmed (on x86_32) that

volatile unsigned int size = 8 * 1024 * 1024;
kmalloc(size, GFP_KERNEL);

causes no warning at compile time and returns NULL at runtime. But

unsigned int size = 8 * 1024 * 1024;
kmalloc(size, GFP_KERNEL);

causes compile time warning

include/linux/slab_def.h:136: warning: array subscript is above array bounds

and runtime bug.

BUG: unable to handle kernel NULL pointer dereference at 00000058
IP: [<c10b9d76>] kmem_cache_alloc+0x26/0xb0

I confirmed (on x86_32) that

kmalloc(64 * 1024 * 1024, GFP_KERNEL);

causes compile time warning

include/linux/slab_def.h:136: warning: array subscript is above array bounds

and runtime bug.

Kernel BUG at c10b9c5b [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP

Also,

volatile unsigned int size = 64 * 1024 * 1024;
kmalloc(size, GFP_KERNEL);

causes no warning at compile time but runtime bug.

Kernel BUG at c10b9c5b [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP

There are kernel modules which expect kmalloc() to return NULL rather than
oops when the requested size is too large.

2013-04-29 02:57:26

by Jianyu Zhan

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Thu, Apr 25, 2013 at 8:20 PM, Tetsuo Handa
<[email protected]> wrote:
> Bisection (with a build fix from commit db845067 "slab: Fixup
> CONFIG_PAGE_ALLOC/DEBUG_SLAB_LEAK sections") reached commit e3366016
> "slab: Use common kmalloc_index/kmalloc_size functions".
> Would you have a look at commit e3366016?

Cc: [email protected]

--

Regards,
Zhan Jianyu

2013-04-29 10:12:54

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Mon, Apr 29, 2013 at 5:40 AM, Tetsuo Handa
<[email protected]> wrote:
> Tetsuo Handa wrote:
>> Also, kmalloc_index() in include/linux/slab.h can return 0 to 26.
>>
>> If (MAX_ORDER + PAGE_SHIFT - 1) > 25 is true and
>> kmalloc_index(64 * 1024 * 1024) is requested (I don't know whether such case
>> happens), kmalloc_caches[26] is beyond the array, for kmalloc_caches[26]
>> allows 0 to 25.
>>
>> If (MAX_ORDER + PAGE_SHIFT - 1) <= 25 is true and
>> kmalloc_index(64 * 1024 * 1024) is requested (I don't know whether such case
>> happens), kmalloc_caches[26] is beyond the array, for
>> kmalloc_caches[MAX_ORDER + PAGE_SHIFT] allows 0 to MAX_ORDER + PAGE_SHIFT - 1.
>>
>> Would you recheck that the array size is correct?
>>
>
> I confirmed (on x86_32) that
>
> volatile unsigned int size = 8 * 1024 * 1024;
> kmalloc(size, GFP_KERNEL);
>
> causes no warning at compile time and returns NULL at runtime. But
>
> unsigned int size = 8 * 1024 * 1024;
> kmalloc(size, GFP_KERNEL);
>
> causes compile time warning
>
> include/linux/slab_def.h:136: warning: array subscript is above array bounds
>
> and runtime bug.
>
> BUG: unable to handle kernel NULL pointer dereference at 00000058
> IP: [<c10b9d76>] kmem_cache_alloc+0x26/0xb0
>
> I confirmed (on x86_32) that
>
> kmalloc(64 * 1024 * 1024, GFP_KERNEL);
>
> causes compile time warning
>
> include/linux/slab_def.h:136: warning: array subscript is above array bounds
>
> and runtime bug.
>
> Kernel BUG at c10b9c5b [verbose debug info unavailable]
> invalid opcode: 0000 [#1] SMP
>
> Also,
>
> volatile unsigned int size = 64 * 1024 * 1024;
> kmalloc(size, GFP_KERNEL);
>
> causes no warning at compile time but runtime bug.
>
> Kernel BUG at c10b9c5b [verbose debug info unavailable]
> invalid opcode: 0000 [#1] SMP
>
> There are kernel modules which expect kmalloc() to return NULL rather than
> oops when the requested size is too large.

Christoph, Glauber, it seems like commit e3366016 ("slab: Use common
kmalloc_index/kmalloc_size functions") is causing some problems here.
Can you please take a look?

Pekka

2013-04-29 14:43:48

by Glauber Costa

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On 04/29/2013 02:12 PM, Pekka Enberg wrote:
> On Mon, Apr 29, 2013 at 5:40 AM, Tetsuo Handa
> <[email protected]> wrote:
>> Tetsuo Handa wrote:
>>> Also, kmalloc_index() in include/linux/slab.h can return 0 to 26.
>>>
>>> If (MAX_ORDER + PAGE_SHIFT - 1) > 25 is true and
>>> kmalloc_index(64 * 1024 * 1024) is requested (I don't know whether such case
>>> happens), kmalloc_caches[26] is beyond the array, for kmalloc_caches[26]
>>> allows 0 to 25.
>>>
>>> If (MAX_ORDER + PAGE_SHIFT - 1) <= 25 is true and
>>> kmalloc_index(64 * 1024 * 1024) is requested (I don't know whether such case
>>> happens), kmalloc_caches[26] is beyond the array, for
>>> kmalloc_caches[MAX_ORDER + PAGE_SHIFT] allows 0 to MAX_ORDER + PAGE_SHIFT - 1.
>>>
>>> Would you recheck that the array size is correct?
>>>
>>
>> I confirmed (on x86_32) that
>>
>> volatile unsigned int size = 8 * 1024 * 1024;
>> kmalloc(size, GFP_KERNEL);
>>
>> causes no warning at compile time and returns NULL at runtime. But
>>
>> unsigned int size = 8 * 1024 * 1024;
>> kmalloc(size, GFP_KERNEL);
>>
>> causes compile time warning
>>
>> include/linux/slab_def.h:136: warning: array subscript is above array bounds
>>
>> and runtime bug.
>>
>> BUG: unable to handle kernel NULL pointer dereference at 00000058
>> IP: [<c10b9d76>] kmem_cache_alloc+0x26/0xb0
>>
>> I confirmed (on x86_32) that
>>
>> kmalloc(64 * 1024 * 1024, GFP_KERNEL);
>>
>> causes compile time warning
>>
>> include/linux/slab_def.h:136: warning: array subscript is above array bounds
>>
>> and runtime bug.
>>
>> Kernel BUG at c10b9c5b [verbose debug info unavailable]
>> invalid opcode: 0000 [#1] SMP
>>
>> Also,
>>
>> volatile unsigned int size = 64 * 1024 * 1024;
>> kmalloc(size, GFP_KERNEL);
>>
>> causes no warning at compile time but runtime bug.
>>
>> Kernel BUG at c10b9c5b [verbose debug info unavailable]
>> invalid opcode: 0000 [#1] SMP
>>
>> There are kernel modules which expect kmalloc() to return NULL rather than
>> oops when the requested size is too large.
>
> Christoph, Glauber, it seems like commit e3366016 ("slab: Use common
> kmalloc_index/kmalloc_size functions") is causing some problems here.
> Can you please take a look?
>
> Pekka
>
I believe this is because the code now always assume that the cache is
found when a constant is passed. Before this patch, we had a "found"
statement that was mistakenly removed.

If I am right, the following (untested) patch should solve the problem.

Attachments:

0001-slab-fix-kmalloc-regression-with-big-constant-alloca.patch (1.51 kB)

2013-04-29 15:12:00

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Mon, 29 Apr 2013, Glauber Costa wrote:

> >> causes no warning at compile time and returns NULL at runtime. But
> >>
> >> unsigned int size = 8 * 1024 * 1024;
> >> kmalloc(size, GFP_KERNEL);
> >>
> >> causes compile time warning
> >>
> >> include/linux/slab_def.h:136: warning: array subscript is above array bounds
> >>
> >> and runtime bug.

SLAB should have support up to 2 << 25 = 1 mb << 5 = 32M

> I believe this is because the code now always assume that the cache is
> found when a constant is passed. Before this patch, we had a "found"
> statement that was mistakenly removed.

The code in kmalloc_index() creates a BUG() and preferentially should
create a compile time failure when a number that is too big is passed to it.

What is MAX_ORDER on the architecture?

An allocation size of more than MAX_ORDER is not supported by the page
allocator or by slab. It is safe to return NULL in that case.

2013-04-29 15:15:57

by Glauber Costa

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On 04/29/2013 06:59 PM, Christoph Lameter wrote:
> The code in kmalloc_index() creates a BUG() and preferentially should
> create a compile time failure when a number that is too big is passed to it.
>
> What is MAX_ORDER on the architecture?

Returning NULL is fine, but kmalloc_index currently does not check for
MAX_ORDER at all.

2013-04-29 15:28:28

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Glauber Costa wrote:
> If I am right, the following (untested) patch should solve the problem.

This patch did not help;

kmalloc(8 * 1024 * 1024, GFP_KERNEL)

still causes both

include/linux/slab_def.h:136: warning: array subscript is above array bounds

and

BUG: unable to handle kernel NULL pointer dereference at 00000058
IP: [<c10b9d76>] kmem_cache_alloc+0x26/0xb0

.

Christoph Lameter wrote:
> What is MAX_ORDER on the architecture?

In my environment (x86_32), the constants are

MAX_ORDER=11 PAGE_SHIFT=12 KMALLOC_SHIFT_HIGH=22 KMALLOC_MAX_SIZE=4194304

2013-04-29 15:46:15

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Mon, 29 Apr 2013, Glauber Costa wrote:

> On 04/29/2013 06:59 PM, Christoph Lameter wrote:
> > The code in kmalloc_index() creates a BUG() and preferentially should
> > create a compile time failure when a number that is too big is passed to it.
> >
> > What is MAX_ORDER on the architecture?
>
> Returning NULL is fine, but kmalloc_index currently does not check for
> MAX_ORDER at all.

Could easily add that and BUG() on it?

2013-04-29 15:53:46

by Glauber Costa

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On 04/29/2013 07:46 PM, Christoph Lameter wrote:
> On Mon, 29 Apr 2013, Glauber Costa wrote:
>
>> On 04/29/2013 06:59 PM, Christoph Lameter wrote:
>>> The code in kmalloc_index() creates a BUG() and preferentially should
>>> create a compile time failure when a number that is too big is passed to it.
>>>
>>> What is MAX_ORDER on the architecture?
>>
>> Returning NULL is fine, but kmalloc_index currently does not check for
>> MAX_ORDER at all.
>
> Could easily add that and BUG() on it?
>
We could, but now I am intrigued by the fact that the patch I sent does
not fix Tetsuo's problem. 8M should be well within MAX_ORDER

2013-04-29 16:16:31

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Tetsuo Handa wrote:
> Glauber Costa wrote:
> > If I am right, the following (untested) patch should solve the problem.
>
> This patch did not help;
>
> kmalloc(8 * 1024 * 1024, GFP_KERNEL)
>
> still causes both
>
> include/linux/slab_def.h:136: warning: array subscript is above array bounds
>
> and
>
> BUG: unable to handle kernel NULL pointer dereference at 00000058
> IP: [<c10b9d76>] kmem_cache_alloc+0x26/0xb0
>
> .

I copied this patch (which modifies "static __always_inline void *kmalloc_node
(size_t size, gfp_t flags, int node)") to "static __always_inline void *kmalloc
(size_t size, gfp_t flags)", but it didn't help.

-------- test cases --------
volatile unsigned int size = 0;
void *ptr = kmalloc(size, GFP_KERNEL);
printk("kmalloc(0)=%p\n", ptr);
kfree(ptr);
for (size = 1; size <= 256 * 1024 * 1024; size *= 2) {
ptr = kmalloc(size, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", size, ptr);
kfree(ptr);
}
-------- test cases --------

kmalloc(0)=00000010
kmalloc(1)=de7eb840
kmalloc(2)=de7eb840
kmalloc(4)=de7eb840
kmalloc(8)=de7eb840
kmalloc(16)=de7eb840
kmalloc(32)=de7eb840
kmalloc(64)=de28ae40
kmalloc(128)=de5ba140
kmalloc(256)=de69e180
kmalloc(512)=dea14600
kmalloc(1024)=de522400
kmalloc(2048)=de1e4000
kmalloc(4096)=de9b3000
kmalloc(8192)=de24a000
kmalloc(16384)=de444000
kmalloc(32768)=de9b8000
kmalloc(65536)=dea20000
kmalloc(131072)=de980000
kmalloc(262144)=deb00000
kmalloc(524288)=dea80000
kmalloc(1048576)=de800000
kmalloc(2097152)=dde00000
kmalloc(4194304)=d5800000
kmalloc(8388608)= (null)
kmalloc(16777216)= (null)

Got BUG() at 32 * 1024 * 1024 bytes.

Kernel BUG at c10b9c9b [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP

There still seems to be a bug in the non-constant case.

> Christoph Lameter wrote:
> > What is MAX_ORDER on the architecture?
>
> In my environment (x86_32), the constants are
>
> MAX_ORDER=11 PAGE_SHIFT=12 KMALLOC_SHIFT_HIGH=22 KMALLOC_MAX_SIZE=4194304
>

I don't know if any, but on an architecture with PAGE_SHIFT + MAX_ORDER > 26,

static void init_node_lock_keys(int q)
{
int i;

if (slab_state < UP)
return;

for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
struct kmem_cache_node *n;
struct kmem_cache *cache = kmalloc_caches[i];

looks like out of bounds access due to

#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
(MAX_ORDER + PAGE_SHIFT - 1) : 25)

and

struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];

.

2013-04-29 17:48:45

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, 30 Apr 2013, Tetsuo Handa wrote:

> Glauber Costa wrote:
> > If I am right, the following (untested) patch should solve the problem.
>
> This patch did not help;
>
> kmalloc(8 * 1024 * 1024, GFP_KERNEL)
>
> still causes both
>
> include/linux/slab_def.h:136: warning: array subscript is above array bounds
>
> and
>
> BUG: unable to handle kernel NULL pointer dereference at 00000058
> IP: [<c10b9d76>] kmem_cache_alloc+0x26/0xb0
>
> .
>
> Christoph Lameter wrote:
> > What is MAX_ORDER on the architecture?
>
> In my environment (x86_32), the constants are
>
> MAX_ORDER=11 PAGE_SHIFT=12 KMALLOC_SHIFT_HIGH=22 KMALLOC_MAX_SIZE=4194304
>

Ok so the maximum allocation is 11+12=23 which is 8M. KMALLOC_MAX_SIZE
amd KMALLOC_SHIFT_HIGH are wrong.

Take the -1 off the constants under #ifdef CONFIG_SLAB in

include/linux/slab.h
Index: linux/include/linux/slab.h
===================================================================
--- linux.orig/include/linux/slab.h 2013-04-29 12:44:42.339011800 -0500
+++ linux/include/linux/slab.h 2013-04-29 12:48:11.446435859 -0500
@@ -176,8 +176,8 @@ struct kmem_cache {
* to do various tricks to work around compiler limitations in order to
* ensure proper constant folding.
*/
-#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
- (MAX_ORDER + PAGE_SHIFT - 1) : 25)
+#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT) <= 26 ? \
+ (MAX_ORDER + PAGE_SHIFT) : 26)
#define KMALLOC_SHIFT_MAX KMALLOC_SHIFT_HIGH
#define KMALLOC_SHIFT_LOW 5
#else
@@ -206,9 +206,9 @@ struct kmem_cache {
#define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW)
#endif

-extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH];
#ifdef CONFIG_ZONE_DMA
-extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH];
#endif

/*

2013-04-29 21:46:16

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Christoph Lameter wrote:
> Ok so the maximum allocation is 11+12=23 which is 8M. KMALLOC_MAX_SIZE
> amd KMALLOC_SHIFT_HIGH are wrong.
>
> Take the -1 off the constants under #ifdef CONFIG_SLAB in

Current diff is:

---------- patch start ----------
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c62175..889d6ef 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -189,8 +189,8 @@ struct kmem_cache {
* to do various tricks to work around compiler limitations in order to
* ensure proper constant folding.
*/
-#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
- (MAX_ORDER + PAGE_SHIFT - 1) : 25)
+#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT) <= 26 ? \
+ (MAX_ORDER + PAGE_SHIFT) : 26)
#define KMALLOC_SHIFT_MAX KMALLOC_SHIFT_HIGH
#ifndef KMALLOC_SHIFT_LOW
#define KMALLOC_SHIFT_LOW 5
@@ -221,9 +221,9 @@ struct kmem_cache {
#define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW)
#endif

-extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH];
#ifdef CONFIG_ZONE_DMA
-extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH];
#endif

/*
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 113ec08..be1446a 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -126,6 +126,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
if (!size)
return ZERO_SIZE_PTR;

+ if (size > KMALLOC_MAX_SIZE)
+ goto not_found;
+
i = kmalloc_index(size);

#ifdef CONFIG_ZONE_DMA
@@ -139,6 +142,7 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)

return ret;
}
+not_found:
return __kmalloc(size, flags);
}

@@ -172,8 +176,10 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
if (!size)
return ZERO_SIZE_PTR;

- i = kmalloc_index(size);
+ if (size > KMALLOC_MAX_SIZE)
+ goto not_found;

+ i = kmalloc_index(size);
#ifdef CONFIG_ZONE_DMA
if (flags & GFP_DMA)
cachep = kmalloc_dma_caches[i];
@@ -183,6 +189,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)

return kmem_cache_alloc_node_trace(cachep, flags, node, size);
}
+not_found:
return __kmalloc_node(size, flags, node);
}

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2f0e7d5..083e7c7 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -319,11 +319,11 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
return s;
}

-struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH];
EXPORT_SYMBOL(kmalloc_caches);

#ifdef CONFIG_ZONE_DMA
-struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
+struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH];
EXPORT_SYMBOL(kmalloc_dma_caches);
#endif

---------- patch end ----------

Current testcases are:

---------- testcases start ----------
volatile unsigned int size;
void *ptr;
ptr = kmalloc(0, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 0, ptr);
kfree(ptr);
ptr = kmalloc(1, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 1, ptr);
kfree(ptr);
ptr = kmalloc(2, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 2, ptr);
kfree(ptr);
ptr = kmalloc(4, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 4, ptr);
kfree(ptr);
ptr = kmalloc(8, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 8, ptr);
kfree(ptr);
ptr = kmalloc(16, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 16, ptr);
kfree(ptr);
ptr = kmalloc(32, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 32, ptr);
kfree(ptr);
ptr = kmalloc(64, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 64, ptr);
kfree(ptr);
ptr = kmalloc(128, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 128, ptr);
kfree(ptr);
ptr = kmalloc(256, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 256, ptr);
kfree(ptr);
ptr = kmalloc(512, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 512, ptr);
kfree(ptr);
ptr = kmalloc(1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 1024, ptr);
kfree(ptr);
ptr = kmalloc(2 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 2 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(4 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 4 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(8 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 8 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(16 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 16 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(32 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 32 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(64 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 64 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(128 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 128 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(256 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 256 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(512 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 512 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(2 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 2 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(4 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 4 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(8 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 8 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(16 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 16 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(32 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 32 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(64 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 64 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(128 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 128 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(256 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 256 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(512 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 512 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(1024 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 1024 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(2 * 1024 * 1024 * 1024, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 2 * 1024 * 1024 * 1024, ptr);
kfree(ptr);
ptr = kmalloc(2 * 1024 * 1024 * 1024 + 1, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", 2 * 1024 * 1024 * 1024 + 1, ptr);
kfree(ptr);
for (size = 0; size; size <<= 1) {
ptr = kmalloc(size, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", size, ptr);
kfree(ptr);
if (!size)
size = 1;
}
for (size = 1; size; size <<= 1) {
ptr = kmalloc(size + 1, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", size + 1, ptr);
kfree(ptr);
}
---------- testcases end ----------

The testcases still trigger BUG() at 32M:

---------- dmesg start ----------
(...snipped...)
kmalloc(2097152)=dde00000
kmalloc(4194304)=d9c00000
------------[ cut here ]------------
WARNING: at mm/page_alloc.c:2410 __alloc_pages_nodemask+0x179/0x650()
(...snipped...)
---[ end trace c08f36179e2d8ff2 ]---
SLAB: Unable to allocate memory on node 0 (gfp=0xd0)
cache: kmalloc-8388608, object size: 8388608, order: 11
node 0: slabs: 0/0, objs: 0/0, free: 0
kmalloc(8388608)= (null)
kmalloc(16777216)= (null)
------------[ cut here ]------------
Kernel BUG at c10b9c9b [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP
(...snipped...)
---------- dmesg end ----------

This means that redirecting to !__builtin_constant_p(size) case does not help.

2013-04-30 14:32:52

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, 30 Apr 2013, Tetsuo Handa wrote:

> The testcases still trigger BUG() at 32M:

I thought we established that MAX_ORDER only allows a maximum of 8M sized
allocations? Why are you trying 32M?

The BUG() should trigger for these allocations.

2013-04-30 14:41:39

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, 30 Apr 2013, Tetsuo Handa wrote:

> Current diff is:

[off by one stuff okay]

> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 113ec08..be1446a 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -126,6 +126,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
> if (!size)
> return ZERO_SIZE_PTR;
>
> + if (size > KMALLOC_MAX_SIZE)
> + goto not_found;
> +
> i = kmalloc_index(size);

Why is this needed? kmalloc_index should BUG() for too large allocs.

2013-04-30 15:28:59

[permalink] [raw]

Subject: Fix off by one error in slab.h

Subject: Fix off by one error in slab.h

We ran into some strange issues as a result of an off by one isse in slab.h

The root of the issue is the treatment of KMALLOC_SHIFT_HIGH that is confusing.

Make KMALLOC_SHIFT_HIGH the first unsupported size instead of the last supported.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux/include/linux/slab.h
===================================================================
--- linux.orig/include/linux/slab.h 2013-04-30 09:54:23.636533564 -0500
+++ linux/include/linux/slab.h 2013-04-30 10:10:35.676932866 -0500
@@ -176,8 +176,8 @@ struct kmem_cache {
* to do various tricks to work around compiler limitations in order to
* ensure proper constant folding.
*/
-#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
- (MAX_ORDER + PAGE_SHIFT - 1) : 25)
+#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT) <= 26 ? \
+ (MAX_ORDER + PAGE_SHIFT) : 26)
#define KMALLOC_SHIFT_MAX KMALLOC_SHIFT_HIGH
#define KMALLOC_SHIFT_LOW 5
#else
@@ -185,7 +185,7 @@ struct kmem_cache {
* SLUB allocates up to order 2 pages directly and otherwise
* passes the request to the page allocator.
*/
-#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1)
+#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 2)
#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT)
#define KMALLOC_SHIFT_LOW 3
#endif
@@ -193,7 +193,7 @@ struct kmem_cache {
/* Maximum allocatable size */
#define KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_MAX)
/* Maximum size for which we actually use a slab cache */
-#define KMALLOC_MAX_CACHE_SIZE (1UL << KMALLOC_SHIFT_HIGH)
+#define KMALLOC_MAX_CACHE_SIZE ((1UL << (KMALLOC_SHIFT_HIGH -1)))
/* Maximum order allocatable via the slab allocagtor */
#define KMALLOC_MAX_ORDER (KMALLOC_SHIFT_MAX - PAGE_SHIFT)

@@ -206,9 +206,9 @@ struct kmem_cache {
#define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW)
#endif

-extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH];
#ifdef CONFIG_ZONE_DMA
-extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
+extern struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH];
#endif

/*
Index: linux/mm/slab_common.c
===================================================================
--- linux.orig/mm/slab_common.c 2013-04-30 09:54:23.636533564 -0500
+++ linux/mm/slab_common.c 2013-04-30 09:54:53.693039252 -0500
@@ -319,11 +319,11 @@ struct kmem_cache *__init create_kmalloc
return s;
}

-struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
+struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH];
EXPORT_SYMBOL(kmalloc_caches);

#ifdef CONFIG_ZONE_DMA
-struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH + 1];
+struct kmem_cache *kmalloc_dma_caches[KMALLOC_SHIFT_HIGH];
EXPORT_SYMBOL(kmalloc_dma_caches);
#endif

@@ -446,7 +446,7 @@ void __init create_kmalloc_caches(unsign
if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2])
kmalloc_caches[2] = create_kmalloc_cache(NULL, 192, flags);

- for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++)
+ for (i = KMALLOC_SHIFT_LOW; i < KMALLOC_SHIFT_HIGH; i++)
if (!kmalloc_caches[i])
kmalloc_caches[i] = create_kmalloc_cache(NULL,
1 << i, flags);
@@ -454,7 +454,7 @@ void __init create_kmalloc_caches(unsign
/* Kmalloc array is now usable */
slab_state = UP;

- for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
+ for (i = 0; i < KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache *s = kmalloc_caches[i];
char *n;

@@ -467,7 +467,7 @@ void __init create_kmalloc_caches(unsign
}

#ifdef CONFIG_ZONE_DMA
- for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
+ for (i = 0; i < KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache *s = kmalloc_caches[i];

if (s) {

2013-04-30 16:02:14

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Christoph Lameter wrote:
> On Tue, 30 Apr 2013, Tetsuo Handa wrote:
>
> > The testcases still trigger BUG() at 32M:
>
> I thought we established that MAX_ORDER only allows a maximum of 8M sized
> allocations? Why are you trying 32M?

Only for regression testing. At least until Linux 3.9, requesting too large
size didn't trigger oops, did it? I'm not expecting kmalloc() to trigger oops
for Linux 3.10 and future kernels.

>
> The BUG() should trigger for these allocations.
>

"kmalloc() returning NULL for these allocations" is needed by "try kmalloc()
first, fallback to vmalloc()" allocation. There are kernel modules which expect
kmalloc() to return NULL rather than oops when the requested size is larger
than KMALLOC_MAX_SIZE bytes. If kmalloc() suddenly starts triggering oops, such
modules will break.

Anyway, there is a regression we want to fix : we won't be able to boot
Linux 3.10-rc1 for x86_32 built with CONFIG_DEBUG_SLAB=y &&
CONFIG_DEBUG_SPINLOCK=y && CONFIG_DEBUG_PAGEALLOC=y .
("Fix off by one error in slab.h" did not fix the regression.)

Regards.

2013-04-30 17:27:57

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Wed, 1 May 2013, Tetsuo Handa wrote:

> Christoph Lameter wrote:
> > On Tue, 30 Apr 2013, Tetsuo Handa wrote:
> >
> > > The testcases still trigger BUG() at 32M:
> >
> > I thought we established that MAX_ORDER only allows a maximum of 8M sized
> > allocations? Why are you trying 32M?
>
> Only for regression testing. At least until Linux 3.9, requesting too large
> size didn't trigger oops, did it? I'm not expecting kmalloc() to trigger oops
> for Linux 3.10 and future kernels.

It did for SLUB. SLAB returned NULL for some cases.

> "kmalloc() returning NULL for these allocations" is needed by "try kmalloc()
> first, fallback to vmalloc()" allocation. There are kernel modules which expect
> kmalloc() to return NULL rather than oops when the requested size is larger
> than KMALLOC_MAX_SIZE bytes. If kmalloc() suddenly starts triggering oops, such
> modules will break.

This behavior has been in there for years. Why try a kmalloc that
always fails since the size is too big?

> Anyway, there is a regression we want to fix : we won't be able to boot
> Linux 3.10-rc1 for x86_32 built with CONFIG_DEBUG_SLAB=y &&
> CONFIG_DEBUG_SPINLOCK=y && CONFIG_DEBUG_PAGEALLOC=y .
> ("Fix off by one error in slab.h" did not fix the regression.)

Hmm... Where does this fail? In slab?

2013-05-01 08:04:02

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On 4/30/13 5:34 PM, Christoph Lameter wrote:
> On Tue, 30 Apr 2013, Tetsuo Handa wrote:
>
>> Current diff is:
>
> [off by one stuff okay]
>
>> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
>> index 113ec08..be1446a 100644
>> --- a/include/linux/slab_def.h
>> +++ b/include/linux/slab_def.h
>> @@ -126,6 +126,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
>> if (!size)
>> return ZERO_SIZE_PTR;
>>
>> + if (size > KMALLOC_MAX_SIZE)
>> + goto not_found;
>> +
>> i = kmalloc_index(size);
>
> Why is this needed? kmalloc_index should BUG() for too large allocs.

Why is that? Historically it has returned NULL, hasn't it? We have had
cases where kernel code (naively) uses size directly from userspace and
we definitely don't want to BUG_ON on it.

Pekka

2013-05-01 08:05:34

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On 4/30/13 8:27 PM, Christoph Lameter wrote:
>> "kmalloc() returning NULL for these allocations" is needed by "try kmalloc()
>> first, fallback to vmalloc()" allocation. There are kernel modules which expect
>> kmalloc() to return NULL rather than oops when the requested size is larger
>> than KMALLOC_MAX_SIZE bytes. If kmalloc() suddenly starts triggering oops, such
>> modules will break.
>
> This behavior has been in there for years. Why try a kmalloc that
> always fails since the size is too big?

...because want the extra protection for cases where size is controlled
by userspace. This is consistent with kcalloc() that returns NULL on
integer overflow.

Pekka

2013-05-01 12:14:54

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Christoph Lameter wrote:
> > "kmalloc() returning NULL for these allocations" is needed by "try kmalloc()
> > first, fallback to vmalloc()" allocation. There are kernel modules which expect
> > kmalloc() to return NULL rather than oops when the requested size is larger
> > than KMALLOC_MAX_SIZE bytes. If kmalloc() suddenly starts triggering oops, such
> > modules will break.
>
> This behavior has been in there for years. Why try a kmalloc that
> always fails since the size is too big?
>

This is nothing but a testcase. Size argument is sometimes unknown and/or
user-controlled. We expect that not only kmalloc() etc. but also kstrdup(),
kmemdup(), krealloc() etc. do not trigger oops. I think that checking the size
in SLAB/SLUB is the only safe way.

> > Anyway, there is a regression we want to fix : we won't be able to boot
> > Linux 3.10-rc1 for x86_32 built with CONFIG_DEBUG_SLAB=y &&
> > CONFIG_DEBUG_SPINLOCK=y && CONFIG_DEBUG_PAGEALLOC=y .
> > ("Fix off by one error in slab.h" did not fix the regression.)
>
> Hmm... Where does this fail? In slab?
>
It hangs (with CPU#0 spinning) immediately after printing

Decompressing Linux... Parsing ELF... done.
Booting the kernel.

lines. Today I heard that gdb can be used if I use qemu, but I doubt that I can
manage time to understand and find the exact location within a few days.

The culprit location is possibly in SLAB because the kernel boots if built with
CONFIG_DEBUG_SLAB=n || CONFIG_DEBUG_SPINLOCK=n || CONFIG_DEBUG_PAGEALLOC=n.

2013-05-02 11:51:44

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Tetsuo Handa wrote:
> > Hmm... Where does this fail? In slab?
> >
> It hangs (with CPU#0 spinning) immediately after printing
>
> Decompressing Linux... Parsing ELF... done.
> Booting the kernel.
>
> lines. Today I heard that gdb can be used if I use qemu, but I doubt that I can
> manage time to understand and find the exact location within a few days.
>
> The culprit location is possibly in SLAB because the kernel boots if built with
> CONFIG_DEBUG_SLAB=n || CONFIG_DEBUG_SPINLOCK=n || CONFIG_DEBUG_PAGEALLOC=n.
>
It turned out that cachep->slabp_cache == NULL is the cause of boot failure.
cachep->slabp_cache == NULL is caused by kmalloc_caches[5] == NULL. Any clue?

int __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
{
(...snipped...)
if (flags & CFLGS_OFF_SLAB) {
cachep->slabp_cache = kmalloc_slab(slab_size, 0u);
/*
* This is a possibility for one of the malloc_sizes caches.
* But since we go off slab only for object size greater than
* PAGE_SIZE/8, and malloc_sizes gets created in ascending order,
* this should not happen at all.
* But leave a BUG_ON for some lucky dude.
*/
BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
}
(...snipped...)
}

(gdb) b __find_general_cachep
Breakpoint 1 at 0xc10b6030: file mm/slab.c, line 679.
(gdb) c
Continuing.

Breakpoint 1, __find_general_cachep (size=32, gfpflags=0) at mm/slab.c:679
679 BUG_ON(kmalloc_caches[INDEX_AC] == NULL);
(gdb) s
671 {
(gdb)
679 BUG_ON(kmalloc_caches[INDEX_AC] == NULL);
(gdb)
681 if (!size)
(gdb)
684 i = kmalloc_index(size);
(gdb)
kmalloc_index (size=32, gfpflags=0) at include/linux/slab.h:208
208 if (size <= KMALLOC_MIN_SIZE)
(gdb)
__find_general_cachep (size=32, gfpflags=0) at mm/slab.c:692
692 if (unlikely(gfpflags & GFP_DMA))
(gdb)
695 return kmalloc_caches[i];
(gdb) print i
$1 = 5
(gdb) print kmalloc_caches[i]
$2 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[0]
$3 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[1]
$4 = (struct kmem_cache *) 0xf6800120
(gdb) print kmalloc_caches[2]
$5 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[3]
$6 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[4]
$7 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[5]
$8 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[6]
$9 = (struct kmem_cache *) 0xf6800080
(gdb) print kmalloc_caches[7]
$10 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[8]
$11 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[9]
$12 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[10]
$13 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[11]
$14 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[12]
$15 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[13]
$16 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[14]
$17 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[15]
$18 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[16]
$19 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[17]
$20 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[18]
$21 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[19]
$22 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[20]
$23 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[21]
$24 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[22]
$25 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[23]
$26 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[24]
$27 = (struct kmem_cache *) 0x0
(gdb) print kmalloc_caches[25]
$28 = (struct kmem_cache *) 0xf68001c0
(gdb) print kmalloc_caches[26]
$29 = (struct kmem_cache *) 0x0
(gdb) s
696 }
(gdb)
__kmem_cache_create (cachep=0xf6800260, flags=2147493888) at mm/slab.c:2493
2493 BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
(gdb)
2485 cachep->slabp_cache = kmem_find_general_cachep(slab_size, 0u);
(gdb)
2493 BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
(gdb)
^C
Program received signal SIGINT, Interrupt.
0xc1409afc in panic (fmt=0xc1511274 "Attempted to kill the idle task!") at kernel/panic.c:182
182 mdelay(PANIC_TIMER_STEP);
(gdb) bt
#0 0xc1409afc in panic (fmt=0xc1511274 "Attempted to kill the idle task!") at kernel/panic.c:182
#1 0xc1034a5f in do_exit (code=11) at kernel/exit.c:718
#2 0xc1005887 in oops_end (flags=70, regs=0xc158def0, signr=11) at arch/x86/kernel/dumpstack.c:249
#3 0xc10059af in die (str=0xc1502c45 "invalid opcode", regs=0xc158def0, err=0) at arch/x86/kernel/dumpstack.c:310
#4 0xc1002e25 in do_trap_no_signal (trapnr=6, signr=4, str=0xc1502c45 "invalid opcode", regs=0xc158def0, error_code=0, info=0xc158de60)
at arch/x86/kernel/traps.c:130
#5 do_trap (trapnr=6, signr=4, str=0xc1502c45 "invalid opcode", regs=0xc158def0, error_code=0, info=0xc158de60) at arch/x86/kernel/traps.c:145
#6 0xc10032a6 in do_invalid_op (regs=0xc158def0, error_code=0) at arch/x86/kernel/traps.c:213
#7 0xc140c742 in ?? () at arch/x86/kernel/entry_32.S:1318
#8 0xc10b91d5 in __kmem_cache_create (cachep=0xf6800260, flags=2147493888) at mm/slab.c:2493
#9 0xc15dfcde in create_boot_cache (s=0xf6800260, name=0xc15148be "kmalloc", size=192, flags=8192) at mm/slab_common.c:299
#10 0xc15dfd4b in create_kmalloc_cache (name=0xc15148be "kmalloc", size=192, flags=8192) at mm/slab_common.c:316
#11 0xc15e0aa2 in kmem_cache_init () at mm/slab.c:1652
#12 0xc15c972a in mm_init () at init/main.c:462
#13 start_kernel () at init/main.c:527
#14 0xc15c929f in i386_start_kernel () at arch/x86/kernel/head32.c:66
#15 0x00000000 in ?? ()

2013-05-02 15:12:23

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Wed, 1 May 2013, Pekka Enberg wrote:

> > This behavior has been in there for years. Why try a kmalloc that
> > always fails since the size is too big?
>
> ...because want the extra protection for cases where size is controlled by
> userspace. This is consistent with kcalloc() that returns NULL on integer
> overflow.

The slab allocator behavior since SLUB was introduced has been to BUG() on
allocations that can never be satisfied.

2013-05-02 15:13:57

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Wed, 1 May 2013, Pekka Enberg wrote:

> Why is that? Historically it has returned NULL, hasn't it? We have had cases
> where kernel code (naively) uses size directly from userspace and we
> definitely don't want to BUG_ON on it.

In that case the size is dynamic and then we return an error. In this case
the size is static and known at compile time. The allocation can never
succeed.

2013-05-02 20:54:00

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Wed, 1 May 2013, Tetsuo Handa wrote:

> The culprit location is possibly in SLAB because the kernel boots if built with
> CONFIG_DEBUG_SLAB=n || CONFIG_DEBUG_SPINLOCK=n || CONFIG_DEBUG_PAGEALLOC=n.

I have booted such a configuration just fine. Please have a look at the
kernel config that I send or send me yours.

Here is a patch that restores the old behavior for SLAB

Subject: slab: Return NULL for oversized allocations

The inline path seems to have changed the SLAB behavior for very large
kmalloc allocations. This patch restores the old behavior.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux/include/linux/slab_def.h
===================================================================
--- linux.orig/include/linux/slab_def.h 2013-05-02 15:02:45.864728115 -0500
+++ linux/include/linux/slab_def.h 2013-05-02 15:06:14.940474110 -0500
@@ -126,6 +126,9 @@ static __always_inline void *kmalloc(siz
if (!size)
return ZERO_SIZE_PTR;

+ if (size >= KMALLOC_MAX_SIZE)
+ return NULL;
+
i = kmalloc_index(size);

#ifdef CONFIG_ZONE_DMA
@@ -172,6 +175,9 @@ static __always_inline void *kmalloc_nod
if (!size)
return ZERO_SIZE_PTR;

+ if (size > KMALLOC_MAX_SIZE)
+ return NULL;
+
i = kmalloc_index(size);

#ifdef CONFIG_ZONE_DMA

2013-05-03 08:26:47

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Christoph Lameter wrote:
> I have booted such a configuration just fine. Please have a look at the
> kernel config that I send or send me yours.

OK. Here are complete steps to reproduce using QEMU on CentOS 6.4.

(1) Compile the kernel for x86_32 and run it on QEMU.

$ cd /tmp/
$ wget -O - https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/snapshot/next-20130426.tar.bz2 | tar -jxf -
$ cd next-20130426/
$ wget -O .config http://I-love.SAKURA.ne.jp/tmp/config-3.9-rc8-next-20130422
$ make -sj2 CONFIG_FRAME_POINTER=y CONFIG_DEBUG_INFO=y
$ /usr/libexec/qemu-kvm -no-kvm -cpu pentium3 -m 512 -nographic -kernel arch/x86/boot/bzImage -S -gdb tcp::1234

(2) Attach gdb to the QEMU process from another terminal and continue.

$ gdb /tmp/next-20130426/vmlinux
(gdb) target remote :1234
(gdb) c

Press Ctrl-C after waiting for ten seconds.
You will find that the kernel has panic()ed.

If one of CONFIG_DEBUG_SLAB/CONFIG_DEBUG_SPINLOCK/CONFIG_DEBUG_PAGEALLOC is
changed to n from the config above, the kernel boots fine. But the kernel
triggers oops for very large kmalloc allocations.

>
> Here is a patch that restores the old behavior for SLAB
>
>
>
> Subject: slab: Return NULL for oversized allocations
>
> The inline path seems to have changed the SLAB behavior for very large
> kmalloc allocations. This patch restores the old behavior.
>
> Signed-off-by: Christoph Lameter <[email protected]>
>

I don't think this patch is sufficient. There are functions (e.g. kstrdup())
which call variant functions (e.g. kmalloc_track_caller()). I think we need to
check size at functions which determine index from size (e.g. kmalloc_slab()).

>
> Index: linux/include/linux/slab_def.h
> ===================================================================
> --- linux.orig/include/linux/slab_def.h 2013-05-02 15:02:45.864728115 -0500
> +++ linux/include/linux/slab_def.h 2013-05-02 15:06:14.940474110 -0500
> @@ -126,6 +126,9 @@ static __always_inline void *kmalloc(siz
> if (!size)
> return ZERO_SIZE_PTR;
>
> + if (size >= KMALLOC_MAX_SIZE)
> + return NULL;
> +

Why not (size > KMALLOC_MAX_SIZE) ?

> i = kmalloc_index(size);
>
> #ifdef CONFIG_ZONE_DMA

2013-05-03 15:50:21

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Fri, 3 May 2013, Tetsuo Handa wrote:

> I don't think this patch is sufficient. There are functions (e.g. kstrdup())
> which call variant functions (e.g. kmalloc_track_caller()). I think we need to
> check size at functions which determine index from size (e.g. kmalloc_slab()).

Right.

> > Index: linux/include/linux/slab_def.h
> > ===================================================================
> > --- linux.orig/include/linux/slab_def.h 2013-05-02 15:02:45.864728115 -0500
> > +++ linux/include/linux/slab_def.h 2013-05-02 15:06:14.940474110 -0500
> > @@ -126,6 +126,9 @@ static __always_inline void *kmalloc(siz
> > if (!size)
> > return ZERO_SIZE_PTR;
> >
> > + if (size >= KMALLOC_MAX_SIZE)
> > + return NULL;
> > +
>
> Why not (size > KMALLOC_MAX_SIZE) ?

Correct.

We also would want some diagnostics as to who is doing these large allocs
so that these issues can be fixed. Updated patch:

Subject: slab: Return NULL for oversized allocations

The inline path seems to have changed the SLAB behavior for very large
kmalloc allocations. This patch restores the old behavior but also
adds diagnostics so that we can figure where in the code these
large allocations occur.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux/include/linux/slab_def.h
===================================================================
--- linux.orig/include/linux/slab_def.h 2013-05-03 10:36:46.019564801 -0500
+++ linux/include/linux/slab_def.h 2013-05-03 10:37:28.860302188 -0500
@@ -126,6 +126,11 @@ static __always_inline void *kmalloc(siz
if (!size)
return ZERO_SIZE_PTR;

+ if (size > KMALLOC_MAX_SIZE) {
+ WARN_ON(1);
+ return NULL;
+ }
+
i = kmalloc_index(size);

#ifdef CONFIG_ZONE_DMA
@@ -172,6 +177,11 @@ static __always_inline void *kmalloc_nod
if (!size)
return ZERO_SIZE_PTR;

+ if (size > KMALLOC_MAX_SIZE) {
+ WARN_ON(1);
+ return NULL;
+ }
+
i = kmalloc_index(size);

#ifdef CONFIG_ZONE_DMA
Index: linux/mm/slab_common.c
===================================================================
--- linux.orig/mm/slab_common.c 2013-05-03 10:36:46.019564801 -0500
+++ linux/mm/slab_common.c 2013-05-03 10:38:29.045351837 -0500
@@ -373,6 +373,11 @@ struct kmem_cache *kmalloc_slab(size_t s
{
int index;

+ if (size > KMALLOC_MAX_SIZE) {
+ WARN_ON(1);
+ return NULL;
+ }
+
if (size <= 192) {
if (!size)
return ZERO_SIZE_PTR;

2013-05-03 18:04:22

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Enabling various debugging options increases the size of structures and
the subslab handling in SLABs kmem_cache_create will start to fail.

Here is a fix for that:

Subject: Fix bootstrap creation of kmalloc caches

For SLAB the kmalloc caches must be created in ascending sizes
in order for the OFF_SLAB sub-slab cache to work properly.

Create the non power of two caches immediately after the prior power
of two kmalloc cache. Do not create the non power of two caches
before all other caches.

Signed-off-by: Christoph Lamete <[email protected]>

Index: linux/mm/slab_common.c
===================================================================
--- linux.orig/mm/slab_common.c 2013-05-03 11:16:45.764382372 -0500
+++ linux/mm/slab_common.c 2013-05-03 12:59:40.431797020 -0500
@@ -444,18 +444,24 @@ void __init create_kmalloc_caches(unsign
for (i = 128 + 8; i <= 192; i += 8)
size_index[size_index_elem(i)] = 8;
}
- /* Caches that are not of the two-to-the-power-of size */
- if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[1])
- kmalloc_caches[1] = create_kmalloc_cache(NULL, 96, flags);
-
- if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2])
- kmalloc_caches[2] = create_kmalloc_cache(NULL, 192, flags);
-
- for (i = KMALLOC_SHIFT_LOW; i < KMALLOC_SHIFT_HIGH; i++)
- if (!kmalloc_caches[i])
+ for (i = KMALLOC_SHIFT_LOW; i < KMALLOC_SHIFT_HIGH; i++) {
+ if (!kmalloc_caches[i]) {
kmalloc_caches[i] = create_kmalloc_cache(NULL,
1 << i, flags);

+ /*
+ * Caches that are not of the two-to-the-power-of size.
+ * These have to be created immediately after the
+ * earlier power of two caches
+ */
+ if (KMALLOC_MIN_SIZE <= 32 && !kmalloc_caches[1] && i == 6)
+ kmalloc_caches[1] = create_kmalloc_cache(NULL, 96, flags);
+
+ if (KMALLOC_MIN_SIZE <= 64 && !kmalloc_caches[2] && i == 7)
+ kmalloc_caches[2] = create_kmalloc_cache(NULL, 192, flags);
+ }
+ }
+
/* Kmalloc array is now usable */
slab_state = UP;

2013-05-03 18:49:18

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Applying

Subject: slab: Return NULL for oversized allocations
(Date: Fri, 3 May 2013 15:43:18 +0000)

and

Subject: Fix bootstrap creation of kmalloc caches
(Date: Fri, 3 May 2013 18:04:18 +0000)

on linux-next-20130426 made my kernel boots fine and passes the allocation
testcases. Thank you.

2013-05-03 19:28:16

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Sat, 4 May 2013, Tetsuo Handa wrote:

> Subject: slab: Return NULL for oversized allocations
> (Date: Fri, 3 May 2013 15:43:18 +0000)
>
> and
>
> Subject: Fix bootstrap creation of kmalloc caches
> (Date: Fri, 3 May 2013 18:04:18 +0000)
>
> on linux-next-20130426 made my kernel boots fine and passes the allocation
> testcases. Thank you.

Ok could I see the kernel logs with the warnings?

2013-05-04 00:16:05

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Christoph Lameter wrote:
> Ok could I see the kernel logs with the warnings?
Sure.

---------- testcase ----------
volatile unsigned int size;
for (size = 4 * 1024 * 1024; size <= 32 * 1024 * 1024; size <<= 1) {
void *ptr = kmalloc(size, GFP_KERNEL);
printk("kmalloc(%u)=%p\n", size, ptr);
kfree(ptr);
}
---------- testcase ----------

---------- before applying this patch ----------
kmalloc(4194304)=d9c00000
kmalloc(8388608)= (null)
kmalloc(16777216)= (null)
------------[ cut here ]------------
Kernel BUG at c10b9c5b [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP
Modules linked in: test(OF+) binfmt_misc
CPU: 0 PID: 3881 Comm: insmod Tainted: GF O 3.9.0-rc8-next-20130426 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 08/15/2008
task: df2a3b30 ti: de0a0000 task.ti: de0a0000
EIP: 0060:[<c10b9c5b>] EFLAGS: 00010202 CPU: 0
EIP is at cache_alloc_refill+0x56b/0x590
EAX: 00000040 EBX: df002680 ECX: 00000000 EDX: 00000010
ESI: df000ac0 EDI: 00000000 EBP: 00000001 ESP: de0a1e08
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: b769f3e0 CR3: 1f0fc000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Stack:
00000000 00000000 00000000 000000d0 00000000 00000010 00000040 00000246
000000d0 00000000 00000000 df000ac0 00000202 000000d0 c10b9d0e ffffffff
00000000 e0838040 00000001 00000000 e083a01c e0838024 01000000 00000000
Call Trace:
[<c10b9d0e>] ? __kmalloc+0x8e/0xd0
[<e083a01c>] ? test_init+0x1c/0x5c [test]
[<c1000102>] ? do_one_initcall+0x32/0x170
[<c10b3098>] ? __vunmap+0xa8/0xe0
[<e083a000>] ? 0xe0839fff
[<c106e066>] ? load_module+0x1196/0x1560
[<c106e7e0>] ? SyS_init_module+0xb0/0xd0
[<c143273a>] ? sysenter_do_call+0x12/0x22
Code: 85 c0 0f 84 05 ff ff ff 31 c9 89 ea 89 f0 e8 8d f9 ff ff e9 f5 fe ff ff 0f 0b eb fe 8b 6e 20 f7 c5 01 00 00 00 0f 84 95 fd ff ff <0f> 0b eb fe 0f 0b eb fe 0f b6 44 24 1f 89 da 8b 4c 24 20 89 04
EIP: [<c10b9c5b>] cache_alloc_refill+0x56b/0x590 SS:ESP 0068:de0a1e08
---[ end trace f0be5ebb0e2c1371 ]---
---------- before applying this patch ----------

---------- after applying this patch ----------
kmalloc(4194304)=d7000000
------------[ cut here ]------------
WARNING: at mm/slab_common.c:377 kmalloc_slab+0x57/0x70()
Modules linked in: test(OF+) binfmt_misc
CPU: 1 PID: 4337 Comm: insmod Tainted: GF W O 3.9.0-rc8-next-20130426-dirty #2
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 08/15/2008
c142f6a1 c102fbcb c1538e8c c153f938 00000179 c10a5197 c10a5197 d7000000
e1584040 00000001 000000d0 c102fc1b 00000009 00000000 c10a5197 d7000000
c10ba3da 00000286 d7000000 e1584040 00000001 00000000 e158601c e1584024
Call Trace:
[<c142f6a1>] ? dump_stack+0xa/0x19
[<c102fbcb>] ? warn_slowpath_common+0x6b/0xa0
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c102fc1b>] ? warn_slowpath_null+0x1b/0x20
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c10ba3da>] ? __kmalloc+0x1a/0xd0
[<e158601c>] ? test_init+0x1c/0x5c [test]
[<c1000102>] ? do_one_initcall+0x32/0x170
[<c10b30b8>] ? __vunmap+0xa8/0xe0
[<e1586000>] ? 0xe1585fff
[<c106e066>] ? load_module+0x1196/0x1560
[<c106e7e0>] ? SyS_init_module+0xb0/0xd0
[<c143273a>] ? sysenter_do_call+0x12/0x22
---[ end trace b6ab930b322ad480 ]---
kmalloc(8388608)= (null)
------------[ cut here ]------------
WARNING: at mm/slab_common.c:377 kmalloc_slab+0x57/0x70()
Modules linked in: test(OF+) binfmt_misc
CPU: 1 PID: 4337 Comm: insmod Tainted: GF W O 3.9.0-rc8-next-20130426-dirty #2
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 08/15/2008
c142f6a1 c102fbcb c1538e8c c153f938 00000179 c10a5197 c10a5197 00000000
e1584040 00000001 000000d0 c102fc1b 00000009 00000000 c10a5197 00000000
c10ba3da ffffffff 00000000 e1584040 00000001 00000000 e158601c e1584024
Call Trace:
[<c142f6a1>] ? dump_stack+0xa/0x19
[<c102fbcb>] ? warn_slowpath_common+0x6b/0xa0
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c102fc1b>] ? warn_slowpath_null+0x1b/0x20
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c10ba3da>] ? __kmalloc+0x1a/0xd0
[<e158601c>] ? test_init+0x1c/0x5c [test]
[<c1000102>] ? do_one_initcall+0x32/0x170
[<c10b30b8>] ? __vunmap+0xa8/0xe0
[<e1586000>] ? 0xe1585fff
[<c106e066>] ? load_module+0x1196/0x1560
[<c106e7e0>] ? SyS_init_module+0xb0/0xd0
[<c143273a>] ? sysenter_do_call+0x12/0x22
---[ end trace b6ab930b322ad481 ]---
kmalloc(16777216)= (null)
------------[ cut here ]------------
WARNING: at mm/slab_common.c:377 kmalloc_slab+0x57/0x70()
Modules linked in: test(OF+) binfmt_misc
CPU: 1 PID: 4337 Comm: insmod Tainted: GF W O 3.9.0-rc8-next-20130426-dirty #2
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 08/15/2008
c142f6a1 c102fbcb c1538e8c c153f938 00000179 c10a5197 c10a5197 00000000
e1584040 00000001 000000d0 c102fc1b 00000009 00000000 c10a5197 00000000
c10ba3da ffffffff 00000000 e1584040 00000001 00000000 e158601c e1584024
Call Trace:
[<c142f6a1>] ? dump_stack+0xa/0x19
[<c102fbcb>] ? warn_slowpath_common+0x6b/0xa0
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c102fc1b>] ? warn_slowpath_null+0x1b/0x20
[<c10a5197>] ? kmalloc_slab+0x57/0x70
[<c10ba3da>] ? __kmalloc+0x1a/0xd0
[<e158601c>] ? test_init+0x1c/0x5c [test]
[<c1000102>] ? do_one_initcall+0x32/0x170
[<c10b30b8>] ? __vunmap+0xa8/0xe0
[<e1586000>] ? 0xe1585fff
[<c106e066>] ? load_module+0x1196/0x1560
[<c106e7e0>] ? SyS_init_module+0xb0/0xd0
[<c143273a>] ? sysenter_do_call+0x12/0x22
---[ end trace b6ab930b322ad482 ]---
kmalloc(33554432)= (null)
---------- after applying this patch ----------

2013-05-06 06:32:52

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Fri, May 3, 2013 at 9:04 PM, Christoph Lameter <[email protected]> wrote:
> Enabling various debugging options increases the size of structures and
> the subslab handling in SLABs kmem_cache_create will start to fail.
>
>
> Here is a fix for that:
>
> Subject: Fix bootstrap creation of kmalloc caches
>
> For SLAB the kmalloc caches must be created in ascending sizes
> in order for the OFF_SLAB sub-slab cache to work properly.
>
> Create the non power of two caches immediately after the prior power
> of two kmalloc cache. Do not create the non power of two caches
> before all other caches.
>
> Signed-off-by: Christoph Lamete <[email protected]>

This doesn't seem to apply against slab/next branch. What tree did you
use to generate the patch?

Pekka

2013-05-06 06:59:41

by Geert Uytterhoeven

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Fri, May 3, 2013 at 5:43 PM, Christoph Lameter <[email protected]> wrote:
> Subject: slab: Return NULL for oversized allocations
>
> The inline path seems to have changed the SLAB behavior for very large
> kmalloc allocations. This patch restores the old behavior but also
> adds diagnostics so that we can figure where in the code these
> large allocations occur.
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
>
> Index: linux/include/linux/slab_def.h
> ===================================================================
> --- linux.orig/include/linux/slab_def.h 2013-05-03 10:36:46.019564801 -0500
> +++ linux/include/linux/slab_def.h 2013-05-03 10:37:28.860302188 -0500
> @@ -126,6 +126,11 @@ static __always_inline void *kmalloc(siz
> if (!size)
> return ZERO_SIZE_PTR;
>
> + if (size > KMALLOC_MAX_SIZE) {
> + WARN_ON(1);

As we were worried about this being triggered frm userspace, this needs
some rate limiting, to avoid flooding the kernel logs.

> + return NULL;
> + }
> +
> i = kmalloc_index(size);
>
> #ifdef CONFIG_ZONE_DMA
> @@ -172,6 +177,11 @@ static __always_inline void *kmalloc_nod
> if (!size)
> return ZERO_SIZE_PTR;
>
> + if (size > KMALLOC_MAX_SIZE) {
> + WARN_ON(1);

Idem ditto.

> + return NULL;
> + }
> +
> i = kmalloc_index(size);

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2013-05-06 07:27:49

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Mon, May 6, 2013 at 9:59 AM, Geert Uytterhoeven <[email protected]> wrote:
> On Fri, May 3, 2013 at 5:43 PM, Christoph Lameter <[email protected]> wrote:
>> Subject: slab: Return NULL for oversized allocations
>>
>> The inline path seems to have changed the SLAB behavior for very large
>> kmalloc allocations. This patch restores the old behavior but also
>> adds diagnostics so that we can figure where in the code these
>> large allocations occur.
>>
>> Signed-off-by: Christoph Lameter <[email protected]>
>>
>>
>> Index: linux/include/linux/slab_def.h
>> ===================================================================
>> --- linux.orig/include/linux/slab_def.h 2013-05-03 10:36:46.019564801 -0500
>> +++ linux/include/linux/slab_def.h 2013-05-03 10:37:28.860302188 -0500
>> @@ -126,6 +126,11 @@ static __always_inline void *kmalloc(siz
>> if (!size)
>> return ZERO_SIZE_PTR;
>>
>> + if (size > KMALLOC_MAX_SIZE) {
>> + WARN_ON(1);
>
> As we were worried about this being triggered frm userspace, this needs
> some rate limiting, to avoid flooding the kernel logs.

I changed it to WARN_ON_ONCE():

https://git.kernel.org/cgit/linux/kernel/git/penberg/linux.git/commit/?h=slab/next

2013-05-06 13:46:50

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Sat, 4 May 2013, Tetsuo Handa wrote:

> Christoph Lameter wrote:
> > Ok could I see the kernel logs with the warnings?
> Sure.

These are exclusively from the module load. So the kernel seems to be
clean of large kmalloc's ?

2013-05-06 13:47:56

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Mon, 6 May 2013, Pekka Enberg wrote:

> This doesn't seem to apply against slab/next branch. What tree did you
> use to generate the patch?

Slab/next from a couple of weeks ago.

2013-05-06 20:24:44

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Fri, May 3, 2013 at 9:04 PM, Christoph Lameter <[email protected]> wrote:
> - for (i = KMALLOC_SHIFT_LOW; i < KMALLOC_SHIFT_HIGH; i++)

This didn't match what I had in my tree. I fixed it by hand but please
verify the end result:

https://git.kernel.org/cgit/linux/kernel/git/penberg/linux.git/commit/?h=slab/next&id=8a965b3baa89ffedc73c0fbc750006c631012ced

2013-05-07 10:38:24

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Christoph Lameter wrote:
> On Sat, 4 May 2013, Tetsuo Handa wrote:
>
> > Christoph Lameter wrote:
> > > Ok could I see the kernel logs with the warnings?
> > Sure.
>
> These are exclusively from the module load. So the kernel seems to be
> clean of large kmalloc's ?
>
There are modules (e.g. TOMOYO) which do not check for KMALLOC_MAX_SIZE limit
and expect kmalloc() larger than KMALLOC_MAX_SIZE bytes to return NULL.

As far as I know, such modules sequentially double the buffer size. Therefore,
as long as request for KMALLOC_MAX_SIZE * 2 bytes returns NULL, they won't
trigger oops by requesting for KMALLOC_MAX_SIZE * 8 bytes.

The testcase I wrote is for module (I don't know if there is one) which
requests for KMALLOC_MAX_SIZE * 8 bytes without requesting for
KMALLOC_MAX_SIZE * 2 bytes and/or KMALLOC_MAX_SIZE * 4 bytes.

2013-05-07 14:28:53

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, 7 May 2013, Tetsuo Handa wrote:

> > These are exclusively from the module load. So the kernel seems to be
> > clean of large kmalloc's ?
> >
> There are modules (e.g. TOMOYO) which do not check for KMALLOC_MAX_SIZE limit
> and expect kmalloc() larger than KMALLOC_MAX_SIZE bytes to return NULL.

Dont do that. Please fix these things. The slab allocators are for small
allocations not large ones like these. There are page allocator functions
that allow higher order allocations if contiguous memory is needed and
there is vmalloc that makes it safe. We recently added a CMA allocator
specifically for these large contiguous physical allocations.

> As far as I know, such modules sequentially double the buffer size. Therefore,
> as long as request for KMALLOC_MAX_SIZE * 2 bytes returns NULL, they won't
> trigger oops by requesting for KMALLOC_MAX_SIZE * 8 bytes.

Please use vmalloc for these large allocs.

2013-05-07 14:28:57

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Mon, 6 May 2013, Pekka Enberg wrote:

> On Fri, May 3, 2013 at 9:04 PM, Christoph Lameter <[email protected]> wrote:
> > - for (i = KMALLOC_SHIFT_LOW; i < KMALLOC_SHIFT_HIGH; i++)
>
> This didn't match what I had in my tree. I fixed it by hand but please
> verify the end result:
>
> https://git.kernel.org/cgit/linux/kernel/git/penberg/linux.git/commit/?h=slab/next&id=8a965b3baa89ffedc73c0fbc750006c631012ced
>
Well this is because you did not take the patch that changed the way
KMALLOC_SHIFT_HIGH is treated.

The patch above looks fine though.

2013-05-07 14:44:36

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, May 7, 2013 at 5:23 PM, Christoph Lameter <[email protected]> wrote:
> Well this is because you did not take the patch that changed the way
> KMALLOC_SHIFT_HIGH is treated.

Is that still needed? I only took the ones Tetsuo said were needed to
fix the issue at hand.

Pekka

2013-05-07 15:47:52

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, 7 May 2013, Pekka Enberg wrote:

> On Tue, May 7, 2013 at 5:23 PM, Christoph Lameter <[email protected]> wrote:
> > Well this is because you did not take the patch that changed the way
> > KMALLOC_SHIFT_HIGH is treated.
>
> Is that still needed? I only took the ones Tetsuo said were needed to
> fix the issue at hand.

If his tests work then I guess not. Queue for later.

2013-05-09 12:25:39

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Tetsuo Handa wrote:
> > Christoph Lameter wrote:
> > > What is MAX_ORDER on the architecture?
> >
> > In my environment (x86_32), the constants are
> >
> > MAX_ORDER=11 PAGE_SHIFT=12 KMALLOC_SHIFT_HIGH=22 KMALLOC_MAX_SIZE=4194304
> >
>
> I don't know if any, but on an architecture with PAGE_SHIFT + MAX_ORDER > 26,
>
> static void init_node_lock_keys(int q)
> {
> int i;
>
> if (slab_state < UP)
> return;
>
> for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
> struct kmem_cache_node *n;
> struct kmem_cache *cache = kmalloc_caches[i];
>
> looks like out of bounds access due to
>
> #define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
> (MAX_ORDER + PAGE_SHIFT - 1) : 25)
>
> and
>
> struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
>
> .
>

As of commit e0fd9aff on linux.git#master,
CONFIG_PPC_256K_PAGES=y (which makes PAGE_SHIFT 18) &&
CONFIG_FORCE_MAX_ZONEORDER=11 (which makes MAX_ORDER 11) &&
CONFIG_PROVE_LOCKING=y with below assertion

----------
diff --git a/mm/slab.c b/mm/slab.c
index 8ccd296..0401982 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -565,6 +565,7 @@ static void init_node_lock_keys(int q)
if (slab_state < UP)
return;

+ BUILD_BUG_ON(PAGE_SHIFT + MAX_ORDER != KMALLOC_SHIFT_HIGH + 1);
for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
struct kmem_cache_node *n;
struct kmem_cache *cache = kmalloc_caches[i];
----------

on powerpc triggers below error.

CC mm/slab.o
mm/slab.c: In function 'init_node_lock_keys':
mm/slab.c:568:2: error: call to '__compiletime_assert_568' declared with attribute error: BUILD_BUG_ON failed: PAGE_SHIFT + MAX_ORDER != KMALLOC_SHIFT_HIGH + 1
make[1]: *** [mm/slab.o] Error 1
make: *** [mm/slab.o] Error 2

This is an example of overrun at kmalloc_caches[26...PAGE_SHIFT+MAX_ORDER-1]
range which the compiler may not be able to detect.

2013-05-09 14:14:34

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Thu, 9 May 2013, Tetsuo Handa wrote:

> + BUILD_BUG_ON(PAGE_SHIFT + MAX_ORDER != KMALLOC_SHIFT_HIGH + 1);
> for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {

Yea looping to PAGE_SHIFT + MAX_ORDER is fundamentally wrong.

Subject: SLAB: Fix init_lock_keys()

init_lock_keys goes too far in initializing values in kmalloc_caches because
it assumed that the size of the kmalloc array goes up to MAX_ORDER. However, the size
of the kmalloc array for SLAB may be restricted due to increased page sizes or CONFIG_FORCE_MAX_ZONEORDER.

Reported-by: Tetsuo Handa <[email protected]>
Signed-off-by: Christoph Lameter <[email protected]>

Index: linux/mm/slab.c
===================================================================
--- linux.orig/mm/slab.c 2013-05-09 09:06:20.000000000 -0500
+++ linux/mm/slab.c 2013-05-09 09:08:08.338606055 -0500
@@ -565,7 +565,7 @@ static void init_node_lock_keys(int q)
if (slab_state < UP)
return;

- for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
+ for (i = 1; i =< KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache_node *n;
struct kmem_cache *cache = kmalloc_caches[i];

2013-05-09 21:54:16

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Christoph Lameter wrote:
> - for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
> + for (i = 1; i =< KMALLOC_SHIFT_HIGH; i++) {

mm/slab.c: In function 'init_node_lock_keys':
mm/slab.c:568:17: error: expected expression before '<' token

What I'm worrying is:

/*
* The largest kmalloc size supported by the SLAB allocators is
* 32 megabyte (2^25) or the maximum allocatable page order if that is
* less than 32 MB.
*
* WARNING: Its not easy to increase this value since the allocators have
* to do various tricks to work around compiler limitations in order to
* ensure proper constant folding.
*/
#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \
(MAX_ORDER + PAGE_SHIFT - 1) : 25)

extern struct kmem_cache *kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];

Can we manage with allocating only 26 elements when MAX_ORDER + PAGE_SHIFT > 26
(e.g. PAGE_SIZE == 256 * 1024) ?

Can kmalloc_index()/kmalloc_size()/kmalloc_slab() etc. work correctly when
MAX_ORDER + PAGE_SHIFT > 26 (e.g. PAGE_SIZE == 256 * 1024) ?

2013-05-10 12:38:59

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Tetsuo Handa wrote:
> Can we manage with allocating only 26 elements when MAX_ORDER + PAGE_SHIFT > 26
> (e.g. PAGE_SIZE == 256 * 1024) ?
>
> Can kmalloc_index()/kmalloc_size()/kmalloc_slab() etc. work correctly when
> MAX_ORDER + PAGE_SHIFT > 26 (e.g. PAGE_SIZE == 256 * 1024) ?
>
Today I compared SLAB/SLUB code. If I understood correctly, the line

if (size <= 64 * 1024 * 1024) return 26;

in kmalloc_index() is redundant (in fact, kmalloc_caches[26] is out of range)
and conflicts with what the comment

* The largest kmalloc size supported by the SLAB allocators is
* 32 megabyte (2^25) or the maximum allocatable page order if that is
* less than 32 MB.

says, and 0 <= kmalloc_index() <= 25 is always true for SLAB and
0 <= kmalloc_index() <= PAGE_SHIFT+1 is always true for SLUB.

Therefore, towards 3.10-rc1,

> > - for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
> > + for (i = 1; i =< KMALLOC_SHIFT_HIGH; i++) {
>
-+ for (i = 1; i =< KMALLOC_SHIFT_HIGH; i++) {
++ for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) {

would be the last fix for me. (I don't know why kmalloc_caches[0] is excluded.)

2013-05-10 15:57:33

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Fri, 10 May 2013, Tetsuo Handa wrote:

> Tetsuo Handa wrote:
> > Can we manage with allocating only 26 elements when MAX_ORDER + PAGE_SHIFT > 26
> > (e.g. PAGE_SIZE == 256 * 1024) ?
> >
> > Can kmalloc_index()/kmalloc_size()/kmalloc_slab() etc. work correctly when
> > MAX_ORDER + PAGE_SHIFT > 26 (e.g. PAGE_SIZE == 256 * 1024) ?
> >
> Today I compared SLAB/SLUB code. If I understood correctly, the line
>
> if (size <= 64 * 1024 * 1024) return 26;
>
> in kmalloc_index() is redundant (in fact, kmalloc_caches[26] is out of range)
> and conflicts with what the comment

True we could remove it but it does not hurt. There is a bounding of size
before any call to kmalloc_index.

> * The largest kmalloc size supported by the SLAB allocators is
> * 32 megabyte (2^25) or the maximum allocatable page order if that is
> * less than 32 MB.
>
> says, and 0 <= kmalloc_index() <= 25 is always true for SLAB and
> 0 <= kmalloc_index() <= PAGE_SHIFT+1 is always true for SLUB.
>
> Therefore, towards 3.10-rc1,
>
> > > - for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
> > > + for (i = 1; i =< KMALLOC_SHIFT_HIGH; i++) {
> >
> -+ for (i = 1; i =< KMALLOC_SHIFT_HIGH; i++) {
> ++ for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) {
>
> would be the last fix for me. (I don't know why kmalloc_caches[0] is excluded.)

Yep. kmalloc[0] is not used. The first cache to be used is 1 and 2 which
are the non power of two caches. 3 and higher are the power of two caches.

Subject: SLAB: Fix init_lock_keys

init_lock_keys goes too far in initializing values in kmalloc_caches because
it assumed that the size of the kmalloc array goes up to MAX_ORDER. However, the size
of the kmalloc array for SLAB may be restricted due to increased page sizes or CONFIG_FORCE_MAX_ZONEORDER.

Reported-by: Tetsuo Handa <[email protected]>
Signed-off-by: Christoph Lameter <[email protected]>

Index: linux/mm/slab.c
===================================================================
--- linux.orig/mm/slab.c 2013-05-09 09:06:20.000000000 -0500
+++ linux/mm/slab.c 2013-05-09 09:08:08.338606055 -0500
@@ -565,7 +565,7 @@ static void init_node_lock_keys(int q)
if (slab_state < UP)
return;

- for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
+ for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache_node *n;
struct kmem_cache *cache = kmalloc_caches[i];

2013-06-17 11:59:38

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Tetsuo Handa wrote:
> Christoph Lameter wrote:
> > Subject: SLAB: Fix init_lock_keys
> >
> > init_lock_keys goes too far in initializing values in kmalloc_caches because
> > it assumed that the size of the kmalloc array goes up to MAX_ORDER. However, the size
> > of the kmalloc array for SLAB may be restricted due to increased page sizes or CONFIG_FORCE_MAX_ZONEORDER.
> >
> > Reported-by: Tetsuo Handa <[email protected]>
> > Signed-off-by: Christoph Lameter <[email protected]>
> >
> > Index: linux/mm/slab.c
> > ===================================================================
> > --- linux.orig/mm/slab.c 2013-05-09 09:06:20.000000000 -0500
> > +++ linux/mm/slab.c 2013-05-09 09:08:08.338606055 -0500
> > @@ -565,7 +565,7 @@ static void init_node_lock_keys(int q)
> > if (slab_state < UP)
> > return;
> >
> > - for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
> > + for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) {
> > struct kmem_cache_node *n;
> > struct kmem_cache *cache = kmalloc_caches[i];
> >
> >
> Looks OK to me. Please send this one to 3.10-rcX.
>
It's already 3.10-rc6. Please be sure to send.

2013-07-01 20:09:08

by Andrew Morton

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, 7 May 2013 14:28:49 +0000 Christoph Lameter <[email protected]> wrote:

> On Tue, 7 May 2013, Tetsuo Handa wrote:
>
> > > These are exclusively from the module load. So the kernel seems to be
> > > clean of large kmalloc's ?
> > >
> > There are modules (e.g. TOMOYO) which do not check for KMALLOC_MAX_SIZE limit
> > and expect kmalloc() larger than KMALLOC_MAX_SIZE bytes to return NULL.
>
> Dont do that. Please fix these things.

Slab should return NULL for a request greater than KMALLOC_MAX_SIZE.
For heaven's sake don't break that!

What's going on with this bug, btw? This:

--- a/mm/slab.c~slab-fix-init_lock_keys
+++ a/mm/slab.c
@@ -565,7 +565,7 @@ static void init_node_lock_keys(int q)
if (slab_state < UP)
return;

- for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
+ for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache_node *n;
struct kmem_cache *cache = kmalloc_caches[i];

still seems to be unapplied.

I've read through the thread trying to work out what the end-user
impact of that fix is, but it's all clear as mud. It's possible that
the end-user effect is `kernel locks up after printing "Booting the
kernel"'. Or maybe not.

And if the above patch does indeed fix something significant, we might
need a -stable backport.

Can we get some clarity here please?

2013-07-01 21:45:40

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Andrew Morton wrote:
> On Tue, 7 May 2013 14:28:49 +0000 Christoph Lameter <[email protected]> wrote:
>
> > On Tue, 7 May 2013, Tetsuo Handa wrote:
> >
> > > > These are exclusively from the module load. So the kernel seems to be
> > > > clean of large kmalloc's ?
> > > >
> > > There are modules (e.g. TOMOYO) which do not check for KMALLOC_MAX_SIZE limit
> > > and expect kmalloc() larger than KMALLOC_MAX_SIZE bytes to return NULL.
> >
> > Dont do that. Please fix these things.
>
> Slab should return NULL for a request greater than KMALLOC_MAX_SIZE.
> For heaven's sake don't break that!

The patch that fixes above things (commit 6286ae97) went to 3.10.

> What's going on with this bug, btw? This:
>
> --- a/mm/slab.c~slab-fix-init_lock_keys
> +++ a/mm/slab.c
> @@ -565,7 +565,7 @@ static void init_node_lock_keys(int q)
> if (slab_state < UP)
> return;
>
> - for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
> + for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) {
> struct kmem_cache_node *n;
> struct kmem_cache *cache = kmalloc_caches[i];
>
>
> still seems to be unapplied.
>
The patch that fixes oops and panic on early boot on architectures with
PAGE_SHIFT + MAX_ORDER > 26 missed 3.10.

> I've read through the thread trying to work out what the end-user
> impact of that fix is, but it's all clear as mud. It's possible that
> the end-user effect is `kernel locks up after printing "Booting the
> kernel"'. Or maybe not.
>
> And if the above patch does indeed fix something significant, we might
> need a -stable backport.
>

Somebody needs this patch when debugging with CONFIG_LOCKDEP=y on
architectures with PAGE_SHIFT + MAX_ORDER > 26 .

> Can we get some clarity here please?
>

Thank you for adding to -mm. But please delete

Tetsuo said:

: It hangs (with CPU#0 spinning) immediately after printing
:
: Decompressing Linux... Parsing ELF... done.
: Booting the kernel.
:
: lines.

lines from "+ slab-fix-init_lock_keys.patch added to -mm tree", for
these lines are fixed by commit 8a965b3b. Though the same symptom would
appear if hitting this PAGE_SHIFT + MAX_ORDER > 26 bug, I can't confirm
the symptom for environments which I don't have.

2013-07-01 21:53:59

by Andrew Morton

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, 2 Jul 2013 06:45:27 +0900 Tetsuo Handa <[email protected]> wrote:

>
> > I've read through the thread trying to work out what the end-user
> > impact of that fix is, but it's all clear as mud. It's possible that
> > the end-user effect is `kernel locks up after printing "Booting the
> > kernel"'. Or maybe not.
> >
> > And if the above patch does indeed fix something significant, we might
> > need a -stable backport.
> >
>
> Somebody needs this patch when debugging with CONFIG_LOCKDEP=y on
> architectures with PAGE_SHIFT + MAX_ORDER > 26 .

Well *why* do they need it? What happens without the patch? How would
a person determine whether their kernel needs this patch?

When this patch crosses Greg's desk for -stable inclusion he's going to
wonder "why do users of -stable kernels need this", and you guys
haven't told him!

Grumble. Why is it so hard to get a simple and decent changelog for
this patch?

Look, I'll make this easier:

: Subject: slab: fix init_lock_keys
:
: In 3.10 kernels with CONFIG_LOCKDEP=y on architectures with
: PAGE_SHIFT + MAX_ORDER > 26 such as [architecture goes here], the kernel does
: [x] when the user does [y].
:
: init_lock_keys() goes too far in initializing values in kmalloc_caches
: because it assumed that the size of the kmalloc array goes up to
: MAX_ORDER. However, the size of the kmalloc array for SLAB may be
: restricted due to increased page sizes or CONFIG_FORCE_MAX_ZONEORDER.
:
: Fix this by [z].

Please fill in the text within [].

2013-07-02 12:49:43

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

Andrew Morton wrote:
> Look, I'll make this easier:
>
> : Subject: slab: fix init_lock_keys
> :
> : In 3.10 kernels with CONFIG_LOCKDEP=y on architectures with
> : PAGE_SHIFT + MAX_ORDER > 26 such as [architecture goes here], the kernel does
> : [x] when the user does [y].
> :
> : init_lock_keys() goes too far in initializing values in kmalloc_caches
> : because it assumed that the size of the kmalloc array goes up to
> : MAX_ORDER. However, the size of the kmalloc array for SLAB may be
> : restricted due to increased page sizes or CONFIG_FORCE_MAX_ZONEORDER.
> :
> : Fix this by [z].
>
>
> Please fill in the text within [].
>
OK. I made from http://marc.info/?l=linux-kernel&m=136810234704350&w=2 .
-----
Some architectures (e.g. powerpc built with CONFIG_PPC_256K_PAGES=y
CONFIG_FORCE_MAX_ZONEORDER=11) get PAGE_SHIFT + MAX_ORDER > 26.

In 3.10 kernels, CONFIG_LOCKDEP=y with PAGE_SHIFT + MAX_ORDER > 26 makes
init_lock_keys() dereference beyond kmalloc_caches[26].
This leads to an unbootable system (kernel panic at initializing SLAB)
if one of kmalloc_caches[26...PAGE_SHIFT+MAX_ORDER-1] is not NULL.

Fix this by making sure that init_lock_keys() does not dereference beyond
kmalloc_caches[26] arrays.
-----

2013-07-02 19:12:13

by Andrew Morton

[permalink] [raw]

Subject: Re: [linux-next-20130422] Bug in SLAB?

On Tue, 2 Jul 2013 21:49:26 +0900 Tetsuo Handa <[email protected]> wrote:

> Some architectures (e.g. powerpc built with CONFIG_PPC_256K_PAGES=y
> CONFIG_FORCE_MAX_ZONEORDER=11) get PAGE_SHIFT + MAX_ORDER > 26.
>
> In 3.10 kernels, CONFIG_LOCKDEP=y with PAGE_SHIFT + MAX_ORDER > 26 makes
> init_lock_keys() dereference beyond kmalloc_caches[26].
> This leads to an unbootable system (kernel panic at initializing SLAB)
> if one of kmalloc_caches[26...PAGE_SHIFT+MAX_ORDER-1] is not NULL.
>
> Fix this by making sure that init_lock_keys() does not dereference beyond
> kmalloc_caches[26] arrays.

Nice, thanks. Pekka, please grab.

From: Christoph Lameter <[email protected]>
Subject: slab: fix init_lock_keys

Some architectures (e.g. powerpc built with CONFIG_PPC_256K_PAGES=y
CONFIG_FORCE_MAX_ZONEORDER=11) get PAGE_SHIFT + MAX_ORDER > 26.

In 3.10 kernels, CONFIG_LOCKDEP=y with PAGE_SHIFT + MAX_ORDER > 26 makes
init_lock_keys() dereference beyond kmalloc_caches[26].
This leads to an unbootable system (kernel panic at initializing SLAB)
if one of kmalloc_caches[26...PAGE_SHIFT+MAX_ORDER-1] is not NULL.

Fix this by making sure that init_lock_keys() does not dereference beyond
kmalloc_caches[26] arrays.

Signed-off-by: Christoph Lameter <[email protected]>
Reported-by: Tetsuo Handa <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: <[email protected]> [3.10.x]
Signed-off-by: Andrew Morton <[email protected]>
---

mm/slab.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN mm/slab.c~slab-fix-init_lock_keys mm/slab.c
--- a/mm/slab.c~slab-fix-init_lock_keys
+++ a/mm/slab.c
@@ -565,7 +565,7 @@ static void init_node_lock_keys(int q)
if (slab_state < UP)
return;

- for (i = 1; i < PAGE_SHIFT + MAX_ORDER; i++) {
+ for (i = 1; i <= KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache_node *n;
struct kmem_cache *cache = kmalloc_caches[i];

_

2013-07-07 15:58:58