LinuxLists.cc - repeatable slab corruption with LTP msgctl08

2008-06-12 05:15:11

Subject: repeatable slab corruption with LTP msgctl08

Running current mainline on my old 2-way PIII. Distro is RH FC1. LTP
version is ltp-full-20070228 (lots of retro-computing there).

Config is at http://userweb.kernel.org/~akpm/config-vmm.txt

./testcases/bin/msgctl08 crashes after ten minutes or so:

slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:

000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
080: 10 00 00 00
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP
Modules linked in:

Pid: 3348, comm: msgctl08 Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c017a35b>] EFLAGS: 00010086 CPU: 0
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: f2905083 ECX: 00000001 EDX: f20ee670
ESI: f2905000 EDI: 00000084 EBP: f4671e88 ESP: f4671e64
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 3348, ti=f4670000 task=f20ee670 task.ti=f4670000)
Stack: c0472f2b 00000000 0000001a f2905000 00000014 f7c01500 ffffffff 0000000e
f2905000 f4671eec c017b69f 00000010 000000d0 f20ee670 f7c032ac 000000d0
f7c01500 0000000e f7c03288 f7c06df0 f29ec088 00000098 00000000 00000000
Call Trace:
[<c017b69f>] ? cache_alloc_refill+0xcf/0x6b0
[<c017bdd4>] ? __kmalloc+0x154/0x160
[<c0257663>] ? load_msg+0x33/0x150
[<c0257663>] ? load_msg+0x33/0x150
[<c0257dfb>] ? do_msgsnd+0x17b/0x2e0
[<c0257cc9>] ? do_msgsnd+0x49/0x2e0
[<c0126f1f>] ? __do_softirq+0x6f/0x100
[<c0126e58>] ? _local_bh_enable+0x48/0xa0
[<c0257f92>] ? sys_msgsnd+0x32/0x40
[<c0106e12>] ? sys_ipc+0xb2/0x240
[<c0102f58>] ? sysenter_past_esp+0xa5/0xb1
[<c0102f1d>] ? sysenter_past_esp+0x6a/0xb1
=======================
Code: 86 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 00 39 f8 76 0b 43 f7 c7 0f 00 00 00 75 d2 eb bd c7 04 24 2b 2f 47 c0 e8 85 86 fa ff <0f> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 6b ff ff ff 90
EIP: [<c017a35b>] check_slabp+0xeb/0x100 SS:ESP 0068:f4671e64
---[ end trace d7a2cbbb5a3654be ]---

Pekka (or Christoph): could you please decrypt the slab bits for us?

Thanks.

2008-06-12 06:35:29

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Wed, 11 Jun 2008 22:13:24 -0700 Andrew Morton <[email protected]> wrote:

> Running current mainline on my old 2-way PIII. Distro is RH FC1. LTP
> version is ltp-full-20070228 (lots of retro-computing there).
>
> Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
>
>
> ./testcases/bin/msgctl08 crashes after ten minutes or so:

ah, it runs to completion in about ten seconds on 2.6.25, so it'll be
easy for someone to bisect it.

What's that? Sigh. OK. I wasn't doing anything much anyway.

2008-06-12 08:03:54

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Wed, 11 Jun 2008 23:34:49 -0700 Andrew Morton <[email protected]> wrote:

> On Wed, 11 Jun 2008 22:13:24 -0700 Andrew Morton <[email protected]> wrote:
>
> > Running current mainline on my old 2-way PIII. Distro is RH FC1. LTP
> > version is ltp-full-20070228 (lots of retro-computing there).
> >
> > Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
> >
> >
> > ./testcases/bin/msgctl08 crashes after ten minutes or so:
>
> ah, it runs to completion in about ten seconds on 2.6.25, so it'll be
> easy for someone to bisect it.
>
> What's that? Sigh. OK. I wasn't doing anything much anyway.

Oh drat. git-bisect tells me that this one-year-old msgctl08's
execution time vastly increased when we added

commit f7bf3df8be72d98afa84f5ff183e14c1ba1e560d
Author: Nadia Derbey <[email protected]>
Date: Tue Apr 29 01:00:39 2008 -0700

ipc: scale msgmni to the amount of lowmem

But we already knew that, and LTP got changed to fix it.

So I was wrong in assuming that the long-execution-time correlates with
the slab-corruption bug.

And the slab corruption bug takes half an hour to reproduce and an
unknown amount of time to not-reproduce. I don't think I'll be able to
complete this before I disappear for over a week.

2008-06-12 08:16:37

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Thu, 12 Jun 2008 01:02:00 -0700 Andrew Morton <[email protected]> wrote:

> On Wed, 11 Jun 2008 23:34:49 -0700 Andrew Morton <[email protected]> wrote:
>
> > On Wed, 11 Jun 2008 22:13:24 -0700 Andrew Morton <[email protected]> wrote:
> >
> > > Running current mainline on my old 2-way PIII. Distro is RH FC1. LTP
> > > version is ltp-full-20070228 (lots of retro-computing there).
> > >
> > > Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
> > >
> > >
> > > ./testcases/bin/msgctl08 crashes after ten minutes or so:
> >
> > ah, it runs to completion in about ten seconds on 2.6.25, so it'll be
> > easy for someone to bisect it.
> >
> > What's that? Sigh. OK. I wasn't doing anything much anyway.
>
> Oh drat. git-bisect tells me that this one-year-old msgctl08's
> execution time vastly increased when we added
>
> commit f7bf3df8be72d98afa84f5ff183e14c1ba1e560d
> Author: Nadia Derbey <[email protected]>
> Date: Tue Apr 29 01:00:39 2008 -0700
>
> ipc: scale msgmni to the amount of lowmem
>
>
> But we already knew that, and LTP got changed to fix it.
>
> So I was wrong in assuming that the long-execution-time correlates with
> the slab-corruption bug.
>
> And the slab corruption bug takes half an hour to reproduce and an
> unknown amount of time to not-reproduce. I don't think I'll be able to
> complete this before I disappear for over a week.
>

Doing

echo 16 > /proc/sys/kernel/msgmni

on the tree at the above bisection point
(f7bf3df8be72d98afa84f5ff183e14c1ba1e560d is most-recently-applied
patch), msgctl08 runs to completion in 15.7 seconds.

Doing the same thing on 2.6.26-rc5-mm3, msgctl08 also runs to
completion, in 20.9 seconds. So

- it got slower

- the crash requires the huge value of /proc/sys/kernel/msgmni (1746)
to reproduce.

2008-06-12 08:17:49

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Thu, 12 Jun 2008 01:15:37 -0700 Andrew Morton <[email protected]> wrote:

> On Thu, 12 Jun 2008 01:02:00 -0700 Andrew Morton <[email protected]> wrote:
>
> > On Wed, 11 Jun 2008 23:34:49 -0700 Andrew Morton <[email protected]> wrote:
> >
> > > On Wed, 11 Jun 2008 22:13:24 -0700 Andrew Morton <[email protected]> wrote:
> > >
> > > > Running current mainline on my old 2-way PIII. Distro is RH FC1. LTP
> > > > version is ltp-full-20070228 (lots of retro-computing there).
> > > >
> > > > Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
> > > >
> > > >
> > > > ./testcases/bin/msgctl08 crashes after ten minutes or so:
> > >
> > > ah, it runs to completion in about ten seconds on 2.6.25, so it'll be
> > > easy for someone to bisect it.
> > >
> > > What's that? Sigh. OK. I wasn't doing anything much anyway.
> >
> > Oh drat. git-bisect tells me that this one-year-old msgctl08's
> > execution time vastly increased when we added
> >
> > commit f7bf3df8be72d98afa84f5ff183e14c1ba1e560d
> > Author: Nadia Derbey <[email protected]>
> > Date: Tue Apr 29 01:00:39 2008 -0700
> >
> > ipc: scale msgmni to the amount of lowmem
> >
> >
> > But we already knew that, and LTP got changed to fix it.
> >
> > So I was wrong in assuming that the long-execution-time correlates with
> > the slab-corruption bug.
> >
> > And the slab corruption bug takes half an hour to reproduce and an
> > unknown amount of time to not-reproduce. I don't think I'll be able to
> > complete this before I disappear for over a week.
> >
>
> Doing
>
> echo 16 > /proc/sys/kernel/msgmni
>
> on the tree at the above bisection point
> (f7bf3df8be72d98afa84f5ff183e14c1ba1e560d is most-recently-applied
> patch), msgctl08 runs to completion in 15.7 seconds.
>
> Doing the same thing on 2.6.26-rc5-mm3, msgctl08 also runs to
> completion, in 20.9 seconds. So
>
> - it got slower
>
> - the crash requires the huge value of /proc/sys/kernel/msgmni (1746)

actually, it's 1741.

> to reproduce.
>

2008-06-12 08:36:14

by Andi Kleen

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

Andrew Morton <[email protected]> writes:
>
> Doing the same thing on 2.6.26-rc5-mm3, msgctl08 also runs to
> completion, in 20.9 seconds. So
>
> - it got slower

That is because it scales itself to the number of available msg queues.
So with Nadia's patch there are more and it runs slower.

In fact it seems to start one process per message queue, so perhaps it's
just running out of processes or something. Ok it should not segfault.

BTW a great way to debug slab corruptions with LTP faster is to run with
a slab thrasher stress module like http://firstfloor.org/~andi/crasher-26.diff

-Andi

2008-06-12 09:09:37

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Thu, 12 Jun 2008 10:35:55 +0200 Andi Kleen <[email protected]> wrote:

> Andrew Morton <[email protected]> writes:
> >
> > Doing the same thing on 2.6.26-rc5-mm3, msgctl08 also runs to
> > completion, in 20.9 seconds. So
> >
> > - it got slower
>
> That is because it scales itself to the number of available msg queues.
> So with Nadia's patch there are more and it runs slower.
>
> In fact it seems to start one process per message queue, so perhaps it's
> just running out of processes or something. Ok it should not segfault.
>
> BTW a great way to debug slab corruptions with LTP faster is to run with
> a slab thrasher stress module like http://firstfloor.org/~andi/crasher-26.diff
>

Something like that might be needed. I ran it again and it took 49
minutes to crash.

2008-06-12 10:25:42

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Thu, 12 Jun 2008 10:35:55 +0200 Andi Kleen <[email protected]> wrote:

> BTW a great way to debug slab corruptions with LTP faster is to run with
> a slab thrasher stress module like http://firstfloor.org/~andi/crasher-26.diff

Well I tried that. It didn't actually seem to do much (no CPU time
consumed) so I revved it up a bit:

--- a/drivers/char/crasher.c~crasher-26-speedup
+++ a/drivers/char/crasher.c
@@ -59,7 +59,7 @@ struct mem_buf {
static unsigned long crasher_random(void)
{
rand_seed = rand_seed*69069L+1;
- return rand_seed^jiffies;
+ return (rand_seed^jiffies) & 3;
}

void crasher_srandom(unsigned long entropy)
_

But it hasn't crashed after 57 minutes.

I don't think that is how we should fix this bug ;)

I'm pretty much out of time on this one.

2008-06-12 10:35:32

by Pekka Enberg

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

Hi Andrew,

On Wed, 11 Jun 2008, Andrew Morton wrote:
> version is ltp-full-20070228 (lots of retro-computing there).
>
> Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
>
> ./testcases/bin/msgctl08 crashes after ten minutes or so:
>
> slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:
>
> 000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
> 010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
> 020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> 030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> 040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
> 050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
> 060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
> 070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> 080: 10 00 00 00

Looking at the above dump, slabp->free is 0x0f and the bufctl it points to
is 0xff ("BUFCTL_END") which marks the last element in the chain. This is
wrong as the total number of objects in the slab (cachep->num) is 26 but
the number of objects in use (slabp->inuse) is 20. So somehow you have
managed to lost 6 objects from the bufctl chain.

I really don't understand how your bufctl chains has so many BUFCTL_END
elements in the first place. It's doesn't look like the memory has been
stomped on (slab->s_mem, for example, is 0xf2906088), so I'd look for a
double kfree() of size 128 somewhere...

Pekka

2008-06-12 10:42:18

by Andi Kleen

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

>
> But it hasn't crashed after 57 minutes.
>
> I don't think that is how we should fix this bug ;)

This means it doesn't corrupt other slabs. Maybe only its own?

-Andi

2008-06-12 17:13:29

by Christoph Lameter

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Thu, 12 Jun 2008, Pekka J Enberg wrote:

> I really don't understand how your bufctl chains has so many BUFCTL_END
> elements in the first place. It's doesn't look like the memory has been
> stomped on (slab->s_mem, for example, is 0xf2906088), so I'd look for a
> double kfree() of size 128 somewhere...

Looks pretty strange. Could this be rerun with SLAB_DEBUG or with SLUB
with full debugging?

2008-06-12 18:33:41

by Manfred Spraul

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

Pekka J Enberg wrote:
> Hi Andrew,
>
> On Wed, 11 Jun 2008, Andrew Morton wrote:
>
>> version is ltp-full-20070228 (lots of retro-computing there).
>>
>> Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
>>
>> ./testcases/bin/msgctl08 crashes after ten minutes or so:
>>
>> slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:
>>
>> 000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
>> 010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
>> 020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
>> 030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
>> 040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
>> 050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
>> 060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
>> 070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
>> 080: 10 00 00 00
>>
>
> Looking at the above dump, slabp->free is 0x0f and the bufctl it points to
> is 0xff ("BUFCTL_END") which marks the last element in the chain. This is
> wrong as the total number of objects in the slab (cachep->num) is 26 but
> the number of objects in use (slabp->inuse) is 20. So somehow you have
> managed to lost 6 objects from the bufctl chain.
>
>
Hmm. double kfree() should be cached by the redzone code.
And I disagree with your link interpretation:

000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010:
inuse: 14 00 00 00 (20 entries in use, 6 should be free)
free: 0f 00 00 00
nodeid: 00 00 00 00
bufctl[0x00] ff ff ff ff 020: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x4] fd ff ff ff 030: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x8] fd ff ff ff 040: fd ff ff ff fd ff ff ff 00 00 00 00
bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff 19 00 00 00
bufctl[0x10] 17 00 00 00 060: fd ff ff ff fd ff ff ff 0b 00 00 00
bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff fd ff ff ff
bufctl[0x18] fd ff ff ff 080: 10 00 00 00

free: points to entry 0x0f.
bufctl[0x0f] is 0x19, i.e. it points to entry 0x19
0x19 points to 0x10
0x10 points to 0x17
0x17 is a BUFCTL_ACTIVE - that's a bug.
but: 0x13 is a valid link entry, is points to 0x0b
0x0b points to 0x00, which is BUFCTL_END.

IMHO the most probable bug is a single bit error:
bufctl[0x10] should be 0x13 instead of 0x17.

What about printing all redzone words? That would allow us to validate the bufctl chain.

Andrew: Could you post the new oops?

--
Manfred

2008-06-12 18:43:18

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Thu, 12 Jun 2008 20:33:23 +0200 Manfred Spraul <[email protected]> wrote:

> Pekka J Enberg wrote:
> > Hi Andrew,
> >
> > On Wed, 11 Jun 2008, Andrew Morton wrote:
> >
> >> version is ltp-full-20070228 (lots of retro-computing there).
> >>
> >> Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
> >>
> >> ./testcases/bin/msgctl08 crashes after ten minutes or so:
> >>
> >> slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:
> >>
> >> 000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
> >> 010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
> >> 020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> >> 030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> >> 040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
> >> 050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
> >> 060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
> >> 070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> >> 080: 10 00 00 00
> >>
> >
> > Looking at the above dump, slabp->free is 0x0f and the bufctl it points to
> > is 0xff ("BUFCTL_END") which marks the last element in the chain. This is
> > wrong as the total number of objects in the slab (cachep->num) is 26 but
> > the number of objects in use (slabp->inuse) is 20. So somehow you have
> > managed to lost 6 objects from the bufctl chain.
> >
> >
> Hmm. double kfree() should be cached by the redzone code.
> And I disagree with your link interpretation:
>
> 000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
> 010:
> inuse: 14 00 00 00 (20 entries in use, 6 should be free)
> free: 0f 00 00 00
> nodeid: 00 00 00 00
> bufctl[0x00] ff ff ff ff 020: fd ff ff ff fd ff ff ff fd ff ff ff
> bufctl[0x4] fd ff ff ff 030: fd ff ff ff fd ff ff ff fd ff ff ff
> bufctl[0x8] fd ff ff ff 040: fd ff ff ff fd ff ff ff 00 00 00 00
> bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff 19 00 00 00
> bufctl[0x10] 17 00 00 00 060: fd ff ff ff fd ff ff ff 0b 00 00 00
> bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff fd ff ff ff
> bufctl[0x18] fd ff ff ff 080: 10 00 00 00
>
> free: points to entry 0x0f.
> bufctl[0x0f] is 0x19, i.e. it points to entry 0x19
> 0x19 points to 0x10
> 0x10 points to 0x17
> 0x17 is a BUFCTL_ACTIVE - that's a bug.
> but: 0x13 is a valid link entry, is points to 0x0b
> 0x0b points to 0x00, which is BUFCTL_END.
>
> IMHO the most probable bug is a single bit error:
> bufctl[0x10] should be 0x13 instead of 0x17.
>
> What about printing all redzone words? That would allow us to validate the bufctl chain.
>
> Andrew: Could you post the new oops?
>

umm, what new oops?

I have four saved away here:

slab: Internal list corruption detected in cache 'size-96'(32), slabp ea2a5040(28). Hexdump:

000: 20 90 b5 ec 88 54 80 f7 e0 00 00 00 e0 50 2a ea
010: 1c 00 00 00 17 00 00 00 00 00 00 00 fd ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
050: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
060: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
070: fd ff ff ff fd ff ff ff 18 00 00 00 1f 00 00 00
080: fd ff ff ff fd ff ff ff 1c 00 00 00 ff ff ff ff
090: fd ff ff ff fd ff ff ff fd ff ff ff
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP
last sysfs file:
Modules linked in:

Pid: 5535, comm: msgctl08 Not tainted (2.6.26-rc5-mm3 #3)
EIP: 0060:[<c0184b5b>] EFLAGS: 00010082 CPU: 0
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: ea2a50db ECX: 00000001 EDX: eea93230
ESI: ea2a5040 EDI: 0000009c EBP: eea95e84 ESP: eea95e60
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 5535, ti=eea94000 task=eea93230 task.ti=eea94000)
Stack: c048897a 000000ff 00000020 ea2a5040 0000001c f7801140 f7822478 00000010
ea2a5040 eea95eec c0185eaf 00000000 00000010 000000d0 00000001 f78054ac
000000d0 f7801140 00000010 f7805488 f7808d38 c063a068 eea935e8 00000001
Call Trace:
[<c0185eaf>] ? cache_alloc_refill+0xcf/0x6c0
[<c01865f4>] ? __kmalloc+0x154/0x160
[<c0264113>] ? load_msg+0x33/0x150
[<c0264113>] ? load_msg+0x33/0x150
[<c02648ab>] ? do_msgsnd+0x17b/0x2e0
[<c0264779>] ? do_msgsnd+0x49/0x2e0
[<c02647d8>] ? do_msgsnd+0xa8/0x2e0
[<c0264a42>] ? sys_msgsnd+0x32/0x40
[<c0106eb2>] ? sys_ipc+0xb2/0x240
[<c02855e4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0102f35>] ? sysenter_past_esp+0x6a/0xa5
=======================
Code: 02 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 00 39 f8 76 0b 43 f7 c7 0f 00 00 00 75 d2 eb bd c7 04 24 7a 89 48 c0 e8 25 02 fa ff <0f> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 6b ff ff ff 90
EIP: [<c0184b5b>] check_slabp+0xeb/0x100 SS:ESP 0068:eea95e60
---[ end trace ec15c778c59809d4 ]---

slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:

000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
080: 10 00 00 00
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP
Modules linked in:

Pid: 3348, comm: msgctl08 Not tainted (2.6.26-rc5 #1)
EIP: 0060:[<c017a35b>] EFLAGS: 00010086 CPU: 0
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: f2905083 ECX: 00000001 EDX: f20ee670
ESI: f2905000 EDI: 00000084 EBP: f4671e88 ESP: f4671e64
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 3348, ti=f4670000 task=f20ee670 task.ti=f4670000)
Stack: c0472f2b 00000000 0000001a f2905000 00000014 f7c01500 ffffffff 0000000e
f2905000 f4671eec c017b69f 00000010 000000d0 f20ee670 f7c032ac 000000d0
f7c01500 0000000e f7c03288 f7c06df0 f29ec088 00000098 00000000 00000000
Call Trace:
[<c017b69f>] ? cache_alloc_refill+0xcf/0x6b0
[<c017bdd4>] ? __kmalloc+0x154/0x160
[<c0257663>] ? load_msg+0x33/0x150
[<c0257663>] ? load_msg+0x33/0x150
[<c0257dfb>] ? do_msgsnd+0x17b/0x2e0
[<c0257cc9>] ? do_msgsnd+0x49/0x2e0
[<c0126f1f>] ? __do_softirq+0x6f/0x100
[<c0126e58>] ? _local_bh_enable+0x48/0xa0
[<c0257f92>] ? sys_msgsnd+0x32/0x40
[<c0106e12>] ? sys_ipc+0xb2/0x240
[<c0102f58>] ? sysenter_past_esp+0xa5/0xb1
[<c0102f1d>] ? sysenter_past_esp+0x6a/0xb1
=======================
Code: 86 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 00 39 f8 76 0b 43 f7 c7 0f 00 00 00 75 d2 eb bd c7 04 24 2b 2f 47 c0 e8 85 86 fa ff <0f> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 6b ff ff ff 90
EIP: [<c017a35b>] check_slabp+0xeb/0x100 SS:ESP 0068:f4671e64
---[ end trace d7a2cbbb5a3654be ]---

slab: Internal list corruption detected in cache 'size-128'(26), slabp f7159000(18). Hexdump:

000: 00 f0 f8 f2 88 32 c0 f7 88 00 00 00 88 90 15 f7
010: 12 00 00 00 08 00 00 00 00 00 00 00 13 00 00 00
020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
030: 06 00 00 00 ff ff ff ff fd ff ff ff 18 00 00 00
040: fd ff ff ff fd ff ff ff 17 00 00 00 fd ff ff ff
050: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
060: fd ff ff ff fd ff ff ff 05 00 00 00 fd ff ff ff
070: fd ff ff ff fd ff ff ff 00 00 00 00 0f 00 00 00
080: fd ff ff ff
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP
Modules linked in:

Pid: 3735, comm: msgctl08 Not tainted (2.6.26-rc5 #3)
EIP: 0060:[<c017a35b>] EFLAGS: 00010086 CPU: 1
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: f7159083 ECX: 00000001 EDX: efa2f120
ESI: f7159000 EDI: 00000084 EBP: efa31e88 ESP: efa31e64
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 3735, ti=efa30000 task=efa2f120 task.ti=efa30000)
Stack: c0472f2b 000000ff 0000001a f7159000 00000012 f7c01500 ffffffff 0000000e
f7159000 efa31eec c017b69f 00000010 000000d0 efa2f120 f7c032ac 000000d0
f7c01500 0000000e f7c03288 f7c1f4c8 f7184088 00000098 00000000 00000000
Call Trace:
[<c017b69f>] ? cache_alloc_refill+0xcf/0x6b0
[<c017bdd4>] ? __kmalloc+0x154/0x160
[<c0257663>] ? load_msg+0x33/0x150
[<c0257663>] ? load_msg+0x33/0x150
[<c0257dfb>] ? do_msgsnd+0x17b/0x2e0
[<c0257cc9>] ? do_msgsnd+0x49/0x2e0
[<c0257d28>] ? do_msgsnd+0xa8/0x2e0
[<c0257f92>] ? sys_msgsnd+0x32/0x40
[<c0106e12>] ? sys_ipc+0xb2/0x240
[<c0102f58>] ? sysenter_past_esp+0xa5/0xb1
[<c0102f1d>] ? sysenter_past_esp+0x6a/0xb1
=======================
Code: 86 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 00 39 f8 76 0b 43 f7 c7 0f 00 00 00 75 d2 eb bd c7 04 24 2b 2f 47 c0 e8 85 86 fa ff <0f> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 6b ff ff ff 90
EIP: [<c017a35b>] check_slabp+0xeb/0x100 SS:ESP 0068:efa31e64
---[ end trace f87c199ef1dd2595 ]---

slab: Internal list corruption detected in cache 'size-128'(26), slabp ed9a9000(21). Hexdump:

000: 00 c0 3a f3 88 32 80 f7 88 00 00 00 88 90 9a ed
010: 15 00 00 00 12 00 00 00 00 00 00 00 fd ff ff ff
020: fd ff ff ff fd ff ff ff 07 00 00 00 fd ff ff ff
030: fd ff ff ff fd ff ff ff 08 00 00 00 0f 00 00 00
040: fd ff ff ff fd ff ff ff ff ff ff ff fd ff ff ff
050: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
060: fd ff ff ff 03 00 00 00 fd ff ff ff fd ff ff ff
070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
080: fd ff ff ff
------------[ cut here ]------------
kernel BUG at mm/slab.c:2949!
invalid opcode: 0000 [#1] SMP
last sysfs file:
Modules linked in:

Pid: 5787, comm: msgctl08 Not tainted (2.6.26-rc5-mm3 #4)
EIP: 0060:[<c0184b5b>] EFLAGS: 00010082 CPU: 0
EIP is at check_slabp+0xeb/0x100
EAX: 00000001 EBX: ed9a9083 ECX: 00000001 EDX: ef708390
ESI: ed9a9000 EDI: 00000084 EBP: eee09e84 ESP: eee09e60
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process msgctl08 (pid: 5787, ti=eee08000 task=ef708390 task.ti=eee08000)
Stack: c048897a 000000ff 0000001a ed9a9000 00000015 f7801500 ffffffff 00000003
ed9a9000 eee09eec c0185eaf 00000000 00000010 000000d0 00000001 f78032ac
000000d0 f7801500 00000003 f7803288 f7806df0 f2199088 00000098 00000001
Call Trace:
[<c0185eaf>] ? cache_alloc_refill+0xcf/0x6c0
[<c01865f4>] ? __kmalloc+0x154/0x160
[<c0264113>] ? load_msg+0x33/0x150
[<c0264113>] ? load_msg+0x33/0x150
[<c02648ab>] ? do_msgsnd+0x17b/0x2e0
[<c0264779>] ? do_msgsnd+0x49/0x2e0
[<c0142bab>] ? trace_hardirqs_on+0xb/0x10
[<c0128d78>] ? _local_bh_enable+0x48/0xa0
[<c0264a42>] ? sys_msgsnd+0x32/0x40
[<c0106eb2>] ? sys_ipc+0xb2/0x240
[<c02855e4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c0102f35>] ? sysenter_past_esp+0x6a/0xa5
[<c03d007b>] ? check_tsc_sync_source+0xfb/0x100
=======================
Code: 02 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 00 39 f8 76 0b 43 f7 c7 0f 00 00 00 75 d2 eb bd c7 04 24 7a 89 48 c0 e8 25 02 fa ff <0f> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 6b ff ff ff 90
EIP: [<c0184b5b>] check_slabp+0xeb/0x100 SS:ESP 0068:eee09e60
---[ end trace 0ba54d745d3f642c ]---

but they're all from under basically the same conditions.

2008-06-12 19:09:28

by Manfred Spraul

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

diff --git a/mm/slab.c b/mm/slab.c
index 06236e4..77c00a0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -219,6 +219,7 @@ typedef unsigned int kmem_bufctl_t;
*/
struct slab {
struct list_head list;
+ unsigned long buh;
unsigned long colouroff;
void *s_mem; /* including colour offset */
unsigned int inuse; /* num of objs active in slab */
@@ -2943,7 +2944,18 @@ bad:
i++) {
if (i % 16 == 0)
printk("\n%03x:", i);
- printk(" %02x", ((unsigned char *)slabp)[i]);
+ printk(" %0x", ((unsigned char *)slabp)[i]);
+ }
+ printk("\n");
+ if (!OFF_SLAB(cachep) && (cachep->flags & SLAB_RED_ZONE)) {
+ printk("redzone codes:\n");
+ for (i = 0; i < cachep->num; i++) {
+ void *objp = index_to_obj(cachep, slabp, i);
+
+ if (i % 2 == 0)
+ printk("\n%03x:", i);
+ printk("%llx/%llx ", *dbg_redzone1(cachep, objp), *dbg_redzone2(cachep, objp));
+ }
}
printk("\n");
BUG();

Attachments:

andrew (918.00 B)

2008-06-12 19:20:58

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Thu, 12 Jun 2008 21:09:10 +0200
Manfred Spraul <[email protected]> wrote:

> Either someone does a set_bit() or your cpu is breaking down.

Well. It is about ten years old. But this is the first sign of a
problem and it's always msgctl08.

> From looking at the the msgctl08 test: it shouldn't produce any races,
> it just does lots of bulk msgsnd()/msgrcv() operations. Always one
> thread sends, one thread receives on each queue. It's probably more a
> scheduler stresstest than anything else.
>
> Attached is a completely untested patch:
> - add 8 bytes to each slabp struct: This changes the alignment of the
> bufctl entries.
> - add a hexdump of the redzone bytes.

OK, I'll try that this evening (eight hours hence).

I'll also try increasing /proc/sys/kernel/msgmni under 2.6.25.

> Andrew: how do you log the oops?
> it might scroll of the screen.

netconsole-to-disk.

2008-06-12 19:30:21

by Pekka Enberg

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

Manfred Spraul wrote:
> Hmm. double kfree() should be cached by the redzone code.
> And I disagree with your link interpretation:
>
> 000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
> 010:
> inuse: 14 00 00 00 (20 entries in use, 6 should be free)
> free: 0f 00 00 00
> nodeid: 00 00 00 00
> bufctl[0x00] ff ff ff ff 020: fd ff ff ff fd ff ff ff fd ff ff ff
> bufctl[0x4] fd ff ff ff 030: fd ff ff ff fd ff ff ff fd ff ff ff
> bufctl[0x8] fd ff ff ff 040: fd ff ff ff fd ff ff ff 00 00 00 00
> bufctl[0x0c] fd ff ff ff 050: fd ff ff ff fd ff ff ff 19 00 00 00
> bufctl[0x10] 17 00 00 00 060: fd ff ff ff fd ff ff ff 0b 00 00 00
> bufctl[0x14] fd ff ff ff 070: fd ff ff ff fd ff ff ff fd ff ff ff
> bufctl[0x18] fd ff ff ff 080: 10 00 00 00
>
> free: points to entry 0x0f.
> bufctl[0x0f] is 0x19, i.e. it points to entry 0x19
> 0x19 points to 0x10
> 0x10 points to 0x17
> 0x17 is a BUFCTL_ACTIVE - that's a bug.
> but: 0x13 is a valid link entry, is points to 0x0b
> 0x0b points to 0x00, which is BUFCTL_END.
>
> IMHO the most probable bug is a single bit error:
> bufctl[0x10] should be 0x13 instead of 0x17.

Ah, you're, of course, right. For some reason I was looking at 8-bit
bufctls when they are, in fact, 32-bit.

Pekka

2008-06-13 07:03:10

by Nadia Derbey

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

Andrew Morton wrote:
> Running current mainline on my old 2-way PIII. Distro is RH FC1. LTP
> version is ltp-full-20070228 (lots of retro-computing there).
>
> Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
>
>
> ./testcases/bin/msgctl08 crashes after ten minutes or so:
>
> slab: Internal list corruption detected in cache 'size-128'(26), slabp f2905000(20). Hexdump:
>
> 000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
> 010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
> 020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> 030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> 040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
> 050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
> 060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
> 070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
> 080: 10 00 00 00
> ------------[ cut here ]------------
> kernel BUG at mm/slab.c:2949!
> invalid opcode: 0000 [#1] SMP
> Modules linked in:
>
> Pid: 3348, comm: msgctl08 Not tainted (2.6.26-rc5 #1)
> EIP: 0060:[<c017a35b>] EFLAGS: 00010086 CPU: 0
> EIP is at check_slabp+0xeb/0x100
> EAX: 00000001 EBX: f2905083 ECX: 00000001 EDX: f20ee670
> ESI: f2905000 EDI: 00000084 EBP: f4671e88 ESP: f4671e64
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process msgctl08 (pid: 3348, ti=f4670000 task=f20ee670 task.ti=f4670000)
> Stack: c0472f2b 00000000 0000001a f2905000 00000014 f7c01500 ffffffff 0000000e
> f2905000 f4671eec c017b69f 00000010 000000d0 f20ee670 f7c032ac 000000d0
> f7c01500 0000000e f7c03288 f7c06df0 f29ec088 00000098 00000000 00000000
> Call Trace:
> [<c017b69f>] ? cache_alloc_refill+0xcf/0x6b0
> [<c017bdd4>] ? __kmalloc+0x154/0x160
> [<c0257663>] ? load_msg+0x33/0x150
> [<c0257663>] ? load_msg+0x33/0x150
> [<c0257dfb>] ? do_msgsnd+0x17b/0x2e0
> [<c0257cc9>] ? do_msgsnd+0x49/0x2e0
> [<c0126f1f>] ? __do_softirq+0x6f/0x100
> [<c0126e58>] ? _local_bh_enable+0x48/0xa0
> [<c0257f92>] ? sys_msgsnd+0x32/0x40
> [<c0106e12>] ? sys_ipc+0xb2/0x240
> [<c0102f58>] ? sysenter_past_esp+0xa5/0xb1
> [<c0102f1d>] ? sysenter_past_esp+0x6a/0xb1
> =======================
> Code: 86 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 00 39 f8 76 0b 43 f7 c7 0f 00 00 00 75 d2 eb bd c7 04 24 2b 2f 47 c0 e8 85 86 fa ff <0f> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 6b ff ff ff 90
> EIP: [<c017a35b>] check_slabp+0xeb/0x100 SS:ESP 0068:f4671e64
> ---[ end trace d7a2cbbb5a3654be ]---
>
>
> Pekka (or Christoph): could you please decrypt the slab bits for us?
>

Andrew,

Sorry for answering so late (I only saw you e-mail yesterday afternoon).
I have just run msgctl08, setting msgmni to 1722 and turning on the SLAB
debug:

CONFIG_SLAB=y
CONFIG_SLABINFO=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

kernel: linux-2.6.26-rc5-mm1
ltp: ltp-full-20080430

But I could not reproduce the bug.

Will try to investigate more.

Regards,
Nadia

2008-06-13 08:33:17

by Nadia Derbey

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

Nadia Derbey wrote:
> Andrew Morton wrote:
>
>> Running current mainline on my old 2-way PIII. Distro is RH FC1. LTP
>> version is ltp-full-20070228 (lots of retro-computing there).
>>
>> Config is at http://userweb.kernel.org/~akpm/config-vmm.txt
>>
>>
>> ./testcases/bin/msgctl08 crashes after ten minutes or so:
>>
>> slab: Internal list corruption detected in cache 'size-128'(26), slabp
>> f2905000(20). Hexdump:
>>
>> 000: 00 e0 12 f2 88 32 c0 f7 88 00 00 00 88 50 90 f2
>> 010: 14 00 00 00 0f 00 00 00 00 00 00 00 ff ff ff ff
>> 020: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
>> 030: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
>> 040: fd ff ff ff fd ff ff ff 00 00 00 00 fd ff ff ff
>> 050: fd ff ff ff fd ff ff ff 19 00 00 00 17 00 00 00
>> 060: fd ff ff ff fd ff ff ff 0b 00 00 00 fd ff ff ff
>> 070: fd ff ff ff fd ff ff ff fd ff ff ff fd ff ff ff
>> 080: 10 00 00 00
>> ------------[ cut here ]------------
>> kernel BUG at mm/slab.c:2949!
>> invalid opcode: 0000 [#1] SMP Modules linked in:
>>
>> Pid: 3348, comm: msgctl08 Not tainted (2.6.26-rc5 #1)
>> EIP: 0060:[<c017a35b>] EFLAGS: 00010086 CPU: 0
>> EIP is at check_slabp+0xeb/0x100
>> EAX: 00000001 EBX: f2905083 ECX: 00000001 EDX: f20ee670
>> ESI: f2905000 EDI: 00000084 EBP: f4671e88 ESP: f4671e64
>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> Process msgctl08 (pid: 3348, ti=f4670000 task=f20ee670 task.ti=f4670000)
>> Stack: c0472f2b 00000000 0000001a f2905000 00000014 f7c01500 ffffffff
>> 0000000e f2905000 f4671eec c017b69f 00000010 000000d0 f20ee670
>> f7c032ac 000000d0 f7c01500 0000000e f7c03288 f7c06df0 f29ec088
>> 00000098 00000000 00000000 Call Trace:
>> [<c017b69f>] ? cache_alloc_refill+0xcf/0x6b0
>> [<c017bdd4>] ? __kmalloc+0x154/0x160
>> [<c0257663>] ? load_msg+0x33/0x150
>> [<c0257663>] ? load_msg+0x33/0x150
>> [<c0257dfb>] ? do_msgsnd+0x17b/0x2e0
>> [<c0257cc9>] ? do_msgsnd+0x49/0x2e0
>> [<c0126f1f>] ? __do_softirq+0x6f/0x100
>> [<c0126e58>] ? _local_bh_enable+0x48/0xa0
>> [<c0257f92>] ? sys_msgsnd+0x32/0x40
>> [<c0106e12>] ? sys_ipc+0xb2/0x240
>> [<c0102f58>] ? sysenter_past_esp+0xa5/0xb1
>> [<c0102f1d>] ? sysenter_past_esp+0x6a/0xb1
>> =======================
>> Code: 86 fa ff 8b 55 f0 8b 42 38 8d 04 85 1c 00 00 00 39 f8 76 0b 43
>> f7 c7 0f 00 00 00 75 d2 eb bd c7 04 24 2b 2f 47 c0 e8 85 86 fa ff <0f>
>> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 6b ff ff ff 90 EIP:
>> [<c017a35b>] check_slabp+0xeb/0x100 SS:ESP 0068:f4671e64
>> ---[ end trace d7a2cbbb5a3654be ]---
>>
>>
>> Pekka (or Christoph): could you please decrypt the slab bits for us?
>>
>
> Andrew,
>
> Sorry for answering so late (I only saw you e-mail yesterday afternoon).
> I have just run msgctl08, setting msgmni to 1722 and turning on the SLAB
> debug:
>
> CONFIG_SLAB=y
> CONFIG_SLABINFO=y
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y
>
> kernel: linux-2.6.26-rc5-mm1
> ltp: ltp-full-20080430
>
> But I could not reproduce the bug.
>
> Will try to investigate more.
>

Same result with 2.6.26-rc6.
I saw from your config file that you're running a 2.6.26-rc2. Will try
with that one.

Regards,
Nadia

2008-06-13 08:43:28

by Andrew Morton

[permalink] [raw]

Subject: Re: repeatable slab corruption with LTP msgctl08

On Fri, 13 Jun 2008 10:33:46 +0200 Nadia Derbey <[email protected]> wrote:

> > Sorry for answering so late (I only saw you e-mail yesterday afternoon).
> > I have just run msgctl08, setting msgmni to 1722 and turning on the SLAB
> > debug:
> >
> > CONFIG_SLAB=y
> > CONFIG_SLABINFO=y
> > CONFIG_DEBUG_SLAB=y
> > CONFIG_DEBUG_SLAB_LEAK=y
> >
> > kernel: linux-2.6.26-rc5-mm1
> > ltp: ltp-full-20080430
> >
> > But I could not reproduce the bug.
> >
> > Will try to investigate more.
> >
>
> Same result with 2.6.26-rc6.

erk. It'll make me a week to bisect this. I suppose I can plod away
at it in the background, but I won't be able to do that until the end
of this month.

> I saw from your config file that you're running a 2.6.26-rc2. Will try
> with that one.

No, that was just when I last checked in that machine's .config file.
It was yesterday's mainline, with that .config and `make oldconfig'.