2003-03-28 03:12:00

by Zwane Mwaikambo

[permalink] [raw]
Subject: task_struct slab cache use after free in 2.5.66

I'm having a few stability problems with 2.5.66 under test loads. I
can't quite parse the slab debugging stuff. Is this actually useful to
anyone?

Slab corruption: start=c1f23380, expend=c1f2399f, problemat=c1f23388
Data: ********6A
*******************************************************************************************************
Next: ********************************
slab error in check_poison_obj(): cache `task_struct': object was modified after freeing
Call Trace:
[<c0142953>] check_poison_obj+0x123/0x170
[<c0144337>] kmem_cache_alloc+0x117/0x160
[<c011fdde>] dup_task_struct+0x9e/0xc0
[<c011fdde>] dup_task_struct+0x9e/0xc0
[<c0120b82>] copy_process+0x82/0xe30
[<c0126d3b>] do_softirq+0xbb/0xc0
[<c012196f>] do_fork+0x3f/0x170
[<c01077b7>] sys_fork+0x17/0x30
[<c0109497>] syscall_call+0x7/0xb


--
function.linuxpower.ca


2003-03-28 03:21:23

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: task_struct slab cache use after free in 2.5.66

On Thu, 27 Mar 2003, Zwane Mwaikambo wrote:

> I'm having a few stability problems with 2.5.66 under test loads. I
> can't quite parse the slab debugging stuff. Is this actually useful to
> anyone?

P5/SMP 192M All the slab debugging nobs on.

--
function.linuxpower.ca

2003-03-28 04:11:44

by Andrew Morton

[permalink] [raw]
Subject: Re: task_struct slab cache use after free in 2.5.66

Zwane Mwaikambo <[email protected]> wrote:
>
> I'm having a few stability problems with 2.5.66 under test loads. I
> can't quite parse the slab debugging stuff. Is this actually useful to
> anyone?
>
> Slab corruption: start=c1f23380, expend=c1f2399f, problemat=c1f23388
> Data: ********6A
> *******************************************************************************************************
> Next: ********************************
> slab error in check_poison_obj(): cache `task_struct': object was modified after freeing
> Call Trace:
> [<c0142953>] check_poison_obj+0x123/0x170
> [<c0144337>] kmem_cache_alloc+0x117/0x160
> [<c011fdde>] dup_task_struct+0x9e/0xc0
> [<c011fdde>] dup_task_struct+0x9e/0xc0
> [<c0120b82>] copy_process+0x82/0xe30
> [<c0126d3b>] do_softirq+0xbb/0xc0
> [<c012196f>] do_fork+0x3f/0x170
> [<c01077b7>] sys_fork+0x17/0x30
> [<c0109497>] syscall_call+0x7/0xb
>

That's the second report of this. Someone did a put_task_struct against a
freed task_struct. I'll cook up a debug patch to trap it. Something like
this:


diff -puN include/linux/sched.h~put_task_struct-debug include/linux/sched.h
--- 25/include/linux/sched.h~put_task_struct-debug 2003-03-27 20:21:16.000000000 -0800
+++ 25-akpm/include/linux/sched.h 2003-03-27 20:22:49.000000000 -0800
@@ -443,12 +443,17 @@ struct task_struct {

unsigned long ptrace_message;
siginfo_t *last_siginfo; /* For ptrace use. */
+ long debug;
};

extern void __put_task_struct(struct task_struct *tsk);
#define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
-#define put_task_struct(tsk) \
-do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0)
+#define put_task_struct(tsk) \
+ do { \
+ BUG_ON((tsk)->debug == 0x6b6b6b6b); \
+ if (atomic_dec_and_test(&(tsk)->usage)) \
+ __put_task_struct(tsk); \
+ } while (0)

/*
* Per process flags

_

2003-03-28 04:18:18

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: task_struct slab cache use after free in 2.5.66

On Thu, 27 Mar 2003, Andrew Morton wrote:

> That's the second report of this. Someone did a put_task_struct against a
> freed task_struct. I'll cook up a debug patch to trap it. Something like
> this:

Well there is also the detach_pid bug which suddenly vanished... I'm not
aware of anyone sending a patch for that (but i can't be certain i havent
been keeping up with lkml lately). I posted some debug information for
that one about a week ago:

Pine.LNX.4.50.0303221152460.18911-100000@montezuma.mastecende.com

Manfred?

--
function.linuxpower.ca

2003-03-28 06:14:16

by Manfred Spraul

[permalink] [raw]
Subject: Re: task_struct slab cache use after free in 2.5.66

Zwane wrote:

>On Thu, 27 Mar 2003, Andrew Morton wrote:
>
>> That's the second report of this. Someone did a put_task_struct against a
>> freed task_struct. I'll cook up a debug patch to trap it. Something like
>> this:
>
>Well there is also the detach_pid bug which suddenly vanished... I'm not
>aware of anyone sending a patch for that (but i can't be certain i havent
>been keeping up with lkml lately). I posted some debug information for
>that one about a week ago:
>
>Pine.LNX.4.50.0303221152460.18911-100000@montezuma.mastecende.com
>
>Manfred?

I didn't send a patch. One thing I noticed from reading zwane's bug report is that the reference count of the task structure was off by at least 3:

__detach_pid(p, PIDTYPE_PGID)

failed because the task structure was already filled with slab poison.

--
Manfred






2003-03-28 06:17:31

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: task_struct slab cache use after free in 2.5.66

On Fri, 28 Mar 2003, Manfred Spraul wrote:

> I didn't send a patch. One thing I noticed from reading zwane's bug
> report is that the reference count of the task structure was off by at
> least 3:
>
> __detach_pid(p, PIDTYPE_PGID)
>
> failed because the task structure was already filled with slab poison.

Ok i'm going to give Andrew's debug patch a go.

--
function.linuxpower.ca

2003-03-28 06:14:52

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: task_struct slab cache use after free in 2.5.66

On Thu, 27 Mar 2003, Zwane Mwaikambo wrote:

> On Thu, 27 Mar 2003, Andrew Morton wrote:
>
> > That's the second report of this. Someone did a put_task_struct against a
> > freed task_struct. I'll cook up a debug patch to trap it. Something like
> > this:
>
> Well there is also the detach_pid bug which suddenly vanished... I'm not
> aware of anyone sending a patch for that (but i can't be certain i havent
> been keeping up with lkml lately). I posted some debug information for
> that one about a week ago:
>
> Pine.LNX.4.50.0303221152460.18911-100000@montezuma.mastecende.com

I thought i was home free after the smp_call_function_interrupt bug :(

detach_pid is back...

Unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip:
c0134b8c
*pde = 00000000
Oops: 0002 [#1]
CPU: 0
EIP: 0060:[<c0134b8c>] Not tainted
EFLAGS: 00010046
EIP is at detach_pid+0x1c/0xf0
eax: 6b6b6b6b ebx: 6b6b6b6b ecx: 6b6b6b6b edx: c39bc6b0
esi: 00000000 edi: bffff228 ebp: 00000000 esp: c245bf08
ds: 007b es: 007b ss: 0068
Process find (pid: 1892, threadinfo=c245a000 task=c5bd8100)
Stack: c39bc660 00000000 bffff228 00000000 c012391c c39bc660 c0123a0c c39bc660
c39bc660 c39bcc24 c39bc660 00005608 c01257fb c39bc660 bffff228 bffff228
c39bc704 c39bc660 c5bd8100 c5bd819c c0125cb1 c39bc660 bffff228 00000000
Call Trace:
[<c012391c>] __unhash_process+0x10c/0x170
[<c0123a0c>] release_task+0x8c/0x200
[<c01257fb>] wait_task_zombie+0x15b/0x1c0
[<c0125cb1>] sys_wait4+0x241/0x290
[<c011d110>] default_wake_function+0x0/0x20
[<c011d110>] default_wake_function+0x0/0x20
[<c0109497>] syscall_call+0x7/0xb

--
function.linuxpower.ca