2004-01-19 12:20:17

by Gerd Knorr

[permalink] [raw]
Subject: Re: Fw: Slab coruption and oops with 2.6.1-mm4

> heh, this is the same bug. Last time we were unlocking an unlocked page.
> Now we're freeing a free page.

Yes. Still no idea why that happens through ...

> CONFIG_PREEMPT=y

Bug reproducable with this one turned off?

> Slab corruption: start=c57c2000, len=4096
> 000: 6e 72 6d 71 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> bttv0: skipped frame. no signal? high irq latency? [main=b030000,o_vbi=b030018,o_field=5378000,rc=537801c]
> ------------[ cut here ]------------
> kernel BUG at include/linux/mm.h:275!

page_cache_release()

> EIP is at videobuf_dma_free+0xa9/0xc0 [video_buf]

The code calling page_cache_release looks like this ...

if (dma->pages) {
int i;
for (i=0; i < dma->nr_pages; i++)
page_cache_release(dma->pages[i]);
kfree(dma->pages);
dma->pages = NULL;
}

... even with videobuf_dma_free() called twice by mistake that shouldn't
double-free the pages. Maybe videobuf_dma_free() is called from two
places at the same time because one of the call paths misses a lock, but
I can't find any on a quick review. Hmm.

Does transcode use threads? If so, does it call into bttv from
different threads?

> Call Trace:
> [<d08f3a70>] bttv_dma_free+0x60/0xa0 [bttv]
> [<d08ede63>] bttv_do_ioctl+0x403/0x16a0 [bttv]

must be VIDIOCSYNC ioctl.

> [<c0335498>] video_usercopy+0xe8/0x1e0
> [<d08ef13e>] bttv_ioctl+0x3e/0x70 [bttv]
> [<c0168ef3>] sys_ioctl+0xf3/0x280
> [<c042e9b7>] syscall_call+0x7/0xb

Gerd

--
"... und auch das ganze Wochenende oll" -- Wetterbericht auf RadioEins


2004-01-19 16:25:24

by Gerd Knorr

[permalink] [raw]
Subject: Re: Fw: Slab coruption and oops with 2.6.1-mm4

> > CONFIG_PREEMPT=y
>
> Bug reproducable with this one turned off?

Hmm, running -mm4 with CONFIG_PREEMPT now, box loaded with bttv capture
+ parallel kernel builds, no problems so far ...

> > Slab corruption: start=c57c2000, len=4096
^^^^^^^^
> > 000: 6e 72 6d 71 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > bttv0: skipped frame. no signal? high irq latency? [main=b030000,o_vbi=b030018,o_field=5378000,rc=537801c]

Who is this? Is this allocated by bttv? Or someone else corrupts
memory here?

btcx-risc and video-buf have a "debug=1" insmod option, bttv has
"bttv_debug=1". Those make bttv verbose (*plenty* of log, so better
don't try all three at the same time ...) and also log addresses of
(some) allocated memory blocks.

btcx-risc calls pci_alloc_consistent() and thus does PAGE_SIZE
allocations, that one likely is a good candidate to start with.

Gerd

2004-01-20 00:31:29

by caszonyi

[permalink] [raw]
Subject: Re: Fw: Slab coruption and oops with 2.6.1-mm4

On Mon, 19 Jan 2004, Gerd Knorr wrote:

> > > CONFIG_PREEMPT=y
> >
> > Bug reproducable with this one turned off?
>
> Hmm, running -mm4 with CONFIG_PREEMPT now, box loaded with bttv capture
> + parallel kernel builds, no problems so far ...
>

yes
bug is reproduceable with preempt turned off

Transcode uses threads to capture and encode the movie. However i don't
know how many threads are allocated for capturing.

I tried to run transcode with only one thread for encoding and one buffer
for capturing (option -u 1,1 )

When writing this email (after the oops) i also got this:

MCE: The hardware reports a non fatal, correctable incident occurred on
CPU 0.
Bank 1: 9400000000000151

> > > Slab corruption: start=c57c2000, len=4096
> ^^^^^^^^
> > > 000: 6e 72 6d 71 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > > bttv0: skipped frame. no signal? high irq latency? [main=b030000,o_vbi=b030018,o_field=5378000,rc=537801c]
>
> Who is this? Is this allocated by bttv? Or someone else corrupts
> memory here?
>
> btcx-risc and video-buf have a "debug=1" insmod option, bttv has
> "bttv_debug=1". Those make bttv verbose (*plenty* of log, so better
> don't try all three at the same time ...) and also log addresses of
> (some) allocated memory blocks.
>
> btcx-risc calls pci_alloc_consistent() and thus does PAGE_SIZE
> allocations, that one likely is a good candidate to start with.
>

see atachment
this is with btcx-risc and video-buf with debug=1

i tried also with bttv_debug=1 when loadijng bttv module but haven't
noticed anything strange.
I can send the debug messages from bttv if you want.

> Gerd
>

--
"A mouse is a device used to point at
the xterm you want to type in".
Kim Alm on a.s.r.


Attachments:
debug (15.19 kB)

2004-01-20 01:26:58

by Mike Fedyk

[permalink] [raw]
Subject: Re: Slab coruption and oops with 2.6.1-mm4

On Tue, Jan 20, 2004 at 02:27:35AM +0200, [email protected] wrote:
> When writing this email (after the oops) i also got this:
>
> MCE: The hardware reports a non fatal, correctable incident occurred on
> CPU 0.
> Bank 1: 9400000000000151

Ok, run memtest86 on the machine with at least one pass through "all tests"
(that should take several hours depending on memory size, and bandwidth).

Check your power supply, and power source (power from the wall, etc.).

Mike