> heh, this is the same bug. Last time we were unlocking an unlocked page.
> Now we're freeing a free page.
Yes. Still no idea why that happens through ...
> CONFIG_PREEMPT=y
Bug reproducable with this one turned off?
> Slab corruption: start=c57c2000, len=4096
> 000: 6e 72 6d 71 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> bttv0: skipped frame. no signal? high irq latency? [main=b030000,o_vbi=b030018,o_field=5378000,rc=537801c]
> ------------[ cut here ]------------
> kernel BUG at include/linux/mm.h:275!
page_cache_release()
> EIP is at videobuf_dma_free+0xa9/0xc0 [video_buf]
The code calling page_cache_release looks like this ...
if (dma->pages) {
int i;
for (i=0; i < dma->nr_pages; i++)
page_cache_release(dma->pages[i]);
kfree(dma->pages);
dma->pages = NULL;
}
... even with videobuf_dma_free() called twice by mistake that shouldn't
double-free the pages. Maybe videobuf_dma_free() is called from two
places at the same time because one of the call paths misses a lock, but
I can't find any on a quick review. Hmm.
Does transcode use threads? If so, does it call into bttv from
different threads?
> Call Trace:
> [<d08f3a70>] bttv_dma_free+0x60/0xa0 [bttv]
> [<d08ede63>] bttv_do_ioctl+0x403/0x16a0 [bttv]
must be VIDIOCSYNC ioctl.
> [<c0335498>] video_usercopy+0xe8/0x1e0
> [<d08ef13e>] bttv_ioctl+0x3e/0x70 [bttv]
> [<c0168ef3>] sys_ioctl+0xf3/0x280
> [<c042e9b7>] syscall_call+0x7/0xb
Gerd
--
"... und auch das ganze Wochenende oll" -- Wetterbericht auf RadioEins
> > CONFIG_PREEMPT=y
>
> Bug reproducable with this one turned off?
Hmm, running -mm4 with CONFIG_PREEMPT now, box loaded with bttv capture
+ parallel kernel builds, no problems so far ...
> > Slab corruption: start=c57c2000, len=4096
^^^^^^^^
> > 000: 6e 72 6d 71 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > bttv0: skipped frame. no signal? high irq latency? [main=b030000,o_vbi=b030018,o_field=5378000,rc=537801c]
Who is this? Is this allocated by bttv? Or someone else corrupts
memory here?
btcx-risc and video-buf have a "debug=1" insmod option, bttv has
"bttv_debug=1". Those make bttv verbose (*plenty* of log, so better
don't try all three at the same time ...) and also log addresses of
(some) allocated memory blocks.
btcx-risc calls pci_alloc_consistent() and thus does PAGE_SIZE
allocations, that one likely is a good candidate to start with.
Gerd
On Mon, 19 Jan 2004, Gerd Knorr wrote:
> > > CONFIG_PREEMPT=y
> >
> > Bug reproducable with this one turned off?
>
> Hmm, running -mm4 with CONFIG_PREEMPT now, box loaded with bttv capture
> + parallel kernel builds, no problems so far ...
>
yes
bug is reproduceable with preempt turned off
Transcode uses threads to capture and encode the movie. However i don't
know how many threads are allocated for capturing.
I tried to run transcode with only one thread for encoding and one buffer
for capturing (option -u 1,1 )
When writing this email (after the oops) i also got this:
MCE: The hardware reports a non fatal, correctable incident occurred on
CPU 0.
Bank 1: 9400000000000151
> > > Slab corruption: start=c57c2000, len=4096
> ^^^^^^^^
> > > 000: 6e 72 6d 71 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > > bttv0: skipped frame. no signal? high irq latency? [main=b030000,o_vbi=b030018,o_field=5378000,rc=537801c]
>
> Who is this? Is this allocated by bttv? Or someone else corrupts
> memory here?
>
> btcx-risc and video-buf have a "debug=1" insmod option, bttv has
> "bttv_debug=1". Those make bttv verbose (*plenty* of log, so better
> don't try all three at the same time ...) and also log addresses of
> (some) allocated memory blocks.
>
> btcx-risc calls pci_alloc_consistent() and thus does PAGE_SIZE
> allocations, that one likely is a good candidate to start with.
>
see atachment
this is with btcx-risc and video-buf with debug=1
i tried also with bttv_debug=1 when loadijng bttv module but haven't
noticed anything strange.
I can send the debug messages from bttv if you want.
> Gerd
>
--
"A mouse is a device used to point at
the xterm you want to type in".
Kim Alm on a.s.r.
On Tue, Jan 20, 2004 at 02:27:35AM +0200, [email protected] wrote:
> When writing this email (after the oops) i also got this:
>
> MCE: The hardware reports a non fatal, correctable incident occurred on
> CPU 0.
> Bank 1: 9400000000000151
Ok, run memtest86 on the machine with at least one pass through "all tests"
(that should take several hours depending on memory size, and bandwidth).
Check your power supply, and power source (power from the wall, etc.).
Mike