To: linux-kernel@vger.kernel.org
Path: not-for-mail
From: Gerd Knorr <kraxel@bytesex.org>
Newsgroups: lists.linux.kernel
Subject: Re: Fw: Slab coruption and oops with 2.6.1-mm4
Date: 20 Jan 2004 12:51:26 +0100
Organization: SuSE Labs, Berlin
Message-ID: <877jzmn8ht.fsf@bytesex.org>
References: <20040118220051.3f3d8420.akpm@osdl.org> <20040119121546.GD5498@bytesex.org> <20040119160512.GB8321@bytesex.org> <Pine.LNX.4.53.0401200219170.293@grinch.ro>
NNTP-Posting-Host: localhost
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2792
Lines: 78

caszonyi@rdslink.ro writes:

> yes
> bug is reproduceable with preempt turned off

Ok.  Makes a locking flaw less likely as those tend to trigger with
preemp or smp only.

> MCE: The hardware reports a non fatal, correctable incident occurred on
> CPU 0.
> Bank 1: 9400000000000151

That pretty much looks like it is really a hardware issue.

> > > > Slab corruption: start=c57c2000, len=4096
> >                            ^^^^^^^^

> > Who is this?  Is this allocated by bttv?  Or someone else corrupts
> > memory here?

> [ bttv load messages ]
> btcx: riscmem alloc size=2320 [2]

That isn't a fresh booted box, is it?  Please reboot the machine after
every oops and before continuing testing.  With known-corrupted memory
it can oops basically everythere and those oops reports don't help
much.

> btcx: skips line 0-9999:
> btcx: riscmem free [1]
> vbuf: init user [0x43267008+0x6c000 => 109 pages]
> btcx: riscmem alloc size=3184 [2]
> btcx: riscmem free [1]
> btcx: riscmem alloc size=2320 [2]
> btcx: skips line 0-9999:
> btcx: riscmem free [1]
> vbuf: init user [0x43267008+0x6c000 => 109 pages]
> btcx: riscmem alloc size=3184 [2]
> btcx: riscmem free [1]

That was xawtv I guess?  Now transcode starting?

> vbuf: mmap setup: 32 buffers, 2129920 bytes each
> vbuf: mmap c9cfc96c: 422fd000-463fd000 pgoff 00000000 bufs 0-31
> vbuf: init user [0x42505000+0x208000 => 520 pages]
> btcx: riscmem alloc size=7820 [2]
> btcx: riscmem alloc size=7820 [3]

Oh, doesn't print the riscmem addresses.  The blocks are two-page
sized through, so the one-page allocation slab complains about above
likely doesn't come from this.

> Unable to handle kernel paging request at virtual address 25262e29
                                                            ^^^^^^^^
strange value for a kernel address, probably some corrupted pointer.

> EIP is at videobuf_dma_free+0x33/0xc0 [video_buf]
> eax: 00000000   ebx: c45a7000   ecx: 00000208   edx: 25262e29
> esi: 00000000   edi: c817cf54   ebp: d0a35720   esp: c4135c18

in edx.  "objdump -Sd video-buf.o" should help finding the instruction
and corrospending source line, but I fear that wouldn't help much as
that isn't the source of the problem but the place where it shows up.

> btcx: riscmem free [64]
> [ ... ]
> btcx: riscmem free [3]

cleanups due to transcode being killed ...

> Unable to handle kernel paging request at virtual address 25262e29

... and here it hits the very same corrupted pointer again.

  Gerd

-- 
"... und auch das ganze Wochenende oll" -- Wetterbericht auf RadioEins
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/