2005-10-01 03:00:34

by Petr Vandrovec

[permalink] [raw]
Subject: 2.6.13-rc2-almost-rc3, ext3 and memory corruption

Hello,
I was so happy that 2.6.13-rc3 was released that I tried to grab it,
and while I was doing that, box said:

Slab corruption: start=ffff8100005e5bf8, len=96
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 01 00 00 00 6b 6b 6b 6b
Prev obj: start=ffff8100005e5b80, len=96
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=ffff8100005e5c70, len=96
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

It looks like that journal head's b_jcount was set to 1 after
journal head was already released. It looks like that I must
have been extremelly unlucky, as all b_jcount operations I
can find are either jh->b_jcount++ or --jh->b_jcount, so
schedule() (and free()) must have happened after b_jcount was
read from memory, but before incremented value was written
back (or I have missed some direct assignment, but I see none).
On other side all these b_jcount manipulations should be under
jbd_lock_bh_journal_head, which both disables preemption and
acquires some bitlock. So I'm again missing something fundamental,
as this use-after-free should never happen, and if it could
happen I should get 6c 6b 6b 6b and not 01 00 00 00...

Box has two ext3 filesystems, and when these oopses occured I
was running one apt-get install in /, apt-get update in the /pure64
chroot, apt-get source apt in /usr/src/debian (but maybe this one
was already finished at that time) and cg-export /tmp/linux-2.6.13-rc3
on fourth console. git's kernel repository is on second disk, everything
else on first one. So disks and ext3 were quite busy when problem
occured.

I'll try to reproduce it with 2.6.13-rc3 (running kernel was from
wednesday git, as of commit 95001ee...), but I have no idea how
reproducible it is, and as I see no ext3 or jbd changes since wednesday
I believe it is not fixed yet.

Box in question has single Athlon64 processor, runs SMP PREEMPT kernel
with almost all debugging options set (but not CONFIG_JBD_DEBUG). Box
has 2GB of ECC RAM. Root filesystem is on pata disk connected to the
pata port on Promise's 20378 (using sata_promise). Disk with git repository
is plugged to pata port on Via's VT8237. Kernel's configuration is
at http://platan.vc.cvut.cz/vana_at_home_config.txt, dmesg at
http://platan.vc.cvut.cz/vana_at_home_dmesg.txt.

Thanks,
Petr Vandrovec


2005-10-08 16:30:11

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.13-rc2-almost-rc3, ext3 and memory corruption

Petr Vandrovec wrote:
> Hello,
> I was so happy that 2.6.13-rc3 was released that I tried to grab it,
> and while I was doing that, box said:
>
> Slab corruption: start=ffff8100005e5bf8, len=96
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 01 00 00 00 6b 6b 6b 6b
> Prev obj: start=ffff8100005e5b80, len=96
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Next obj: start=ffff8100005e5c70, len=96
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<ffffffff801ec42d>](journal_remove_journal_head+0x6d/0xb0)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

Hello,
so I've enabled JBD-DEBUG, and unfortunately somebody corrupted inode instead
of journal head this time :-( But corruption looks same - somebody wrote 32bit
value 1 somewhere it should not... Does anybody have something like
CONFIG_DEBUG_PAGE_ALLOC for x86-64?

Slab corruption: start=ffff8100005e7c00, len=976
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<ffffffff801aa690>](destroy_inode+0x30/0x50)
080: 01 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Prev obj: start=ffff8100005e7818, len=976
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<ffffffff801aa690>](destroy_inode+0x30/0x50)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
ppc:~$ uname -a
Linux ppc 2.6.14-rc3-dd72 #1 SMP PREEMPT Tue Oct 4 21:12:16 CEST 2005 x86_64
GNU/Linux
ppc:~$

This time I was not doing anything special, just building new kernel & browsing web.
Thanks,
Petr Vandrovec

> Box in question has single Athlon64 processor, runs SMP PREEMPT kernel
> with almost all debugging options set (but not CONFIG_JBD_DEBUG). Box
> has 2GB of ECC RAM. Root filesystem is on pata disk connected to the
> pata port on Promise's 20378 (using sata_promise). Disk with git repository
> is plugged to pata port on Via's VT8237. Kernel's configuration is
> at http://platan.vc.cvut.cz/vana_at_home_config.txt, dmesg at
> http://platan.vc.cvut.cz/vana_at_home_dmesg.txt.


2005-10-20 20:44:50

by Petr Vandrovec

[permalink] [raw]
Subject: Re: 2.6.13-rc2-almost-rc3, ext3 and memory corruption

Petr Vandrovec wrote:
> Petr Vandrovec wrote:
>
>> Hello,
>> I was so happy that 2.6.13-rc3 was released that I tried to grab it,
>> and while I was doing that, box said:
>>
>> Slab corruption: start=ffff8100005e5bf8, len=96
>> Redzone: 0x5a2cf071/0x5a2cf071.

> Hello,
> so I've enabled JBD-DEBUG, and unfortunately somebody corrupted inode
> instead of journal head this time :-( But corruption looks same -
> somebody wrote 32bit value 1 somewhere it should not... Does anybody
> have something like CONFIG_DEBUG_PAGE_ALLOC for x86-64?
>
> Slab corruption: start=ffff8100005e7c00, len=976
> Redzone: 0x5a2cf071/0x5a2cf071.

Still no clue, but somebody still writes 0x00000001 here and there around my
inodes :-( This time when I was building 2.6.14-rc5 kernel... Apparently
whenever my old kernel finds that it is going to be replaced it attempts to
boycott build of new kernel.

Linux ppc 2.6.14-rc3-8298 #9 SMP PREEMPT Sun Oct 9 03:45:32 CEST 2005 x86_64
GNU/Linux

slab error in cache_free_debugcheck(): cache `size-64': double free, or memory
outside object was overwritten

Call Trace:<ffffffff8017230a>{cache_free_debugcheck+378}
<ffffffff8017376c>{kfree+220}
<ffffffff801aaafe>{clear_inode+174} <ffffffff801aaba8>{dispose_list+104}
<ffffffff801aaffa>{prune_icache+426}
<ffffffff801ab067>{shrink_icache_memory+23}
<ffffffff80176185>{shrink_slab+229} <ffffffff801776bd>{balance_pgdat+589}
<ffffffff80371474>{_spin_lock_irqsave+36} <ffffffff80177937>{kswapd+311}
<ffffffff80155070>{autoremove_wake_function+0}
<ffffffff80155070>{autoremove_wake_function+0}
<ffffffff801356dd>{schedule_tail+77} <ffffffff8010fd92>{child_rip+8}
<ffffffff80177800>{kswapd+0} <ffffffff8010fd8a>{child_rip+0}

ffff8100005e7c38: redzone 1: 0x170fc2a5, redzone 2: 0x1.

Petr

>> Box in question has single Athlon64 processor, runs SMP PREEMPT kernel
>> with almost all debugging options set (but not CONFIG_JBD_DEBUG). Box
>> has 2GB of ECC RAM. Root filesystem is on pata disk connected to the
>> pata port on Promise's 20378 (using sata_promise). Disk with git
>> repository is plugged to pata port on Via's VT8237. Kernel's
>> configuration is at http://platan.vc.cvut.cz/vana_at_home_config.txt,
>> dmesg at http://platan.vc.cvut.cz/vana_at_home_dmesg.txt.