2004-01-15 00:39:36

by Jun Sun

[permalink] [raw]
Subject: [BUG] 2.6.1/MIPS - missing cache flushing when user program returns pages to kernel


I have been chasing a nasty memory corruption bug on my MIPS box with
2.6.1 kernel. In the end it appears the following sequence has
happened:

1. userland gets a page and writes some stuff to it, which dirties
data cache. In my case, it is actually doing a sys_read() into
that page. See my kgdb trace attached in the end.

2. userland returns this page to kernel *without* any cache flushing,
i.e., the dcache is still dirty.

3. kernel calls kmalloc() to get a block from this page.

4. the dirty dcache is written back to physical memory some time later,
corrupting the kernel data.

It seems to me the problem is that we should do a cache flush
for all the pages returned to kernel during step 2.

I attached a hack which solves my problem but I am not sure if it is
most appropriate. It looks like the affected user region (start, end)
can span over multiple vma areas. If so, the fix will only flush the first
area.

Also, it is hard to find an appropriate place to do the flushing
The new 2.6 mm is a confusing maze to me. I hope someone more
knowledgable can come up with a more decent fix for this problem.

BTW, it appears in 2.4 we are doing this flushing in do_zap_page_range()
where we call a flush_cache_range(mm, start, end).

Jun

P.S., Here is the stack trace when dirty data is written to user page:

$8 = 0x2aac3000
(gdb) p/x kaddr
$9 = 0x811a3880
(gdb) bt
#0 both_aligned () at arch/mips/lib/memcpy.S:222
#1 0xffffffff8014351c in file_read_actor (desc=0x87fd1dd0, page=0x8102c178,
offset=0, size=2717) at mm/filemap.c:754
#2 0xffffffff80143088 in do_generic_mapping_read (mapping=0x803e3168,
ra=0x87fe8868, filp=0x87fe8820, ppos=0x87fe8840, desc=0x87fd1dd0,
actor=0x80143480 <file_read_actor>) at mm/filemap.c:658
#3 0xffffffff801437a0 in __generic_file_aio_read (iocb=0x80143480,
iov=0x811a3880, nr_segs=1, ppos=0x87fe8840) at fs.h:1334
#4 0xffffffff80143884 in generic_file_read (filp=0x61697265,
buf=0xcccccccd <Address 0xcccccccd out of bounds>, count=715927552,
ppos=0xa9d) at mm/filemap.c:877
#5 0xffffffff80162688 in vfs_read (file=0x87fe8820,
buf=0x2aac3000 "# /etc/inittab: init(8) configuration.\n# $Id: inittab,v 1.8 1998/05/10 10:37:50 miquels Exp $\n\n# The default runlevel.\nid:3:initdefault:\n\n# Boot-time system configuration/initialization script.\n# This"...,
count=4096, pos=0x87fe8840) at fs/read_write.c:213
#6 0xffffffff80162918 in sys_read (fd=715929728,
buf=0x2aac3000 "# /etc/inittab: init(8) configuration.\n# $Id: inittab,v 1.8 1998/05/10 10:37:50 miquels Exp $\n\n# The default runlevel.\nid:3:initdefault:\n\n# Boot-time system configuration/initialization script.\n# This"...,
count=4096) at fs/read_write.c:278

It appears I lost the stack trace when kernel finds data corruption. It is somewhere
inside d_splice_alias() where inode->i_dentry, happens to be at 0x811a3880, has
a wrong value instead of the expected 0.


Attachments:
(No filename) (2.88 kB)
missing-cache-flush-on-return-user-page.patch (372.00 B)
Download all attachments

2004-01-15 01:12:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.1/MIPS - missing cache flushing when user program returns pages to kernel

Jun Sun <[email protected]> wrote:
>
> I have been chasing a nasty memory corruption bug on my MIPS box with
> 2.6.1 kernel. In the end it appears the following sequence has
> happened:
>
> 1. userland gets a page and writes some stuff to it, which dirties
> data cache. In my case, it is actually doing a sys_read() into
> that page. See my kgdb trace attached in the end.
>
> 2. userland returns this page to kernel *without* any cache flushing,
> i.e., the dcache is still dirty.
>
> 3. kernel calls kmalloc() to get a block from this page.
>
> 4. the dirty dcache is written back to physical memory some time later,
> corrupting the kernel data.
>
> It seems to me the problem is that we should do a cache flush
> for all the pages returned to kernel during step 2.
>
> I attached a hack which solves my problem but I am not sure if it is
> most appropriate. It looks like the affected user region (start, end)
> can span over multiple vma areas. If so, the fix will only flush the first
> area.
>
> Also, it is hard to find an appropriate place to do the flushing
> The new 2.6 mm is a confusing maze to me. I hope someone more
> knowledgable can come up with a more decent fix for this problem.
>
> BTW, it appears in 2.4 we are doing this flushing in do_zap_page_range()
> where we call a flush_cache_range(mm, start, end).

That flush_cache_range was removed between 2.5.67 and 2.5.68. If you put
it back, does it fix the problem?

It seems from Russell's words here, MIPS should be flushing in
tlb_start_vma().

I think that's wrong, really. We've discussed this before and decided that
these flushing operations should be open-coded in the main .c file rather
than embedded in arch functions which happen to undocumentedly do other
stuff.


# --------------------------------------------
# 03/04/14 [email protected] 1.1017
# [PATCH] flush_cache_mm in zap_page_range
#
# unmap_vmas() eventually calls tlb_start_vma(), where most architectures
# flush caches as necessary. The flush here seems to make the
# flush_cache_range() in zap_page_range() redundant, and therefore can be
# removed.
# --------------------------------------------
#
diff -Nru a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c Wed Jan 14 17:09:07 2004
+++ b/mm/memory.c Wed Jan 14 17:09:07 2004
@@ -601,7 +601,6 @@

lru_add_drain();
spin_lock(&mm->page_table_lock);
- flush_cache_range(vma, address, end);
tlb = tlb_gather_mmu(mm, 0);
unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted);
tlb_finish_mmu(tlb, address, end);

2004-01-15 01:29:00

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.1/MIPS - missing cache flushing when user program returns pages to kernel

Andrew Morton <[email protected]> wrote:
>
> I think that's wrong, really. We've discussed this before and decided that
> these flushing operations should be open-coded in the main .c file rather
> than embedded in arch functions which happen to undocumentedly do other
> stuff.

err, OK, I give up. Lots of architectures do the cache flush in
tlb_start_vma(). I guess mips may as well do the same.

2004-01-15 01:45:37

by Jun Sun

[permalink] [raw]
Subject: Re: [BUG] 2.6.1/MIPS - missing cache flushing when user program returns pages to kernel

On Wed, Jan 14, 2004 at 05:29:46PM -0800, Andrew Morton wrote:
> Andrew Morton <[email protected]> wrote:
> >
> > I think that's wrong, really. We've discussed this before and decided that
> > these flushing operations should be open-coded in the main .c file rather
> > than embedded in arch functions which happen to undocumentedly do other
> > stuff.
>
> err, OK, I give up. Lots of architectures do the cache flush in
> tlb_start_vma(). I guess mips may as well do the same.
>

Looking at my tree (which is from linux-mips.org), it appears
arm, sparc, sparc64, and sh have tlb_start_vma() defined to call
cache flushing.

What exactly does tlb_start_vma()/tlb_end_vma() mean? There is
only one invocation instance, which is significant enough to infer
the meaning. :)

BTW, either my original hack or putting a cache flush in tlb_start_vma()
solves my problem. They are really doing the same thing, just at
different places.

Jun

2004-01-15 06:24:05

by David Miller

[permalink] [raw]
Subject: Re: [BUG] 2.6.1/MIPS - missing cache flushing when user program returns pages to kernel

On Wed, 14 Jan 2004 17:40:12 -0800
Jun Sun <[email protected]> wrote:

> Looking at my tree (which is from linux-mips.org), it appears
> arm, sparc, sparc64, and sh have tlb_start_vma() defined to call
> cache flushing.

Correct, in fact every platform where cache flushing matters
at all (ie. where flush_cache_*() routines actually need to
flush a cpu cache), they should have tlb_start_vma() do such
a flush.

> What exactly does tlb_start_vma()/tlb_end_vma() mean? There is
> only one invocation instance, which is significant enough to infer
> the meaning. :)

When the kernel unmaps a mmap region of a process (either for the
sake of munmap() or tearing down all mapping during exit()) tlb_start_vma()
is called, the page table mappings in the region are torn down one by
one, then a tlb_end_vma() call is made.

At the top level, ie. whoever invokes unmap_page_range(), there will
be a tlb_gather_mmu() call.

In order to properly optimize the cache flushes, most platforms do the
following:

1) The tlb->fullmm boolean keeps trap of whether this is just a munmap()
unmapping operation (if zero) or a full address space teardown
(if non-zero).

2) In the full address space teardown case, and thus tlb->fullmm is
non-zero, the top level will do the explict flush_cache_mm()
(see mm/mmap.c:exit_mmap()), therefore the tlb_start_vma()
implementation need not do the flush, otherwise it does.

This is why sparc64 and friends implement it like this:

#define tlb_start_vma(tlb, vma) \
do { if (!(tlb)->fullmm) \
flush_cache_range(vma, vma->vm_start, vma->vm_end); \
} while (0)

Hope this clears things up.

Someone should probably take what I just wrote, expand and organize it,
then add such content to Documentation/cachetlb.txt