2002-12-11 02:13:07

by Jun Sun

[permalink] [raw]
Subject: possible cache aliasing problem with O_DIRECT?


I am chasing a problem which might be a cache aliasing problem
when a disk file is opened with O_DIRECT flag.

I attached the source code of two programs. One generates a binary file
and the other opens the file with O_DIRECT and reads it. It checks
the content of the file while reading it.

I tested this on a MIPS board with NEC vr5432 CPU, which has a
virtually indexed, two-way set associative d-cache, and can easily
re-produce the data corruption problem.

I attached a patch which apparently solves the problem.

I am not an expert in fs and mm, but my guess is:

1) user process allocates a big buffer
2) the user buffer is mapped into kernel virtual space for doing direct IO
through map_user_kiobuf()
3) since the virtual address for buffer area is different in user space
from that in kernel virtual, kernel should do a flush cache for those
pages after doing the IO. That is why my attached patch makes it work.

Does this make sense?

However, I still have some puzzles. For it to work completely, another
cache flushing needs to be done for the address range of the buffer in user
space. I thought this should be done some where inside map_user_kiobuf()
but could not find it anywhere. Did I miss it? Or it just happens to work
even without it?

Another puzzling part is that I also tested the program on another couple
of MIPS boards which *should* suffer from this problem, but failed to
re-produce it.

Any thoughts?

Jun


2002-12-11 02:25:12

by Jun Sun

[permalink] [raw]
Subject: Re: possible cache aliasing problem with O_DIRECT?


... forgot the attachment, possibly due to the same memory
corruption problem. :-)

Also, the problem is discovered in 2.4.18. I checked with 2.4.19
and it appears it should be there as well.

Jun

On Tue, Dec 10, 2002 at 06:20:51PM -0800, Jun Sun wrote:
>
> I am chasing a problem which might be a cache aliasing problem
> when a disk file is opened with O_DIRECT flag.
>
> I attached the source code of two programs. One generates a binary file
> and the other opens the file with O_DIRECT and reads it. It checks
> the content of the file while reading it.
>
> I tested this on a MIPS board with NEC vr5432 CPU, which has a
> virtually indexed, two-way set associative d-cache, and can easily
> re-produce the data corruption problem.
>
> I attached a patch which apparently solves the problem.
>
> I am not an expert in fs and mm, but my guess is:
>
> 1) user process allocates a big buffer
> 2) the user buffer is mapped into kernel virtual space for doing direct IO
> through map_user_kiobuf()
> 3) since the virtual address for buffer area is different in user space
> from that in kernel virtual, kernel should do a flush cache for those
> pages after doing the IO. That is why my attached patch makes it work.
>
> Does this make sense?
>
> However, I still have some puzzles. For it to work completely, another
> cache flushing needs to be done for the address range of the buffer in user
> space. I thought this should be done some where inside map_user_kiobuf()
> but could not find it anywhere. Did I miss it? Or it just happens to work
> even without it?
>
> Another puzzling part is that I also tested the program on another couple
> of MIPS boards which *should* suffer from this problem, but failed to
> re-produce it.
>
> Any thoughts?
>
> Jun
>


Attachments:
(No filename) (1.75 kB)
gen-file.c (565.00 B)
my-diotest.c (1.37 kB)
o_direct-cache-flush.patch (534.00 B)
Download all attachments