In older kernels (< 2.5.46) used to be a call to 'flush_page_to_ram'
in the function 'filemap_nopage()' in 'mm/filemap.c'. This macro has
been obsoleted and has been replaced in other places by appropriate
'flush_dcache_page', 'flush_icache_page', 'copy_user_page', or
'clear_user_page' calls.
But in 'mm/filemap.c' it has been removed without replacement.
In fact a call to 'flush_dcache_page' should be in place there.
This missing macro call produces data errors when randomly reading an
'mmap'ed file (like it happens, when a program is executed).
I stumbled over this bug when I tried to execute a program from a
freshly mounted IDE CF card. The first call would segfault while
subsequent calls worked. This behaviour recurred when the disk was
unmounted between calls.
It also helped to copy the file (to /dev/null) before executing it
(from CF disk).
Digging into the problem I wrote a simple program that created a large
file (e.g. 1 MiB size), umounted and remounted the disk where it had
been created to flush all buffers and subsequently 'mmap'ed the file
and checked the contents from the file end to the start.
Upon error the program simply read the same word over and over again
until the correct data appeared after some time.
In kernels < 2.5.46 the deprecated macro call was still present in
filemap.c (though the macro was defined to do nothing), while in later
kernel versions that call was removed.
The accompanied patches are generated against kernel versions 2.5.30
and 2.5.68 but should also be applicable to other kernels (with a hunk
offset).
On Thu, May 22, 2003 at 02:34:56PM +0200, [email protected] wrote:
> in file 'mm/filemap.c' a call to 'flush_dcache_page' is missing as a
> replacement for the obsoleted 'flush_page_to_ram' call that was
> present there in older kernels.
Please refrain from cross-posting between member-only posting lists and
people not on those lists. If you need to send the message to such a
list, please forward it there instead.
Anyone replying to this message and who aren't on linux-arm-kernel,
please drop this list from the CC: line.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Thu, May 22, 2003 at 02:34:56PM +0200, [email protected] wrote:
> in file 'mm/filemap.c' a call to 'flush_dcache_page' is missing as a
> replacement for the obsoleted 'flush_page_to_ram' call that was
> present there in older kernels.
>
> This missing macro call produces data errors when randomly reading an
> 'mmap'ed file (e.g. leading to segfaults, when a program is executed).
>
> In kernels < 2.5.46 the deprecated macro call was still present
> (defined to do nothing), while in later kernels the call has been
> removed.
>
> Below are two patches generated against kernel versions 2.5.30 and
> 2.5.68 which should also be applicable to other kernels (with a hunk
> offset).
We seem to have flush_icache_page() in install_page() - I wonder whether
we should also have flush_dcache_page() in there as well.
I've always been confused about what flush_icache_page() is there for,
and its a no-op on ARM. Whether it should or shouldn't be is an
unanswered question, and will probably remain unanswered until I can
sit down and go through the whole of the VM layer, working out exactly
what it requires and where today.
Maybe someone more knowledgeable of the VM layer can comment.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Thu, 2003-05-22 at 07:11, Russell King wrote:
> We seem to have flush_icache_page() in install_page() - I wonder whether
> we should also have flush_dcache_page() in there as well.
...
> Maybe someone more knowledgeable of the VM layer can comment.
I am not sure of the exact environment install_page() is meant
to run in, does it always know that no mapping exists at that
address?
If not, something (either there or higher up) needs to be doing
a flush_cache_page(...) at a minimum.
The things that some platforms use flush_icache_page() for are
handled by other platforms using other mechanisms in clever ways (for
example, at update_mmu_cache() or instruction TLB miss time, older
sparc64's use special D/I cache flush block stores to handle the I-cache
coherency problem there).
--
David S. Miller <[email protected]>
Russell King writes:
> We seem to have flush_icache_page() in install_page() - I wonder whether
> we should also have flush_dcache_page() in there as well.
>
Maybe because install_page() isn't called in the situation I
was talking about. install_page() is called from filemap_populate()
which in turn is called from do_file_map() in handle_pte_fault(),
while I was talking about filemap_nopage() called by do_no_page() in
handle_pte_fault().
And maybe because *every* other call to flush_page_to_ram() has been
replaced by one of the new interface macros except that one in
filemap_nopage() in 'mm/filemap.c'.
Lothar Wassmann
"David S. Miller" <[email protected]> wrote:
>
> On Thu, 2003-05-22 at 07:11, Russell King wrote:
> > We seem to have flush_icache_page() in install_page() - I wonder whether
> > we should also have flush_dcache_page() in there as well.
> ...
> > Maybe someone more knowledgeable of the VM layer can comment.
>
> I am not sure of the exact environment install_page() is meant
> to run in, does it always know that no mapping exists at that
> address?
No, it does not - there could have been a different page mapped at that
virtual address.
> If not, something (either there or higher up) needs to be doing
> a flush_cache_page(...) at a minimum.
install_page() did a flush_cache_page() in zap_pte(), if it found there was
already a page mapped by the pte.
That flush_cache_page() was added 28 March 2003. 2.5.30 does not have it.
What's that flush_icache_page() doing in there? The fine document makes
one suspect it is wrong.
install_page() is prefaulting pages into pagetables, so perhaps it should
have an update_mmu_cache()?
diff -puN mm/fremap.c~install_page-flushing mm/fremap.c
--- 25/mm/fremap.c~install_page-flushing 2003-05-23 02:01:52.000000000 -0700
+++ 25-akpm/mm/fremap.c 2003-05-23 02:09:03.000000000 -0700
@@ -84,7 +84,7 @@ int install_page(struct mm_struct *mm, s
pte_unmap(pte);
if (flush)
flush_tlb_page(vma, addr);
-
+ update_mmu_cache(vma, addr, *pte);
spin_unlock(&mm->page_table_lock);
pte_chain_free(pte_chain);
return 0;
_
"Lothar Wassmann" <[email protected]> wrote:
>
> And maybe because *every* other call to flush_page_to_ram() has been
> replaced by one of the new interface macros except that one in
> filemap_nopage() in 'mm/filemap.c'.
>
flush_page_to_ram() has been deleted from the kernel.
filemap_nopage() is a pagecache function and shouldn't be fiddling with
cache and/or TLB operations. Unless it touches the page by hand, which it
does not.
Please, test a current kernel.
From: Andrew Morton <[email protected]>
Date: Fri, 23 May 2003 02:12:04 -0700
install_page() is prefaulting pages into pagetables, so perhaps it should
have an update_mmu_cache()?
I agree. Someone should take a close look at the do_file_page()
code paths to make sure that's still kosher after such a change.
"David S. Miller" <[email protected]> wrote:
>
> From: Andrew Morton <[email protected]>
> Date: Fri, 23 May 2003 02:12:04 -0700
>
> install_page() is prefaulting pages into pagetables, so perhaps it should
> have an update_mmu_cache()?
>
> I agree. Someone should take a close look at the do_file_page()
> code paths to make sure that's still kosher after such a change.
What would one be looking for? I don't know what the sideeffects of
update_mmu_cache() might be.
It looks to be the same as do_no_page() though: the update_mmu_cache() is
the last substantive thing which happens in the fault or the syscall.
Andrew Morton writes:
> "Lothar Wassmann" <[email protected]> wrote:
> >
> > And maybe because *every* other call to flush_page_to_ram() has been
> > replaced by one of the new interface macros except that one in
> > filemap_nopage() in 'mm/filemap.c'.
> >
>
> flush_page_to_ram() has been deleted from the kernel.
>
Yes, I know. But did you even read, what I have written?
The macro has still been there in 2.5.30, but was already defined as a
NOP. In every source file where it had been used (except filemap.c), it
was already accompanied by a call to one of 'flush_dcache_page',
'flush_icache_page', 'copy_user_page' or 'clear_user_page' which
obviously took over the function that flush_page_to_ram previously
had. Somewhere around 2.5.46 the obsolete macro was removed and the
piece of code in filemap.c is the only location where it has not been
replaced by one of the new macros. This looks very suspicious to me.
Unfortunately in the i386 Architecture (which I presume is the most
widely spread Linux arch) this macro has always been a NOP, so that
noone could notice it missing somewhere.
> filemap_nopage() is a pagecache function and shouldn't be fiddling with
> cache and/or TLB operations. Unless it touches the page by hand, which it
> does not.
>
Ok, but then some function which is called from filemap_nopage()
should have done the job beforehand.
> Please, test a current kernel.
>
Is 2.5.68 current enough? The problem was even better reproducible
with this kernel than with the old one. So I made all my tests with
2.5.68.
Lothar
From: Andrew Morton <[email protected]>
Date: Fri, 23 May 2003 03:04:09 -0700
"David S. Miller" <[email protected]> wrote:
> I agree. Someone should take a close look at the do_file_page()
> code paths to make sure that's still kosher after such a change.
What would one be looking for? I don't know what the sideeffects of
update_mmu_cache() might be.
It looks to be the same as do_no_page() though: the update_mmu_cache() is
the last substantive thing which happens in the fault or the syscall.
Yes, this is what I was talking about, making sure do_file_page()'s
return path isn't doing a update_mmu_cache() call already.
I don't believe this is an error (to do it twice for the same fault)
but it would be superfluous.
"Lothar Wassmann" <[email protected]> wrote:
>
> Andrew Morton writes:
> > "Lothar Wassmann" <[email protected]> wrote:
> > >
> > > And maybe because *every* other call to flush_page_to_ram() has been
> > > replaced by one of the new interface macros except that one in
> > > filemap_nopage() in 'mm/filemap.c'.
> > >
> >
> > flush_page_to_ram() has been deleted from the kernel.
> >
> Yes, I know. But did you even read, what I have written?
Vaguely. It never hurts to repeat things ;)
> Is 2.5.68 current enough?
yes.
filemap_nopage isn't the right place to be doing these things though.
Given that there was no page at the virtual address before filemap_nopage
was called I don't think any CPU cache writeback or invalidation need be
performed. Perhaps a writeback or invalidate is missing somewhere in the
unmap paths, or there is a problem in arch/arm somewhere.
We have a no-op flush_icache_page() in do_no_page(), but I don't know what
that thing ever did, not what it's doing in there. (What happens if you
replace it with a flush_cache_page(vma, address)?)
Someone who understands these things better than I is going to have to work
out where the bug really is, I'm afraid.
On Fri, May 23, 2003 at 12:04:13PM +0200, Lothar Wassmann wrote:
> Is 2.5.68 current enough? The problem was even better reproducible
> with this kernel than with the old one. So I made all my tests with
> 2.5.68.
Can you attach the test program which reproduces your problem, and
include a description of what you think is going wrong?
AFAICS, you have only pointed out that a call to flush_page_to_ram()
without further information.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Andrew Morton writes:
> filemap_nopage isn't the right place to be doing these things though.
>
> Given that there was no page at the virtual address before filemap_nopage
> was called I don't think any CPU cache writeback or invalidation need be
> performed. Perhaps a writeback or invalidate is missing somewhere in the
> unmap paths, or there is a problem in arch/arm somewhere.
>
> We have a no-op flush_icache_page() in do_no_page(), but I don't know what
> that thing ever did, not what it's doing in there. (What happens if you
> replace it with a flush_cache_page(vma, address)?)
We used to use flush_icache_page() on ppc/ppc64 to ensure that the
i-cache was consistent with the d-cache and with memory. It was
needed in do_no_page for the case where the page had just been read in
and the i-cache could have stale data reflecting the previous contents
of the page (the i-cache doesn't snoop on ppc). But we now do that in
update_mmu_cache.
Paul.
ARCH = arm
CC = arm-linux-gcc
CFLAGS := -Wall -Wstrict-prototypes -Wno-trigraphs -O2 \
-fomit-frame-pointer -fno-strict-aliasing -fno-common
INSTALL_DIR=/tftpboot/rootfs/usr/local/arm/bin
SOURCES = disktest.c
OBJECTS= $(SOURCES:%.c=%.o)
BINARIES= $(SOURCES:%.c=%)
TARGETS = $(BINARIES:%=$(INSTALL_DIR)/%)
all: $(OBJECTS) $(BINARIES)
clean:
rm -f $(OBJECTS)
install: all $(TARGETS)
$(BINARIES:%=$(INSTALL_DIR)/%): $(INSTALL_DIR)/%: %
sudo cp -p $< $@
On Fri, 23 May 2003, Lothar Wassmann wrote:
> I'm using a custom PXA250/255 board (same problem on both processors)
> with either kernel 2.5.30-rmk1-pxa1 or 2.5.68-rmk1-pxa1. Both show the
> same malfunction when reading a file non-sequentially from an IDE CF
> card.
Which filesystem - jffs2 or some other?
Hugh
On Fri, May 23, 2003 at 03:45:51AM -0700, Andrew Morton wrote:
> Given that there was no page at the virtual address before filemap_nopage
> was called I don't think any CPU cache writeback or invalidation need be
> performed. Perhaps a writeback or invalidate is missing somewhere in the
> unmap paths, or there is a problem in arch/arm somewhere.
No, I think there is a flush missing somewhere in this path.
What I think is happening is that Lothar is using the PXA with the cache
in write allocate write back mode (Xscale is the first ARM-arch cpu to
have allocate on write caches.)
This means that when IDE copies the data into the buffer using insw or
whatever, it ends up in the VIVT cache rather than memory. Since we
don't seem to be calling flush_dcache_page(), we never write this data
back to memory for user space to access it via their mapping.
If this is the case, its something I can't test, because I don't have
access to such hardware (I'm currently being kept a hardware pauper
as far as new ARM technologies go.)
> We have a no-op flush_icache_page() in do_no_page(), but I don't know what
> that thing ever did, not what it's doing in there. (What happens if you
> replace it with a flush_cache_page(vma, address)?)
I don't think this'll help - its asking the wrong bit of cache to be
flushed. I think we want to replace that flush_icache_page() with
a flush_dcache_page(), but shrug, I don't really know this code well
enough.
> Someone who understands these things better than I is going to have
> to work out where the bug really is, I'm afraid.
I suspect that there's very few people who really understand this area -
you need to know what the block drivers are doing, and everything in
between there and user space.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Fri, 23 May 2003, Russell King wrote:
>
> No, I think there is a flush missing somewhere in this path.
>
> What I think is happening is that Lothar is using the PXA with the cache
> in write allocate write back mode (Xscale is the first ARM-arch cpu to
> have allocate on write caches.)
>
> This means that when IDE copies the data into the buffer using insw or
> whatever, it ends up in the VIVT cache rather than memory. Since we
> don't seem to be calling flush_dcache_page(), we never write this data
> back to memory for user space to access it via their mapping.
I believe (DaveM will speak with authority) that hitherto it has been
assumed that I/O (well, Input) brings data actually into memory: we use
flush_dcache_page if kernel memsets or memcpys data, not if it's read in.
If this mode+architecture departs from that, then we would need another
macro, which translates to flush_dcache_page (sufficient?) for that,
and is a nop for everything else.
And where would it be placed? I think not where the flush_page_to_ram
used to be in filemap_nopage, but after the ->readpage. Or... would
this tie in with Martin's s390 request for a SetPageUptodate hook?
Hugh
Hugh Dickins <[email protected]> wrote:
>
> On Fri, 23 May 2003, Russell King wrote:
> >
> > No, I think there is a flush missing somewhere in this path.
> >
> > What I think is happening is that Lothar is using the PXA with the cache
> > in write allocate write back mode (Xscale is the first ARM-arch cpu to
> > have allocate on write caches.)
> >
> > This means that when IDE copies the data into the buffer using insw or
> > whatever, it ends up in the VIVT cache rather than memory. Since we
> > don't seem to be calling flush_dcache_page(), we never write this data
> > back to memory for user space to access it via their mapping.
That sounds distinctly possible.
> I believe (DaveM will speak with authority) that hitherto it has been
> assumed that I/O (well, Input) brings data actually into memory: we use
> flush_dcache_page if kernel memsets or memcpys data, not if it's read in.
Vague statement of principle: The device driver layer takes care of these
issues for DMA transfers, and hence should also take care of them for PIO.
Is this sensible and/or possible?
> If this mode+architecture departs from that, then we would need another
> macro, which translates to flush_dcache_page (sufficient?) for that,
> and is a nop for everything else.
>
> And where would it be placed? I think not where the flush_page_to_ram
> used to be in filemap_nopage, but after the ->readpage. Or... would
> this tie in with Martin's s390 request for a SetPageUptodate hook?
>
IDE?
On Fri, May 23, 2003 at 11:29:26AM -0700, Andrew Morton wrote:
> Vague statement of principle: The device driver layer takes care of these
> issues for DMA transfers, and hence should also take care of them for PIO.
> Is this sensible and/or possible?
I'd err on the side of caution about extending this principle. The
device driver layer's issue for DMA transfers seems to cover the device <->
kernel cache consistency, not the device <-> user space or kernel <->
user space cache consistency.
The kernel <-> user space consistency issue seems to be one for the MM
layer to deal with. There are other situations where this view can go
out of sync - for instance, when you have a page of a file mmap'd, and
you use sys_write() to that page of file. This is the issue which
flush_dcache_page() seems to be addressing, and it's the same issue
with PIO from IDE.
So no, I don't think it is a device driver issue at all.
DaveM?
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Fri, 2003-05-23 at 16:42, Hugh Dickins wrote:
> On Fri, 23 May 2003, Lothar Wassmann wrote:
> > I'm using a custom PXA250/255 board (same problem on both processors)
> > with either kernel 2.5.30-rmk1-pxa1 or 2.5.68-rmk1-pxa1. Both show the
> > same malfunction when reading a file non-sequentially from an IDE CF
> > card.
>
> Which filesystem - jffs2 or some other?
Presumably the latter. JFFS2 works on flash, not on IDE -- unless you
load the 'blkmtd' driver which uses a block device as backing store for
a 'virtual' MTD device.
--
dwmw2
On Fri, 2003-05-23 at 11:34, Russell King wrote:
> So no, I don't think it is a device driver issue at all.
>
> DaveM?
Oh yes, this part is. If you don't ensure this, everything
breaks.
At the end of an I/O operation, say to a page cache page, that
data ought to be visible equally to a userspace vs. a kernel
space mapping to that page.
For example, this is why we use language about "cpu visibility" in the
DMA api documentation and not "kernel cpu visibility" :-) And because
PIO transfers are basically pseudo-DMA they need to make the same exact
guarentees.
If you've been living in a world where you didn't think this is
necessary, I certainly feel bad for you :-)
--
David S. Miller <[email protected]>
BTW, it's only by LUCK that I actually read this.
Whoever removed me from the CC: list and still wanted me
to follow up on the conversation needs a good kicking :-)
--
David S. Miller <[email protected]>
ah, ok. so there are cache issues even if if the user pte is not
established yet? Then it seems natural to couple flush_dcache_page with
pte establishing, not at the driver level.
--Mika
David S. Miller wrote:
> From: **UNKNOWN CHARSET** <[email protected]>
> Date: Mon, 26 May 2003 08:07:10 +0300
>
> I don't think the flush_dcache_page thing is done almost anywhere in the
> block/driver level right now.
>
>It isn't and it shouldn't :-)
>
> And we shouldn't be doing io reads to pagecache pages with user
> mappings anyway normally. direct-io is a different thing.
>
>We are talking about the case where we are bringing in the
>data for the first time, on the page cache lookup miss.
>
>
>
From: Mika Penttil? <[email protected]>
Date: Mon, 26 May 2003 08:36:34 +0300
ah, ok. so there are cache issues even if if the user pte is not
established yet? Then it seems natural to couple flush_dcache_page
with pte establishing, not at the driver level.
flush_dcache_page() or some architecture level equivalent belongs
whereever the kernel uses CPU store instructions to modify a page's
contents.
When IDE uses PIO to do a data transfer, the flush belongs there.
Right now this is occuring in the architecture defined IDE insw/outsw
macros. It very well might be more efficient to do this at a higher
level where the total extent of the I/O is known.
On Sun, May 25, 2003 at 08:19:32PM -0700, David S. Miller wrote:
> Oh yes, this part is. If you don't ensure this, everything
> breaks.
>
> At the end of an I/O operation, say to a page cache page, that
> data ought to be visible equally to a userspace vs. a kernel
> space mapping to that page.
>
> For example, this is why we use language about "cpu visibility" in the
> DMA api documentation and not "kernel cpu visibility" :-) And because
> PIO transfers are basically pseudo-DMA they need to make the same exact
> guarentees.
>
> If you've been living in a world where you didn't think this is
> necessary, I certainly feel bad for you :-)
Ok, so the flush_dcache_page() interface looses this; the original
placement of the flush_page_to_ram() ensured that data written by
device drivers was visible to user space.
Maybe the BIO layer can handle this - the same problem exists when
(and if) BIO uses a bounce buffer, so it would have to be handled
there. Jens?
Lothar - can you confirm that your problem vanishes when you turn off
write allocation on the caches please? (cachepolicy=writeback)
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Hugh Dickins writes:
> On Fri, 23 May 2003, Lothar Wassmann wrote:
> > I'm using a custom PXA250/255 board (same problem on both processors)
> > with either kernel 2.5.30-rmk1-pxa1 or 2.5.68-rmk1-pxa1. Both show the
> > same malfunction when reading a file non-sequentially from an IDE CF
> > card.
>
> Which filesystem - jffs2 or some other?
>
I tried VFAT, MINIX and EXT2 with the same results.
Lothar
Russell King writes:
> Lothar - can you confirm that your problem vanishes when you turn off
> write allocation on the caches please? (cachepolicy=writeback)
>
No, its still there. But I seem to be unable to turn writealloc ON
anyway. I get:
|CPU: XScale-PXA250 [69052904] revision 4 (ARMv5TE)
|CPU: D undefined 5 cache
|CPU: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
|CPU: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
|Machine: KARO electronics PXA25x module
|Memory policy: ECC disabled, Data cache write back
no matter whether I specify 'cachepolicy=writeback', '...writealloc'
or no cachepolicy at all.
(This is because ARMv5TE is not recognized as an ARMv5 architecture and
PMD_SECT_WBWA is turned into PMD_SECT_WB in build_mem_type_table()
('arch/arm/mm/mm-armv.c' line 304).
Another bug?)
Shouldn't it be:
@@ -295,12 +295,13 @@
/*
* ARMv5 can use ECC memory.
*/
- if (cpu_arch == CPU_ARCH_ARMv5) {
+ if (cpu_arch == CPU_ARCH_ARMv5 || cpu_arch == CPU_ARCH_ARMv5T ||
+ cpu_arch == CPU_ARCH_ARMv5TE) {
mem_types[MT_VECTORS].prot_l1 |= ecc_mask;
mem_types[MT_MEMORY].prot_sect |= ecc_mask;
} else {
mem_types[MT_MINICLEAN].prot_sect &= ~PMD_SECT_TEX(1);
if (cachepolicy == PMD_SECT_WBWA)
cachepolicy = PMD_SECT_WB;
ecc_mask = 0;
}
Only 'cachepolicy=writethrough' makes the problem disappear.
Lothar
Hi,
On Sun, 25 May 2003, David S. Miller wrote:
> flush_dcache_page() or some architecture level equivalent belongs
> whereever the kernel uses CPU store instructions to modify a page's
> contents.
>
> When IDE uses PIO to do a data transfer, the flush belongs there.
> Right now this is occuring in the architecture defined IDE insw/outsw
> macros. It very well might be more efficient to do this at a higher
> level where the total extent of the I/O is known.
I'd prefer not to do this at driver level at all and rather let the user
of the data do it. The driver doesn't know how this page will be used, so
it has to assume the worst case.
E.g. for normal read calls there isn't a cache flush needed after a PIO
transfer and if the page is mapped nonexecutable, there is no page flush
needed either for physical caches.
It's also not just IO, a normal write call will dirty the page as well, so
before we map a page into userspace it's IMO the best time to synchronize
the caches. update_mmu_cache() might be a bad place for this as this is
called after set_pte().
bye, Roman
On Mon, May 26 2003, Russell King wrote:
> On Sun, May 25, 2003 at 08:19:32PM -0700, David S. Miller wrote:
> > Oh yes, this part is. If you don't ensure this, everything
> > breaks.
> >
> > At the end of an I/O operation, say to a page cache page, that
> > data ought to be visible equally to a userspace vs. a kernel
> > space mapping to that page.
> >
> > For example, this is why we use language about "cpu visibility" in the
> > DMA api documentation and not "kernel cpu visibility" :-) And because
> > PIO transfers are basically pseudo-DMA they need to make the same exact
> > guarentees.
> >
> > If you've been living in a world where you didn't think this is
> > necessary, I certainly feel bad for you :-)
>
> Ok, so the flush_dcache_page() interface looses this; the original
> placement of the flush_page_to_ram() ensured that data written by
> device drivers was visible to user space.
>
> Maybe the BIO layer can handle this - the same problem exists when
> (and if) BIO uses a bounce buffer, so it would have to be handled
> there. Jens?
For bouncing it's relatively trivial to add (and probably should, feel
free...). PIO etc is really a driver problem to handle, I don't see how
that could be handled generically.
--
Jens Axboe
From: Russell King <[email protected]>
Date: Mon, 26 May 2003 09:55:51 +0100
Ok, so the flush_dcache_page() interface looses this; the original
placement of the flush_page_to_ram() ensured that data written by
device drivers was visible to user space.
That's possible, but see my response to Roman's posting for how
this can be easily fixed using current interfaces yet retaining
identical or even potentially better performance.
On Mon, May 26, 2003 at 03:08:12PM +0200, Lothar Wassmann wrote:
> No, its still there. But I seem to be unable to turn writealloc ON
> anyway. I get:
Ok, this gets worrying since I haven't heard of the problem occuring
with our write back caches before, neither can I reproduce it here.
It also completely knocks out my reasoning for it occuring.
> |CPU: XScale-PXA250 [69052904] revision 4 (ARMv5TE)
> |CPU: D undefined 5 cache
> |CPU: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
> |CPU: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
> |Machine: KARO electronics PXA25x module
> |Memory policy: ECC disabled, Data cache write back
> no matter whether I specify 'cachepolicy=writeback', '...writealloc'
> or no cachepolicy at all.
> (This is because ARMv5TE is not recognized as an ARMv5 architecture and
> PMD_SECT_WBWA is turned into PMD_SECT_WB in build_mem_type_table()
> ('arch/arm/mm/mm-armv.c' line 304).
> Another bug?)
> Shouldn't it be:
> @@ -295,12 +295,13 @@
> /*
> * ARMv5 can use ECC memory.
> */
> - if (cpu_arch == CPU_ARCH_ARMv5) {
> + if (cpu_arch == CPU_ARCH_ARMv5 || cpu_arch == CPU_ARCH_ARMv5T ||
> + cpu_arch == CPU_ARCH_ARMv5TE) {
No - it should be cpu_arch >= CPU_ARCH_ARMv5. I seem to have been over-
zealous about removing some currently unreleasable stuff here.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
From: Roman Zippel <[email protected]>
Date: Mon, 26 May 2003 15:18:01 +0200 (CEST)
I'd prefer not to do this at driver level at all and rather let the
user of the data do it.
This is an easy thing to say, but you have to recognize that PIO
based data transfers must retain the EXACT behavior required of
real DMA transfers, which is that a subsequent user mapping of the
data must be able to see the data without an intervening
flush_dcache_page() or similar.
You can STILL optimize this the way you seem to want to. The
update_mmu_cache() routing exists as a point at which you can
do such deferred situation-based flushing optimizations.
F.e. at ide_insw() time you mark pages as "might_need_flush" with
some bit in page->flags, we even have bits allocated for arch specific
use and we can allocate 1 or 2 more if you need them. Then at
update_mmu_cache() time you check this bit and act accordingly.
This is exactly the kinds of things I expected platforms to do.
It took a while but even PPC moved all of their flush_icache_page()
stuff into update_mmu_cache() based delayed flush schemes.
Hi,
On Mon, 26 May 2003, David S. Miller wrote:
> I'd prefer not to do this at driver level at all and rather let the
> user of the data do it.
>
> This is an easy thing to say, but you have to recognize that PIO
> based data transfers must retain the EXACT behavior required of
> real DMA transfers, which is that a subsequent user mapping of the
> data must be able to see the data without an intervening
> flush_dcache_page() or similar.
>
> You can STILL optimize this the way you seem to want to. The
> update_mmu_cache() routing exists as a point at which you can
> do such deferred situation-based flushing optimizations.
>
> F.e. at ide_insw() time you mark pages as "might_need_flush" with
> some bit in page->flags, we even have bits allocated for arch specific
> use and we can allocate 1 or 2 more if you need them. Then at
> update_mmu_cache() time you check this bit and act accordingly.
I thought about this before, but I don't think there is much to optimize.
The PG_arch_1 bit is the only optimization which makes sense and setting
it by default to dirty, makes it a lot easier for PIO drivers. PIO
transfers are really the smallest problem as drivers write only into not
uptodate and so not mapped pages. We have to be more careful with other
writes, that they always call flush_dcache_page().
The point I don't like about update_mmu_cache() is that it's called
_after_ set_pte(). Practically it's maybe not a problem right now, but
the cache synchronization should happen before set_pte().
bye, Roman
From: Roman Zippel <[email protected]>
Date: Tue, 27 May 2003 12:53:12 +0200 (CEST)
The point I don't like about update_mmu_cache() is that it's called
_after_ set_pte(). Practically it's maybe not a problem right now, but
the cache synchronization should happen before set_pte().
update_mmu_cache() is specifically supposed to always occur before
anyone could try to use the mapping created.
If this is ever violated, it will be fixed because it is a BUG().
So I don't see what you're worried about.
If the above were not true, sparc64 wouldn't be able to compile
a kernel successfully. :-)
Hi,
On Tue, 27 May 2003, David S. Miller wrote:
> The point I don't like about update_mmu_cache() is that it's called
> _after_ set_pte(). Practically it's maybe not a problem right now, but
> the cache synchronization should happen before set_pte().
>
> update_mmu_cache() is specifically supposed to always occur before
> anyone could try to use the mapping created.
>
> If this is ever violated, it will be fixed because it is a BUG().
set_pte() establishes the mapping, this means another cpu can get the pte
and start reading the data e.g. into the instruction cache, but if that
cpu still has dirty data in the data cache (e.g. a write() or a disk
read), the following update_mmu_cache() might be too late.
bye, Roman
From: Roman Zippel <[email protected]>
Date: Wed, 28 May 2003 18:35:15 +0200 (CEST)
set_pte() establishes the mapping, this means another cpu can get the pte
and start reading the data e.g. into the instruction cache, but if that
cpu still has dirty data in the data cache (e.g. a write() or a disk
read), the following update_mmu_cache() might be too late.
If that really matters for you, your set_pte() could add this
operation to a list of pending set_pte()'s in the mm_struct arch
specific context area. Then at update_mmu_cache() it runs this
list of todo items.
I don't see the problem.
Hi,
On Wed, 28 May 2003, David S. Miller wrote:
> set_pte() establishes the mapping, this means another cpu can get the pte
> and start reading the data e.g. into the instruction cache, but if that
> cpu still has dirty data in the data cache (e.g. a write() or a disk
> read), the following update_mmu_cache() might be too late.
>
> If that really matters for you, your set_pte() could add this
> operation to a list of pending set_pte()'s in the mm_struct arch
> specific context area. Then at update_mmu_cache() it runs this
> list of todo items.
>
> I don't see the problem.
This would require quite some magic in set_pte(), as update_mmu_cache() is
not always called after set_pte(). I'd prefer to keep flush_icache_page()
for this, there are only a few places, which require this and it doesn't
hurt to keep this explicit.
BTW it's a bit unfortunate that flush_dcache_page() is called for reads
and writes. This might be needed for virtual indexed caches, but for
icache/dcache synchronization I'm only interested in writes (so that
write()/exec() works, but not every read() causes a cache flush). Could we
split that interface?
bye, Roman
From: Roman Zippel <[email protected]>
Date: Thu, 29 May 2003 02:12:47 +0200 (CEST)
BTW it's a bit unfortunate that flush_dcache_page() is called for reads
and writes.
Please don't say it this way, this is an inaccurate description.
DMA-mapping.txt defines very precisely when flush_dcache_page() is
invoked, and that is it's only definition. I purposely DO NOT say
that "this is for reads" or "this is for handling virtual aliasing
in L1 caches", I simply define where this macro is invoked and
that is it.
Specifically, flush_dcache_page() is called any time the kernel makes
cpu stores into a page cache page that might be mapped into a user's
address space.
On Wed, May 28, 2003 at 06:37:00PM -0700, David S. Miller wrote:
> Specifically, flush_dcache_page() is called any time the kernel makes
> cpu stores into a page cache page that might be mapped into a user's
> address space.
Presumably then the two in drivers/block/rd.c (ramdisk_readpage and
ramdisk_prepare_write) are out of spec then?
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
From: Russell King <[email protected]>
Date: Thu, 29 May 2003 08:13:15 +0100
Presumably then the two in drivers/block/rd.c (ramdisk_readpage and
ramdisk_prepare_write) are out of spec then?
They are doing the page cache store, they are correct.
This also reminds me that we are missing calls in the
loop driver.
Hi,
On Wed, 28 May 2003, David S. Miller wrote:
> DMA-mapping.txt defines very precisely when flush_dcache_page() is
> invoked, and that is it's only definition. I purposely DO NOT say
> that "this is for reads" or "this is for handling virtual aliasing
> in L1 caches", I simply define where this macro is invoked and
> that is it.
>
> Specifically, flush_dcache_page() is called any time the kernel makes
> cpu stores into a page cache page that might be mapped into a user's
> address space.
cachetlb.txt also says "_OR_ the kernel is about to read from a page cache
page..." and it's used like this in mm/filemap.c. I think it would be
better to move this into a separate function, e.g. like this:
flush_user_dcache_page(): If the page is mapped writable into user space,
flush the dirty data from the user D-cache, so it becomes visible from the
kernel.
flush_kernel_dcache_page(): The page was written from kernel space and the
data has to become visible in user space, so flush the data into memory
and possibly invalidate data in the user D/I-cache. The flush can be
delayed if this page is currently not mapped.
bye, Roman
From: Roman Zippel <[email protected]>
Date: Thu, 29 May 2003 19:49:15 +0200 (CEST)
cachetlb.txt also says "_OR_ the kernel is about to read from a page cache
page..." and it's used like this in mm/filemap.c.
That's correct.
I think it would be better to move this into a separate function,
How about just a flag, saying which event is occuring/about-to-occur?
Franks a lot,
David S. Miller
[email protected]