This test essentially opens a file (via open(2)), writes something,
opens it via a mmaped file object *read only* (via fopen(...,"rm)) reads
what was writtent, writes some more and reads it via the mmaped file
object.
This last read fails to get the data on parisc. The problem is that our
CPU cache is virtually indexed, and the page the write is storing the
data to (in the buffer cache) and the page it is mmapped to have the
same physical, but different virtual addresses. We need the write() to
trigger a cache update via flush_dcache_page to get the virtually
indexed cache in sync.
The reason this doesn't happen is because the mapping is not on the
mmap_shared list that flush_dcache_page() updates.
And the reason it's not on the correct list is because there's a check
in mm/mmap.c:do_mmap_pgoff() that drops the VM_SHARED flag on the
mapping if the file wasn't opened for writing (about line 541).
Semantically, it seems that whether the mmaping sees a write or not on a
different descriptor shouldn't depend on whether the underlying file was
opened read only or read write, so I think the glibc test is correct,
and we should keep the VM_SHARED flag even if the underlying file was
opened read only.
The patch is attached (and makes the test pass on parisc).
Comments?
James
On Fri, Aug 22, 2003 at 09:14:47AM -0700, David S. Miller wrote:
> On 22 Aug 2003 09:40:37 -0500
> James Bottomley <[email protected]> wrote:
> > The reason this doesn't happen is because the mapping is not on the
> > mmap_shared list that flush_dcache_page() updates.
>
> flush_dcache_page() checks both the shared and non-shared mmap lists,
> so if it is on _either_ list it is flushed. It does not check only
> the shared list.
Gah, that's going to get really inefficient. I still think we want to
split flush_dcache_page() into two operations -- flush_dcache_user() and
flush_dcache_kernel(). flush_dcache_user() would flush this specific
user mapping back to ram and flush_dcache_kernel() would flush the
kernel mapping. Obviously we'd still want to have flush_dcache_page()
as there are instances when you want to flush all user mappings and the
kernel mapping back to ram.
> The VM_SHARED change you are proposing is definitely wrong.
Why is it wrong? Why should whether-or-not a mapping is read-only affect
whether it's mapped shared? I can't see anything in SuS v3 that suggests
we should do this.
--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk
On 22 Aug 2003 09:40:37 -0500
James Bottomley <[email protected]> wrote:
> This test essentially opens a file (via open(2)), writes something,
> opens it via a mmaped file object *read only* (via fopen(...,"rm)) reads
> what was writtent, writes some more and reads it via the mmaped file
> object.
>
> This last read fails to get the data on parisc. The problem is that our
> CPU cache is virtually indexed, and the page the write is storing the
> data to (in the buffer cache) and the page it is mmapped to have the
> same physical, but different virtual addresses. We need the write() to
> trigger a cache update via flush_dcache_page to get the virtually
> indexed cache in sync.
>
> The reason this doesn't happen is because the mapping is not on the
> mmap_shared list that flush_dcache_page() updates.
flush_dcache_page() checks both the shared and non-shared mmap lists,
so if it is on _either_ list it is flushed. It does not check only
the shared list.
The VM_SHARED change you are proposing is definitely wrong.
On Fri, Aug 22, 2003 at 05:34:29PM +0100, Matthew Wilcox wrote:
> On Fri, Aug 22, 2003 at 09:14:47AM -0700, David S. Miller wrote:
> > On 22 Aug 2003 09:40:37 -0500
> > James Bottomley <[email protected]> wrote:
> > > The reason this doesn't happen is because the mapping is not on the
> > > mmap_shared list that flush_dcache_page() updates.
> >
> > flush_dcache_page() checks both the shared and non-shared mmap lists,
> > so if it is on _either_ list it is flushed. It does not check only
> > the shared list.
Eww.
> Gah, that's going to get really inefficient. I still think we want to
> split flush_dcache_page() into two operations -- flush_dcache_user() and
> flush_dcache_kernel(). flush_dcache_user() would flush this specific
> user mapping back to ram and flush_dcache_kernel() would flush the
> kernel mapping.
Where are you proposing calling only _user() and _kernel() from ?
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
On Fri, 22 Aug 2003 17:34:29 +0100
Matthew Wilcox <[email protected]> wrote:
> On Fri, Aug 22, 2003 at 09:14:47AM -0700, David S. Miller wrote:
> > On 22 Aug 2003 09:40:37 -0500
> > flush_dcache_page() checks both the shared and non-shared mmap lists,
> > so if it is on _either_ list it is flushed. It does not check only
> > the shared list.
>
> Gah, that's going to get really inefficient. I still think we want to
> split flush_dcache_page() into two operations -- flush_dcache_user() and
> flush_dcache_kernel(). flush_dcache_user() would flush this specific
> user mapping back to ram and flush_dcache_kernel() would flush the
> kernel mapping. Obviously we'd still want to have flush_dcache_page()
> as there are instances when you want to flush all user mappings and the
> kernel mapping back to ram.
flush_dcache_page() works only on kernel pages.
It is defined to execute when the kernel executes store instructions
into a page.
Therefore splitting it into a "user" part makes absolutely no
sense.
> > The VM_SHARED change you are proposing is definitely wrong.
>
> Why is it wrong? Why should whether-or-not a mapping is read-only affect
> whether it's mapped shared? I can't see anything in SuS v3 that suggests
> we should do this.
MAP_SHARED has no meaning if the mapping isn't writable.
On Fri, 22 Aug 2003 17:42:03 +0100
Russell King <[email protected]> wrote:
> > Gah, that's going to get really inefficient. I still think we want to
> > split flush_dcache_page() into two operations -- flush_dcache_user() and
> > flush_dcache_kernel(). flush_dcache_user() would flush this specific
> > user mapping back to ram and flush_dcache_kernel() would flush the
> > kernel mapping.
>
> Where are you proposing calling only _user() and _kernel() from ?
The is not acceptable answer.
Purely, flush_dcache_page() is defined to execute when the
kernel stores into a page cache page, and that is it's only
valid definition.
Splitting into a "user" part makes absolutely no sense.
On Fri, Aug 22, 2003 at 09:39:00AM -0700, David S. Miller wrote:
> flush_dcache_page() works only on kernel pages.
>
> It is defined to execute when the kernel executes store instructions
> into a page.
>
> Therefore splitting it into a "user" part makes absolutely no
> sense.
Uhm. So what happens when the user has stored into the page and now
the kernel wants to read from it? There's still data in the cache for
the user mapping that's non-coherent with the kernel mapping.
> > > The VM_SHARED change you are proposing is definitely wrong.
> >
> > Why is it wrong? Why should whether-or-not a mapping is read-only affect
> > whether it's mapped shared? I can't see anything in SuS v3 that suggests
> > we should do this.
>
> MAP_SHARED has no meaning if the mapping isn't writable.
Sure it does. It affects whether other writes to that page show up in
the shared read-only mapping.
--
"It's not Hollywood. War is real, war is primarily not about defeat or
victory, it is about death. I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk
On Fri, 22 Aug 2003 18:41:03 +0100
Matthew Wilcox <[email protected]> wrote:
> Uhm. So what happens when the user has stored into the page and now
> the kernel wants to read from it? There's still data in the cache for
> the user mapping that's non-coherent with the kernel mapping.
I see. This causes the page cache read flush_dcache_page() call
not to trigger.
I was very confused by the fact that this bug was explained by
saying that "the shared mmap list that flush_dcache_page() checks".
So the idea is that VM_SHARED should be set based upon whether
we mmap() the thing writable _not_ whether the open() was done
with write permission enabled.
Yes, I agree with that.
On Fri, 22 Aug 2003 10:36:34 -0700
"David S. Miller" <[email protected]> wrote:
> On Fri, 22 Aug 2003 18:41:03 +0100
> Matthew Wilcox <[email protected]> wrote:
>
> > Uhm. So what happens when the user has stored into the page and now
> > the kernel wants to read from it? There's still data in the cache for
> > the user mapping that's non-coherent with the kernel mapping.
>
> I see. This causes the page cache read flush_dcache_page() call
> not to trigger.
Wait, I'm confused again.
How can the user "write" to the mmap()'d side if PROT_WRITE
was not specified? That is the only case in which the proposed
patch could make a difference, we check this:
switch (flags & MAP_TYPE) {
case MAP_SHARED:
if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
return -EACCES;
Therefore if the user can write to the page, file->f_mode will
have the write bit set too.
So the proposed patch looks bogus to me.
On Fri, 2003-08-22 at 13:34, Hugh Dickins wrote:
> Might the problem be in parisc's __flush_dcache_page,
> which only examines i_mmap_shared?
This is the issue: we do treat them differently.
Semantics differ between privately mapped data (where there's no
coherency guarantee) and shared data (where there is). Flushing the
virtual cache is expensive on pa, so we only do it for the i_mmap_shared
list.
The difficulty is that a mmap of a read only file with MAP_SHARED is
expecting the shared cache semantics, but gets added to the non shared
list.
Since flushing the caches is a performance hog, we'd like do be able to
distinguish the cases where we have to do the flush MAP_SHARED mappings
from those we don't (MAP_PRIVATE).
James
On Fri, 22 Aug 2003, David S. Miller wrote:
>
> So the proposed patch looks bogus to me.
And to me. If VM_SHARED is set, then __vma_link_file puts the vma on
on i_mmap_shared. If VM_SHARED is not set, it puts the vma on i_mmap.
flush_dcache_page treats i_mmap_shared and i_mmap lists equally.
Might the problem be in parisc's __flush_dcache_page,
which only examines i_mmap_shared?
Though it's odd, I'm not keen on changing do_mmap_pgoff's usage of
VM_SHARED in a hurry: it's worked like that for years, and other
things (I forget) depend on it.
Hugh
On Fri, 22 Aug 2003 19:34:41 +0100 (BST)
Hugh Dickins <[email protected]> wrote:
> And to me. If VM_SHARED is set, then __vma_link_file puts the vma on
> on i_mmap_shared. If VM_SHARED is not set, it puts the vma on i_mmap.
> flush_dcache_page treats i_mmap_shared and i_mmap lists equally.
But file system page cache writes only call flush_dache_page()
if the page has a non-empty i_mmap_shared list.
> Might the problem be in parisc's __flush_dcache_page,
> which only examines i_mmap_shared?
No, it examines both lists, the problem is not there.
The issue seems to be some confusion about whether the
test program in question is actually mmap()'ing the area
with PROT_WRITE set, and if so why the test case isn't
passing because in such a case the page will have a non-empty
i_mmap_shared list.
In reference to a message from James Bottomley, dated Aug 22:
> On Fri, 2003-08-22 at 13:34, Hugh Dickins wrote:
> > Might the problem be in parisc's __flush_dcache_page,
> > which only examines i_mmap_shared?
>
> This is the issue: we do treat them differently.
as does some other archs, like ARM.
are we saying that MAP_SHARED != VM_SHARED? the mmap code allows
architectures to map pages differently if MAP_SHARED is specified, but
it puts it on i_mmap vs i_mmap_shared using VM_SHARED, and for read-only
files we silently drop VM_SHARED... so the page is mapped using
MAP_SHARED semantics but placed on i_mmap....
confused,
randolph
--
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
On Fri, 2003-08-22 at 13:31, David S. Miller wrote:
> On Fri, 22 Aug 2003 19:34:41 +0100 (BST)
> Hugh Dickins <[email protected]> wrote:
>
> > And to me. If VM_SHARED is set, then __vma_link_file puts the vma on
> > on i_mmap_shared. If VM_SHARED is not set, it puts the vma on i_mmap.
> > flush_dcache_page treats i_mmap_shared and i_mmap lists equally.
>
> But file system page cache writes only call flush_dache_page()
> if the page has a non-empty i_mmap_shared list.
Hmm, but if it does that then the glibc bug test should show up on sparc
because the i_mmap_shared list is empty if we only do MAP_SHARED of read
only files.
James
On 22 Aug 2003, James Bottomley wrote:
> On Fri, 2003-08-22 at 13:34, Hugh Dickins wrote:
> > Might the problem be in parisc's __flush_dcache_page,
> > which only examines i_mmap_shared?
>
> This is the issue: we do treat them differently.
>
> Semantics differ between privately mapped data (where there's no
> coherency guarantee) and shared data (where there is). Flushing the
> virtual cache is expensive on pa, so we only do it for the i_mmap_shared
> list.
>
> The difficulty is that a mmap of a read only file with MAP_SHARED is
> expecting the shared cache semantics, but gets added to the non shared
> list.
>
> Since flushing the caches is a performance hog, we'd like do be able to
> distinguish the cases where we have to do the flush MAP_SHARED mappings
> from those we don't (MAP_PRIVATE).
The naming "i_mmap_shared" does suggest that once upon a time those
lists were as you'd like; and that at some point it was changed.
Perhaps some arches prefer the coherency guarantee I'm familiar with
in MAP_PRIVATE (yes, when you modify a page yourself, it cows off and
becomes private; but in i386 it's shared up until then), and other
arches (like yours) would prefer to avoid the overhead.
Hugh
On 22 Aug 2003 13:56:06 -0500
James Bottomley <[email protected]> wrote:
> On Fri, 2003-08-22 at 13:31, David S. Miller wrote:
> > On Fri, 22 Aug 2003 19:34:41 +0100 (BST)
> > Hugh Dickins <[email protected]> wrote:
> >
> > > And to me. If VM_SHARED is set, then __vma_link_file puts the vma on
> > > on i_mmap_shared. If VM_SHARED is not set, it puts the vma on i_mmap.
> > > flush_dcache_page treats i_mmap_shared and i_mmap lists equally.
> >
> > But file system page cache writes only call flush_dache_page()
> > if the page has a non-empty i_mmap_shared list.
>
> Hmm, but if it does that then the glibc bug test should show up on sparc
> because the i_mmap_shared list is empty if we only do MAP_SHARED of read
> only files.
Sparc64's alias'able caches are 1) write-through and 2) quite small.
I think I begin to see the issue clearly now.
But you cannot do the VM_SHARED change without an audit first.
Lots of code thinks that VM_SHARED means someone maybe wrote
to the page through a mmap(). For example look at how filemap
sync interprets this flag bit.
On Fri, 2003-08-22 at 14:19, David S. Miller wrote:
> Sparc64's alias'able caches are 1) write-through and 2) quite small.
>
> I think I begin to see the issue clearly now.
>
> But you cannot do the VM_SHARED change without an audit first.
> Lots of code thinks that VM_SHARED means someone maybe wrote
> to the page through a mmap(). For example look at how filemap
> sync interprets this flag bit.
Yes, the issue seems to be that the flush_dcache_page() was implemented
with the thought that the caches of the shared mappings may contain
modified data that needs to be flushed to the aliased page.
The opposite property: that the caches of the aliased page need to be
invalidated because someone else changed data in the aliased page seems
to work as a byproduct of the above implementation.
But some of the checks for !list_empty(&mapping->i_shared) are going to
prevent the necessary invalidations on read only shared mappings...which
was the initial problem.
The only issue I can see with not dropping VM_SHARED for read only
shared mappings is that we do spurious (but harmless)
flushe_dcache_page() on reads.
There also appears to be a lurking prob lem in do_mremap, where it keys
off the VM_SHARED flag to set the MAP_SHARED flag for
get_unmapped_area. That's going to cause us a problem on parisc because
SHARED pages need to obey slightly stricter alignment constraints
All in all, I think not dropping VM_SHARED on read only shared mappings
is the right thing to do.
Do you need a more detailed audit?
James
On 22 Aug 2003 17:27:33 -0500
James Bottomley <[email protected]> wrote:
> Yes, the issue seems to be that the flush_dcache_page() was implemented
> with the thought that the caches of the shared mappings may contain
> modified data that needs to be flushed to the aliased page.
>
> The opposite property: that the caches of the aliased page need to be
> invalidated because someone else changed data in the aliased page seems
> to work as a byproduct of the above implementation.
>
> But some of the checks for !list_empty(&mapping->i_shared) are going to
> prevent the necessary invalidations on read only shared mappings...which
> was the initial problem.
The theory of operation is that there are two "classes" of mappings
for a page, the implicit kernel mapping and all user mappings. The
goal is to flush out one class from the cache when the other one writes
to such a page.
When a write() system call occurs, the kernel "class" is writing to
the page so all user mappings (shared or not!) need to be flushed
out.
When a read() system call occurs, and shared+writable mmap()'s of
the page exist, we must flush the user mapping "class" before the
kernel "class" tries to read from that page.
You cannot avoid doing things exactly as I've just described without
allowing bad aliases to be in the cache and corrupt data.
Your test case is essentially, annotated with what the kernel
should do at each step:
static const char *test1 = "Line the first\n";
static const char *test2 = "Line the second\n";
temp_fd = open(O_RDWR);
write(temp_fd, test1, sizeof test1 - 1);
No mmaps of this page, so no flush_dcache_page() call.
...
fd = open(O_RDONLY);
p = mmap(fd, ... PROT_READ ...);
Not writable, not added to i_mmap_shared list, instead it is
added to normal i_mmap list.
memcpy(tmp_buf, p, sizeof test1 - 1);
p += sizeof test1 - 1;
if (strcmp(tmp_buf, test1))
BUG();
...
write(temp_fd, test2, sizeof test2 - 1);
Mmaps of this page exist, flush_dcache_page() call is made.
memcpy(tmp_buf, p, sizeof test2 - 1);
if (strcmp(tmp_buf, test2))
BUG();
And thus memcpy() sees correct data.
I think on parisc you are trying to avoid the write() case
of the cache flush for non-shared mmap()s, and sorry you
really can't do this, again this is:
When a write() system call occurs, the kernel "class" is writing to
the page so all user mappings (shared or not!) need to be flushed
out.
If your flush_dcache_page() is not doing this, it's no wonder
the test case fails for you.
On Fri, 2003-08-22 at 17:41, David S. Miller wrote:
> I think on parisc you are trying to avoid the write() case
> of the cache flush for non-shared mmap()s, and sorry you
> really can't do this, again this is:
>
> When a write() system call occurs, the kernel "class" is writing to
> the page so all user mappings (shared or not!) need to be flushed
> out.
>
> If your flush_dcache_page() is not doing this, it's no wonder
> the test case fails for you.
Yes, that's precisely what we're trying to do. Our problem is that we
have to issue the flush to all the aliased addresses (one cache line at
a time) which is phenomenally expensive.
What we were hoping is that we could rely on this little property of
mmap:
MAP_PRIVATE
Create a private copy-on-write mapping. Stores
to the region do not affect the original file.
It is unspecified whether changes made to the
file after the mmap call are visible in the
mapped region.
To avoid having to flush the non-shared mappings (basically on parisc if
you write to a file backing a MAP_PRIVATE mapping then we don't
guarantee you see the update).
I suppose if we had a way of telling if any of the i_mmap list members
were really MAP_SHARED semantics mappings, then we could alter our
flush_dcache_page() implementation to work.
James
On 22 Aug 2003, James Bottomley wrote:
>
> I suppose if we had a way of telling if any of the i_mmap list members
> were really MAP_SHARED semantics mappings, then we could alter our
> flush_dcache_page() implementation to work.
Good idea. It's VM_MAYSHARE you need to check for.
Hugh
===== mremap.c 1.32 vs edited =====
--- 1.32/mm/mremap.c Thu Aug 7 12:29:10 2003
+++ edited/mremap.c Sat Aug 23 10:54:21 2003
@@ -420,7 +420,7 @@
if (flags & MREMAP_MAYMOVE) {
if (!(flags & MREMAP_FIXED)) {
unsigned long map_flags = 0;
- if (vma->vm_flags & VM_SHARED)
+ if (vma->vm_flags & VM_MAYSHARE)
map_flags |= MAP_SHARED;
new_addr = get_unmapped_area(vma->vm_file, 0, new_len,
On 22 Aug 2003 20:09:30 -0500
James Bottomley <[email protected]> wrote:
> What we were hoping is that we could rely on this little property of
> mmap:
>
> MAP_PRIVATE
> Create a private copy-on-write mapping. Stores
> to the region do not affect the original file.
> It is unspecified whether changes made to the
> file after the mmap call are visible in the
> mapped region.
>
> To avoid having to flush the non-shared mappings (basically on parisc if
> you write to a file backing a MAP_PRIVATE mapping then we don't
> guarantee you see the update).
>
> I suppose if we had a way of telling if any of the i_mmap list members
> were really MAP_SHARED semantics mappings, then we could alter our
> flush_dcache_page() implementation to work.
I thought about this very deeply last night and this morning.
And what you're trying to optimize won't work. Here is why.
If the first access to a MAP_PRIVATE mapping of a page is a read,
we'll use the page-cache page. This means that, with your
optimization, during this time if another cpu write()`s into the
page we'll lose the data update.
Sorry :(
On Sat, 23 Aug 2003 08:22:19 +0100 (BST)
Hugh Dickins <[email protected]> wrote:
> On 22 Aug 2003, James Bottomley wrote:
> >
> > I suppose if we had a way of telling if any of the i_mmap list members
> > were really MAP_SHARED semantics mappings, then we could alter our
> > flush_dcache_page() implementation to work.
>
> Good idea. It's VM_MAYSHARE you need to check for.
Nope, please see my other email for why all of these ideas
simply will not work. If the first fault-in of a MAP_PRIVATE
page is a read, it's just like a MAP_SHARED read-only page until
the first write occurs.
On Sat, 2003-08-23 at 16:43, David S. Miller wrote:
> On 22 Aug 2003 20:09:30 -0500
> James Bottomley <[email protected]> wrote:
>
> > What we were hoping is that we could rely on this little property of
> > mmap:
> >
> > MAP_PRIVATE
> > Create a private copy-on-write mapping. Stores
> > to the region do not affect the original file.
> > It is unspecified whether changes made to the
> > file after the mmap call are visible in the
> > mapped region.
> >
> > To avoid having to flush the non-shared mappings (basically on parisc if
> > you write to a file backing a MAP_PRIVATE mapping then we don't
> > guarantee you see the update).
> >
> > I suppose if we had a way of telling if any of the i_mmap list members
> > were really MAP_SHARED semantics mappings, then we could alter our
> > flush_dcache_page() implementation to work.
>
> I thought about this very deeply last night and this morning.
> And what you're trying to optimize won't work. Here is why.
>
> If the first access to a MAP_PRIVATE mapping of a page is a read,
> we'll use the page-cache page. This means that, with your
> optimization, during this time if another cpu write()`s into the
> page we'll lose the data update.
>
> Sorry :(
Could you elaborate some more? I agree that the MAP_PRIVATE mapping may
not see cpu1's write because of cache incoherencies (but that's what I
believe is covered by the `unspecified' bit of the MAP_PRIVATE
definition above). MAP_PRIVATE is COW (i.e. the page is marked read
only while it's shared), so there can be no data to flush in the cache
for the page of the MAP_PRIVATE process...The only scenario I see where
we can get cache based data destruction is if two cache aliases both
contain dirty caches for the page (which can only happen for MAP_SHARED
RW, which we already have the correct semantics for...they're racing to
do a msync and the last one in wins).
James
On 23 Aug 2003 17:21:21 -0500
James Bottomley <[email protected]> wrote:
> On Sat, 2003-08-23 at 16:43, David S. Miller wrote:
> > On 22 Aug 2003 20:09:30 -0500
> > James Bottomley <[email protected]> wrote:
> >
> > > To avoid having to flush the non-shared mappings (basically on parisc if
> > > you write to a file backing a MAP_PRIVATE mapping then we don't
> > > guarantee you see the update).
BTW, what gains to you really get from this optimization?
How often do writes happen to files while private mappings
to it exist? :-) This is one of the reasons I think this
discussion is a bit silly.
What specific cases does your optimization help, and how common is it?
Show us some numbers.
On Sat, 2003-08-23 at 17:51, David S. Miller wrote:
> Ok. Let me think about this a bit more.
>
> The safest solution for parisc, meanwhile, would be to walk the
> non-shared mmap list checking for any instance of the VM_MAYSHARE bit
> being set.
Right, that's how I plan to fix this problem in parisc.
We also need the VM_MAYSHARE flag to propagate across remappings, which
was the general kernel fix I sent some emails back.
James
On 23 Aug 2003 17:21:21 -0500
James Bottomley <[email protected]> wrote:
> On Sat, 2003-08-23 at 16:43, David S. Miller wrote:
> > On 22 Aug 2003 20:09:30 -0500
> > James Bottomley <[email protected]> wrote:
> >
> > > MAP_PRIVATE
> > > Create a private copy-on-write mapping. Stores
> > > to the region do not affect the original file.
> > > It is unspecified whether changes made to the
> > > file after the mmap call are visible in the
> > > mapped region.
...
> Could you elaborate some more? I agree that the MAP_PRIVATE mapping may
> not see cpu1's write because of cache incoherencies (but that's what I
> believe is covered by the `unspecified' bit of the MAP_PRIVATE
> definition above).
Ok. Let me think about this a bit more.
The safest solution for parisc, meanwhile, would be to walk the
non-shared mmap list checking for any instance of the VM_MAYSHARE bit
being set.
On Sat, 2003-08-23 at 17:53, David S. Miller wrote:
> BTW, what gains to you really get from this optimization?
>
> How often do writes happen to files while private mappings
> to it exist? :-) This is one of the reasons I think this
> discussion is a bit silly.
>
> What specific cases does your optimization help, and how common is it?
> Show us some numbers.
Not having to flush the private mappings is a huge optimisation. Our
current flush_dcache_page implementation allows us only to flush a
single page and get all the aliased caches updated because we carefully
align MAP_SHARED areas (by supplying our own arch_get_unmapped_area()).
However, the alignment constraint is 4MB to get this property of the
virtually aliased caches, so we can't afford to align all mappings like
this (for our 32 bit userspace, anyway).
If we were to have to flush the private i_mmap list, we'd have to do a
page flush for *every* entry in the list (that's 256 instructions per
page at a cache width of 16 bytes). This would be a horrific overhead.
using the VM_MAYSHARE to carry the read only shared mapping semantics
indication still allows us to align correctly, but the only additional
overhead we incur is to walk the i_mmap list to find VM_MAYSHARE
mappings as well as i_mmap_shared. Since we can key of VM_MAYSHARE to
do the alignment, we still only need flush the first one we come to.
This all works as long as we can agree there are no pathological mmap
cases that force us to flush *all* the i_mmap mappings...which is what I
think the discussion has come down to.
James
On 23 Aug 2003 18:11:16 -0500
James Bottomley <[email protected]> wrote:
> On Sat, 2003-08-23 at 17:53, David S. Miller wrote:
> > How often do writes happen to files while private mappings
> > to it exist? :-) This is one of the reasons I think this
> > discussion is a bit silly.
...
> Not having to flush the private mappings is a huge optimisation.
You're not answering my question :(
I know that when the case _DOES_ happen, your optimization is
worthwhile, my question is not about this. My question is about
how often does this case happen.