2003-09-04 14:49:30

by James Bottomley

[permalink] [raw]
Subject: [PATCH] fix remap of shared read only mappings

When mmap MAP_SHARED is done on a file, it gets marked with VM_MAYSHARE
and, if it's read/write, VM_SHARED. However, if it is remapped with
mremap(), the MAP_SHARED is only passed into the new mapping based on
VM_SHARED. This means that remapped read only MAP_SHARED mappings lose
VM_MAYSHARE. This is causing us a problem on parisc because we have to
align all shared mappings carefully to mitigate cache aliasing problems.

The fix is to key passing the MAP_SHARED flag back into the remapped are
off VM_MAYSHARE not VM_SHARED.

James

===== mremap.c 1.32 vs 1.33 =====
--- 1.32/mm/mremap.c Thu Aug 7 12:29:10 2003
+++ 1.33/mm/mremap.c Sun Aug 24 06:50:10 2003
@@ -420,7 +420,7 @@
if (flags & MREMAP_MAYMOVE) {
if (!(flags & MREMAP_FIXED)) {
unsigned long map_flags = 0;
- if (vma->vm_flags & VM_SHARED)
+ if (vma->vm_flags & VM_MAYSHARE)
map_flags |= MAP_SHARED;

new_addr = get_unmapped_area(vma->vm_file, 0, new_len,


2003-09-04 21:48:26

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

James Bottomley wrote:
> When mmap MAP_SHARED is done on a file, it gets marked with VM_MAYSHARE
> and, if it's read/write, VM_SHARED. However, if it is remapped with
> mremap(), the MAP_SHARED is only passed into the new mapping based on
> VM_SHARED. This means that remapped read only MAP_SHARED mappings lose
> VM_MAYSHARE. This is causing us a problem on parisc because we have to
> align all shared mappings carefully to mitigate cache aliasing problems.
>
> The fix is to key passing the MAP_SHARED flag back into the remapped are
> off VM_MAYSHARE not VM_SHARED.

At first I disagreed with your patch. I was thinking that special
alignment is only really needed to avoid aliasing problems for
_writable_ shared mappings, and VM_SHARED is right for that. (Which
would indicate that mmap is faulty, not mremap).

But after some thought, I agree with you.

A read-only mapping of a shared segment must be coherent with other,
possibly writable mappings, so far as the architecture can guarantee
that.

That coherence should not disappear if the file handle used for the
mapping was opened O_RDONLY.

One last thought: this is what PROT_SEM is for. Linux doesn't use
this in any useful way. But, technically, mmap with MAP_SHARED ad
PROT_SEM should enable cache coherence, and that might include
aligning the address. Without PROT_SEM an application should not rely
on cache coherence.

-- Jamie

2003-09-04 22:34:30

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

On Thu, 2003-09-04 at 17:48, Jamie Lokier wrote:
> One last thought: this is what PROT_SEM is for. Linux doesn't use
> this in any useful way. But, technically, mmap with MAP_SHARED ad
> PROT_SEM should enable cache coherence, and that might include
> aligning the address. Without PROT_SEM an application should not rely
> on cache coherence.

I'm not familiar with the PROT_SEM flag. I don't believe it to be
defined in POSIX.

However, POSIX does imply levels of cache coherence for both MAP_SHARED
and MAP_PRIVATE:

With MAP_SHARED, any change to the underlying object after the mapping
must become visible to the mapper (although the change may be delayed by
local caching of the changer's implementation until it is explicitly
flushed).

With MAP_PRIVATE it is undefined what happens if the underlying object
is changed after mapping.

So regardless of PROT_SEM we have no choice but to worry about cache
coherence issues on MAP_SHARED mappings.

James


2003-09-04 22:56:50

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

James Bottomley wrote:
> However, POSIX does imply levels of cache coherence for both MAP_SHARED
> and MAP_PRIVATE:
>
> With MAP_SHARED, any change to the underlying object after the mapping
> must become visible to the mapper (although the change may be delayed by
> local caching of the changer's implementation until it is explicitly
> flushed).
...
> So regardless of PROT_SEM we have no choice but to worry about cache
> coherence issues on MAP_SHARED mappings.

You have just argued _against_ worrying about cache coherence by
aligning mapping addresses.

Basically, POSIX says shared mappings aren't guanteed to be coherent
until you call msync(). Then you can just do whatever is needed to
make different views coherent. That's easier now that we have rmap.

-- Jamie

2003-09-04 23:23:59

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

On Thu, 2003-09-04 at 18:56, Jamie Lokier wrote:
> You have just argued _against_ worrying about cache coherence by
> aligning mapping addresses.
>
> Basically, POSIX says shared mappings aren't guanteed to be coherent
> until you call msync(). Then you can just do whatever is needed to
> make different views coherent. That's easier now that we have rmap.

I think you may misunderstand what I mean by coherence: Our problem is
the VIVT processor caches. Once one mapper does an msync, that data
must be visible to all the other mappers, so at that point we have to
flush the cache lines of all the other mappers. On PA, we only need to
flush one correctly aligned address to get the VIVT cache to flush all
the others. However, the kernel page cache usually holds an unaligned
reference so we need to do the extra aligned flush when this data
changes. If we didn't do the alignment, we'd need to flush every
virtual address in the current CPU translation for that page.

If you mean PROT_SEM requires immediate coherence without an msync, then
those semantics would be very tricky to achieve on parisc since we'd
need the kernel virtual address of the page in the page cache correctly
aligned as well.

rmap isn't really necessary for this, that's what the
page->mapping->i_mmap and i_mmap_shared lists are for.

James


2003-09-04 23:40:30

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

James Bottomley wrote:
> I think you may misunderstand what I mean by coherence: Our problem is
> the VIVT processor caches. Once one mapper does an msync, that data
> must be visible to all the other mappers, so at that point we have to
> flush the cache lines of all the other mappers. On PA, we only need to
> flush one correctly aligned address to get the VIVT cache to flush all
> the others. However, the kernel page cache usually holds an unaligned
> reference so we need to do the extra aligned flush when this data
> changes. If we didn't do the alignment, we'd need to flush every
> virtual address in the current CPU translation for that page.

Ok, I understand why you want matching alignments now. :)
(So that MS_INVALIDATE doesn't have to do anything).

> If you mean PROT_SEM requires immediate coherence without an msync, then
> those semantics would be very tricky to achieve on parisc since we'd
> need the kernel virtual address of the page in the page cache correctly
> aligned as well.

Linux hasn't ever done anything useful with PROT_SEM. As far as I
know, on some other systems PROT_SEM has meant that you mark pages
uncacheable or similar, but only on systems where that is needed to
implement IPC through the shared memory. For some definition IPC.

-- JAmie

2003-09-05 00:45:42

by Daniel Phillips

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

On Friday 05 September 2003 00:33, James Bottomley wrote:
> On Thu, 2003-09-04 at 17:48, Jamie Lokier wrote:
> However, POSIX does imply levels of cache coherence for both MAP_SHARED
> and MAP_PRIVATE:
>
> With MAP_SHARED, any change to the underlying object after the mapping
> must become visible to the mapper (although the change may be delayed by
> local caching of the changer's implementation until it is explicitly
> flushed).

This an interesting tidbit, as I'm busy working on a DFS mmap for OpenGFS, and
I want to be sure I'm implementing true-blue Posix semantics. But trawling
through the Posix/SUS specification at:

http://www.unix-systems.org/version3/online.html

all it says is that for MAP_SHARED "write references shall change the
underlying object." I don't see anything about when those changes become
visible to other mappers, much less any discussion of local caching. Am I
looking at the wrong document?

Regards,

Daniel


2003-09-05 00:52:50

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

On Thu, 2003-09-04 at 20:49, Daniel Phillips wrote:
> This an interesting tidbit, as I'm busy working on a DFS mmap for OpenGFS, and
> I want to be sure I'm implementing true-blue Posix semantics. But trawling
> through the Posix/SUS specification at:
>
> http://www.unix-systems.org/version3/online.html
>
> all it says is that for MAP_SHARED "write references shall change the
> underlying object." I don't see anything about when those changes become
> visible to other mappers, much less any discussion of local caching. Am I
> looking at the wrong document?

Not sure which is "correct", but the one I'm looking at is the POSIX
update from the open group:

http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html

And that's where I was quoting from.

James


2003-09-05 00:50:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings


On Fri, 5 Sep 2003, Daniel Phillips wrote:
>
> This an interesting tidbit, as I'm busy working on a DFS mmap for OpenGFS, and
> I want to be sure I'm implementing true-blue Posix semantics.

Please don't.

POSIX semantics are weak enough not to be interesting. Exactly because a
number of old hardware platforms simply _cannot_ give you good coherency.
And a number of old UNIXes couldn't either, for that matter.

What really matters is that mmap() under Linux is 100% coherent, as far as
the hardware just allows. We haven't taken the easy way out. We shouldn't
start now.

Linus

2003-09-05 01:18:16

by Daniel Phillips

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

On Friday 05 September 2003 02:52, James Bottomley wrote:
> On Thu, 2003-09-04 at 20:49, Daniel Phillips wrote:
> > This an interesting tidbit, as I'm busy working on a DFS mmap for
> > OpenGFS, and I want to be sure I'm implementing true-blue Posix
> > semantics. But trawling through the Posix/SUS specification at:
> >
> > http://www.unix-systems.org/version3/online.html
> >
> > all it says is that for MAP_SHARED "write references shall change the
> > underlying object." I don't see anything about when those changes become
> > visible to other mappers, much less any discussion of local caching. Am
> > I looking at the wrong document?
>
> Not sure which is "correct", but the one I'm looking at is the POSIX
> update from the open group:
>
> http://www.opengroup.org/onlinepubs/007904975/functions/mmap.html
>
> And that's where I was quoting from.

It appears to be the same text. Either I'm blind, or the part about the
caching was your editorial commentaryk. Anyway, I'm going with Linus' ruling
on semantics, I agree totally.

Regards,

Daniel

2003-09-05 01:08:18

by Alan

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

On Gwe, 2003-09-05 at 01:49, Linus Torvalds wrote:
> What really matters is that mmap() under Linux is 100% coherent, as far as
> the hardware just allows. We haven't taken the easy way out. We shouldn't
> start now.

NFS ?

The problem with OpenGFS is that it is a network file system so
implementing "perfect" shared mmap semantics might actually reduce it
from handy to useless. Right now the worst we have to do is mark pages
uncached in some weird shared map cases, with pages being bounced across
firewire its a bit different.

2003-09-05 01:28:01

by Daniel Phillips

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings

On Friday 05 September 2003 03:07, Alan Cox wrote:
> On Gwe, 2003-09-05 at 01:49, Linus Torvalds wrote:
> > What really matters is that mmap() under Linux is 100% coherent, as far
> > as the hardware just allows. We haven't taken the easy way out. We
> > shouldn't start now.
>
> NFS ?

NFS doesn't attempt to implement local Posix semantics, but a DFS should.
Anyway, Linus has ruled we're being held to the higher standard of "Linux
semantics".

> The problem with OpenGFS is that it is a network file system so
> implementing "perfect" shared mmap semantics might actually reduce it
> from handy to useless. Right now the worst we have to do is mark pages
> uncached in some weird shared map cases, with pages being bounced across
> firewire its a bit different.

Sistina has been doing it the "perfect" way for some time now, and it's worked
out OK. This relies on writes being rare.

Things will definitely slow to a crawl if several nodes keep writing little
bits into the same mmap, but that's a case of "doctor it hurts" I think. If
there's a common application that does this, I'd appreciate knowing about it.

Regards,

Daniel

2003-09-05 04:35:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fix remap of shared read only mappings


On Fri, 5 Sep 2003, Alan Cox wrote:
>
> NFS ?

On a local machine, yes.

Clearly NFS will _not_ be cache-coherent over a network. But even NFS is
supposed to be cache-coherent on a per-client basis.

Networking is _not_ an excuse for not doing that right.

Linus