2002-07-23 16:00:03

by Gregory Giguashvili

[permalink] [raw]
Subject: Problem with msync system call

Hello,

RH 7.2 (kernel 2.4.7-10) and RH 7.3 (kernel 2.4.18-3) (I haven't checked the
others).

I attempt to read/write memory mapped file from two Linux machines, which
resides on NFS mounted drive. The file gets corrupted since the changes made
on one machine aren't immediately available on the other. The sample program
is attached to this e-mail. The problematic API set includes (mmap, munmap
and msync system calls). It seems that MS_INVALIDATE has no effect....

The original code uses NFS locking to assure file consistency, but the
example misses this part to simplicity (locking is simulated by the user).
The same code works on a variety of other operating systems, but fails to
work between two Linux or Linux/Other OS machines.

I decided to give up on the performance issue and even tried to remap the
whole file before every attempt to read/write the mapped file. Surprisingly,
even this extreme measure didn't help (the code is commented out using
preprocessor directives in the sample program).
I couldn't find any patch, which specifically fixes this problem, though I
have seen some patches related to msync, which I don't think to be relevant
(Am I wrong?).

I'm sure that someone has come across this problem and I sure hope there is
some workaround/patch available.
Any help will be greatly appreciated.

Thanks in advance.
Giga

<<mmap.cc>>


Attachments:
mmap.cc (1.34 kB)

2002-07-23 16:28:12

by Dr. Michael Weller

[permalink] [raw]
Subject: Re: Problem with msync system call

On Tue, 23 Jul 2002, Gregory Giguashvili wrote:

> Hello,
>
> RH 7.2 (kernel 2.4.7-10) and RH 7.3 (kernel 2.4.18-3) (I haven't checked the
> others).
>
> I attempt to read/write memory mapped file from two Linux machines, which
> resides on NFS mounted drive. The file gets corrupted since the changes made
> on one machine aren't immediately available on the other. The sample program
> is attached to this e-mail. The problematic API set includes (mmap, munmap
> and msync system calls). It seems that MS_INVALIDATE has no effect....
[... rest deleted ...]

I'm no NFS/mmap author or expert, only an experienced admin/user regarding
this issue, still:

I must say I have a very uneasy feeling about such a usage and don't know
how it is covered by standards (although you claim it works for non
linux). Experience shows that such a construct is very fragile. Note also
that NFS file locking is not mandatory, only advisory (read: user level)
and it is unclear how that will interact with mmap.

That said, I'd expect at least the munmap using variant to work as
expected. I don't know if msync in this context is more than a placebo
function. Again experience shows that linux is very happy in caching
NFS files and contents locally, though. For inter-machine file exchange
(write on one machine, read immediately on the other) synchronized clocks
between server and clients within millisecond range are mandatory.

Forget about clocks set manually, use xntpd or timed or similar. Without
such synchronized clocks (which synchronize caches), forget about NFS.
Note that this holds in principle even w/o the caching issue (like having
synchronized user databases.. it only seems that due to this caching issue
linux NFS seems to be VERY sensitive to out of sync clocks compared to
other NFS implementations.

So, my guess would be your clocks are out of sync, hence the copies of the
network shared file are. (you know, like: server clock is some
hours/minutes behind the clients so each client thinks IT has the most
actual copy of the file)

Michael.

--

Michael Weller: [email protected], [email protected],
or even [email protected]. If you encounter an eowmob account on
any machine in the net, it's very likely it's me.

2002-07-23 17:00:17

by Gregory Giguashvili

[permalink] [raw]
Subject: RE: Problem with msync system call

Thanks a lot for your comments.

>I must say I have a very uneasy feeling about such a usage and
>don't know how it is covered by standards (although you claim it works for
non
>linux). Experience shows that such a construct is very
>fragile. Note also that NFS file locking is not mandatory, only advisory
(read:
>user level) and it is unclear how that will interact with mmap.

I agree that the construction is very fragile, but...

- It's been working on a variety of OSes for years... There is no reason for
Linux not to support it as a mature operating system.
- This works for read/write system calls if the file is open with O_SYNC
flag and NFS is mounted using "sync" option.
- There has to be something in the OS that users can do to unconditionally
reread mapped files (no matter if this is NFS or not)
- Even mandatory locking should be sufficient for mmap interaction if one
cares to flush information to the disk before the file is unlocked.
Suprisingly, locking is not the problem here :)

Best,
Giga

2002-07-23 17:06:22

by Andi Kleen

[permalink] [raw]
Subject: Re: Problem with msync system call

Gregory Giguashvili <[email protected]> writes:

> I'm sure that someone has come across this problem and I sure hope there is
> some workaround/patch available.

Do a F_SETFL lock/unlock on the file That should act as a full NFS write
barrier and flush all buffers. Best is if you synchronize between the various
writers with the full lock.

-Andi

2002-07-23 17:44:28

by Gregory Giguashvili

[permalink] [raw]
Subject: RE: Problem with msync system call

>Do a F_SETFL lock/unlock on the file That should act as a
>full NFS write barrier and flush all buffers. Best is if you synchronize
>between the various writers with the full lock.

Do you mean F_SETLK? If so, this didn't help (the source is attached).
If you meant something else, could you be more specific, please?

Thanks in advance.
Giga


Attachments:
mmap.cc (1.91 kB)

2002-07-23 17:58:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: Problem with msync system call

>>>>> " " == Andi Kleen <[email protected]> writes:

> Do a F_SETFL lock/unlock on the file That should act as a full
> NFS write barrier and flush all buffers. Best is if you
> synchronize between the various writers with the full lock.

Note: This will not work for files that are in the process of being
mmap()ed. In order to make it all work, you have to munmap() first,
then lock, then mmap().

This is due to limitations in the VM which won't allow anyone to
invalidate a mapping that is in use.

Cheers,
Trond

2002-07-23 18:03:22

by Gregory Giguashvili

[permalink] [raw]
Subject: RE: Problem with msync system call

>So, my guess would be your clocks are out of sync, hence the
>copies of the network shared file are. (you know, like: server clock is
some
>hours/minutes behind the clients so each client thinks IT has the most
>actual copy of the file)
The clocks were out of sync, indeed. I tried to sync them and still the same
problem persists...

Thanks a lot for your help.
Giga

2002-07-23 18:03:07

by Andi Kleen

[permalink] [raw]
Subject: Re: Problem with msync system call

On Tue, Jul 23, 2002 at 08:45:07PM +0200, Gregory Giguashvili wrote:
> >Do a F_SETFL lock/unlock on the file That should act as a
> >full NFS write barrier and flush all buffers. Best is if you synchronize
> >between the various writers with the full lock.
>
> Do you mean F_SETLK? If so, this didn't help (the source is attached).

F_SETLK sorry.

You need to do it on both reader and writer. On the writer it acts
like a fsync(), on the reader it should clear the cache.

I think the problem in your case is that you have the pages mmaped.
NFS uses invalidate_inode_pages() to throw away the cache, but that
doesn't work when the pages are mapped. It may work to munmap/mmap
around the locking.

In theory with rmap (=2.5) the kernel could do that unmap/remap for you,
but it will be probably non trivial to implement.

-Andi

2002-07-23 18:35:48

by Gregory Giguashvili

[permalink] [raw]
Subject: RE: Problem with msync system call

It seems to finally work both between two Linux machines and
between Linux and other OSes! Thank you all for your help!

I attach the working source to whoever that cares...

>I think the problem in your case is that you have the pages mmaped.
>NFS uses invalidate_inode_pages() to throw away the cache, but that
>doesn't work when the pages are mapped. It may work to munmap/mmap
>around the locking.
Now, I think I understand what the problem is.

Can we make msync call with MS_INVALIDATE flag temporarily unmap the
file, invalidate the cache and remap the file again? It sounds like
a hell of an overhead, but users don't expect msync call to be a
light one.

Anyway, this would be better than the current behavior, which in fact does
nothing for the mapped files. Also, the documentation for msync call is
extremely vague, which only adds to the confusion.

Best,
Giga


Attachments:
mmap.cc (1.90 kB)