2004-11-12 19:06:21

by Miika Pekkarinen

[permalink] [raw]
Subject: Using mmap result in data corruption

Hi

(I am sending this message again to this list because my previous post
about a week ago didn't appear on this list. Now I have subscribed and I
hope for better luck. :)

I have found a reproducable issue with NFS that will lead to file
corruption on server. When a file is created with O_TRUNC and data is
copied using mmap, the new file on the server will be filled with null bytes.

NFS was exported using the following flags:
/home *.ihme.org(rw,async,wdelay,root_squash,no_subtree_check)

This problem was reproduced with kernel versions 2.4.27 (server), 2.6.9
(server/client), 2.6.10-rc1 (client).

The included sample C-code should reproduce the problem:
client$ gcc -o nfsdebug nfddebug.c
client$ ./nfsdebug
client$ ls -la nfsdebug.txt
client$ -rw-r--r-- 1 miipekk users 1393 Nov 5 23:25 nfsdebug.txt
client$ cmp nfsdebug.txt nfsdebug.c
client$

server$ ls -la nfsdebug.txt
server$ -rw-r--r-- 1 miipekk users 1393 Nov 5 23:25 nfsdebug.txt
server$ cmp nfsdebug.txt /dev/zero
server$ cmp: EOF on nfsdebug.txt


--
Miika Pekkarinen <miipekk at ihme.org>


--- nfsdebug.c
/* This file should be named "nfsdebug.c".
* After excecuted, a file named nfsdebug.txt will be created.
*
* Author: 2004 Miika Pekkarinen <[email protected]>
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>

int main()
{
int fdin, fdout;
struct stat statbuf;
char *s, *d;

fdin = open("nfsdebug.c", O_RDONLY);
if (fdin < 0) {
perror("open fail");
return 1;
}

if (fstat(fdin, &statbuf) < 0) {
perror("open fail");
return 2;
}

fdout = open("nfsdebug.txt", O_RDWR | O_CREAT | O_TRUNC,
statbuf.st_mode);

if (fdout < 0)
return 3;

// If the following line is enabled, the problem won't appear!
// write(fdout, "", 1);

if(lseek(fdout, statbuf.st_size - 1, SEEK_SET) == -1) {
perror("Seek failed\n");
return 4;
}

if (write(fdout, "", 1) != 1) {
perror("Write failed\n");
return 5;
}

if ((s = (char *)mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED,
fdin, 0))
== (caddr_t) -1) {
perror("mmap source fail");
return 6;
}

if ((d = (char *)mmap(0, statbuf.st_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fdout, 0)) == (caddr_t) - 1) {
perror("mmap destination fail");
return 7;
}

memcpy(d, s, statbuf.st_size);

munmap(s, statbuf.st_size);
munmap(d, statbuf.st_size);

close(fdin);
close(fdout);

return 0;
}





-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-11-12 19:35:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: Using mmap result in data corruption

fr den 12.11.2004 Klokka 21:06 (+0200) skreiv Miika Pekkarinen:

> I have found a reproducable issue with NFS that will lead to file
> corruption on server. When a file is created with O_TRUNC and data is
> copied using mmap, the new file on the server will be filled with null bytes.

mmap() offers absolutely NO guarantees that the file will be synced to
disk on close. Use msync(MS_SYNC) if you want such a guarantee.

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 10:07:32

by Miika Pekkarinen

[permalink] [raw]
Subject: Re: Using mmap result in data corruption

On Fri, 12 Nov 2004, Trond Myklebust wrote:

> > I have found a reproducable issue with NFS that will lead to file
> > corruption on server. When a file is created with O_TRUNC and data is
> > copied using mmap, the new file on the server will be filled with null bytes.
>
> mmap() offers absolutely NO guarantees that the file will be synced to
> disk on close. Use msync(MS_SYNC) if you want such a guarantee.

Hmm, with msync (right before munmap) the file gets synced to disk
correctly. However, I am
still wondering what is wrong with munmap. I checked the manual page
of msync and it said: "msync flushes changes made to the in-core copy
of a file that was mapped into memory using mmap(2) back to disk. Without
use of this call there is no guarantee that changes are written back
before munmap(2) is called." So I understood that munmap should do the
same thing than msync but I doesn't sync the file. Probably I am missing
some point or there is a bug somewhere else (either in manual pages or in
some library)?

--
Miika Pekkarinen <miika at ihme.org>





-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 15:21:58

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Using mmap result in data corruption

On Tue, 16 Nov 2004, Miika Pekkarinen wrote:

> On Fri, 12 Nov 2004, Trond Myklebust wrote:
>
>>> I have found a reproducable issue with NFS that will lead to file
>>> corruption on server. When a file is created with O_TRUNC and data is
>>> copied using mmap, the new file on the server will be filled with null bytes.
>>
>> mmap() offers absolutely NO guarantees that the file will be synced to
>> disk on close. Use msync(MS_SYNC) if you want such a guarantee.
>
> Hmm, with msync (right before munmap) the file gets synced to disk
> correctly. However, I am
> still wondering what is wrong with munmap. I checked the manual page
> of msync and it said: "msync flushes changes made to the in-core copy
> of a file that was mapped into memory using mmap(2) back to disk. Without
> use of this call there is no guarantee that changes are written back
> before munmap(2) is called." So I understood that munmap should do the
> same thing than msync but I doesn't sync the file. Probably I am missing
> some point or there is a bug somewhere else (either in manual pages or in
> some library)?

i think the man page is badly written. it has confused me too. i think it
should probably say

"msync flushes changes made to the in-core copy of a file that was
mapped into memory using mmap(2) back to disk. Without use of this call
there is no guarantee that changes have been written WHEN munmap(2) is
called and the map is released."

i think is it saying that the spec does not state that munmap must msync the
file, but that some implementation might.

regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-11-16 16:20:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: Using mmap result in data corruption

ty den 16.11.2004 Klokka 08:21 (-0700) skreiv Ara.T.Howard:

> i think the man page is badly written. it has confused me too. i think it
> should probably say
>
> "msync flushes changes made to the in-core copy of a file that was
> mapped into memory using mmap(2) back to disk. Without use of this call
> there is no guarantee that changes have been written WHEN munmap(2) is
> called and the map is released."

Yep. That would indeed be more consistent with the actual behaviour of
munmap(). There have been no guarantees that munmap() would cause any
data to be flushed to disk either in Linux 2.4.x or with the 2.6.x
kernels.

This is BTW true of *all* filesystems, including local block based
filesystems. It is not behaviour that is specific to NFS only.

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
This SF.Net email is sponsored by: InterSystems CACHE
FREE OODBMS DOWNLOAD - A multidimensional database that combines
robust object and relational technologies, making it a perfect match
for Java, C++,COM, XML, ODBC and JDBC. http://www.intersystems.com/match8
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-08 14:42:12

by Peter W. Draper

[permalink] [raw]
Subject: Re: Using mmap result in data corruption

> > fr den 12.11.2004 Klokka 21:06 (+0200) skreiv Miika Pekkarinen:
> >
> > I have found a reproducable issue with NFS that will lead to file
> > corruption on server. When a file is created with O_TRUNC and data is
> > copied using mmap, the new file on the server will be filled with null
> > bytes.
>
> mmap() offers absolutely NO guarantees that the file will be synced to
> disk on close. Use msync(MS_SYNC) if you want such a guarantee.
>
> Cheers,
> Trond

Hi,

like Miika I have noticed that mapped files updated over NFS are now
corrupted on the server. For me this problem looks like it arrived with
2.6.9 and is still causing problems with 2.6.11 (I'm running various
Fedora Core 2/3 clients together with various Linux and non-Linux NFS
servers, all using NFS v3).

Following Trond's advice I've changed to using an msync() call before
munmap() and that fixes the problem. However, I'd really like to avoid
the potential performance impact of using an msync(MS_SYNC) by using an
msync(MS_ASYNC) call instead, as the files I access can be mmap'd many
times, plus can be quite large, but, when I switch to msync(MS_ASYNC) the
data corruption returns (I am only interested in the correctness of the
data when my applications exit).

Surely that is incorrect behaviour? Otherwise what is the point of
msync(MS_ASYNC)?

Thanks for any help/advice on this,

Peter.

--
Dr. Peter W. Draper, Starlink Programmer, http://star-www.dur.ac.uk/~pdraper



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-08 15:01:31

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: Using mmap result in data corruption

ty den 08.03.2005 Klokka 14:41 (+0000) skreiv Peter W. Draper:

> Following Trond's advice I've changed to using an msync() call before
> munmap() and that fixes the problem. However, I'd really like to avoid
> the potential performance impact of using an msync(MS_SYNC) by using an
> msync(MS_ASYNC) call instead, as the files I access can be mmap'd many
> times, plus can be quite large, but, when I switch to msync(MS_ASYNC) the
> data corruption returns (I am only interested in the correctness of the
> data when my applications exit).
>
> Surely that is incorrect behaviour? Otherwise what is the point of
> msync(MS_ASYNC)?

The Single UNIX Spec v3 says that

MS_ASYNC Perform asynchronous writes.
MS_SYNC Perform synchronous writes.

When MS_ASYNC is specified, msync() shall return immediately
once all the write operations are initiated or queued for
servicing; when MS_SYNC is specified, msync() shall not return
until all write operations are completed as defined for
synchronized I/O data integrity completion. Either MS_ASYNC or
MS_SYNC is specified, but not both.

So on Linux, msync(MS_ASYNC) just marks the pages as dirty (see the
comment in mm/msync.c). It is then up to the application to call fsync()
if it wants to flush those asynchronous writes to disk at some specific
time.

Cheers,
Trond
--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-08 16:18:11

by Peter W. Draper

[permalink] [raw]
Subject: Re: Re: Using mmap result in data corruption

On Tue, 8 Mar 2005, Trond Myklebust wrote:

> ty den 08.03.2005 Klokka 14:41 (+0000) skreiv Peter W. Draper:
>
> > Following Trond's advice I've changed to using an msync() call before
> > munmap() and that fixes the problem. However, I'd really like to avoid
> > the potential performance impact of using an msync(MS_SYNC) by using an
> > msync(MS_ASYNC) call instead, as the files I access can be mmap'd many
> > times, plus can be quite large, but, when I switch to msync(MS_ASYNC) the
> > data corruption returns (I am only interested in the correctness of the
> > data when my applications exit).
> >
> > Surely that is incorrect behaviour? Otherwise what is the point of
> > msync(MS_ASYNC)?
>
> The Single UNIX Spec v3 says that
>
> MS_ASYNC Perform asynchronous writes.
> MS_SYNC Perform synchronous writes.
>
> When MS_ASYNC is specified, msync() shall return immediately
> once all the write operations are initiated or queued for
> servicing; when MS_SYNC is specified, msync() shall not return
> until all write operations are completed as defined for
> synchronized I/O data integrity completion. Either MS_ASYNC or
> MS_SYNC is specified, but not both.
>
> So on Linux, msync(MS_ASYNC) just marks the pages as dirty (see the
> comment in mm/msync.c). It is then up to the application to call fsync()
> if it wants to flush those asynchronous writes to disk at some specific
> time.

Hi Trond,

thanks for the explanation, calling fsync() solves my problem. One fact
you probably don't want to know is that just calling fsync() works
regardless of whether I have called msync(MS_ASYNC) or not.

Cheers,

Peter.




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs