LinuxLists.cc - nfs/mmap/rename file corruption

2003-08-28 01:03:21

Subject: nfs/mmap/rename file corruption

There is a fairly easily reproducible bug in NFS in 2.4.22 that can
cause files to read back as full of nulls. I have a tcpdump that
shows what is going wrong.

Gavrie Philipson reported corruption happening when distcc and ccache
are used together with the cache on NFS.

http://lists.samba.org/pipermail/distcc/2003q3/001556.html

To reproduce the bug you need to just install ccache 2.2 and distcc
2.10.1. Set CCACHE_DIR to an empty directory on an NFS filesystem
mounted with default/rw options. Build a file with a command like
this:

ccache distcc -c ./hello.c

The first (only the first) time that you run this, the output file
(hello.o) will be the correct size, but contain only \0 bytes.

What is basically happening here is

- ccache runs distcc with output to a temporary file
- distcc opens, mmaps, writes to, munmaps, and closes the temporary
file
- distcc exits
- ccache renames the temporary file to its proper location in the
ccache
- ccache opens the file read only, and reads from it

ccache ought to see the proper contents as written by mmap, but when
the cache is on NFS it just sees \0s. It works correctly and reliably
on reiserfs and ext3. However, if you look at the file ccache was
trying to read a second later then it seems to have the right
contents.

I tried writing a standalone test case but I couldn't reproduce it,
perhaps because of some timing issue. It is quite reproducible both
on my machine and Gavrie's.

If distcc is configured to not use mmap for writing, the problem is
hidden.

A tcpdump of the problem is available here:

http://distcc.samba.org/ftp/distcc/misc/mmap-bug/nfs-20030827T1351.pcap.gz

Here are the significant bits:

frame 79

renames tmp.hash.vexed.7897.o to the final object filename,
cbfc5ca42b1a693a5bca9bb8b23c5b-17387

frame 105

also frame 107

look up a filehandle for the final object filename, and gets the
hash 0xed8222404

frame 115

reads back from the final object file, 0xed8222404

frame 116

is the reply to the read and it is full of nulls

frame 127

writes the ELF output into the temporary object file,
tmp.hash.vexed.7897.o, which has file hash 0xf27c2204.

The problem is that the NFS client tries to read from the destination
file before it has written to the temporary file! Frame 127 is far
too late.

It seems to me like there are two possible solutions: either flush out
all cached data for a file before it's renamed, or make the rename
smart enough to 'take over' any data cached under an old name. To me
the first seems more robust if a little slower.

You can see something similar going on in this NFS log:

http://distcc.samba.org/ftp/distcc/misc/mmap-bug/nfsdebug-20030827T1609.log.gz

The flush(b/49777) call comes long after the rename and the attempt to
read from the new file.

I'll try to draft a patch for this.

--
Martin

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-28 01:38:01

by Trond Myklebust

[permalink] [raw]

Subject: Re: nfs/mmap/rename file corruption

>>>>> " " == Martin Pool <[email protected]> writes:

> - ccache runs distcc with output to a temporary file
> - distcc opens, mmaps, writes to, munmaps, and closes the temporary
> file
> - distcc exits
> - ccache renames the temporary file to its proper location in the
> ccache
> - ccache opens the file read only, and reads from it

Is this a rename from one directory to the other? If so, are you using
the 'no_subtree_check' option on the server? Without the latter option
enabled, I would indeed expect the behaviour that you describe.

Cheers,
Trond

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-28 02:14:23

by Martin Pool

[permalink] [raw]

Subject: Re: nfs/mmap/rename file corruption

On 27 Aug 2003 21:37:38 -0400
Trond Myklebust <[email protected]> wrote:

Thanks for responding so quickly!

> >>>>> " " == Martin Pool <[email protected]> writes:
>
> > - ccache runs distcc with output to a temporary file
> > - distcc opens, mmaps, writes to, munmaps, and closes the
> > temporary
> > file
> > - distcc exits
> > - ccache renames the temporary file to its proper location in
> > the
> > ccache
> > - ccache opens the file read only, and reads from it
>
> Is this a rename from one directory to the other?

Yes.

> If so, are you using
> the 'no_subtree_check' option on the server?

No, I was not. It happens that the filesystem is exported at its
root.

The manpage from Debian's nfs-kernel-server 1:1.0.3-2 says

In order to perform this check, the server must include some
information about the location of the file in the "filehandle"
that is given to the client. This can cause problems with
accessing files that are renamed while a client has them open
(though in many simple cases it will still work).

In this case, the file is not still open at the time it is renamed.
It just still has some dirty pages in the client's memory.

When I set this option on the server then the renamed file gets the
same filehandle and things work properly. I'll suggest to the user
that they should set it.

> Without the latter option enabled, I would indeed expect the
> behaviour that you describe.

It seems a bit unfortunate that we can get corruption unless a special
option is set. Will it work on non-Linux nfs servers?

Wouldn't it still be possible to get the client to flush data out
before renaming it? I tried naively calling nfs_flush_file before
renaming but that didn't seem to do it.

--
Martin

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-28 14:04:52

by Trond Myklebust

[permalink] [raw]

Subject: Re: nfs/mmap/rename file corruption

>>>>> " " == Martin Pool <[email protected]> writes:

> In order to perform this check, the server must include
> some information about the location of the file in the
> "filehandle" that is given to the client. This can cause
> problems with accessing files that are renamed while a
> client has them open (though in many simple cases it will
> still work).

> In this case, the file is not still open at the time it is
> renamed. It just still has some dirty pages in the client's
> memory.

That is the same as being 'open' for the purposes of the above
paragraph.

>> Without the latter option enabled, I would indeed expect the
>> behaviour that you describe.

> It seems a bit unfortunate that we can get corruption unless a
> special option is set. Will it work on non-Linux nfs servers?

Yes. 'subtree checking' is a Linux-only concept. Most servers just
open their files by inode number and leave it at that.

> Wouldn't it still be possible to get the client to flush data
> out before renaming it? I tried naively calling nfs_flush_file
> before renaming but that didn't seem to do it.

mmap() is 'special': there are all sorts of silly races possible, and
many of them appear to lie deep in the mm layer. If you want to
trigger some really nasty ones, try playing with mmap() +
truncate()....

In principle, you could get what you want by calling the combination

filemap_fdatasync(inode->i_mapping);
nfs_wb_all(inode);
filemap_fdatawait(inode->i_mapping);

like we do in the file locking code. However that too appears to be
race prone due to races with the swap code.

In any case, doctoring the client in order to get around a bug in the
server is not usually my first choice...

Cheers,
Trond

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-29 00:10:06

by Bernd Schubert

[permalink] [raw]

Subject: no_subtree_check questions

Hello,

as I understand it, the sub_tree_check is only neccessary if the filesystem is
not exported at its root. Wouldn't it make sense to let the nfs-utils
automatically recognise that and then automatically set the option
no_subtree_check? I mean this could be implemented easily...

Also, I would like to know what happens if one sets the
no_subtree_check-option and re-exports the /etc/exportfs on the server or
restarts the nfsd, but the clients still have mounted the directory?
I'm a bit scared since the man-page says that the filehandle is modified for
this check.
Though we never had problems like Martin (at least we didn't notice), I don't
think this check makes sense in our environment.

Thanks,
Bernd

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-29 05:13:53

by Martin Pool

[permalink] [raw]

Subject: Re: no_subtree_check questions

On 29 Aug 2003 Bernd Schubert <[email protected]> wrote:

> Hello,
>
> as I understand it, the sub_tree_check is only neccessary if the
> filesystem is not exported at its root. Wouldn't it make sense to let
> the nfs-utils automatically recognise that and then automatically set
> the option no_subtree_check? I mean this could be implemented
> easily...

Of course that would just make the behaviour incosistent and hard to
debug. (I might not have been able to work out what was wrong in the
original bug report, for example.) I'm not really a fan of fixes that
just fix some circumstances.

> Also, I would like to know what happens if one sets the
> no_subtree_check-option and re-exports the /etc/exportfs on the server
> or restarts the nfsd, but the clients still have mounted the
> directory? I'm a bit scared since the man-page says that the
> filehandle is modified for this check.

I think the clients need to remount the filesystem, because all the
filehandles become invalid. Even the root of the filesystem can't be
found. Perhaps I just did something wrong though.

--
Martin

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs