2008-09-11 17:50:07

by Chuck Lever III

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

On Sep 11, 2008, at Sep 11, 2008, 1:19 PM, Aaron Straus wrote:
> Hi,
>
> On Sep 11 12:55 PM, Chuck Lever wrote:
>> A more thorough review of the NFS write and flush logic that exists
>> in
>> 2.6.27 is needed if we choose to recognize this issue as a real
>> problem.
>
> Yep.
>
> Sorry. I didn't mean we should revert the hunk. I was just trying to
> help identify the cause of the new behavior.
>
> I think this is a real problem albeit not a "serious" one. Network
> file-systems usually try to avoid readers seeing blocks of zeros in
> files, especially in this simple writer/reader case.
>
> It wouldn't be bad if the file is written out of order occasionally,
> but
> we see this constantly now.
>
> We cannot write/read log files to NFS mounts reliably any more. That
> seems like a valid use case which no longer works?

Were you able to modify your writer to do real fsync system calls? If
so, did it help? That would be a useful data point.

NFS uses close-to-open cache coherency. Thus we expect the file to be
consistent and up to date after it is opened by a client, but not
necessarily if some other client writes to it after it was opened. We
usually recommend file locking and direct I/O to minimize these
problems.

Practically speaking this is often not enough for typical
applications, so NFS client implementations go to further (non-
standard) efforts to behave like a local file system. This is simply
a question of whether we can address this while not creating
performance or correctness issues for other common use cases.

Anyway, I'm not the NFS client maintainer, so this decision is not up
to me.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com


2008-09-11 18:49:53

by Aaron Straus

[permalink] [raw]
Subject: Re: [NFS] blocks of zeros (NULLs) in NFS files in kernels >= 2.6.20

Hi,

On Sep 11 01:48 PM, Chuck Lever wrote:
> Were you able to modify your writer to do real fsync system calls? If
> so, did it help? That would be a useful data point.

Yes/Yes. Please see the attached tarball sync-test.tar.bz2.

Inside you'll find the modified writer: writer_sync.py

That will call fsync on the file descriptor after each write.

Also I added the strace and wireshark data for the sync and nosync
cases.

Note these are all tested with latest linus git:

d1c6d2e547148c5aa0c0a4ff6aac82f7c6da1d8b

> Practically speaking this is often not enough for typical
> applications, so NFS client implementations go to further (non-
> standard) efforts to behave like a local file system. This is simply
> a question of whether we can address this while not creating
> performance or correctness issues for other common use cases.

Yep, I agree. I'm not saying what we do now is "wrong" per the RFC
(writing the file out of order). It's just different from what we've
done in the past (and somewhat unexpected).

I'm still hoping there is a simple fix... but maybe not... :(

Thanks!

=a=

--
===================
Aaron Straus
aaron-bYFJunmd+ZV8UrSeD/[email protected]


Attachments:
(No filename) (1.18 kB)
sync-test.tar.bz2 (14.84 kB)
Download all attachments