2004-06-02 06:55:31

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client behavior on close

P? m? , 31/05/2004 klokka 14:38, skreiv Simon Kirby:

> Is the NFS client required to write all data on close?

Yes. That is the basis of the NFSv2/v3 caching model...

Cheers,
Trond


2004-06-02 15:44:21

by Simon Kirby

[permalink] [raw]
Subject: Re: NFS client behavior on close

Hi Trond,

On Tue, Jun 01, 2004 at 11:55:27PM -0700, Trond Myklebust wrote:

> P? m? , 31/05/2004 klokka 14:38, skreiv Simon Kirby:
>
> > Is the NFS client required to write all data on close?
>
> Yes. That is the basis of the NFSv2/v3 caching model...

In that case, is there any reason why we would ever want to wait
before sending data to the server, except for a minimal time to allow
merging into wsize blocks? With no delay, avoiding the write to disk
for temporary files can still happen on the server side (async).
Mass file writes from a single thread should be faster if the client
write buffering is minimized.

Perhaps there is no way to easily separate the NFS client case from
the normal page cache behavior?

Simon-

2004-06-02 16:38:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client behavior on close

P? on , 02/06/2004 klokka 08:41, skreiv Simon Kirby:

> In that case, is there any reason why we would ever want to wait
> before sending data to the server, except for a minimal time to allow
> merging into wsize blocks? With no delay, avoiding the write to disk
> for temporary files can still happen on the server side (async).

NO! async is a stupidity that was introduced in order to get round the
fact that NFSv2 had no server-side equivalent of the "fsync()" command.
Async breaks O_SYNC writes, fsync(), sync(), ... Most importantly, it
removes all the normal guarantees that clients can recover safely if the
server reboots or crashes.

<RANT>I find it hard to understand how people, who would normally scream
if you told them that "fsync()" on their desktop PC was broken and
didn't actually flush data to disk, can find it quite acceptable as long
as it's "only" their central storage units that are broken in the same
way.</RANT>

In any case, the performance benefit of using "async" should be very
small these days.

> Mass file writes from a single thread should be faster if the client
> write buffering is minimized.

Not necessarily. Consider the case of a random workload in which you
touch the same page more than once. Why then flush those same pages out
to disk more than once?

Cheers,
Trond

2004-06-02 19:16:08

by Simon Kirby

[permalink] [raw]
Subject: Re: NFS client behavior on close

On Wed, Jun 02, 2004 at 09:38:28AM -0700, Trond Myklebust wrote:

> P? on , 02/06/2004 klokka 08:41, skreiv Simon Kirby:
>
> > In that case, is there any reason why we would ever want to wait
> > before sending data to the server, except for a minimal time to allow
> > merging into wsize blocks? With no delay, avoiding the write to disk
> > for temporary files can still happen on the server side (async).
>
> NO! async is a stupidity that was introduced in order to get round the
> fact that NFSv2 had no server-side equivalent of the "fsync()" command.
> Async breaks O_SYNC writes, fsync(), sync(), ... Most importantly, it
> removes all the normal guarantees that clients can recover safely if the
> server reboots or crashes.

Ok, that makes sense -- if NFSv2 has no fsync(), then using "async" mode
definitely sounds broken. But is this the same with NFSv3?

> <RANT>I find it hard to understand how people, who would normally scream
> if you told them that "fsync()" on their desktop PC was broken and
> didn't actually flush data to disk, can find it quite acceptable as long
> as it's "only" their central storage units that are broken in the same
> way.</RANT>

I'm of the (probably small) school of thought where I'd rather have my
data disappear than have to wait for all of the stupid uses of sync() and
fsync() in applications everywhere these days. In fact, I've even
considered writing an SMTP gateway which attempts delivery to the remote
host between the end-of-message marker and the response in order to avoid
having to fsync() to a queue (and still RFC compliant :) ).

Instead, I think applications should be woken up so that they can exit or
reply "OK" once the dirty data has been flushed, overwritten, or toasted
rather than the application requesting it and blocking). The same sort
of idea, but the other way around. Maybe fsync() could just change more
to a "I'd like to participate in the next round of writes" kind of call.

> Not necessarily. Consider the case of a random workload in which you
> touch the same page more than once. Why then flush those same pages out
> to disk more than once?

Well, if the client sends immediately _and_ the server writes it
instantly to disk, then, yes, that would not be optimal.

NFS should just extend fsync() back to the server -- with minimal caching
on the client, normal write-back caching on the server, and where fsync()
on the client forces the server to write before returning on the client.
Forcing this to happen on close() doesn't even line up with local file
systems.

Simon-

2004-06-02 19:45:25

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client behavior on close

P? on , 02/06/2004 klokka 12:16, skreiv Simon Kirby:

> Ok, that makes sense -- if NFSv2 has no fsync(), then using "async" mode
> definitely sounds broken. But is this the same with NFSv3?

The problem is that Linux's "async" implementation short-circuits the
NFSv3 fsync() equivalent. Not good!


> NFS should just extend fsync() back to the server -- with minimal caching
> on the client, normal write-back caching on the server, and where fsync()
> on the client forces the server to write before returning on the client.
> Forcing this to happen on close() doesn't even line up with local file
> systems.

That still leaves room for races with other clients trying to open the
file after the server comes up after a crash, then finding stale data.
(Free|Net|Open)BSD choose to ignore that race, and do the above. I'm not
aware of anybody else doing so, though...

Performance is good, but it should always take second place to data
integrity. There are more than enough people out there who are
entrusting research projects, banking data,... to their NFS server.

Cheers,
Trond