Trond,
Paraphrased from one of my inhouse customers: "The timestamp of an
NFS-mounted file does not change when written to, when the below test is
run on a 2.6.6-rc1 to 2.6.7-rc2 kernel. The timestamp is appropriately
updated when the test is run on a 2.6.5 kernel. This is with NFSv3.
The type of system serving up the files does not seem to be a factor."
I was not able to narrow the problem/featureset change down to a cset.
Attached is the test program my customer used.
Regards,
Joe
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <errno.h>
#define PATH "./my_file"
int
main()
{
int status;
int count = 0;
int fd;
struct stat wuz;
struct stat iz;
(void) unlink(PATH);
fd = open (PATH, O_RDWR+O_CREAT+O_SYNC, 0666);
if (fd < 0) {
fprintf (stderr, "open(%s) fails, errno = %d\n", PATH, errno);
return 1;
}
status = fstat (fd, &wuz);
if (status) {
fprintf (stderr, "fstat(%s) fails, errno = %d\n", PATH, errno);
return 1;
}
for (;;) {
status = write (fd, &status, sizeof(status));
if (status != sizeof(status)) {
fprintf (stderr, "write(%s) fails, errno = %d\n", PATH, errno);
return 1;
}
status = fstat (fd, &iz);
if (status) {
fprintf (stderr, "fstat(%s) fails, errno = %d\n", PATH, errno);
return 1;
}
if (iz.st_mtime != wuz.st_mtime) break;
count++;
if (count % 1000 == 0) {
printf ("count = %d\n", count);
}
}
printf ("File modification time changed after %d iterations\n", count);
}
P? to , 03/06/2004 klokka 13:28, skreiv Joe Korty:
> Trond,
> Paraphrased from one of my inhouse customers: "The timestamp of an
> NFS-mounted file does not change when written to, when the below test is
> run on a 2.6.6-rc1 to 2.6.7-rc2 kernel. The timestamp is appropriately
> updated when the test is run on a 2.6.5 kernel. This is with NFSv3.
> The type of system serving up the files does not seem to be a factor."
NFS is only guaranteed to flush the file to disk when you do the
close(). Your program will just result in a lot of cached writes right
up until the moment it exits...
...and no - we do not update timestamps on the client side when we cache
the write, 'cos NFS does not provide any device for ensuring that clocks
on client and server are synchronized.
Cheers,
Trond
On Thu, Jun 03, 2004 at 05:11:52PM -0400, Trond Myklebust wrote:
> P? to , 03/06/2004 klokka 13:28, skreiv Joe Korty:
> > Paraphrased from one of my inhouse customers: "The timestamp of an
> > NFS-mounted file does not change when written to, when the below test is
> > run on a 2.6.6-rc1 to 2.6.7-rc2 kernel. The timestamp is appropriately
> > updated when the test is run on a 2.6.5 kernel. This is with NFSv3.
> > The type of system serving up the files does not seem to be a factor."
>
> NFS is only guaranteed to flush the file to disk when you do the
> close(). Your program will just result in a lot of cached writes right
> up until the moment it exits...
>
> ...and no - we do not update timestamps on the client side when we cache
> the write, 'cos NFS does not provide any device for ensuring that clocks
> on client and server are synchronized.
Hi Trond,
Thanks for the explanation. What did 2.6.5 do differently that made it
appear to work?
Regards,
Joe
P? fr , 04/06/2004 klokka 09:23, skreiv Joe Korty:
> Hi Trond,
> Thanks for the explanation. What did 2.6.5 do differently that made it
> appear to work?
Nothing in the NFS client...
The only difference might be if the VM decided to flush writes out
earlier in order to reclaim memory.
Cheers,
Trond
On Fri, 04 Jun 2004 19:08:15 -0400
Trond Myklebust <[email protected]> wrote:
> P? fr , 04/06/2004 klokka 09:23, skreiv Joe Korty:
>
> > Hi Trond,
> > Thanks for the explanation. What did 2.6.5 do differently that made it
> > appear to work?
>
> Nothing in the NFS client...
>
> The only difference might be if the VM decided to flush writes out
> earlier in order to reclaim memory.
What about fsync()?
--
Stephen Hemminger mailto:[email protected]
Open Source Development Lab http://developer.osdl.org/shemminger
P? fr , 04/06/2004 klokka 19:24, skreiv Stephen Hemminger:
> What about fsync()?
There were no changes to fsync() between 2.6.5 and 2.6.6. In any case,
Joe's test program did not use fsync() (that was very much the problem).
Any use of either close() or fsync() on the client will cause the
mtime/ctime to be updated on the server, and that change should
immediately propagate back to the client.
Cheers,
Trond
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi there,
On Thursday 03 June 2004 23:11, Trond Myklebust wrote:
> ...and no - we do not update timestamps on the client side when we cache
> the write, 'cos NFS does not provide any device for ensuring that clocks
> on client and server are synchronized.
Could you make this an option? The device ensuring this is the an admin
with a clue, who configures NTP or similiar in his network.
If unsure you could at least disable it by default.
Regards
Ingo Oeser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFAv+uyU56oYWuOrkARAvbCAJ0cG2HI4ScMAR8R8Iie5NN1FerGoQCdExHm
RF4Y/hZoKf4DuTr1w9lLPKY=
=EmI4
-----END PGP SIGNATURE-----
P? to , 03/06/2004 klokka 23:25, skreiv Ingo Oeser:
> On Thursday 03 June 2004 23:11, Trond Myklebust wrote:
> > ...and no - we do not update timestamps on the client side when we cache
> > the write, 'cos NFS does not provide any device for ensuring that clocks
> > on client and server are synchronized.
>
> Could you make this an option? The device ensuring this is the an admin
> with a clue, who configures NTP or similiar in his network.
>
> If unsure you could at least disable it by default.
Why? It still won't be set to the same value as on the server.
Cheers,
Trond
On Thu, Jun 03, 2004 at 05:11:52PM -0400, Trond Myklebust wrote:
> P? to , 03/06/2004 klokka 13:28, skreiv Joe Korty:
>> Paraphrased from one of my inhouse customers: "The timestamp of an
>> NFS-mounted file does not change when written to, when the below test is
>> run on a 2.6.6-rc1 to 2.6.7-rc2 kernel. The timestamp is appropriately
>> updated when the test is run on a 2.6.5 kernel. This is with NFSv3.
>> The type of system serving up the files does not seem to be a factor."
>
> NFS is only guaranteed to flush the file to disk when you do the
> close(). Your program will just result in a lot of cached writes right
> up until the moment it exits...
>
> ...and no - we do not update timestamps on the client side when we cache
> the write, 'cos NFS does not provide any device for ensuring that clocks
> on client and server are synchronized.
Hi Trond,
For those interested, this patch reverts NFS to the old behavior of
a timestamp-for-each-write.
I see no harm in it. After all, timestamps have to be updated on file
creation and close, which are also initiated from the client, just as
writes are. So allowing timestamp update on create/close but not writes
does not make much sense to me.
Unless the real reason is reducing ethernet traffic. In which case we
could defer a timestamp-on-write only when it is still in the same second
as the previous write, but don't defer when a new second rolls around
on the client. That would reduce timestamp updates to at most one per
second per inode per client, while preserving old NFS behavior.
Regards,
Joe
--- base/fs/nfs/write.c 2004-06-07 10:25:33.861224586 -0400
+++ new/fs/nfs/write.c 2004-06-07 11:06:22.044853102 -0400
@@ -417,7 +417,7 @@
nfsi->npages--;
if (!nfsi->npages) {
spin_unlock(&nfs_wreq_lock);
- nfs_end_data_update_defer(inode);
+ nfs_end_data_update(inode);
iput(inode);
} else
spin_unlock(&nfs_wreq_lock);
P? m? , 07/06/2004 klokka 11:21, skreiv Joe Korty:
> Unless the real reason is reducing ethernet traffic.
That is after all, why we cache data. Look at the GETATTR traffic using
nfsstat.
> In which case we
> could defer a timestamp-on-write only when it is still in the same second
> as the previous write, but don't defer when a new second rolls around
> on the client. That would reduce timestamp updates to at most one per
> second per inode per client, while preserving old NFS behavior.
Exactly why should we go to all this trouble?
Cheers,
Trond
On Mon, Jun 07, 2004 at 11:51:49AM -0400, Trond Myklebust wrote:
> P? m? , 07/06/2004 klokka 11:21, skreiv Joe Korty:
> > Unless the real reason is reducing ethernet traffic.
>
> That is after all, why we cache data. Look at the GETATTR traffic using
> nfsstat.
>
> > In which case we
> > could defer a timestamp-on-write only when it is still in the same second
> > as the previous write, but don't defer when a new second rolls around
> > on the client. That would reduce timestamp updates to at most one per
> > second per inode per client, while preserving old NFS behavior.
>
> Exactly why should we go to all this trouble?
For compatibility?
P? m? , 07/06/2004 klokka 12:13, skreiv Joe Korty:
> >
> > > In which case we
> > > could defer a timestamp-on-write only when it is still in the same second
> > > as the previous write, but don't defer when a new second rolls around
> > > on the client. That would reduce timestamp updates to at most one per
> > > second per inode per client, while preserving old NFS behavior.
> >
> > Exactly why should we go to all this trouble?
>
> For compatibility?
With what? There has never been a standard other than the close-to-open.
Cheers,
Trond
On Mon, Jun 07, 2004 at 12:20:11PM -0400, Trond Myklebust wrote:
> P? m? , 07/06/2004 klokka 12:13, skreiv Joe Korty:
> > >
> > > > In which case we
> > > > could defer a timestamp-on-write only when it is still in the same second
> > > > as the previous write, but don't defer when a new second rolls around
> > > > on the client. That would reduce timestamp updates to at most one per
> > > > second per inode per client, while preserving old NFS behavior.
> > >
> > > Exactly why should we go to all this trouble?
> >
> > For compatibility?
>
> With what? There has never been a standard other than the close-to-open.
Compatibility with existing behavior. It's called a de-facto standard.
P? m? , 07/06/2004 klokka 12:39, skreiv Joe Korty:
> >
> > With what? There has never been a standard other than the close-to-open.
>
> Compatibility with existing behavior. It's called a de-facto standard.
The "de-facto standard" you describe has never existed other than for
large files. It was never true of small files that did not trigger
immediate writeout.
P? m? , 07/06/2004 klokka 12:53, skreiv Trond Myklebust:
> >
> > Compatibility with existing behavior. It's called a de-facto standard.
>
> The "de-facto standard" you describe has never existed other than for
> large files. It was never true of small files that did not trigger
> immediate writeout.
...in fact even for files that trigger immediate writeout, the behaviour
was erratic, since writes could still be cached after the
memory-triggered flush was completed.
So I repeat: There has *never* been a standard other than the
close-to-open.
There has *never* existed any reliable mtime/ctime while the client was
caching writes.
If you want that sort of behaviour, the options are O_SYNC, fsync(),
close(), or "mount -osync". There is no call for it in async writes.