2002-03-19 18:18:29

by Tavis Barr

[permalink] [raw]
Subject: [Fwd: Re: NFS-HOWTO]


Andrew Ryan had a good question for me below that I don't know the
answer to. When a file gets modified and left the same size twice
within one second, its mtime stays the same and all other attributes
stay the same, so the NFS server does not see that it has been altered.
At least this is my understanding of the bug. Some things that I don't
know because I'm not familiar enough with the NFS internals:

*What data structure reflects that the file has been altered? Is it the
inode number, or some field within the inode?
*This was supposedly a 2.5 fix item; the issue is that mtime does not
have a granularity finer than one second. What subsystem does the fix
go into? The VFS layer? Has there been any work done on it?


Thanks,
Tavis


Attachments:
(No filename) (2.25 kB)
Forwarded message - Re: [NFS] NFS-HOWTO

2002-03-19 18:47:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: [Fwd: Re: NFS-HOWTO]

>>>>> " " == Tavis Barr <[email protected]> writes:

> Andrew Ryan had a good question for me below that I don't know
> the answer to. When a file gets modified and left the same
> size twice within one second, its mtime stays the same and all
> other attributes stay the same, so the NFS server does not see
> that it has been altered. At least this is my understanding of
> the bug. Some things that I don't know because I'm not
> familiar enough with the NFS internals:

> *What data structure reflects that the file has been altered?

inode->i_size + inode->i_mtime ;-)

NFSv4 has support for a new 64-bit opaque value that can be used to
tell if the file has changed (that doesn't have to be i_mtime).
For NFSv2/v3 though, file size and mtime are all we have available to
tell whether or not the file has changed.

> Is it the inode number, or some field within the inode? *This
> was supposedly a 2.5 fix item; the issue is that mtime does not
> have a granularity finer than one second. What subsystem does
> the fix go into? The VFS layer? Has there been any work done
> on it?

Neil was talking about fixing this in 2.5.x (it is after all a server
issue). The problem is that several filesystems (i.e. most notably
ext2/ext3) don't have the space in their on-disk inodes for <1s time
resolution.
There are some ideas floating around on how to get around this, but I
do not believe that concensus has yet been achieved...

Cheers,
Trond

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-03-19 19:21:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: [Fwd: Re: NFS-HOWTO]

>>>>> " " == Trond Myklebust <[email protected]> writes:

>> Is it the inode number, or some field within the inode? *This
>> was supposedly a 2.5 fix item; the issue is that mtime does not
>> have a granularity finer than one second. What subsystem does
>> the fix go into? The VFS layer? Has there been any work done
>> on it?

> Neil was talking about fixing this in 2.5.x (it is after all a
> server issue). The problem is that several filesystems
> (i.e. most notably ext2/ext3) don't have the space in their
> on-disk inodes for <1s time resolution. There are some ideas
> floating around on how to get around this, but I do not believe
> that concensus has yet been achieved...

Perhaps I should expand a little on this. The changes are twofold:

- Change the VFS structures to support 64-bit (a|c|m)time values.
This is not really a big deal, and nothing is stopping us from
doing it today...

- Changes to the individual filesystems so that they can save and
retrieve the extra 96 bits (== 32 bits * (mtime + atime + ctime))
as part of the on-disk metadata.
This is non-trivial, since a lot of these filesystems have not
got much padding left in their inodes (particularly once acls
etc. have grabbed their share of real-estate). Even finding 32
free bits is a real problem for ext[23]...

One solution might be to only keep the full 64-bit data in the VFS
inode cache, and to zero the low 32-bits whenever we have to reload
the metadata from the disk.
That means that each time the file falls out of cache, then the mtime
would appear to change on the client (which might then proceed to
invalidate its data cache). Not entirely satisfactory, but probably
better than nothing...

Cheers,
Trond

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-03-19 23:07:11

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: [Fwd: Re: NFS-HOWTO]

On Tue, Mar 19, 2002 at 08:20:32PM +0100, Trond Myklebust wrote:
> - Changes to the individual filesystems so that they can save and
> retrieve the extra 96 bits (=3D=3D 32 bits * (mtime + atime + ctim=
e))
> as part of the on-disk metadata.
> This is non-trivial, since a lot of these filesystems have not
> got much padding left in their inodes (particularly once acls
> etc. have grabbed their share of real-estate). Even finding 32
> free bits is a real problem for ext[23]...

I think Ted was talking about doing a disk-format change for ext[23]
soon (to improve resizing support and to extend fields like timestamps
and link-counters).

For reiserfs adding more data in the on-disk metadata is perhaps less of
a problem than for other filesystems, because reiserfs can handle
multiple inode-types on the same filesystem. (Of course old kernels
would not work with the new format, so it's still not trivial).

I haven't checked xfs, jfs or any of the other filesystems.

> One solution might be to only keep the full 64-bit data in the VFS
> inode cache, and to zero the low 32-bits whenever we have to reload
> the metadata from the disk.
> That means that each time the file falls out of cache, then the mtime
> would appear to change on the client (which might then proceed to
> invalidate its data cache). Not entirely satisfactory, but probably
> better than nothing...

Filesystems that _do_ have support for 64-bit data on-disk could still
take advantage, right?=20


--=20
Ragnar Kj=F8rstad
Big Storage

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs