2007-11-22 00:52:14

by Timo Sirainen

[permalink] [raw]
Subject: Re: [NFS] Cache flushing

On Wed, 2007-11-21 at 16:15 -0500, Trond Myklebust wrote:
> On Wed, 2007-11-21 at 22:56 +0200, Timo Sirainen wrote:
> > On Wed, 2007-11-21 at 15:39 -0500, Trond Myklebust wrote:
> > > I thought you said this was for flushing the attribute cache?
> >
> > That was it originally, but this one's more important. :) Also I
> > originally thought that directory's attribute cache flushing also
> > flushed its file handle cache. I'm not actually sure what file handle
> > cache should even really be called. It looks like Linux handles it with
> > page cache the same way as for files?
> >
> > So I guess I'm asking for chown(-1,-1) call to invalidate the file's
> > attribute and page cache. Although it's interesting why chown(0,-1)
> > flushes a directory's file handle cache (==page cache?), but the same
> > call for a file doesn't flush its page cache. Maybe I'm still
> > misunderstanding something how all of this works.
>
> No. I'm not at all comfortable with the idea of overloading chown() to
> invalidate data caches.
>
> It is one thing to have it revalidate the attribute cache (you might be
> able to argue that this is consistent with the mission of chown(),
> although I'm not yet convinced of that) but the only effect that chown()
> has on data caching is that we force a sync to disk in order to avoid
> trouble with cached writes that are no longer accepted by the server. It
> has never forced a cache invalidation.

I finally set up my own NFS test setup and figured out how this really
works. As long as nfs_setattr() has something to do, it calls
nfs_end_data_update() at the end, which in turn unconditionally sets
"nfsi->cache_change_attribute = jiffies", which causes the file handle
cache to be flushed on next access.

So the only special case here is if nfs_setattr() has nothing to do.
Just moving that check a bit later would be enough:

--- inode.c.old 2007-11-16 22:18:46.000000000 +0200
+++ inode.c 2007-11-22 02:50:04.000000000 +0200
@@ -323,7 +323,7 @@
{
struct inode *inode = dentry->d_inode;
struct nfs_fattr fattr;
- int error;
+ int error = 0;

nfs_inc_stats(inode, NFSIOS_VFSSETATTR);

@@ -332,11 +332,6 @@
attr->ia_valid &= ~ATTR_SIZE;
}

- /* Optimization: if the end result is no change, don't RPC */
- attr->ia_valid &= NFS_VALID_ATTRS;
- if (attr->ia_valid == 0)
- return 0;
-
lock_kernel();
nfs_begin_data_update(inode);
/* Write all dirty data */
@@ -349,9 +344,14 @@
*/
if ((attr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
nfs_inode_return_delegation(inode);
- error = NFS_PROTO(inode)->setattr(dentry, &fattr, attr);
- if (error == 0)
- nfs_refresh_inode(inode, &fattr);
+
+ /* Optimization: if the end result is no change, don't RPC */
+ attr->ia_valid &= NFS_VALID_ATTRS;
+ if (attr->ia_valid != 0) {
+ error = NFS_PROTO(inode)->setattr(dentry, &fattr, attr);
+ if (error == 0)
+ nfs_refresh_inode(inode, &fattr);
+ }
nfs_end_data_update(inode);
unlock_kernel();
return error;


> IMO, the right interface for manipulating the page cache is rather
> posix_fadvise(). I can certainly see an argument for adding a
> "POSIX_FADV_UNCACHE" option in order to force a cache invalidation for
> those applications that need stronger cache consistency.

Yes, that would be nice. Perhaps I'll try to implement it some day.

> Note, however, that even if such a thing is accepted into the mainstream
> kernel, you still have to convince all the distros to backport this
> interface to their older kernels. If not, you will still have a problem
> with legacy clients...

I don't mind as long as I can say a newer kernel will solve the problem.
It's still a lot easier to upgrade the kernel than requiring to change
the whole NFS server/filesystem.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part
(No filename) (228.00 B)
(No filename) (362.00 B)
Download all attachments