2005-04-05 05:40:59

by NeilBrown

[permalink] [raw]
Subject: Re: Re: corruption over NFS with 2.6 client, locking, truncating and appending...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday April 5, [email protected] wrote:
> > I poked around in the related code and found that nfs_update_inode,
> > which is called by nfs_revalidate_inode, won't accept a decrease in
> > file size if it thinks the file is 'unstable'. So I put in a printk
> > to tell me when this happened:
> >
> > if (S_ISREG(inode->i_mode) && data_unstable) {
> > if (new_isize > cur_isize) {
> > inode->i_size = new_isize;
> > invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
> > }
> > else printk("may have missed trunc %ld->%ld\n",
> > (long)cur_isize, (long)new_isize); /* neilb */
> > } else {
> > inode->i_size = new_isize;
> > invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
> > }
> >
> > and got some output:
> >
> > Apr 5 13:40:41 cage kernel: may have missed trunc 2050->2030
> > Apr 5 13:40:41 cage kernel: may have missed trunc 2060->2020
> > Apr 5 13:40:41 cage kernel: may have missed trunc 2090->2070
> > Apr 5 13:40:41 cage kernel: may have missed trunc 2130->2110
>
> That is indeed a problem. The current code basically assumes
> close-to-open semantics, and that of course breaks down in the case of
> locking.
> OK. I'm going to have to give that some thought...

The thing is that it seems to break down even without locking, though
I don't fully understand why yet.
When I use .lock file locking I get the same problem.
However I have found that if I open the file for read, and then close
it, immediately before opening for write+append, then the problem goes
away. This is for the .lock file locking, and the .lock file is held
over both opens.

However, let's fix one thing at a time. I look forward to the result
of your thoughts.

>
> BTW: The 2.4 code should be very different in this respect: it has no
> equivalent to the "unstable" test. Are you seeing the same problem
> there?

When using lockf locking, the only problem I see can be explained by the
llseek problem you sent a patch for (I haven't tested it in 2.4 yet).
With .lock file locking I see a similar problem in 2.4, but that could
be because 2.4 doesn't support full cto semantics - I'm not sure.

I have a question though.
If I understand cto properly, then if I do
open(O_READ)
read
close
open(O_READ)
read
close

then the client should atleast refresh the attributes somewhere
between the first close and the second read. tcpdump shows that it
does do this, however I cannot find in the code where either the close
or the open invalidates the cached attributes.
Could you give me a pointer?

Thanks
NeilBrown
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFCUiTjG5fc6gV+Wb0RAkoXAKC+lTD+VmFDQCpHM7hbKy3cOb3FwQCgtNYO
wWNl7iobnxLV9xXXmqY/Osc=
=YmF9
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-04-05 20:24:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: corruption over NFS with 2.6 client, locking, truncating and appending...

NFS: Fix the file size revalidation

Instead of looking at whether or not the file is open for writes before
we accept to update the length using the server value, we should rather
be looking at whether or not we are currently caching any writes.

Failure to do so means in particular that we're not updating the file
length correctly after obtaining a POSIX or BSD lock.

Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/direct.c | 2 -
fs/nfs/inode.c | 69 ++++++++++++-------------------------------------
fs/nfs/write.c | 4 +-
include/linux/nfs_fs.h | 1
4 files changed, 21 insertions(+), 55 deletions(-)

Index: linux-2.6.12-rc1/fs/nfs/inode.c
===================================================================
--- linux-2.6.12-rc1.orig/fs/nfs/inode.c
+++ linux-2.6.12-rc1/fs/nfs/inode.c
@@ -1148,27 +1148,6 @@ void nfs_end_data_update(struct inode *i
}

/**
- * nfs_end_data_update_defer
- * @inode - pointer to inode
- * Declare end of the operations that will update file data
- * This will defer marking the inode as needing revalidation
- * unless there are no other pending updates.
- */
-void nfs_end_data_update_defer(struct inode *inode)
-{
- struct nfs_inode *nfsi = NFS_I(inode);
-
- if (atomic_dec_and_test(&nfsi->data_updates)) {
- /* Mark the attribute cache for revalidation */
- nfsi->flags |= NFS_INO_INVALID_ATTR;
- /* Directories and symlinks: invalidate page cache too */
- if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
- nfsi->flags |= NFS_INO_INVALID_DATA;
- nfsi->cache_change_attribute ++;
- }
-}
-
-/**
* nfs_refresh_inode - verify consistency of the inode attribute cache
* @inode - pointer to inode
* @fattr - updated attributes
@@ -1222,8 +1201,8 @@ int nfs_refresh_inode(struct inode *inod
if (!timespec_equal(&inode->i_mtime, &fattr->mtime)
|| cur_size != new_isize)
nfsi->flags |= NFS_INO_INVALID_ATTR;
- } else if (S_ISREG(inode->i_mode) && new_isize > cur_size)
- nfsi->flags |= NFS_INO_INVALID_ATTR;
+ } else if (new_isize != cur_size && nfsi->npages == 0)
+ nfsi->flags |= NFS_INO_INVALID_ATTR;

/* Have any file permissions changed? */
if ((inode->i_mode & S_IALLUGO) != (fattr->mode & S_IALLUGO)
@@ -1257,10 +1236,8 @@ int nfs_refresh_inode(struct inode *inod
static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr, unsigned long verifier)
{
struct nfs_inode *nfsi = NFS_I(inode);
- __u64 new_size;
- loff_t new_isize;
+ loff_t cur_isize, new_isize;
unsigned int invalid = 0;
- loff_t cur_isize;
int data_unstable;

dfprintk(VFS, "NFS: %s(%s/%ld ct=%d info=0x%x)\n",
@@ -1293,49 +1270,39 @@ static int nfs_update_inode(struct inode
/* Are we racing with known updates of the metadata on the server? */
data_unstable = ! nfs_verify_change_attribute(inode, verifier);

- /* Check if the file size agrees */
- new_size = fattr->size;
+ /* Check if our cached file size is stale */
new_isize = nfs_size_to_loff_t(fattr->size);
cur_isize = i_size_read(inode);
- if (cur_isize != new_size) {
-#ifdef NFS_DEBUG_VERBOSE
- printk(KERN_DEBUG "NFS: isize change on %s/%ld\n", inode->i_sb->s_id, inode->i_ino);
-#endif
- /*
- * If we have pending writebacks, things can get
- * messy.
- */
- if (S_ISREG(inode->i_mode) && data_unstable) {
- if (new_isize > cur_isize) {
+ if (new_isize != cur_isize) {
+ /* Do we perhaps have any outstanding writes? */
+ if (nfsi->npages == 0) {
+ /* No, but did we race with nfs_end_data_update()? */
+ if (verifier == nfsi->cache_change_attribute) {
inode->i_size = new_isize;
- invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
+ invalid |= NFS_INO_INVALID_DATA;
}
- } else {
+ invalid |= NFS_INO_INVALID_ATTR;
+ } else if (new_isize > cur_isize) {
inode->i_size = new_isize;
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
}
+ dprintk("NFS: isize change on server for file %s/%ld\n",
+ inode->i_sb->s_id, inode->i_ino);
}

- /*
- * Note: we don't check inode->i_mtime since pipes etc.
- * can change this value in VFS without requiring a
- * cache revalidation.
- */
+ /* Check if the mtime agrees */
if (!timespec_equal(&inode->i_mtime, &fattr->mtime)) {
memcpy(&inode->i_mtime, &fattr->mtime, sizeof(inode->i_mtime));
-#ifdef NFS_DEBUG_VERBOSE
- printk(KERN_DEBUG "NFS: mtime change on %s/%ld\n", inode->i_sb->s_id, inode->i_ino);
-#endif
+ dprintk("NFS: mtime change on server for file %s/%ld\n",
+ inode->i_sb->s_id, inode->i_ino);
if (!data_unstable)
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
}

if ((fattr->valid & NFS_ATTR_FATTR_V4)
&& nfsi->change_attr != fattr->change_attr) {
-#ifdef NFS_DEBUG_VERBOSE
- printk(KERN_DEBUG "NFS: change_attr change on %s/%ld\n",
+ dprintk("NFS: change_attr change on server for file %s/%ld\n",
inode->i_sb->s_id, inode->i_ino);
-#endif
nfsi->change_attr = fattr->change_attr;
if (!data_unstable)
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
Index: linux-2.6.12-rc1/fs/nfs/write.c
===================================================================
--- linux-2.6.12-rc1.orig/fs/nfs/write.c
+++ linux-2.6.12-rc1/fs/nfs/write.c
@@ -220,7 +220,7 @@ static int nfs_writepage_sync(struct nfs
ClearPageError(page);

io_error:
- nfs_end_data_update_defer(inode);
+ nfs_end_data_update(inode);
nfs_writedata_free(wdata);
return written ? written : result;
}
@@ -401,7 +401,7 @@ static void nfs_inode_remove_request(str
nfsi->npages--;
if (!nfsi->npages) {
spin_unlock(&nfsi->req_lock);
- nfs_end_data_update_defer(inode);
+ nfs_end_data_update(inode);
iput(inode);
} else
spin_unlock(&nfsi->req_lock);
Index: linux-2.6.12-rc1/fs/nfs/direct.c
===================================================================
--- linux-2.6.12-rc1.orig/fs/nfs/direct.c
+++ linux-2.6.12-rc1/fs/nfs/direct.c
@@ -517,7 +517,7 @@ retry:
result = tot_bytes;

out:
- nfs_end_data_update_defer(inode);
+ nfs_end_data_update(inode);
nfs_writedata_free(wdata);
return result;

Index: linux-2.6.12-rc1/include/linux/nfs_fs.h
===================================================================
--- linux-2.6.12-rc1.orig/include/linux/nfs_fs.h
+++ linux-2.6.12-rc1/include/linux/nfs_fs.h
@@ -294,7 +294,6 @@ extern void nfs_begin_attr_update(struct
extern void nfs_end_attr_update(struct inode *);
extern void nfs_begin_data_update(struct inode *);
extern void nfs_end_data_update(struct inode *);
-extern void nfs_end_data_update_defer(struct inode *);
extern struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry, struct rpc_cred *cred);
extern struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx);
extern void put_nfs_open_context(struct nfs_open_context *ctx);


Attachments:
linux-2.6.12-38-fix_writes.dif (6.70 kB)