Hi Trond,
We're still having some problems with data corruption when multiple clients are
appending to a file and those clients are being granted write delegations on
open.
To reproduce:
Client A:
vi /mnt/`hostname -s`
while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done
Client B:
vi /mnt/`hostname -s`
while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done
The resulting data looks something like this:
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@XXXXXXXXXXXXXXX
YYYYYYYYYYYYYYY
YYYYYYYYYYYYYYY
What's happening is that in nfs_update_inode() we're recognizing that the file
size has changed and we're setting NFS_INO_INVALID_DATA accordingly, but then we
ignore the cache_validity flags in nfs_write_pageuptodate() because we have a
delegation. As a result, in nfs_updatepage() we're extending the write to cover
the full page even though we've not read in the data to begin with.
The attached patch fixes the issue by calling nfs_revalidate_mapping() whenever
we attach a delegation to an nfs_inode. However, I'm not entirely sure that
calling nfs_revalidate_inode() in those places won't cause some other issues. I
tested the change with all of the stock workloads in nfsometer except for
bonnie++.
An alternative would be to just revert c7559663 (NFS: Allow nfs_updatepage to
extend a write under additional circumstances), but then we'd lose the
performance benefits of that patch... which I'd like to avoid if possible.
-Scott
Scott Mayhew (1):
nfs: ensure cached data is correct before using delegation
fs/nfs/delegation.c | 1 +
fs/nfs/inode.c | 1 +
fs/nfs/nfs4proc.c | 5 +++--
3 files changed, 5 insertions(+), 2 deletions(-)
--
1.9.3
On Tue, Jun 17, 2014 at 5:04 PM, Scott Mayhew <[email protected]> wrote:
>
> No, I don't think it's right to ignore NFS_INO_INVALID_DATA, and
> originally I was testing a fix similar to this:
>
> ---8<---
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 3ee5af4..98ff061 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -934,12 +934,14 @@ static bool nfs_write_pageuptodate(struct page *page, struct inode *inode)
>
> if (nfs_have_delegated_attributes(inode))
> goto out;
> - if (nfsi->cache_validity & (NFS_INO_INVALID_DATA|NFS_INO_REVAL_PAGECACHE))
> + if (nfsi->cache_validity & NFS_INO_REVAL_PAGECACHE)
> return false;
> smp_rmb();
> if (test_bit(NFS_INO_INVALIDATING, &nfsi->flags))
> return false;
> out:
> + if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
> + return false;
> return PageUptodate(page) != 0;
> }
> ---8<---
>
>
> However,
>
> 1) it wasn't really keeping with the spirit of commit 8d197a56 (NFS:
> Always trust the PageUptodate flag when we have a delegation), and
>
> 2) one of my test programs (used to test commit c7559663 (NFS: Allow
> nfs_updatepage to extend a write under additional circumstances)))
> started performing poorly again, doing tons of sub page-sized writes
> intead of a handful of wsize'd writes.
>
> I did some more digging and I think I see 2 areas that could be
> improved.
>
> The first would be to clear NFS_INO_INVALID_DATA if we've just
> truncated the inode to 0 bytes -- after all, if we've just unmapped
> all the pages from the inode's address space then isn't our data
> consisitent?:
Hi Scott,
What say we rather just don't set/clear NFS_INO_INVALID_DATA if
mapping->nrpages == 0? That should catch the above case, plus a couple
of other potential false positives.
> The second thing I noticed is that we're constantly invalidating our
> cache due to the change attribute changing on the server. But if we
> have a write delegation then the change attribute changing must be the
> result of *our* changes, in which case we should be able to just silently
> update the change attribute on our side without invalidating our caches:
Ack. That's a bug in the current code.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
Now that we have functions such as nfs_write_pageuptodate() that use
the cache_validity flags to check if the data cache is valid or not,
it is a little more important to keep the flags in sync with the
state of the data cache.
In particular, we'd like to ensure that if the data cache is empty, we
don't start marking it as needing revalidation.
Reported-by: Scott Mayhew <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/inode.c | 75 +++++++++++++++++++++++++++++++---------------------------
1 file changed, 40 insertions(+), 35 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 51dda21ebc4e..9927913c97c2 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -147,6 +147,17 @@ int nfs_sync_mapping(struct address_space *mapping)
return ret;
}
+static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
+{
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ if (inode->i_mapping->nrpages == 0)
+ flags &= ~NFS_INO_INVALID_DATA;
+ nfsi->cache_validity |= flags;
+ if (flags & NFS_INO_INVALID_DATA)
+ nfs_fscache_invalidate(inode);
+}
+
/*
* Invalidate the local caches
*/
@@ -162,17 +173,16 @@ static void nfs_zap_caches_locked(struct inode *inode)
memset(NFS_I(inode)->cookieverf, 0, sizeof(NFS_I(inode)->cookieverf));
if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)) {
- nfs_fscache_invalidate(inode);
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_DATA
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
- | NFS_INO_REVAL_PAGECACHE;
+ | NFS_INO_REVAL_PAGECACHE);
} else
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
- | NFS_INO_REVAL_PAGECACHE;
+ | NFS_INO_REVAL_PAGECACHE);
nfs_zap_label_cache_locked(nfsi);
}
@@ -187,8 +197,7 @@ void nfs_zap_mapping(struct inode *inode, struct address_space *mapping)
{
if (mapping->nrpages != 0) {
spin_lock(&inode->i_lock);
- NFS_I(inode)->cache_validity |= NFS_INO_INVALID_DATA;
- nfs_fscache_invalidate(inode);
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
spin_unlock(&inode->i_lock);
}
}
@@ -209,7 +218,7 @@ EXPORT_SYMBOL_GPL(nfs_zap_acl_cache);
void nfs_invalidate_atime(struct inode *inode)
{
spin_lock(&inode->i_lock);
- NFS_I(inode)->cache_validity |= NFS_INO_INVALID_ATIME;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATIME);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL_GPL(nfs_invalidate_atime);
@@ -369,7 +378,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr, st
inode->i_mode = fattr->mode;
if ((fattr->valid & NFS_ATTR_FATTR_MODE) == 0
&& nfs_server_capable(inode, NFS_CAP_MODE))
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
/* Why so? Because we want revalidate for devices/FIFOs, and
* that's precisely what we have in nfs_file_inode_operations.
*/
@@ -415,36 +424,36 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr, st
if (fattr->valid & NFS_ATTR_FATTR_ATIME)
inode->i_atime = fattr->atime;
else if (nfs_server_capable(inode, NFS_CAP_ATIME))
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_MTIME)
inode->i_mtime = fattr->mtime;
else if (nfs_server_capable(inode, NFS_CAP_MTIME))
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_CTIME)
inode->i_ctime = fattr->ctime;
else if (nfs_server_capable(inode, NFS_CAP_CTIME))
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_CHANGE)
inode->i_version = fattr->change_attr;
else if (nfs_server_capable(inode, NFS_CAP_CHANGE_ATTR))
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_SIZE)
inode->i_size = nfs_size_to_loff_t(fattr->size);
else
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR
- | NFS_INO_REVAL_PAGECACHE;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
+ | NFS_INO_REVAL_PAGECACHE);
if (fattr->valid & NFS_ATTR_FATTR_NLINK)
set_nlink(inode, fattr->nlink);
else if (nfs_server_capable(inode, NFS_CAP_NLINK))
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_OWNER)
inode->i_uid = fattr->uid;
else if (nfs_server_capable(inode, NFS_CAP_OWNER))
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_GROUP)
inode->i_gid = fattr->gid;
else if (nfs_server_capable(inode, NFS_CAP_OWNER_GROUP))
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
inode->i_blocks = fattr->du.nfs2.blocks;
if (fattr->valid & NFS_ATTR_FATTR_SPACE_USED) {
@@ -550,6 +559,9 @@ static int nfs_vmtruncate(struct inode * inode, loff_t offset)
spin_lock(&inode->i_lock);
i_size_write(inode, offset);
+ /* Optimisation */
+ if (offset == 0)
+ NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_DATA;
spin_unlock(&inode->i_lock);
truncate_pagecache(inode, offset);
@@ -578,7 +590,8 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr)
inode->i_uid = attr->ia_uid;
if ((attr->ia_valid & ATTR_GID) != 0)
inode->i_gid = attr->ia_gid;
- NFS_I(inode)->cache_validity |= NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_ACCESS
+ | NFS_INO_INVALID_ACL);
spin_unlock(&inode->i_lock);
}
if ((attr->ia_valid & ATTR_SIZE) != 0) {
@@ -1101,7 +1114,7 @@ static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr
&& inode->i_version == fattr->pre_change_attr) {
inode->i_version = fattr->change_attr;
if (S_ISDIR(inode->i_mode))
- nfsi->cache_validity |= NFS_INO_INVALID_DATA;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
ret |= NFS_INO_INVALID_ATTR;
}
/* If we have atomic WCC data, we may update some attributes */
@@ -1117,7 +1130,7 @@ static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr
&& timespec_equal(&inode->i_mtime, &fattr->pre_mtime)) {
memcpy(&inode->i_mtime, &fattr->mtime, sizeof(inode->i_mtime));
if (S_ISDIR(inode->i_mode))
- nfsi->cache_validity |= NFS_INO_INVALID_DATA;
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
ret |= NFS_INO_INVALID_ATTR;
}
if ((fattr->valid & NFS_ATTR_FATTR_PRESIZE)
@@ -1128,9 +1141,6 @@ static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr
ret |= NFS_INO_INVALID_ATTR;
}
- if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
- nfs_fscache_invalidate(inode);
-
return ret;
}
@@ -1189,7 +1199,7 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat
invalid |= NFS_INO_INVALID_ATIME;
if (invalid != 0)
- nfsi->cache_validity |= invalid;
+ nfs_set_cache_invalid(inode, invalid);
nfsi->read_cache_jiffies = fattr->time_start;
return 0;
@@ -1402,13 +1412,11 @@ EXPORT_SYMBOL_GPL(nfs_refresh_inode);
static int nfs_post_op_update_inode_locked(struct inode *inode, struct nfs_fattr *fattr)
{
- struct nfs_inode *nfsi = NFS_I(inode);
+ unsigned long invalid = NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
- nfsi->cache_validity |= NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
- if (S_ISDIR(inode->i_mode)) {
- nfsi->cache_validity |= NFS_INO_INVALID_DATA;
- nfs_fscache_invalidate(inode);
- }
+ if (S_ISDIR(inode->i_mode))
+ invalid |= NFS_INO_INVALID_DATA;
+ nfs_set_cache_invalid(inode, invalid);
if ((fattr->valid & NFS_ATTR_FATTR) == 0)
return 0;
return nfs_refresh_inode_locked(inode, fattr);
@@ -1703,10 +1711,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
invalid &= ~NFS_INO_INVALID_DATA;
if (!NFS_PROTO(inode)->have_delegation(inode, FMODE_READ) ||
(save_cache_validity & NFS_INO_REVAL_FORCED))
- nfsi->cache_validity |= invalid;
-
- if (invalid & NFS_INO_INVALID_DATA)
- nfs_fscache_invalidate(inode);
+ nfs_set_cache_invalid(inode, invalid);
return 0;
out_err:
--
1.9.3
nfs_write_pageuptodate() bypasses the cache_validity flags whenever we
have a delegation... but in order to do that we need to be sure our
cached data is correct to begin with.
---
fs/nfs/delegation.c | 1 +
fs/nfs/inode.c | 1 +
fs/nfs/nfs4proc.c | 5 +++--
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 5d8ccec..12f3eca 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -167,6 +167,7 @@ void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred,
spin_unlock(&delegation->lock);
rcu_read_unlock();
nfs_inode_set_delegation(inode, cred, res);
+ nfs_revalidate_mapping(inode, inode->i_mapping);
}
} else {
rcu_read_unlock();
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index c496f8a..95a9d21 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1090,6 +1090,7 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
out:
return ret;
}
+EXPORT_SYMBOL_GPL(nfs_revalidate_mapping);
static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
{
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 285ad53..a538aac 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1361,11 +1361,12 @@ nfs4_opendata_check_deleg(struct nfs4_opendata *data, struct nfs4_state *state)
"returning a delegation for "
"OPEN(CLAIM_DELEGATE_CUR)\n",
clp->cl_hostname);
- } else if ((delegation_flags & 1UL<<NFS_DELEGATION_NEED_RECLAIM) == 0)
+ } else if ((delegation_flags & 1UL<<NFS_DELEGATION_NEED_RECLAIM) == 0) {
nfs_inode_set_delegation(state->inode,
data->owner->so_cred,
&data->o_res);
- else
+ nfs_revalidate_mapping(state->inode, state->inode->i_mapping);
+ } else
nfs_inode_reclaim_delegation(state->inode,
data->owner->so_cred,
&data->o_res);
--
1.9.3
On Fri, Jun 13, 2014 at 2:18 PM, Scott Mayhew <[email protected]> wrote:
> nfs_write_pageuptodate() bypasses the cache_validity flags whenever we
> have a delegation... but in order to do that we need to be sure our
> cached data is correct to begin with.
> ---
> fs/nfs/delegation.c | 1 +
> fs/nfs/inode.c | 1 +
> fs/nfs/nfs4proc.c | 5 +++--
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
> index 5d8ccec..12f3eca 100644
> --- a/fs/nfs/delegation.c
> +++ b/fs/nfs/delegation.c
> @@ -167,6 +167,7 @@ void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred,
> spin_unlock(&delegation->lock);
> rcu_read_unlock();
> nfs_inode_set_delegation(inode, cred, res);
> + nfs_revalidate_mapping(inode, inode->i_mapping);
If you are reclaiming a delegation after a server reboot, then nobody
is supposed to have changed the file.
> }
> } else {
> rcu_read_unlock();
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index c496f8a..95a9d21 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1090,6 +1090,7 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
> out:
> return ret;
> }
> +EXPORT_SYMBOL_GPL(nfs_revalidate_mapping);
>
> static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> {
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 285ad53..a538aac 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1361,11 +1361,12 @@ nfs4_opendata_check_deleg(struct nfs4_opendata *data, struct nfs4_state *state)
> "returning a delegation for "
> "OPEN(CLAIM_DELEGATE_CUR)\n",
> clp->cl_hostname);
> - } else if ((delegation_flags & 1UL<<NFS_DELEGATION_NEED_RECLAIM) == 0)
> + } else if ((delegation_flags & 1UL<<NFS_DELEGATION_NEED_RECLAIM) == 0) {
> nfs_inode_set_delegation(state->inode,
> data->owner->so_cred,
> &data->o_res);
> - else
> + nfs_revalidate_mapping(state->inode, state->inode->i_mapping);
> + } else
> nfs_inode_reclaim_delegation(state->inode,
> data->owner->so_cred,
> &data->o_res);
I'd really prefer to fix this in the part of the code that is actually broken.
I agree that we should ignore the NFS_INO_REVAL_PAGECACHE flag if we
have a delegation and the NFS_INO_REVAL_FORCED is unset. However is it
right to ignore NFS_INO_INVALID_DATA?
Cheers
Trond
On Fri, 20 Jun 2014, Trond Myklebust wrote:
> Hi Scott,
>
> How about the following 2 extra patches on top of yours? Do they help
> with the performance regression?
Yes, they do. Thanks!
-Scott
>
> Cheers
> Trond
>
> Scott Mayhew (1):
> nfs: Fix cache_validity check in nfs_write_pageuptodate()
>
> Trond Myklebust (2):
> NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file size
> NFS: Don't mark the data cache as invalid if it has been flushed
>
> fs/nfs/inode.c | 76 +++++++++++++++++++++++++++++++---------------------------
> fs/nfs/write.c | 4 +++-
> 2 files changed, 44 insertions(+), 36 deletions(-)
>
> --
> 1.9.3
>
From: Scott Mayhew <[email protected]>
NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation.
We're still having some problems with data corruption when multiple
clients are appending to a file and those clients are being granted
write delegations on open.
To reproduce:
Client A:
vi /mnt/`hostname -s`
while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done
Client B:
vi /mnt/`hostname -s`
while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done
What's happening is that in nfs_update_inode() we're recognizing that
the file size has changed and we're setting NFS_INO_INVALID_DATA
accordingly, but then we ignore the cache_validity flags in
nfs_write_pageuptodate() because we have a delegation. As a result,
in nfs_updatepage() we're extending the write to cover the full page
even though we've not read in the data to begin with.
Signed-off-by: Scott Mayhew <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/write.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 48cca1a65a4b..8534ee5c207d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -928,12 +928,14 @@ static bool nfs_write_pageuptodate(struct page *page, struct inode *inode)
if (nfs_have_delegated_attributes(inode))
goto out;
- if (nfsi->cache_validity & (NFS_INO_INVALID_DATA|NFS_INO_REVAL_PAGECACHE))
+ if (nfsi->cache_validity & NFS_INO_REVAL_PAGECACHE)
return false;
smp_rmb();
if (test_bit(NFS_INO_INVALIDATING, &nfsi->flags))
return false;
out:
+ if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
+ return false;
return PageUptodate(page) != 0;
}
--
1.9.3
In nfs_update_inode(), if the change attribute is seen to change on
the server, then we set NFS_INO_REVAL_PAGECACHE in order to make
sure that we check the file size.
However, if we also update the file size in the same function, we
don't need to check it again. So make sure that we clear the
NFS_INO_REVAL_PAGECACHE that was set earlier.
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/inode.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index c496f8a74639..51dda21ebc4e 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1601,6 +1601,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
if ((nfsi->npages == 0) || new_isize > cur_isize) {
i_size_write(inode, new_isize);
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA;
+ invalid &= ~NFS_INO_REVAL_PAGECACHE;
}
dprintk("NFS: isize change on server for file %s/%ld "
"(%Ld to %Ld)\n",
--
1.9.3
On Tue, 17 Jun 2014, Trond Myklebust wrote:
> On Tue, Jun 17, 2014 at 5:04 PM, Scott Mayhew <[email protected]> wrote:
> >
> > I did some more digging and I think I see 2 areas that could be
> > improved.
> >
> > The first would be to clear NFS_INO_INVALID_DATA if we've just
> > truncated the inode to 0 bytes -- after all, if we've just unmapped
> > all the pages from the inode's address space then isn't our data
> > consisitent?:
>
> What say we rather just don't set/clear NFS_INO_INVALID_DATA if
> mapping->nrpages == 0? That should catch the above case, plus a couple
> of other potential false positives.
>
That'd be fine too, but I'm not clear on whether I should do this check
for all files or just for regular files (I'm not sure how it would make
that big a difference for directory inodes for instance).
-Scott
Hi Trond,
On Fri, 13 Jun 2014, Trond Myklebust wrote:
> On Fri, Jun 13, 2014 at 2:18 PM, Scott Mayhew <[email protected]> wrote:
> > nfs_write_pageuptodate() bypasses the cache_validity flags whenever we
> > have a delegation... but in order to do that we need to be sure our
> > cached data is correct to begin with.
> > ---
> > fs/nfs/delegation.c | 1 +
> > fs/nfs/inode.c | 1 +
> > fs/nfs/nfs4proc.c | 5 +++--
> > 3 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
> > index 5d8ccec..12f3eca 100644
> > --- a/fs/nfs/delegation.c
> > +++ b/fs/nfs/delegation.c
> > @@ -167,6 +167,7 @@ void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred,
> > spin_unlock(&delegation->lock);
> > rcu_read_unlock();
> > nfs_inode_set_delegation(inode, cred, res);
> > + nfs_revalidate_mapping(inode, inode->i_mapping);
>
> If you are reclaiming a delegation after a server reboot, then nobody
> is supposed to have changed the file.
Agreed.
>
> > }
> > } else {
> > rcu_read_unlock();
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > index c496f8a..95a9d21 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -1090,6 +1090,7 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping)
> > out:
> > return ret;
> > }
> > +EXPORT_SYMBOL_GPL(nfs_revalidate_mapping);
> >
> > static unsigned long nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> > {
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 285ad53..a538aac 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -1361,11 +1361,12 @@ nfs4_opendata_check_deleg(struct nfs4_opendata *data, struct nfs4_state *state)
> > "returning a delegation for "
> > "OPEN(CLAIM_DELEGATE_CUR)\n",
> > clp->cl_hostname);
> > - } else if ((delegation_flags & 1UL<<NFS_DELEGATION_NEED_RECLAIM) == 0)
> > + } else if ((delegation_flags & 1UL<<NFS_DELEGATION_NEED_RECLAIM) == 0) {
> > nfs_inode_set_delegation(state->inode,
> > data->owner->so_cred,
> > &data->o_res);
> > - else
> > + nfs_revalidate_mapping(state->inode, state->inode->i_mapping);
> > + } else
> > nfs_inode_reclaim_delegation(state->inode,
> > data->owner->so_cred,
> > &data->o_res);
>
> I'd really prefer to fix this in the part of the code that is actually broken.
>
> I agree that we should ignore the NFS_INO_REVAL_PAGECACHE flag if we
> have a delegation and the NFS_INO_REVAL_FORCED is unset. However is it
> right to ignore NFS_INO_INVALID_DATA?
>
No, I don't think it's right to ignore NFS_INO_INVALID_DATA, and
originally I was testing a fix similar to this:
---8<---
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 3ee5af4..98ff061 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -934,12 +934,14 @@ static bool nfs_write_pageuptodate(struct page *page, struct inode *inode)
if (nfs_have_delegated_attributes(inode))
goto out;
- if (nfsi->cache_validity & (NFS_INO_INVALID_DATA|NFS_INO_REVAL_PAGECACHE))
+ if (nfsi->cache_validity & NFS_INO_REVAL_PAGECACHE)
return false;
smp_rmb();
if (test_bit(NFS_INO_INVALIDATING, &nfsi->flags))
return false;
out:
+ if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
+ return false;
return PageUptodate(page) != 0;
}
---8<---
However,
1) it wasn't really keeping with the spirit of commit 8d197a56 (NFS:
Always trust the PageUptodate flag when we have a delegation), and
2) one of my test programs (used to test commit c7559663 (NFS: Allow
nfs_updatepage to extend a write under additional circumstances)))
started performing poorly again, doing tons of sub page-sized writes
intead of a handful of wsize'd writes.
I did some more digging and I think I see 2 areas that could be
improved.
The first would be to clear NFS_INO_INVALID_DATA if we've just
truncated the inode to 0 bytes -- after all, if we've just unmapped
all the pages from the inode's address space then isn't our data
consisitent?:
---8<---
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index c496f8a..1078d06 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -584,6 +584,11 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr)
if ((attr->ia_valid & ATTR_SIZE) != 0) {
nfs_inc_stats(inode, NFSIOS_SETATTRTRUNC);
nfs_vmtruncate(inode, attr->ia_size);
+ if (attr->ia_size == 0) {
+ spin_lock(&inode->i_lock);
+ NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_DATA;
+ spin_unlock(&inode->i_lock);
+ }
}
}
EXPORT_SYMBOL_GPL(nfs_setattr_update_inode);
---8<---
The second thing I noticed is that we're constantly invalidating our
cache due to the change attribute changing on the server. But if we
have a write delegation then the change attribute changing must be the
result of *our* changes, in which case we should be able to just silently
update the change attribute on our side without invalidating our caches:
---8<---
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 1078d06..932c999 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1568,15 +1568,17 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
/* More cache consistency checks */
if (fattr->valid & NFS_ATTR_FATTR_CHANGE) {
if (inode->i_version != fattr->change_attr) {
- dprintk("NFS: change_attr change on server for file %s/%ld\n",
+ if (!NFS_PROTO(inode)->have_delegation(inode, FMODE_WRITE)) {
+ dprintk("NFS: change_attr change on server for file %s/%ld\n",
inode->i_sb->s_id, inode->i_ino);
- invalid |= NFS_INO_INVALID_ATTR
- | NFS_INO_INVALID_DATA
- | NFS_INO_INVALID_ACCESS
- | NFS_INO_INVALID_ACL
- | NFS_INO_REVAL_PAGECACHE;
- if (S_ISDIR(inode->i_mode))
- nfs_force_lookup_revalidate(inode);
+ invalid |= NFS_INO_INVALID_ATTR
+ | NFS_INO_INVALID_DATA
+ | NFS_INO_INVALID_ACCESS
+ | NFS_INO_INVALID_ACL
+ | NFS_INO_REVAL_PAGECACHE;
+ if (S_ISDIR(inode->i_mode))
+ nfs_force_lookup_revalidate(inode);
+ }
inode->i_version = fattr->change_attr;
}
} else if (server->caps & NFS_CAP_CHANGE_ATTR)
---8<---
If you think these 3 changes look alright then I'll do some more testing
and then send the patches (but I'd rather not spend too much time
testing if you see an issue with the changes in the first place).
Thanks,
Scott
Hi Scott,
How about the following 2 extra patches on top of yours? Do they help
with the performance regression?
Cheers
Trond
Scott Mayhew (1):
nfs: Fix cache_validity check in nfs_write_pageuptodate()
Trond Myklebust (2):
NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file size
NFS: Don't mark the data cache as invalid if it has been flushed
fs/nfs/inode.c | 76 +++++++++++++++++++++++++++++++---------------------------
fs/nfs/write.c | 4 +++-
2 files changed, 44 insertions(+), 36 deletions(-)
--
1.9.3