Currently when we're reading a directory that is undergoing concurrent
modifications, we start re-reading from the beginning of the directory as soon
as we detect that a change has occurred. On a LAN connection or in a directory
with a small number of entries, the impact isn't too noticeable... but reading
a directory with a large number of entries over a WAN connection gets pretty
bad.
For NFS v3, what happens is that after each on-the-wire READDIR we call
nfs_refresh_inode() and from there we get to nfs_update_inode(), where we wind
up setting NFS_INO_INVALID_DATA in the directory's cache_validity flags. Then
on a subsequent call to nfs_readdir() we call nfs_revalidate_mapping(), and
seeing that NFS_INO_INVALIDATE_DATA is set we call nfs_invalidate_mapping(),
flushing all our cached data for the directory.
So for each nfs_readdir() call, we wind up redoing all of the on-the-wire
readdir operations just to get back where we were, and then we're able to get
just one more operation's worth of entries on top of that. If the directory on
the NFS server is constantly being modified then this winds up being a lot of
extra READDIR ops.
These patches change that behavior by only revalidating if we're at the
beginning of the directory or if the cached attributes for the directory have
expired.
For example, on a test environment of two VMs, I have a directory of 100,000
entries that takes 981 READDIR operations to read if no modifications being made
to the directory at the same time. If I add a 35ms delay between the client and
the server and start a script on the server that repeatedly creates and removes
a file in the directory being listed I get the following results:
[root@localhost ~]# mount -t nfs -o nfsvers=3,nordirplus server:/export /mnt
[root@localhost ~]# time /bin/ls /mnt/bigdir >/dev/null
real 29m52.594s
user 0m0.376s
sys 0m2.191s
[root@localhost ~]# mountstats --rpc /mnt | grep -A3 READDIR
READDIR:
49729 ops (99%) 0 retrans (0%) 0 major timeouts
avg bytes sent per op: 144 avg bytes received per op: 4196
backlog wait: 0.003620 RTT: 35.889501 total execute time: 35.925858 (milliseconds)
[root@localhost ~]#
With the patched kernel, that same test yields these results:
[root@localhost ~]# time /bin/ls /mnt/bigdir >/dev/null
real 0m35.952s
user 0m0.460s
sys 0m0.100s
[root@localhost ~]# mountstats --rpc /mnt | grep -A3 READDIR
READDIR:
981 ops (98%) 0 retrans (0%) 0 major timeouts
avg bytes sent per op: 144 avg bytes received per op: 4194
backlog wait: 0.004077 RTT: 35.887870 total execute time: 35.926606 (milliseconds)
[root@localhost ~]#
-Scott
Scott Mayhew (2):
NFS: Make nfs_attribute_cache_expired() non-static
NFS: Make nfs_readdir revalidate less often
fs/nfs/dir.c | 5 +++--
fs/nfs/inode.c | 2 +-
include/linux/nfs_fs.h | 1 +
3 files changed, 5 insertions(+), 3 deletions(-)
--
1.7.11.7
Make nfs_readdir revalidate only when we're at the beginning of the directory or
if the cached attributes have expired.
Signed-off-by: Scott Mayhew <[email protected]>
---
fs/nfs/dir.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d7ed697..e57703c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -806,7 +806,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
nfs_readdir_descriptor_t my_desc,
*desc = &my_desc;
struct nfs_open_dir_context *dir_ctx = file->private_data;
- int res;
+ int res = 0;
dfprintk(FILE, "NFS: readdir(%s/%s) starting at cookie %llu\n",
dentry->d_parent->d_name.name, dentry->d_name.name,
@@ -828,7 +828,8 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
desc->plus = nfs_use_readdirplus(inode, ctx) ? 1 : 0;
nfs_block_sillyrename(dentry);
- res = nfs_revalidate_mapping(inode, file->f_mapping);
+ if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
+ res = nfs_revalidate_mapping(inode, file->f_mapping);
if (res < 0)
goto out;
--
1.7.11.7
NFS: Make nfs_attribute_cache_expired() non-static so we can call it from
nfs_readdir().
Signed-off-by: Scott Mayhew <[email protected]>
---
fs/nfs/inode.c | 2 +-
include/linux/nfs_fs.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index ce72704..b4e3432 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -847,7 +847,7 @@ int nfs_attribute_timeout(struct inode *inode)
return !time_in_range_open(jiffies, nfsi->read_cache_jiffies, nfsi->read_cache_jiffies + nfsi->attrtimeo);
}
-static int nfs_attribute_cache_expired(struct inode *inode)
+int nfs_attribute_cache_expired(struct inode *inode)
{
if (nfs_have_delegated_attributes(inode))
return 0;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index fc01d5c..990d6e3 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -347,6 +347,7 @@ extern int nfs_permission(struct inode *, int);
extern int nfs_open(struct inode *, struct file *);
extern int nfs_release(struct inode *, struct file *);
extern int nfs_attribute_timeout(struct inode *inode);
+extern int nfs_attribute_cache_expired(struct inode *inode);
extern int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode);
extern int __nfs_revalidate_inode(struct nfs_server *, struct inode *);
extern int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping);
--
1.7.11.7