2015-07-30 13:52:30

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/9] nfsd: open file caching for v2/3

Hi Bruce!

This patchset adds a new open file cache for knfsd. As you well know,
nfsd basically does an open() - read/write() - close() cycle for every
nfsv3 READ or WRITE. It's also common for clients to "spray" several
read and write requests in parallel or in quick succession, so we could
skip a lot of that by simply caching these open filps.

The idea here is to cache them in a hashtable for a little while (1s by
default) in the expectation that clients may try to issue more reads or
writes in quick succession. When there are any entries in the hashtable,
there is a recurring workqueue job that will clean the cache.

I've also added some hooks into sunrpc cache code that should allow us
to purge the cache on an unexport event, so this shouldn't cause any
problems with unmounting once you've unexported the fs.

I did a little testing with it, but my test rig is pretty slow, and I
couldn't measure much of a performance difference on a bog standard
local fs.

We do have some patches that allow the reexporting of NFSv4.1 via knfsd.
Since NFS has a relatively slow open routine, this provides a rather
large speedup.

Without these patches:
$ dd if=/dev/urandom of=/mnt/dp01/ddfile bs=4k count=256 oflag=direct
256+0 records in
256+0 records out
1048576 bytes (1.0 MB) copied, 54.3109 s, 19.3 kB/s

With these patches:
$ dd if=/dev/urandom of=/mnt/dp01/ddfile bs=4k count=256 oflag=direct
256+0 records in
256+0 records out
1048576 bytes (1.0 MB) copied, 1.05437 s, 995 kB/s

It should also be possible to hook this code up to the nfs4_file too,
but I haven't done that in this set. I'd like to get this in and settled
before we start looking at that, since it'll mean a bit of reengineering
of the NFSv4 code not to pass around struct file pointers.

I'd like to have these considered for the v4.3 merge window if they look
reasonable.

Jeff Layton (9):
nfsd: include linux/nfs4.h in export.h
nfsd: move some file caching helpers to a common header
nfsd: convert laundry_wq to something less nfsd4 specific
nfsd: add a new struct file caching facility to nfsd
nfsd: hook up nfsd_write to the new nfsd_file cache
nfsd: hook up nfsd_read to the nfsd_file cache
sunrpc: add a new cache_detail operation for when a cache is flushed
nfsd: add a ->flush routine to svc_export_cache
nfsd: allow the file cache expire time to be tunable

fs/nfsd/Makefile | 3 +-
fs/nfsd/export.c | 14 ++
fs/nfsd/export.h | 1 +
fs/nfsd/filecache.c | 348 +++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.h | 47 ++++++
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfs4state.c | 40 +----
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsproc.c | 2 +-
fs/nfsd/nfssvc.c | 22 ++-
fs/nfsd/vfs.c | 56 +++----
fs/nfsd/vfs.h | 2 +-
include/linux/sunrpc/cache.h | 1 +
net/sunrpc/cache.c | 3 +
14 files changed, 467 insertions(+), 75 deletions(-)
create mode 100644 fs/nfsd/filecache.c
create mode 100644 fs/nfsd/filecache.h

--
2.4.3



2015-07-30 13:52:32

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid

Currently, preprocess_stateid_op calls nfs4_check_olstateid which
verifies that the open stateid corresponds to the current_fh in the
call by calling nfs4_check_fh.

If the stateid is a NFS4_DELEG_STID however, then no such check is
done. Move the call to nfs4_check_fh into nfs4_check_file instead
so that it can be done for all stateid types.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index cd8c33186e26..75f617a052cf 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4406,9 +4406,9 @@ laundromat_main(struct work_struct *laundry)
queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ);
}

-static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_ol_stateid *stp)
+static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
{
- if (!fh_match(&fhp->fh_handle, &stp->st_stid.sc_file->fi_fhandle))
+ if (!fh_match(&fhp->fh_handle, &stp->sc_file->fi_fhandle))
return nfserr_bad_stateid;
return nfs_ok;
}
@@ -4611,9 +4611,6 @@ nfs4_check_olstateid(struct svc_fh *fhp, struct nfs4_ol_stateid *ols, int flags)
{
__be32 status;

- status = nfs4_check_fh(fhp, ols);
- if (status)
- return status;
status = nfsd4_check_openowner_confirmed(ols);
if (status)
return status;
@@ -4628,6 +4625,10 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
struct file *file;
__be32 status;

+ status = nfs4_check_fh(fhp, s);
+ if (status)
+ return status;
+
file = nfs4_find_file(s, flags);
if (file) {
status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
@@ -4808,7 +4809,7 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_
status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate));
if (status)
return status;
- return nfs4_check_fh(current_fh, stp);
+ return nfs4_check_fh(current_fh, &stp->st_stid);
}

/*
--
2.4.3


2015-07-30 13:52:33

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 1/9] nfsd: include linux/nfs4.h in export.h

export.h refers to the pnfs_layouttype enum, which is defined there.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/export.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 1f52bfcc436f..2e315072bf3f 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -6,6 +6,7 @@

#include <linux/sunrpc/cache.h>
#include <uapi/linux/nfsd/export.h>
+#include <linux/nfs4.h>

struct knfsd_fh;
struct svc_fh;
--
2.4.3


2015-07-30 13:52:39

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 7/9] sunrpc: add a new cache_detail operation for when a cache is flushed

When the exports table is changed, exportfs will usually write a new
time to the "flush" file in the nfsd.export cache procfile. This tells
the kernel to flush any entries that are older than that value.

This gives us a mechanism to tell whether an unexport might have
occurred. Add a new ->flush cache_detail operation that is called after
flushing the cache whenever someone writes to a "flush" file.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/sunrpc/cache.h | 1 +
net/sunrpc/cache.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6c4aef..29033421aa0c 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -98,6 +98,7 @@ struct cache_detail {
int has_died);

struct cache_head * (*alloc)(void);
+ void (*flush)(void);
int (*match)(struct cache_head *orig, struct cache_head *new);
void (*init)(struct cache_head *orig, struct cache_head *new);
void (*update)(struct cache_head *orig, struct cache_head *new);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 2928afffbb81..19dc71ccc706 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1452,6 +1452,9 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
cd->nextcheck = seconds_since_boot();
cache_flush();

+ if (cd->flush)
+ cd->flush();
+
*ppos += count;
return count;
}
--
2.4.3


2015-07-30 13:52:40

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 8/9] nfsd: add a ->flush routine to svc_export_cache

...to purge the nfsd_file table. This should help ensure that
filesystems are unmountable very soon after unexporting them.

Note that we take the nfsd_mutex in this code to ensure that we can't
race with a concurrent shutdown of nfsd that might destroy the cache.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/export.c | 14 ++++++++++++++
fs/nfsd/filecache.c | 39 +++++++++++++++++++++++++--------------
fs/nfsd/filecache.h | 1 +
3 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index f79521a59747..be80ff370072 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -21,6 +21,7 @@
#include "nfsfh.h"
#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_EXPORT

@@ -231,6 +232,18 @@ static struct cache_head *expkey_alloc(void)
return NULL;
}

+static void
+expkey_flush(void)
+{
+ /*
+ * Take the nfsd_mutex here to ensure that the file cache is not
+ * destroyed while we're in the middle of flushing.
+ */
+ mutex_lock(&nfsd_mutex);
+ nfsd_file_cache_purge();
+ mutex_unlock(&nfsd_mutex);
+}
+
static struct cache_detail svc_expkey_cache_template = {
.owner = THIS_MODULE,
.hash_size = EXPKEY_HASHMAX,
@@ -243,6 +256,7 @@ static struct cache_detail svc_expkey_cache_template = {
.init = expkey_init,
.update = expkey_update,
.alloc = expkey_alloc,
+ .flush = expkey_flush,
};

static int
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 564b4eec0350..55930995a1ec 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -124,6 +124,30 @@ nfsd_file_dispose_list(struct list_head *dispose)
}
}

+void
+nfsd_file_cache_purge(void)
+{
+ unsigned int i;
+ struct nfsd_file *nf;
+ LIST_HEAD(dispose);
+
+ if (!atomic_read(&nfsd_file_count))
+ return;
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+ while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
+ nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
+ struct nfsd_file, nf_node);
+ nfsd_file_unhash(nf);
+ /* put the hash reference */
+ nfsd_file_put_locked(nf, &dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+ nfsd_file_dispose_list(&dispose);
+ }
+}
+
static void
nfsd_file_cache_prune(void)
{
@@ -200,23 +224,10 @@ out_nomem:
void
nfsd_file_cache_shutdown(void)
{
- unsigned int i;
- struct nfsd_file *nf;
LIST_HEAD(dispose);

cancel_delayed_work_sync(&nfsd_file_cache_clean_work);
- for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
- spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
- while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
- nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
- struct nfsd_file, nf_node);
- nfsd_file_unhash(nf);
- /* put the hash reference */
- nfsd_file_put_locked(nf, &dispose);
- }
- spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
- nfsd_file_dispose_list(&dispose);
- }
+ nfsd_file_cache_purge();
kfree(nfsd_file_hashtbl);
nfsd_file_hashtbl = NULL;
}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index adf7e78b8e43..98976f71caa8 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -39,6 +39,7 @@ struct nfsd_file {
};

int nfsd_file_cache_init(void);
+void nfsd_file_cache_purge(void);
void nfsd_file_cache_shutdown(void);
void nfsd_file_put(struct nfsd_file *nf);
__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
--
2.4.3


2015-07-30 13:52:34

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 2/9] nfsd: move some file caching helpers to a common header

We'll want to reuse some of this for common open file caching
infrastructure.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.h | 25 +++++++++++++++++++++++++
fs/nfsd/nfs4state.c | 17 ++---------------
2 files changed, 27 insertions(+), 15 deletions(-)
create mode 100644 fs/nfsd/filecache.h

diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
new file mode 100644
index 000000000000..9051ee54faa3
--- /dev/null
+++ b/fs/nfsd/filecache.h
@@ -0,0 +1,25 @@
+#ifndef _FS_NFSD_FILECACHE_H
+#define _FS_NFSD_FILECACHE_H
+
+#include <linux/jhash.h>
+#include <linux/sunrpc/xdr.h>
+
+#include "export.h"
+
+/* hash table for nfs4_file */
+#define NFSD_FILE_HASH_BITS 8
+#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS)
+
+static inline unsigned int
+nfsd_fh_hashval(struct knfsd_fh *fh)
+{
+ return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0);
+}
+
+static inline unsigned int
+file_hashval(struct knfsd_fh *fh)
+{
+ return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
+}
+
+#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 75f617a052cf..79795c898dd1 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -49,6 +49,7 @@

#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_PROC

@@ -381,21 +382,7 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
return ret & OWNER_HASH_MASK;
}

-/* hash table for nfs4_file */
-#define FILE_HASH_BITS 8
-#define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
-
-static unsigned int nfsd_fh_hashval(struct knfsd_fh *fh)
-{
- return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0);
-}
-
-static unsigned int file_hashval(struct knfsd_fh *fh)
-{
- return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1);
-}
-
-static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
+static struct hlist_head file_hashtbl[NFSD_FILE_HASH_SIZE];

static void
__nfs4_file_get_access(struct nfs4_file *fp, u32 access)
--
2.4.3


2015-07-30 13:52:41

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 9/9] nfsd: allow the file cache expire time to be tunable

This is primarily for testing purposes, but this may also be useful
until we have a better idea for a sensible default here.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 55930995a1ec..0f0272cc3683 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -8,6 +8,7 @@
#include <linux/hash.h>
#include <linux/file.h>
#include <linux/sched.h>
+#include <linux/module.h>

#include "vfs.h"
#include "nfsd.h"
@@ -16,8 +17,10 @@

#define NFSDDBG_FACILITY NFSDDBG_VFS

-/* Min time we should keep around a file cache entry */
-#define NFSD_FILE_EXPIRE (HZ)
+/* Min time we should keep around a file cache entry (in jiffies) */
+static unsigned int nfsd_file_cache_expiry = HZ;
+module_param(nfsd_file_cache_expiry, uint, 0644);
+MODULE_PARM_DESC(nfsd_file_cache_expiry, "Expire time for open file cache (in jiffies)");

/* We only care about NFSD_MAY_READ/WRITE for this cache */
#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
@@ -40,7 +43,7 @@ nfsd_file_count_inc(void)
{
if (atomic_inc_return(&nfsd_file_count) == 1)
queue_delayed_work(nfsd_laundry_wq, &nfsd_file_cache_clean_work,
- NFSD_FILE_EXPIRE);
+ nfsd_file_cache_expiry);
}

static void
@@ -169,7 +172,8 @@ nfsd_file_cache_prune(void)
continue;

/* Was this file touched recently? */
- if (time_before(nf->nf_time + NFSD_FILE_EXPIRE, jiffies))
+ if (time_before(nf->nf_time + nfsd_file_cache_expiry,
+ jiffies))
continue;

/* Ok, it's expired...unhash it */
@@ -193,7 +197,7 @@ nfsd_file_cache_cleaner(struct work_struct *work)

if (atomic_read(&nfsd_file_count))
queue_delayed_work(nfsd_laundry_wq, &nfsd_file_cache_clean_work,
- NFSD_FILE_EXPIRE);
+ nfsd_file_cache_expiry);
}

int
--
2.4.3


2015-07-30 13:52:35

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 3/9] nfsd: convert laundry_wq to something less nfsd4 specific

Currently, nfsd uses a singlethread workqueue for this, but that's
probably not necessary. If we have multiple namespaces, then there's no
need to serialize the laundromat runs. They can run in parallel.

Also, create_singlethread_workqueue adds the WQ_MEM_RECLAIM flag, which
doesn't really seem to be necessary. The laundromat jobs are always
kicked off via a timer, and not from memory reclaim paths. There's no
need for a rescuer thread.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 23 +++++------------------
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfssvc.c | 14 +++++++++++++-
3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 79795c898dd1..977a8aee9122 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4376,7 +4376,6 @@ nfs4_laundromat(struct nfsd_net *nn)
return new_timeo;
}

-static struct workqueue_struct *laundry_wq;
static void laundromat_main(struct work_struct *);

static void
@@ -4390,7 +4389,7 @@ laundromat_main(struct work_struct *laundry)

t = nfs4_laundromat(nn);
dprintk("NFSD: laundromat_main - sleeping for %ld seconds\n", t);
- queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ);
+ queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work, t*HZ);
}

static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
@@ -6587,7 +6586,8 @@ nfs4_state_start_net(struct net *net)
nfsd4_client_tracking_init(net);
printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
nn->nfsd4_grace, net);
- queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
+ queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work,
+ nn->nfsd4_grace * HZ);
return 0;
}

@@ -6601,22 +6601,10 @@ nfs4_state_start(void)
ret = set_callback_cred();
if (ret)
return -ENOMEM;
- laundry_wq = create_singlethread_workqueue("nfsd4");
- if (laundry_wq == NULL) {
- ret = -ENOMEM;
- goto out_recovery;
- }
ret = nfsd4_create_callback_queue();
- if (ret)
- goto out_free_laundry;
-
- set_max_delegations();
-
- return 0;
+ if (!ret)
+ set_max_delegations();

-out_free_laundry:
- destroy_workqueue(laundry_wq);
-out_recovery:
return ret;
}

@@ -6653,7 +6641,6 @@ nfs4_state_shutdown_net(struct net *net)
void
nfs4_state_shutdown(void)
{
- destroy_workqueue(laundry_wq);
nfsd4_destroy_callback_queue();
}

diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index cf980523898b..0199415344ff 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -62,6 +62,7 @@ struct readdir_cd {
extern struct svc_program nfsd_program;
extern struct svc_version nfsd_version2, nfsd_version3,
nfsd_version4;
+extern struct workqueue_struct *nfsd_laundry_wq;
extern struct mutex nfsd_mutex;
extern spinlock_t nfsd_drc_lock;
extern unsigned long nfsd_drc_max_mem;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index ad4e2377dd63..ced9944201a0 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -17,6 +17,7 @@
#include <linux/lockd/bind.h>
#include <linux/nfsacl.h>
#include <linux/seq_file.h>
+#include <linux/workqueue.h>
#include <net/net_namespace.h>
#include "nfsd.h"
#include "cache.h"
@@ -28,6 +29,9 @@
extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);

+/* A workqueue for nfsd-related cleanup tasks */
+struct workqueue_struct *nfsd_laundry_wq;
+
/*
* nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members
* of the svc_serv struct. In particular, ->sv_nrthreads but also to some
@@ -224,11 +228,19 @@ static int nfsd_startup_generic(int nrservs)
if (ret)
goto dec_users;

+ ret = -ENOMEM;
+ nfsd_laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd-laundry");
+ if (!nfsd_laundry_wq)
+ goto out_racache;
+
ret = nfs4_state_start();
if (ret)
- goto out_racache;
+ goto out_wq;
return 0;

+out_wq:
+ destroy_workqueue(nfsd_laundry_wq);
+ nfsd_laundry_wq = NULL;
out_racache:
nfsd_racache_shutdown();
dec_users:
--
2.4.3


2015-07-30 13:52:36

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 4/9] nfsd: add a new struct file caching facility to nfsd

Currently, NFSv2/3 reads and writes have to open a file, do the read or
write and then close it again for each RPC. This is highly inefficient,
especially when the underlying filesystem has a relatively slow open
routine.

This patch adds a new open file cache to knfsd. Rather than doing an
open for each RPC, the read/write handlers can call into this cache to
see if there is one already there for the correct filehandle and
NFS_MAY_READ/WRITE flags.

If there isn't an entry, then we create a new one and attempt to
perform the open. If there is, then we wait until the entry is fully
instantiated and return it if it is at the end of the wait. If it's
not, then we attempt to take over construction.

Since the main goal is to speed up NFSv2/3 I/O, we don't want to
close these files on last put of these objects. We need to keep them
around for a little while since we never know when the next READ/WRITE
will come in.

Cache entries have a hardcoded 1s timeout, and we have a recurring
workqueue job that walks the cache and purges any entries that have
expired.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/Makefile | 3 +-
fs/nfsd/filecache.c | 333 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.h | 21 ++++
fs/nfsd/nfssvc.c | 10 +-
4 files changed, 365 insertions(+), 2 deletions(-)
create mode 100644 fs/nfsd/filecache.c

diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 9a6028e120c6..8908bb467727 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -10,7 +10,8 @@ obj-$(CONFIG_NFSD) += nfsd.o
nfsd-y += trace.o

nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
- export.o auth.o lockd.o nfscache.o nfsxdr.o stats.o
+ export.o auth.o lockd.o nfscache.o nfsxdr.o \
+ stats.o filecache.o
nfsd-$(CONFIG_NFSD_FAULT_INJECTION) += fault_inject.o
nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
new file mode 100644
index 000000000000..564b4eec0350
--- /dev/null
+++ b/fs/nfsd/filecache.c
@@ -0,0 +1,333 @@
+/*
+ * Open file cache.
+ *
+ * (c) 2015 - Jeff Layton <[email protected]>
+ */
+
+#include <linux/slab.h>
+#include <linux/hash.h>
+#include <linux/file.h>
+#include <linux/sched.h>
+
+#include "vfs.h"
+#include "nfsd.h"
+#include "nfsfh.h"
+#include "filecache.h"
+
+#define NFSDDBG_FACILITY NFSDDBG_VFS
+
+/* Min time we should keep around a file cache entry */
+#define NFSD_FILE_EXPIRE (HZ)
+
+/* We only care about NFSD_MAY_READ/WRITE for this cache */
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+
+struct nfsd_fcache_bucket {
+ struct hlist_head nfb_head;
+ spinlock_t nfb_lock;
+};
+
+static struct nfsd_fcache_bucket *nfsd_file_hashtbl;
+
+/* Count of hashed nfsd_file objects */
+static atomic_t nfsd_file_count;
+
+/* Periodic job for cleaning nfsd_file cache */
+static struct delayed_work nfsd_file_cache_clean_work;
+
+static void
+nfsd_file_count_inc(void)
+{
+ if (atomic_inc_return(&nfsd_file_count) == 1)
+ queue_delayed_work(nfsd_laundry_wq, &nfsd_file_cache_clean_work,
+ NFSD_FILE_EXPIRE);
+}
+
+static void
+nfsd_file_count_dec(void)
+{
+ if (atomic_dec_and_test(&nfsd_file_count))
+ cancel_delayed_work(&nfsd_file_cache_clean_work);
+}
+
+static struct nfsd_file *
+nfsd_file_alloc(struct knfsd_fh *fh, unsigned int may, unsigned int hashval)
+{
+ struct nfsd_file *nf;
+
+ /* FIXME: create a new slabcache for these? */
+ nf = kzalloc(sizeof(*nf), GFP_KERNEL);
+ if (nf) {
+ INIT_HLIST_NODE(&nf->nf_node);
+ INIT_LIST_HEAD(&nf->nf_dispose);
+ nf->nf_time = jiffies;
+ fh_copy_shallow(&nf->nf_handle, fh);
+ nf->nf_hashval = hashval;
+ atomic_set(&nf->nf_ref, 1);
+ nf->nf_may = NFSD_FILE_MAY_MASK & may;
+ }
+ return nf;
+}
+
+static void
+nfsd_file_put_final(struct nfsd_file *nf)
+{
+ if (nf->nf_file)
+ fput(nf->nf_file);
+ kfree_rcu(nf, nf_rcu);
+}
+
+static void
+nfsd_file_unhash(struct nfsd_file *nf)
+{
+ if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+ hlist_del_rcu(&nf->nf_node);
+ nfsd_file_count_dec();
+ }
+}
+
+static void
+nfsd_file_put_locked(struct nfsd_file *nf, struct list_head *dispose)
+{
+ if (!atomic_dec_and_test(&nf->nf_ref)) {
+ nf->nf_time = jiffies;
+ return;
+ }
+
+ nfsd_file_unhash(nf);
+ list_add(&nf->nf_dispose, dispose);
+}
+
+void
+nfsd_file_put(struct nfsd_file *nf)
+{
+ if (!atomic_dec_and_lock(&nf->nf_ref,
+ &nfsd_file_hashtbl[nf->nf_hashval].nfb_lock)) {
+ nf->nf_time = jiffies;
+ return;
+ }
+
+ nfsd_file_unhash(nf);
+ spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+ nfsd_file_put_final(nf);
+}
+
+static void
+nfsd_file_dispose_list(struct list_head *dispose)
+{
+ struct nfsd_file *nf;
+
+ while(!list_empty(dispose)) {
+ nf = list_first_entry(dispose, struct nfsd_file, nf_dispose);
+ list_del(&nf->nf_dispose);
+ nfsd_file_put_final(nf);
+ }
+}
+
+static void
+nfsd_file_cache_prune(void)
+{
+ unsigned int i;
+ struct nfsd_file *nf;
+ struct hlist_node *tmp;
+ LIST_HEAD(dispose);
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ if (hlist_empty(&nfsd_file_hashtbl[i].nfb_head))
+ continue;
+
+ spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+ hlist_for_each_entry_safe(nf, tmp,
+ &nfsd_file_hashtbl[i].nfb_head, nf_node) {
+
+ /* does someone else have a reference? */
+ if (atomic_read(&nf->nf_ref) > 1)
+ continue;
+
+ /* Was this file touched recently? */
+ if (time_before(nf->nf_time + NFSD_FILE_EXPIRE, jiffies))
+ continue;
+
+ /* Ok, it's expired...unhash it */
+ nfsd_file_unhash(nf);
+
+ /* ...and put the hash reference */
+ nfsd_file_put_locked(nf, &dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+ nfsd_file_dispose_list(&dispose);
+ }
+}
+
+static void
+nfsd_file_cache_cleaner(struct work_struct *work)
+{
+ if (!atomic_read(&nfsd_file_count))
+ return;
+
+ nfsd_file_cache_prune();
+
+ if (atomic_read(&nfsd_file_count))
+ queue_delayed_work(nfsd_laundry_wq, &nfsd_file_cache_clean_work,
+ NFSD_FILE_EXPIRE);
+}
+
+int
+nfsd_file_cache_init(void)
+{
+ unsigned int i;
+
+ if (nfsd_file_hashtbl)
+ return 0;
+
+ nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
+ sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
+ if (!nfsd_file_hashtbl)
+ goto out_nomem;
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head);
+ spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock);
+ }
+
+ INIT_DELAYED_WORK(&nfsd_file_cache_clean_work, nfsd_file_cache_cleaner);
+ return 0;
+out_nomem:
+ printk(KERN_ERR "nfsd: failed to init nfsd file cache\n");
+ return -ENOMEM;
+}
+
+void
+nfsd_file_cache_shutdown(void)
+{
+ unsigned int i;
+ struct nfsd_file *nf;
+ LIST_HEAD(dispose);
+
+ cancel_delayed_work_sync(&nfsd_file_cache_clean_work);
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+ while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
+ nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
+ struct nfsd_file, nf_node);
+ nfsd_file_unhash(nf);
+ /* put the hash reference */
+ nfsd_file_put_locked(nf, &dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+ nfsd_file_dispose_list(&dispose);
+ }
+ kfree(nfsd_file_hashtbl);
+ nfsd_file_hashtbl = NULL;
+}
+
+/*
+ * Search nfsd_file_hashtbl[] for file. We hash on the filehandle and also on
+ * the NFSD_MAY_READ/WRITE flags. If the file is open for r/w, then it's usable
+ * for either.
+ */
+static struct nfsd_file *
+nfsd_file_find_locked(struct knfsd_fh *fh, unsigned int may_flags,
+ unsigned int hashval)
+{
+ struct nfsd_file *nf;
+ unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
+
+ hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+ nf_node) {
+ if ((need & nf->nf_may) != need)
+ continue;
+ if (fh_match(&nf->nf_handle, fh)) {
+ if (atomic_inc_not_zero(&nf->nf_ref))
+ return nf;
+ }
+ }
+ return NULL;
+}
+
+__be32
+nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **pnf)
+{
+ __be32 status = nfs_ok;
+ struct nfsd_file *nf, *new = NULL;
+ struct knfsd_fh *fh = &fhp->fh_handle;
+ unsigned int hashval = file_hashval(fh);
+
+ /* Mask off any extraneous bits */
+ may_flags &= NFSD_FILE_MAY_MASK;
+retry:
+ rcu_read_lock();
+ nf = nfsd_file_find_locked(fh, may_flags, hashval);
+ rcu_read_unlock();
+ if (nf)
+ goto wait_for_construction;
+
+ if (!new) {
+ new = nfsd_file_alloc(&fhp->fh_handle, may_flags, hashval);
+ if (!new)
+ return nfserr_jukebox;
+ }
+
+ spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ nf = nfsd_file_find_locked(fh, may_flags, hashval);
+ if (likely(nf == NULL)) {
+ /* Take reference for the hashtable */
+ atomic_inc(&new->nf_ref);
+ __set_bit(NFSD_FILE_HASHED, &new->nf_flags);
+ __set_bit(NFSD_FILE_PENDING, &new->nf_flags);
+ hlist_add_head_rcu(&new->nf_node,
+ &nfsd_file_hashtbl[hashval].nfb_head);
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ nfsd_file_count_inc();
+ nf = new;
+ new = NULL;
+ goto open_file;
+ }
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+
+wait_for_construction:
+ wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
+
+ /* Did construction of this file fail? */
+ if (!nf->nf_file) {
+ /*
+ * We can only take over construction for this nfsd_file if the
+ * MAY flags are equal. Otherwise, we put the reference and try
+ * again.
+ */
+ if (may_flags != nf->nf_may) {
+ nfsd_file_put(nf);
+ goto retry;
+ }
+
+ /* try to take over construction for this file */
+ if (test_and_set_bit(NFSD_FILE_PENDING, &nf->nf_flags))
+ goto wait_for_construction;
+ goto open_file;
+ }
+
+ /*
+ * We have a file that was opened in the context of another rqst. We
+ * must check permissions. Since we're dealing with open files here,
+ * we always want to set the OWNER_OVERRIDE bit.
+ */
+ status = fh_verify(rqstp, fhp, S_IFREG, may_flags);
+ if (status == nfs_ok)
+ status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
+ may_flags|NFSD_MAY_OWNER_OVERRIDE);
+out:
+ if (status == nfs_ok)
+ *pnf = nf;
+ else
+ nfsd_file_put(nf);
+
+ if (new)
+ nfsd_file_put(new);
+ return status;
+open_file:
+ status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
+ clear_bit(NFSD_FILE_PENDING, &nf->nf_flags);
+ wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
+ goto out;
+}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 9051ee54faa3..adf7e78b8e43 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -4,6 +4,7 @@
#include <linux/jhash.h>
#include <linux/sunrpc/xdr.h>

+#include "nfsfh.h"
#include "export.h"

/* hash table for nfs4_file */
@@ -22,4 +23,24 @@ file_hashval(struct knfsd_fh *fh)
return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
}

+struct nfsd_file {
+ struct hlist_node nf_node;
+ struct list_head nf_dispose;
+ struct rcu_head nf_rcu;
+ struct file *nf_file;
+ unsigned long nf_time;
+#define NFSD_FILE_HASHED (0)
+#define NFSD_FILE_PENDING (1)
+ unsigned long nf_flags;
+ struct knfsd_fh nf_handle;
+ unsigned int nf_hashval;
+ atomic_t nf_ref;
+ unsigned char nf_may;
+};
+
+int nfsd_file_cache_init(void);
+void nfsd_file_cache_shutdown(void);
+void nfsd_file_put(struct nfsd_file *nf);
+__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **nfp);
#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index ced9944201a0..0572441e23ec 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -23,6 +23,7 @@
#include "cache.h"
#include "vfs.h"
#include "netns.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_SVC

@@ -233,11 +234,17 @@ static int nfsd_startup_generic(int nrservs)
if (!nfsd_laundry_wq)
goto out_racache;

- ret = nfs4_state_start();
+ ret = nfsd_file_cache_init();
if (ret)
goto out_wq;
+
+ ret = nfs4_state_start();
+ if (ret)
+ goto out_nfsd_file;
return 0;

+out_nfsd_file:
+ nfsd_file_cache_shutdown();
out_wq:
destroy_workqueue(nfsd_laundry_wq);
nfsd_laundry_wq = NULL;
@@ -254,6 +261,7 @@ static void nfsd_shutdown_generic(void)
return;

nfs4_state_shutdown();
+ nfsd_file_cache_shutdown();
nfsd_racache_shutdown();
}

--
2.4.3


2015-07-30 13:52:38

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 6/9] nfsd: hook up nfsd_read to the nfsd_file cache

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/vfs.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 59234d1d8d8e..fd688c86af66 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -980,20 +980,14 @@ out_nfserr:
__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
{
- struct file *file;
- struct raparms *ra;
- __be32 err;
-
- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
- if (err)
- return err;
-
- ra = nfsd_init_raparms(file);
- err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
- if (ra)
- nfsd_put_raparams(file, ra);
- fput(file);
+ __be32 err;
+ struct nfsd_file *nf;

+ err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
+ if (err == nfs_ok)
+ err = nfsd_vfs_read(rqstp, nf->nf_file, offset, vec, vlen,
+ count);
+ nfsd_file_put(nf);
return err;
}

--
2.4.3


2015-07-30 13:52:37

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 5/9] nfsd: hook up nfsd_write to the new nfsd_file cache

Note that all callers currently pass in NULL for "file" anyway, so
there was already some dead code in here. Just eliminate that parm
and have it use the file cache instead of dealing directly with a
filp.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfsproc.c | 2 +-
fs/nfsd/vfs.c | 34 +++++++++++-----------------------
fs/nfsd/vfs.h | 2 +-
4 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 7b755b7f785c..4e46ac511479 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -192,7 +192,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp, struct nfsd3_writeargs *argp,

fh_copy(&resp->fh, &argp->fh);
resp->committed = argp->stable;
- nfserr = nfsd_write(rqstp, &resp->fh, NULL,
+ nfserr = nfsd_write(rqstp, &resp->fh,
argp->offset,
rqstp->rq_vec, argp->vlen,
&cnt,
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 4cd78ef4c95c..9893095cbee1 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -213,7 +213,7 @@ nfsd_proc_write(struct svc_rqst *rqstp, struct nfsd_writeargs *argp,
SVCFH_fmt(&argp->fh),
argp->len, argp->offset);

- nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
+ nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh),
argp->offset,
rqstp->rq_vec, argp->vlen,
&cnt,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index b5e077a6e7d4..59234d1d8d8e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -42,6 +42,7 @@

#include "nfsd.h"
#include "vfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_FILEOP

@@ -1002,30 +1003,17 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
* N.B. After this call fhp needs an fh_put
*/
__be32
-nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
- loff_t offset, struct kvec *vec, int vlen, unsigned long *cnt,
- int *stablep)
+nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
+ struct kvec *vec, int vlen, unsigned long *cnt, int *stablep)
{
- __be32 err = 0;
-
- if (file) {
- err = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
- NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE);
- if (err)
- goto out;
- err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt,
- stablep);
- } else {
- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
- if (err)
- goto out;
-
- if (cnt)
- err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen,
- cnt, stablep);
- fput(file);
- }
-out:
+ __be32 err;
+ struct nfsd_file *nf;
+
+ err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
+ if (err == nfs_ok)
+ err = nfsd_vfs_write(rqstp, fhp, nf->nf_file, offset, vec,
+ vlen, cnt, stablep);
+ nfsd_file_put(nf);
return err;
}

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 5be875e3e638..78b5527cba93 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -78,7 +78,7 @@ __be32 nfsd_readv(struct file *, loff_t, struct kvec *, int,
unsigned long *);
__be32 nfsd_read(struct svc_rqst *, struct svc_fh *,
loff_t, struct kvec *, int, unsigned long *);
-__be32 nfsd_write(struct svc_rqst *, struct svc_fh *,struct file *,
+__be32 nfsd_write(struct svc_rqst *, struct svc_fh *,
loff_t, struct kvec *,int, unsigned long *, int *);
__be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file, loff_t offset,
--
2.4.3


2015-07-30 13:53:55

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid

On Thu, 30 Jul 2015 09:52:12 -0400
Jeff Layton <[email protected]> wrote:

> Currently, preprocess_stateid_op calls nfs4_check_olstateid which
> verifies that the open stateid corresponds to the current_fh in the
> call by calling nfs4_check_fh.
>
> If the stateid is a NFS4_DELEG_STID however, then no such check is
> done. Move the call to nfs4_check_fh into nfs4_check_file instead
> so that it can be done for all stateid types.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index cd8c33186e26..75f617a052cf 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4406,9 +4406,9 @@ laundromat_main(struct work_struct *laundry)
> queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ);
> }
>
> -static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_ol_stateid *stp)
> +static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
> {
> - if (!fh_match(&fhp->fh_handle, &stp->st_stid.sc_file->fi_fhandle))
> + if (!fh_match(&fhp->fh_handle, &stp->sc_file->fi_fhandle))
> return nfserr_bad_stateid;
> return nfs_ok;
> }
> @@ -4611,9 +4611,6 @@ nfs4_check_olstateid(struct svc_fh *fhp, struct nfs4_ol_stateid *ols, int flags)
> {
> __be32 status;
>
> - status = nfs4_check_fh(fhp, ols);
> - if (status)
> - return status;
> status = nfsd4_check_openowner_confirmed(ols);
> if (status)
> return status;
> @@ -4628,6 +4625,10 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
> struct file *file;
> __be32 status;
>
> + status = nfs4_check_fh(fhp, s);
> + if (status)
> + return status;
> +
> file = nfs4_find_file(s, flags);
> if (file) {
> status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
> @@ -4808,7 +4809,7 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_
> status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate));
> if (status)
> return status;
> - return nfs4_check_fh(current_fh, stp);
> + return nfs4_check_fh(current_fh, &stp->st_stid);
> }
>
> /*

Doh! Sorry -- I had already sent this separately but forgot to clean
out the directory before running git-send-email. It's identical though
to the one that I sent earlier today.

--
Jeff Layton <[email protected]>

2015-07-30 15:51:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid

> return status;
> @@ -4628,6 +4625,10 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
> struct file *file;
> __be32 status;
>
> + status = nfs4_check_fh(fhp, s);
> + if (status)
> + return status;
> +

This means we check the file handle for all stateids now, not just
open and lock stateids. That seems reasonable to me but should be
mentioned in the changelog.

2015-07-30 16:20:54

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid

On Thu, 30 Jul 2015 08:51:35 -0700
Christoph Hellwig <[email protected]> wrote:

> > return status;
> > @@ -4628,6 +4625,10 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
> > struct file *file;
> > __be32 status;
> >
> > + status = nfs4_check_fh(fhp, s);
> > + if (status)
> > + return status;
> > +
>
> This means we check the file handle for all stateids now, not just
> open and lock stateids. That seems reasonable to me but should be
> mentioned in the changelog.

This code is only called from nfs4_preprocess_stateid_op (which I
typoed in the changelog -- maybe Bruce can fix that). Anything other
than an open, lock or delegation stateid is explicitly rejected before
this point.

So, this just adds this check to delegation stateids (which is
necessary I think). That is mentioned in the changelog though. Do you
think it needs more elaboration or is that sufficient?

--
Jeff Layton <[email protected]>

2015-07-30 16:34:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid

On Thu, Jul 30, 2015 at 12:20:48PM -0400, Jeff Layton wrote:
> So, this just adds this check to delegation stateids (which is
> necessary I think). That is mentioned in the changelog though. Do you
> think it needs more elaboration or is that sufficient?

No, I'm just thick today. The patch looks fine as-is!

2015-07-31 21:32:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/9] nfsd: convert laundry_wq to something less nfsd4 specific

On Thu, Jul 30, 2015 at 09:52:15AM -0400, Jeff Layton wrote:
> Currently, nfsd uses a singlethread workqueue for this, but that's
> probably not necessary. If we have multiple namespaces, then there's no
> need to serialize the laundromat runs. They can run in parallel.

Running a singled-threaded workqueue for each namespace is one thing,
but if we're allowing multiple work items for one namespace to run
simultaneously then this would need more of an argument.

However... I guess we only requeue this from the end of the work
itself, so there's no risk of concurrent execution here, OK, fine.

--b.

>
> Also, create_singlethread_workqueue adds the WQ_MEM_RECLAIM flag, which
> doesn't really seem to be necessary. The laundromat jobs are always
> kicked off via a timer, and not from memory reclaim paths. There's no
> need for a rescuer thread.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 23 +++++------------------
> fs/nfsd/nfsd.h | 1 +
> fs/nfsd/nfssvc.c | 14 +++++++++++++-
> 3 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 79795c898dd1..977a8aee9122 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4376,7 +4376,6 @@ nfs4_laundromat(struct nfsd_net *nn)
> return new_timeo;
> }
>
> -static struct workqueue_struct *laundry_wq;
> static void laundromat_main(struct work_struct *);
>
> static void
> @@ -4390,7 +4389,7 @@ laundromat_main(struct work_struct *laundry)
>
> t = nfs4_laundromat(nn);
> dprintk("NFSD: laundromat_main - sleeping for %ld seconds\n", t);
> - queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ);
> + queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work, t*HZ);
> }
>
> static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
> @@ -6587,7 +6586,8 @@ nfs4_state_start_net(struct net *net)
> nfsd4_client_tracking_init(net);
> printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
> nn->nfsd4_grace, net);
> - queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
> + queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work,
> + nn->nfsd4_grace * HZ);
> return 0;
> }
>
> @@ -6601,22 +6601,10 @@ nfs4_state_start(void)
> ret = set_callback_cred();
> if (ret)
> return -ENOMEM;
> - laundry_wq = create_singlethread_workqueue("nfsd4");
> - if (laundry_wq == NULL) {
> - ret = -ENOMEM;
> - goto out_recovery;
> - }
> ret = nfsd4_create_callback_queue();
> - if (ret)
> - goto out_free_laundry;
> -
> - set_max_delegations();
> -
> - return 0;
> + if (!ret)
> + set_max_delegations();
>
> -out_free_laundry:
> - destroy_workqueue(laundry_wq);
> -out_recovery:
> return ret;
> }
>
> @@ -6653,7 +6641,6 @@ nfs4_state_shutdown_net(struct net *net)
> void
> nfs4_state_shutdown(void)
> {
> - destroy_workqueue(laundry_wq);
> nfsd4_destroy_callback_queue();
> }
>
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index cf980523898b..0199415344ff 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -62,6 +62,7 @@ struct readdir_cd {
> extern struct svc_program nfsd_program;
> extern struct svc_version nfsd_version2, nfsd_version3,
> nfsd_version4;
> +extern struct workqueue_struct *nfsd_laundry_wq;
> extern struct mutex nfsd_mutex;
> extern spinlock_t nfsd_drc_lock;
> extern unsigned long nfsd_drc_max_mem;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index ad4e2377dd63..ced9944201a0 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -17,6 +17,7 @@
> #include <linux/lockd/bind.h>
> #include <linux/nfsacl.h>
> #include <linux/seq_file.h>
> +#include <linux/workqueue.h>
> #include <net/net_namespace.h>
> #include "nfsd.h"
> #include "cache.h"
> @@ -28,6 +29,9 @@
> extern struct svc_program nfsd_program;
> static int nfsd(void *vrqstp);
>
> +/* A workqueue for nfsd-related cleanup tasks */
> +struct workqueue_struct *nfsd_laundry_wq;
> +
> /*
> * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members
> * of the svc_serv struct. In particular, ->sv_nrthreads but also to some
> @@ -224,11 +228,19 @@ static int nfsd_startup_generic(int nrservs)
> if (ret)
> goto dec_users;
>
> + ret = -ENOMEM;
> + nfsd_laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd-laundry");
> + if (!nfsd_laundry_wq)
> + goto out_racache;
> +
> ret = nfs4_state_start();
> if (ret)
> - goto out_racache;
> + goto out_wq;
> return 0;
>
> +out_wq:
> + destroy_workqueue(nfsd_laundry_wq);
> + nfsd_laundry_wq = NULL;
> out_racache:
> nfsd_racache_shutdown();
> dec_users:
> --
> 2.4.3

2015-07-31 22:27:58

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 3/9] nfsd: convert laundry_wq to something less nfsd4 specific

On Fri, 31 Jul 2015 17:32:50 -0400
"J. Bruce Fields" <[email protected]> wrote:

> On Thu, Jul 30, 2015 at 09:52:15AM -0400, Jeff Layton wrote:
> > Currently, nfsd uses a singlethread workqueue for this, but that's
> > probably not necessary. If we have multiple namespaces, then there's no
> > need to serialize the laundromat runs. They can run in parallel.
>
> Running a singled-threaded workqueue for each namespace is one thing,
> but if we're allowing multiple work items for one namespace to run
> simultaneously then this would need more of an argument.
>
> However... I guess we only requeue this from the end of the work
> itself, so there's no risk of concurrent execution here, OK, fine.
>
> --b.
>

Yes, it should be safe, given that. I should have made it clear in the
changelog. I'll update that for the next respin.

> >
> > Also, create_singlethread_workqueue adds the WQ_MEM_RECLAIM flag, which
> > doesn't really seem to be necessary. The laundromat jobs are always
> > kicked off via a timer, and not from memory reclaim paths. There's no
> > need for a rescuer thread.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 23 +++++------------------
> > fs/nfsd/nfsd.h | 1 +
> > fs/nfsd/nfssvc.c | 14 +++++++++++++-
> > 3 files changed, 19 insertions(+), 19 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 79795c898dd1..977a8aee9122 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -4376,7 +4376,6 @@ nfs4_laundromat(struct nfsd_net *nn)
> > return new_timeo;
> > }
> >
> > -static struct workqueue_struct *laundry_wq;
> > static void laundromat_main(struct work_struct *);
> >
> > static void
> > @@ -4390,7 +4389,7 @@ laundromat_main(struct work_struct *laundry)
> >
> > t = nfs4_laundromat(nn);
> > dprintk("NFSD: laundromat_main - sleeping for %ld seconds\n", t);
> > - queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ);
> > + queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work, t*HZ);
> > }
> >
> > static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
> > @@ -6587,7 +6586,8 @@ nfs4_state_start_net(struct net *net)
> > nfsd4_client_tracking_init(net);
> > printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
> > nn->nfsd4_grace, net);
> > - queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
> > + queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work,
> > + nn->nfsd4_grace * HZ);
> > return 0;
> > }
> >
> > @@ -6601,22 +6601,10 @@ nfs4_state_start(void)
> > ret = set_callback_cred();
> > if (ret)
> > return -ENOMEM;
> > - laundry_wq = create_singlethread_workqueue("nfsd4");
> > - if (laundry_wq == NULL) {
> > - ret = -ENOMEM;
> > - goto out_recovery;
> > - }
> > ret = nfsd4_create_callback_queue();
> > - if (ret)
> > - goto out_free_laundry;
> > -
> > - set_max_delegations();
> > -
> > - return 0;
> > + if (!ret)
> > + set_max_delegations();
> >
> > -out_free_laundry:
> > - destroy_workqueue(laundry_wq);
> > -out_recovery:
> > return ret;
> > }
> >
> > @@ -6653,7 +6641,6 @@ nfs4_state_shutdown_net(struct net *net)
> > void
> > nfs4_state_shutdown(void)
> > {
> > - destroy_workqueue(laundry_wq);
> > nfsd4_destroy_callback_queue();
> > }
> >
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index cf980523898b..0199415344ff 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -62,6 +62,7 @@ struct readdir_cd {
> > extern struct svc_program nfsd_program;
> > extern struct svc_version nfsd_version2, nfsd_version3,
> > nfsd_version4;
> > +extern struct workqueue_struct *nfsd_laundry_wq;
> > extern struct mutex nfsd_mutex;
> > extern spinlock_t nfsd_drc_lock;
> > extern unsigned long nfsd_drc_max_mem;
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index ad4e2377dd63..ced9944201a0 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -17,6 +17,7 @@
> > #include <linux/lockd/bind.h>
> > #include <linux/nfsacl.h>
> > #include <linux/seq_file.h>
> > +#include <linux/workqueue.h>
> > #include <net/net_namespace.h>
> > #include "nfsd.h"
> > #include "cache.h"
> > @@ -28,6 +29,9 @@
> > extern struct svc_program nfsd_program;
> > static int nfsd(void *vrqstp);
> >
> > +/* A workqueue for nfsd-related cleanup tasks */
> > +struct workqueue_struct *nfsd_laundry_wq;
> > +
> > /*
> > * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members
> > * of the svc_serv struct. In particular, ->sv_nrthreads but also to some
> > @@ -224,11 +228,19 @@ static int nfsd_startup_generic(int nrservs)
> > if (ret)
> > goto dec_users;
> >
> > + ret = -ENOMEM;
> > + nfsd_laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd-laundry");
> > + if (!nfsd_laundry_wq)
> > + goto out_racache;
> > +
> > ret = nfs4_state_start();
> > if (ret)
> > - goto out_racache;
> > + goto out_wq;
> > return 0;
> >
> > +out_wq:
> > + destroy_workqueue(nfsd_laundry_wq);
> > + nfsd_laundry_wq = NULL;
> > out_racache:
> > nfsd_racache_shutdown();
> > dec_users:
> > --
> > 2.4.3


--
Jeff Layton <[email protected]>

2015-08-05 21:13:43

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 00/18] nfsd: open file caching for v2/3

v2:
- changelog cleanups and clarifications
- allow COMMIT to use cached open files
- tracepoints for nfsd_file cache
- proactively close open files prior to REMOVE, or a RENAME over a
positive dentry

This is the v2 set of this series. It's a little larger than the v1
series, but adds a bit more functionality.

The last several patches you may be less-inclined to take. Since one of
our use-cases is a reexporter for NFS, we want to avoid triggering
sillyrenames when possible. Closing files proactively prior to a
REMOVE or RENAME is something we want to do. I'm less sure however
whether it's something you'll want to bother with for more general uses.
Your call there.

I'd still like to see this merged for v4.3 if possible.

Original cover letter follows:

---------------------[snip]------------------------

Hi Bruce!

This patchset adds a new open file cache for knfsd. As you well know,
nfsd basically does an open() - read/write() - close() cycle for every
nfsv3 READ or WRITE. It's also common for clients to "spray" several
read and write requests in parallel or in quick succession, so we could
skip a lot of that by simply caching these open filps.

The idea here is to cache them in a hashtable for a little while (1s by
default) in the expectation that clients may try to issue more reads or
writes in quick succession. When there are any entries in the hashtable,
there is a recurring workqueue job that will clean the cache.

I've also added some hooks into sunrpc cache code that should allow us
to purge the cache on an unexport event, so this shouldn't cause any
problems with unmounting once you've unexported the fs.

I did a little testing with it, but my test rig is pretty slow, and I
couldn't measure much of a performance difference on a bog standard
local fs.

We do have some patches that allow the reexporting of NFSv4.1 via knfsd.
Since NFS has a relatively slow open routine, this provides a rather
large speedup.

Without these patches:
$ dd if=/dev/urandom of=/mnt/dp01/ddfile bs=4k count=256 oflag=direct
256+0 records in
256+0 records out
1048576 bytes (1.0 MB) copied, 54.3109 s, 19.3 kB/s

With these patches:
$ dd if=/dev/urandom of=/mnt/dp01/ddfile bs=4k count=256 oflag=direct
256+0 records in
256+0 records out
1048576 bytes (1.0 MB) copied, 1.05437 s, 995 kB/s

It should also be possible to hook this code up to the nfs4_file too,
but I haven't done that in this set. I'd like to get this in and settled
before we start looking at that, since it'll mean a bit of reengineering
of the NFSv4 code not to pass around struct file pointers.

I'd like to have these considered for the v4.3 merge window if they look
reasonable.

Jeff Layton (18):
nfsd: include linux/nfs4.h in export.h
nfsd: move some file caching helpers to a common header
nfsd: convert laundry_wq to something less nfsd4 specific
nfsd: add a new struct file caching facility to nfsd
nfsd: hook up nfsd_write to the new nfsd_file cache
nfsd: hook up nfsd_read to the nfsd_file cache
sunrpc: add a new cache_detail operation for when a cache is flushed
nfsd: add a ->flush routine to svc_export_cache
nfsd: allow the file cache expire time to be tunable
nfsd: handle NFSD_MAY_NOT_BREAK_LEASE in open file cache
nfsd: hook nfsd_commit up to the nfsd_file cache
nfsd: move include of state.h from trace.c to trace.h
nfsd: add new tracepoints for nfsd_file cache
nfsd: have _fh_update take a knfsd_fh instead of a svc_fh
nfsd: have set_version_and_fsid_type take a knfsd_fh instead of svc_fh
nfsd: add a fh_compose_shallow
nfsd: close cached files prior to a REMOVE or RENAME that would
replace target
nfsd: call flush_delayed_fput from nfsd_file_close_fh

fs/file_table.c | 1 +
fs/nfsd/Makefile | 3 +-
fs/nfsd/export.c | 14 ++
fs/nfsd/export.h | 1 +
fs/nfsd/filecache.c | 451 +++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.h | 51 +++++
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfs4state.c | 40 +---
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 139 ++++++-------
fs/nfsd/nfsfh.h | 2 +
fs/nfsd/nfsproc.c | 2 +-
fs/nfsd/nfssvc.c | 22 ++-
fs/nfsd/trace.c | 2 -
fs/nfsd/trace.h | 126 ++++++++++++
fs/nfsd/vfs.c | 146 ++++++++------
fs/nfsd/vfs.h | 3 +-
include/linux/sunrpc/cache.h | 1 +
net/sunrpc/cache.c | 3 +
19 files changed, 851 insertions(+), 159 deletions(-)
create mode 100644 fs/nfsd/filecache.c
create mode 100644 fs/nfsd/filecache.h

--
2.4.3


2015-08-05 21:13:44

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 01/18] nfsd: include linux/nfs4.h in export.h

export.h refers to the pnfs_layouttype enum, which is defined there.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/export.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 1f52bfcc436f..2e315072bf3f 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -6,6 +6,7 @@

#include <linux/sunrpc/cache.h>
#include <uapi/linux/nfsd/export.h>
+#include <linux/nfs4.h>

struct knfsd_fh;
struct svc_fh;
--
2.4.3


2015-08-05 21:13:44

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 02/18] nfsd: move some file caching helpers to a common header

We'll want to reuse some of this for common open file caching
infrastructure.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.h | 25 +++++++++++++++++++++++++
fs/nfsd/nfs4state.c | 17 ++---------------
2 files changed, 27 insertions(+), 15 deletions(-)
create mode 100644 fs/nfsd/filecache.h

diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
new file mode 100644
index 000000000000..9051ee54faa3
--- /dev/null
+++ b/fs/nfsd/filecache.h
@@ -0,0 +1,25 @@
+#ifndef _FS_NFSD_FILECACHE_H
+#define _FS_NFSD_FILECACHE_H
+
+#include <linux/jhash.h>
+#include <linux/sunrpc/xdr.h>
+
+#include "export.h"
+
+/* hash table for nfs4_file */
+#define NFSD_FILE_HASH_BITS 8
+#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS)
+
+static inline unsigned int
+nfsd_fh_hashval(struct knfsd_fh *fh)
+{
+ return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0);
+}
+
+static inline unsigned int
+file_hashval(struct knfsd_fh *fh)
+{
+ return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
+}
+
+#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 43903abd6189..b3b306bd1830 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -49,6 +49,7 @@

#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_PROC

@@ -381,21 +382,7 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
return ret & OWNER_HASH_MASK;
}

-/* hash table for nfs4_file */
-#define FILE_HASH_BITS 8
-#define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
-
-static unsigned int nfsd_fh_hashval(struct knfsd_fh *fh)
-{
- return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0);
-}
-
-static unsigned int file_hashval(struct knfsd_fh *fh)
-{
- return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1);
-}
-
-static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
+static struct hlist_head file_hashtbl[NFSD_FILE_HASH_SIZE];

static void
__nfs4_file_get_access(struct nfs4_file *fp, u32 access)
--
2.4.3


2015-08-05 21:13:45

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

Currently, nfsd uses a singlethread workqueue for this, but that's not
necessary. If we have multiple namespaces, then there's no need to
serialize the laundromat runs since they are only requeued at the end of
the work itself.

Also, create_singlethread_workqueue adds the WQ_MEM_RECLAIM flag, which
doesn't really seem to be necessary. The laundromat jobs are always
kicked off via a timer, and not from memory reclaim paths. There's no
need for a rescuer thread.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 23 +++++------------------
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfssvc.c | 14 +++++++++++++-
3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b3b306bd1830..c94859122e6f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4366,7 +4366,6 @@ nfs4_laundromat(struct nfsd_net *nn)
return new_timeo;
}

-static struct workqueue_struct *laundry_wq;
static void laundromat_main(struct work_struct *);

static void
@@ -4380,7 +4379,7 @@ laundromat_main(struct work_struct *laundry)

t = nfs4_laundromat(nn);
dprintk("NFSD: laundromat_main - sleeping for %ld seconds\n", t);
- queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ);
+ queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work, t*HZ);
}

static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
@@ -6569,7 +6568,8 @@ nfs4_state_start_net(struct net *net)
nfsd4_client_tracking_init(net);
printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
nn->nfsd4_grace, net);
- queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
+ queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work,
+ nn->nfsd4_grace * HZ);
return 0;
}

@@ -6583,22 +6583,10 @@ nfs4_state_start(void)
ret = set_callback_cred();
if (ret)
return -ENOMEM;
- laundry_wq = create_singlethread_workqueue("nfsd4");
- if (laundry_wq == NULL) {
- ret = -ENOMEM;
- goto out_recovery;
- }
ret = nfsd4_create_callback_queue();
- if (ret)
- goto out_free_laundry;
-
- set_max_delegations();
-
- return 0;
+ if (!ret)
+ set_max_delegations();

-out_free_laundry:
- destroy_workqueue(laundry_wq);
-out_recovery:
return ret;
}

@@ -6635,7 +6623,6 @@ nfs4_state_shutdown_net(struct net *net)
void
nfs4_state_shutdown(void)
{
- destroy_workqueue(laundry_wq);
nfsd4_destroy_callback_queue();
}

diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index cf980523898b..0199415344ff 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -62,6 +62,7 @@ struct readdir_cd {
extern struct svc_program nfsd_program;
extern struct svc_version nfsd_version2, nfsd_version3,
nfsd_version4;
+extern struct workqueue_struct *nfsd_laundry_wq;
extern struct mutex nfsd_mutex;
extern spinlock_t nfsd_drc_lock;
extern unsigned long nfsd_drc_max_mem;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index ad4e2377dd63..ced9944201a0 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -17,6 +17,7 @@
#include <linux/lockd/bind.h>
#include <linux/nfsacl.h>
#include <linux/seq_file.h>
+#include <linux/workqueue.h>
#include <net/net_namespace.h>
#include "nfsd.h"
#include "cache.h"
@@ -28,6 +29,9 @@
extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);

+/* A workqueue for nfsd-related cleanup tasks */
+struct workqueue_struct *nfsd_laundry_wq;
+
/*
* nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members
* of the svc_serv struct. In particular, ->sv_nrthreads but also to some
@@ -224,11 +228,19 @@ static int nfsd_startup_generic(int nrservs)
if (ret)
goto dec_users;

+ ret = -ENOMEM;
+ nfsd_laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd-laundry");
+ if (!nfsd_laundry_wq)
+ goto out_racache;
+
ret = nfs4_state_start();
if (ret)
- goto out_racache;
+ goto out_wq;
return 0;

+out_wq:
+ destroy_workqueue(nfsd_laundry_wq);
+ nfsd_laundry_wq = NULL;
out_racache:
nfsd_racache_shutdown();
dec_users:
--
2.4.3


2015-08-05 21:13:46

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

Currently, NFSv2/3 reads and writes have to open a file, do the read or
write and then close it again for each RPC. This is highly inefficient,
especially when the underlying filesystem has a relatively slow open
routine.

This patch adds a new open file cache to knfsd. Rather than doing an
open for each RPC, the read/write handlers can call into this cache to
see if there is one already there for the correct filehandle and
NFS_MAY_READ/WRITE flags.

If there isn't an entry, then we create a new one and attempt to
perform the open. If there is, then we wait until the entry is fully
instantiated and return it if it is at the end of the wait. If it's
not, then we attempt to take over construction.

Since the main goal is to speed up NFSv2/3 I/O, we don't want to
close these files on last put of these objects. We need to keep them
around for a little while since we never know when the next READ/WRITE
will come in.

Cache entries have a hardcoded 1s timeout, and we have a recurring
workqueue job that walks the cache and purges any entries that have
expired.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/Makefile | 3 +-
fs/nfsd/filecache.c | 333 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.h | 21 ++++
fs/nfsd/nfssvc.c | 10 +-
4 files changed, 365 insertions(+), 2 deletions(-)
create mode 100644 fs/nfsd/filecache.c

diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 9a6028e120c6..8908bb467727 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -10,7 +10,8 @@ obj-$(CONFIG_NFSD) += nfsd.o
nfsd-y += trace.o

nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
- export.o auth.o lockd.o nfscache.o nfsxdr.o stats.o
+ export.o auth.o lockd.o nfscache.o nfsxdr.o \
+ stats.o filecache.o
nfsd-$(CONFIG_NFSD_FAULT_INJECTION) += fault_inject.o
nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
new file mode 100644
index 000000000000..5278b8d9e79a
--- /dev/null
+++ b/fs/nfsd/filecache.c
@@ -0,0 +1,333 @@
+/*
+ * Open file cache.
+ *
+ * (c) 2015 - Jeff Layton <[email protected]>
+ */
+
+#include <linux/slab.h>
+#include <linux/hash.h>
+#include <linux/file.h>
+#include <linux/sched.h>
+
+#include "vfs.h"
+#include "nfsd.h"
+#include "nfsfh.h"
+#include "filecache.h"
+
+#define NFSDDBG_FACILITY NFSDDBG_FH
+
+/* Min time we should keep around a file cache entry */
+#define NFSD_FILE_EXPIRE (HZ)
+
+/* We only care about NFSD_MAY_READ/WRITE for this cache */
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+
+struct nfsd_fcache_bucket {
+ struct hlist_head nfb_head;
+ spinlock_t nfb_lock;
+};
+
+static struct nfsd_fcache_bucket *nfsd_file_hashtbl;
+
+/* Count of hashed nfsd_file objects */
+static atomic_t nfsd_file_count;
+
+/* Periodic job for cleaning nfsd_file cache */
+static struct delayed_work nfsd_file_cache_clean_work;
+
+static void
+nfsd_file_count_inc(void)
+{
+ if (atomic_inc_return(&nfsd_file_count) == 1)
+ queue_delayed_work(nfsd_laundry_wq, &nfsd_file_cache_clean_work,
+ NFSD_FILE_EXPIRE);
+}
+
+static void
+nfsd_file_count_dec(void)
+{
+ if (atomic_dec_and_test(&nfsd_file_count))
+ cancel_delayed_work(&nfsd_file_cache_clean_work);
+}
+
+static struct nfsd_file *
+nfsd_file_alloc(struct knfsd_fh *fh, unsigned int may, unsigned int hashval)
+{
+ struct nfsd_file *nf;
+
+ /* FIXME: create a new slabcache for these? */
+ nf = kzalloc(sizeof(*nf), GFP_KERNEL);
+ if (nf) {
+ INIT_HLIST_NODE(&nf->nf_node);
+ INIT_LIST_HEAD(&nf->nf_dispose);
+ nf->nf_time = jiffies;
+ fh_copy_shallow(&nf->nf_handle, fh);
+ nf->nf_hashval = hashval;
+ atomic_set(&nf->nf_ref, 1);
+ nf->nf_may = NFSD_FILE_MAY_MASK & may;
+ }
+ return nf;
+}
+
+static void
+nfsd_file_put_final(struct nfsd_file *nf)
+{
+ if (nf->nf_file)
+ fput(nf->nf_file);
+ kfree_rcu(nf, nf_rcu);
+}
+
+static void
+nfsd_file_unhash(struct nfsd_file *nf)
+{
+ if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+ hlist_del_rcu(&nf->nf_node);
+ nfsd_file_count_dec();
+ }
+}
+
+static void
+nfsd_file_put_locked(struct nfsd_file *nf, struct list_head *dispose)
+{
+ if (!atomic_dec_and_test(&nf->nf_ref)) {
+ nf->nf_time = jiffies;
+ return;
+ }
+
+ nfsd_file_unhash(nf);
+ list_add(&nf->nf_dispose, dispose);
+}
+
+void
+nfsd_file_put(struct nfsd_file *nf)
+{
+ if (!atomic_dec_and_lock(&nf->nf_ref,
+ &nfsd_file_hashtbl[nf->nf_hashval].nfb_lock)) {
+ nf->nf_time = jiffies;
+ return;
+ }
+
+ nfsd_file_unhash(nf);
+ spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+ nfsd_file_put_final(nf);
+}
+
+static void
+nfsd_file_dispose_list(struct list_head *dispose)
+{
+ struct nfsd_file *nf;
+
+ while(!list_empty(dispose)) {
+ nf = list_first_entry(dispose, struct nfsd_file, nf_dispose);
+ list_del(&nf->nf_dispose);
+ nfsd_file_put_final(nf);
+ }
+}
+
+static void
+nfsd_file_cache_prune(void)
+{
+ unsigned int i;
+ struct nfsd_file *nf;
+ struct hlist_node *tmp;
+ LIST_HEAD(dispose);
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ if (hlist_empty(&nfsd_file_hashtbl[i].nfb_head))
+ continue;
+
+ spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+ hlist_for_each_entry_safe(nf, tmp,
+ &nfsd_file_hashtbl[i].nfb_head, nf_node) {
+
+ /* does someone else have a reference? */
+ if (atomic_read(&nf->nf_ref) > 1)
+ continue;
+
+ /* Was this file touched recently? */
+ if (time_before(nf->nf_time + NFSD_FILE_EXPIRE, jiffies))
+ continue;
+
+ /* Ok, it's expired...unhash it */
+ nfsd_file_unhash(nf);
+
+ /* ...and put the hash reference */
+ nfsd_file_put_locked(nf, &dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+ nfsd_file_dispose_list(&dispose);
+ }
+}
+
+static void
+nfsd_file_cache_cleaner(struct work_struct *work)
+{
+ if (!atomic_read(&nfsd_file_count))
+ return;
+
+ nfsd_file_cache_prune();
+
+ if (atomic_read(&nfsd_file_count))
+ queue_delayed_work(nfsd_laundry_wq, &nfsd_file_cache_clean_work,
+ NFSD_FILE_EXPIRE);
+}
+
+int
+nfsd_file_cache_init(void)
+{
+ unsigned int i;
+
+ if (nfsd_file_hashtbl)
+ return 0;
+
+ nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
+ sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
+ if (!nfsd_file_hashtbl)
+ goto out_nomem;
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head);
+ spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock);
+ }
+
+ INIT_DELAYED_WORK(&nfsd_file_cache_clean_work, nfsd_file_cache_cleaner);
+ return 0;
+out_nomem:
+ printk(KERN_ERR "nfsd: failed to init nfsd file cache\n");
+ return -ENOMEM;
+}
+
+void
+nfsd_file_cache_shutdown(void)
+{
+ unsigned int i;
+ struct nfsd_file *nf;
+ LIST_HEAD(dispose);
+
+ cancel_delayed_work_sync(&nfsd_file_cache_clean_work);
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+ while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
+ nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
+ struct nfsd_file, nf_node);
+ nfsd_file_unhash(nf);
+ /* put the hash reference */
+ nfsd_file_put_locked(nf, &dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+ nfsd_file_dispose_list(&dispose);
+ }
+ kfree(nfsd_file_hashtbl);
+ nfsd_file_hashtbl = NULL;
+}
+
+/*
+ * Search nfsd_file_hashtbl[] for file. We hash on the filehandle and also on
+ * the NFSD_MAY_READ/WRITE flags. If the file is open for r/w, then it's usable
+ * for either.
+ */
+static struct nfsd_file *
+nfsd_file_find_locked(struct knfsd_fh *fh, unsigned int may_flags,
+ unsigned int hashval)
+{
+ struct nfsd_file *nf;
+ unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
+
+ hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+ nf_node) {
+ if ((need & nf->nf_may) != need)
+ continue;
+ if (fh_match(&nf->nf_handle, fh)) {
+ if (atomic_inc_not_zero(&nf->nf_ref))
+ return nf;
+ }
+ }
+ return NULL;
+}
+
+__be32
+nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **pnf)
+{
+ __be32 status = nfs_ok;
+ struct nfsd_file *nf, *new = NULL;
+ struct knfsd_fh *fh = &fhp->fh_handle;
+ unsigned int hashval = file_hashval(fh);
+
+ /* Mask off any extraneous bits */
+ may_flags &= NFSD_FILE_MAY_MASK;
+retry:
+ rcu_read_lock();
+ nf = nfsd_file_find_locked(fh, may_flags, hashval);
+ rcu_read_unlock();
+ if (nf)
+ goto wait_for_construction;
+
+ if (!new) {
+ new = nfsd_file_alloc(&fhp->fh_handle, may_flags, hashval);
+ if (!new)
+ return nfserr_jukebox;
+ }
+
+ spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ nf = nfsd_file_find_locked(fh, may_flags, hashval);
+ if (likely(nf == NULL)) {
+ /* Take reference for the hashtable */
+ atomic_inc(&new->nf_ref);
+ __set_bit(NFSD_FILE_HASHED, &new->nf_flags);
+ __set_bit(NFSD_FILE_PENDING, &new->nf_flags);
+ hlist_add_head_rcu(&new->nf_node,
+ &nfsd_file_hashtbl[hashval].nfb_head);
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ nfsd_file_count_inc();
+ nf = new;
+ new = NULL;
+ goto open_file;
+ }
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+
+wait_for_construction:
+ wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
+
+ /* Did construction of this file fail? */
+ if (!nf->nf_file) {
+ /*
+ * We can only take over construction for this nfsd_file if the
+ * MAY flags are equal. Otherwise, we put the reference and try
+ * again.
+ */
+ if (may_flags != nf->nf_may) {
+ nfsd_file_put(nf);
+ goto retry;
+ }
+
+ /* try to take over construction for this file */
+ if (test_and_set_bit(NFSD_FILE_PENDING, &nf->nf_flags))
+ goto wait_for_construction;
+ goto open_file;
+ }
+
+ /*
+ * We have a file that was opened in the context of another rqst. We
+ * must check permissions. Since we're dealing with open files here,
+ * we always want to set the OWNER_OVERRIDE bit.
+ */
+ status = fh_verify(rqstp, fhp, S_IFREG, may_flags);
+ if (status == nfs_ok)
+ status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
+ may_flags|NFSD_MAY_OWNER_OVERRIDE);
+out:
+ if (status == nfs_ok)
+ *pnf = nf;
+ else
+ nfsd_file_put(nf);
+
+ if (new)
+ nfsd_file_put(new);
+ return status;
+open_file:
+ status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
+ clear_bit(NFSD_FILE_PENDING, &nf->nf_flags);
+ wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
+ goto out;
+}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 9051ee54faa3..adf7e78b8e43 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -4,6 +4,7 @@
#include <linux/jhash.h>
#include <linux/sunrpc/xdr.h>

+#include "nfsfh.h"
#include "export.h"

/* hash table for nfs4_file */
@@ -22,4 +23,24 @@ file_hashval(struct knfsd_fh *fh)
return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
}

+struct nfsd_file {
+ struct hlist_node nf_node;
+ struct list_head nf_dispose;
+ struct rcu_head nf_rcu;
+ struct file *nf_file;
+ unsigned long nf_time;
+#define NFSD_FILE_HASHED (0)
+#define NFSD_FILE_PENDING (1)
+ unsigned long nf_flags;
+ struct knfsd_fh nf_handle;
+ unsigned int nf_hashval;
+ atomic_t nf_ref;
+ unsigned char nf_may;
+};
+
+int nfsd_file_cache_init(void);
+void nfsd_file_cache_shutdown(void);
+void nfsd_file_put(struct nfsd_file *nf);
+__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **nfp);
#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index ced9944201a0..0572441e23ec 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -23,6 +23,7 @@
#include "cache.h"
#include "vfs.h"
#include "netns.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_SVC

@@ -233,11 +234,17 @@ static int nfsd_startup_generic(int nrservs)
if (!nfsd_laundry_wq)
goto out_racache;

- ret = nfs4_state_start();
+ ret = nfsd_file_cache_init();
if (ret)
goto out_wq;
+
+ ret = nfs4_state_start();
+ if (ret)
+ goto out_nfsd_file;
return 0;

+out_nfsd_file:
+ nfsd_file_cache_shutdown();
out_wq:
destroy_workqueue(nfsd_laundry_wq);
nfsd_laundry_wq = NULL;
@@ -254,6 +261,7 @@ static void nfsd_shutdown_generic(void)
return;

nfs4_state_shutdown();
+ nfsd_file_cache_shutdown();
nfsd_racache_shutdown();
}

--
2.4.3


2015-08-05 21:13:47

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 06/18] nfsd: hook up nfsd_read to the nfsd_file cache

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/vfs.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 59234d1d8d8e..fd688c86af66 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -980,20 +980,14 @@ out_nfserr:
__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
{
- struct file *file;
- struct raparms *ra;
- __be32 err;
-
- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
- if (err)
- return err;
-
- ra = nfsd_init_raparms(file);
- err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
- if (ra)
- nfsd_put_raparams(file, ra);
- fput(file);
+ __be32 err;
+ struct nfsd_file *nf;

+ err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
+ if (err == nfs_ok)
+ err = nfsd_vfs_read(rqstp, nf->nf_file, offset, vec, vlen,
+ count);
+ nfsd_file_put(nf);
return err;
}

--
2.4.3


2015-08-05 21:13:47

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 05/18] nfsd: hook up nfsd_write to the new nfsd_file cache

Note that all callers currently pass in NULL for "file" anyway, so
there was already some dead code in here. Just eliminate that parm
and have it use the file cache instead of dealing directly with a
filp.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfsproc.c | 2 +-
fs/nfsd/vfs.c | 34 +++++++++++-----------------------
fs/nfsd/vfs.h | 2 +-
4 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 7b755b7f785c..4e46ac511479 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -192,7 +192,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp, struct nfsd3_writeargs *argp,

fh_copy(&resp->fh, &argp->fh);
resp->committed = argp->stable;
- nfserr = nfsd_write(rqstp, &resp->fh, NULL,
+ nfserr = nfsd_write(rqstp, &resp->fh,
argp->offset,
rqstp->rq_vec, argp->vlen,
&cnt,
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 4cd78ef4c95c..9893095cbee1 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -213,7 +213,7 @@ nfsd_proc_write(struct svc_rqst *rqstp, struct nfsd_writeargs *argp,
SVCFH_fmt(&argp->fh),
argp->len, argp->offset);

- nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
+ nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh),
argp->offset,
rqstp->rq_vec, argp->vlen,
&cnt,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index b5e077a6e7d4..59234d1d8d8e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -42,6 +42,7 @@

#include "nfsd.h"
#include "vfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_FILEOP

@@ -1002,30 +1003,17 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
* N.B. After this call fhp needs an fh_put
*/
__be32
-nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
- loff_t offset, struct kvec *vec, int vlen, unsigned long *cnt,
- int *stablep)
+nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
+ struct kvec *vec, int vlen, unsigned long *cnt, int *stablep)
{
- __be32 err = 0;
-
- if (file) {
- err = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
- NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE);
- if (err)
- goto out;
- err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt,
- stablep);
- } else {
- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
- if (err)
- goto out;
-
- if (cnt)
- err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen,
- cnt, stablep);
- fput(file);
- }
-out:
+ __be32 err;
+ struct nfsd_file *nf;
+
+ err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
+ if (err == nfs_ok)
+ err = nfsd_vfs_write(rqstp, fhp, nf->nf_file, offset, vec,
+ vlen, cnt, stablep);
+ nfsd_file_put(nf);
return err;
}

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 5be875e3e638..78b5527cba93 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -78,7 +78,7 @@ __be32 nfsd_readv(struct file *, loff_t, struct kvec *, int,
unsigned long *);
__be32 nfsd_read(struct svc_rqst *, struct svc_fh *,
loff_t, struct kvec *, int, unsigned long *);
-__be32 nfsd_write(struct svc_rqst *, struct svc_fh *,struct file *,
+__be32 nfsd_write(struct svc_rqst *, struct svc_fh *,
loff_t, struct kvec *,int, unsigned long *, int *);
__be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file, loff_t offset,
--
2.4.3


2015-08-05 21:13:48

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 07/18] sunrpc: add a new cache_detail operation for when a cache is flushed

When the exports table is changed, exportfs will usually write a new
time to the "flush" file in the nfsd.export cache procfile. This tells
the kernel to flush any entries that are older than that value.

This gives us a mechanism to tell whether an unexport might have
occurred. Add a new ->flush cache_detail operation that is called after
flushing the cache whenever someone writes to a "flush" file.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/sunrpc/cache.h | 1 +
net/sunrpc/cache.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 437ddb6c4aef..29033421aa0c 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -98,6 +98,7 @@ struct cache_detail {
int has_died);

struct cache_head * (*alloc)(void);
+ void (*flush)(void);
int (*match)(struct cache_head *orig, struct cache_head *new);
void (*init)(struct cache_head *orig, struct cache_head *new);
void (*update)(struct cache_head *orig, struct cache_head *new);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 2928afffbb81..19dc71ccc706 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1452,6 +1452,9 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
cd->nextcheck = seconds_since_boot();
cache_flush();

+ if (cd->flush)
+ cd->flush();
+
*ppos += count;
return count;
}
--
2.4.3


2015-08-05 21:13:50

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 09/18] nfsd: allow the file cache expire time to be tunable

This is primarily for testing purposes, but this may also be useful
until we have a better idea for a sensible default here.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a3782cf7b352..084e56726318 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -8,6 +8,7 @@
#include <linux/hash.h>
#include <linux/file.h>
#include <linux/sched.h>
+#include <linux/module.h>

#include "vfs.h"
#include "nfsd.h"
@@ -16,8 +17,10 @@

#define NFSDDBG_FACILITY NFSDDBG_FH

-/* Min time we should keep around a file cache entry */
-#define NFSD_FILE_EXPIRE (HZ)
+/* Min time we should keep around a file cache entry (in jiffies) */
+static unsigned int nfsd_file_cache_expiry = HZ;
+module_param(nfsd_file_cache_expiry, uint, 0644);
+MODULE_PARM_DESC(nfsd_file_cache_expiry, "Expire time for open file cache (in jiffies)");

/* We only care about NFSD_MAY_READ/WRITE for this cache */
#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
@@ -40,7 +43,7 @@ nfsd_file_count_inc(void)
{
if (atomic_inc_return(&nfsd_file_count) == 1)
queue_delayed_work(nfsd_laundry_wq, &nfsd_file_cache_clean_work,
- NFSD_FILE_EXPIRE);
+ nfsd_file_cache_expiry);
}

static void
@@ -169,7 +172,8 @@ nfsd_file_cache_prune(void)
continue;

/* Was this file touched recently? */
- if (time_before(nf->nf_time + NFSD_FILE_EXPIRE, jiffies))
+ if (time_before(nf->nf_time + nfsd_file_cache_expiry,
+ jiffies))
continue;

/* Ok, it's expired...unhash it */
@@ -193,7 +197,7 @@ nfsd_file_cache_cleaner(struct work_struct *work)

if (atomic_read(&nfsd_file_count))
queue_delayed_work(nfsd_laundry_wq, &nfsd_file_cache_clean_work,
- NFSD_FILE_EXPIRE);
+ nfsd_file_cache_expiry);
}

int
--
2.4.3


2015-08-05 21:13:51

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 11/18] nfsd: hook nfsd_commit up to the nfsd_file cache

Use cached filps if possible instead of opening a new one every time.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/vfs.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 79a881589ca7..8308d786c1e2 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1026,9 +1026,9 @@ __be32
nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, unsigned long count)
{
- struct file *file;
- loff_t end = LLONG_MAX;
- __be32 err = nfserr_inval;
+ struct nfsd_file *nf;
+ loff_t end = LLONG_MAX;
+ __be32 err = nfserr_inval;

if (offset < 0)
goto out;
@@ -1038,12 +1038,12 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
goto out;
}

- err = nfsd_open(rqstp, fhp, S_IFREG,
- NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &file);
+ err = nfsd_file_acquire(rqstp, fhp,
+ NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
if (err)
goto out;
if (EX_ISSYNC(fhp->fh_export)) {
- int err2 = vfs_fsync_range(file, offset, end, 0);
+ int err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);

if (err2 != -EINVAL)
err = nfserrno(err2);
@@ -1051,7 +1051,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
err = nfserr_notsupp;
}

- fput(file);
+ nfsd_file_put(nf);
out:
return err;
}
--
2.4.3


2015-08-05 21:13:51

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 10/18] nfsd: handle NFSD_MAY_NOT_BREAK_LEASE in open file cache

The NFSD_MAY_NOT_BREAK_LEASE flag needs special handling. If we open a
file in order to do (e.g.) a COMMIT then we don't want to break any
leases, but subsequent READ/WRITE operations must break the leases.

If we construct a new cache entry with a set of may flags that have
NFSD_MAY_NOT_BREAK_LEASE set, then set flags in the cache entry that
indicate that subsequent users of this file must break leases before
using it if they do not have NFSD_MAY_NOT_BREAK_LEASE set.

Note that because NFSD_MAY_READ opens do not break read leases, we
must track what sort of lease breaks have been done. If we're breaking
leases for read, then we still need to do a lease break for write if
it's a R/W open and a writer comes along. Lease breaks for write
however imply a read lease break so we can clear both flags in that
event.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.c | 40 ++++++++++++++++++++++++++++++++++++----
fs/nfsd/filecache.h | 2 ++
fs/nfsd/vfs.c | 3 ++-
fs/nfsd/vfs.h | 1 +
4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 084e56726318..0adcc674441d 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -68,6 +68,12 @@ nfsd_file_alloc(struct knfsd_fh *fh, unsigned int may, unsigned int hashval)
nf->nf_hashval = hashval;
atomic_set(&nf->nf_ref, 1);
nf->nf_may = NFSD_FILE_MAY_MASK & may;
+ if (may & NFSD_MAY_NOT_BREAK_LEASE) {
+ if (may & NFSD_MAY_WRITE)
+ __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+ if (may & NFSD_MAY_READ)
+ __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ }
}
return nf;
}
@@ -269,8 +275,6 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct knfsd_fh *fh = &fhp->fh_handle;
unsigned int hashval = file_hashval(fh);

- /* Mask off any extraneous bits */
- may_flags &= NFSD_FILE_MAY_MASK;
retry:
rcu_read_lock();
nf = nfsd_file_find_locked(fh, may_flags, hashval);
@@ -311,7 +315,7 @@ wait_for_construction:
* MAY flags are equal. Otherwise, we put the reference and try
* again.
*/
- if (may_flags != nf->nf_may) {
+ if ((may_flags & NFSD_FILE_MAY_MASK) != nf->nf_may) {
nfsd_file_put(nf);
goto retry;
}
@@ -319,6 +323,18 @@ wait_for_construction:
/* try to take over construction for this file */
if (test_and_set_bit(NFSD_FILE_PENDING, &nf->nf_flags))
goto wait_for_construction;
+
+ /* sync up the BREAK_* flags with our may_flags */
+ if (may_flags & NFSD_MAY_NOT_BREAK_LEASE) {
+ if (may_flags & NFSD_MAY_WRITE)
+ set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+ if (may_flags & NFSD_MAY_READ)
+ set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ } else {
+ clear_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+ clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ }
+
goto open_file;
}

@@ -330,7 +346,23 @@ wait_for_construction:
status = fh_verify(rqstp, fhp, S_IFREG, may_flags);
if (status == nfs_ok)
status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
- may_flags|NFSD_MAY_OWNER_OVERRIDE);
+ may_flags|NFSD_MAY_OWNER_OVERRIDE);
+
+ if (status == nfs_ok && !(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
+ bool write = (may_flags & NFSD_MAY_WRITE);
+
+ if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) ||
+ (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) {
+ status = nfserrno(nfsd_open_break_lease(
+ file_inode(nf->nf_file), may_flags));
+ if (status == nfs_ok) {
+ clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ if (write)
+ clear_bit(NFSD_FILE_BREAK_WRITE,
+ &nf->nf_flags);
+ }
+ }
+ }
out:
if (status == nfs_ok)
*pnf = nf;
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 98976f71caa8..b8a59d615729 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -31,6 +31,8 @@ struct nfsd_file {
unsigned long nf_time;
#define NFSD_FILE_HASHED (0)
#define NFSD_FILE_PENDING (1)
+#define NFSD_FILE_BREAK_READ (2)
+#define NFSD_FILE_BREAK_WRITE (3)
unsigned long nf_flags;
struct knfsd_fh nf_handle;
unsigned int nf_hashval;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fd688c86af66..79a881589ca7 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -618,7 +618,8 @@ nfsd_access(struct svc_rqst *rqstp, struct svc_fh *fhp, u32 *access, u32 *suppor
}
#endif /* CONFIG_NFSD_V3 */

-static int nfsd_open_break_lease(struct inode *inode, int access)
+int
+nfsd_open_break_lease(struct inode *inode, int access)
{
unsigned int mode;

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 78b5527cba93..a3ec59830297 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -69,6 +69,7 @@ __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *,
__be32 nfsd_commit(struct svc_rqst *, struct svc_fh *,
loff_t, unsigned long);
#endif /* CONFIG_NFSD_V3 */
+int nfsd_open_break_lease(struct inode *, int);
__be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
int, struct file **);
struct raparms;
--
2.4.3


2015-08-05 21:13:49

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 08/18] nfsd: add a ->flush routine to svc_export_cache

...to purge the nfsd_file table. This should help ensure that
filesystems are unmountable very soon after unexporting them.

Note that we take the nfsd_mutex in this code to ensure that we can't
race with a concurrent shutdown of nfsd that might destroy the cache.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/export.c | 14 ++++++++++++++
fs/nfsd/filecache.c | 39 +++++++++++++++++++++++++--------------
fs/nfsd/filecache.h | 1 +
3 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index f79521a59747..be80ff370072 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -21,6 +21,7 @@
#include "nfsfh.h"
#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_EXPORT

@@ -231,6 +232,18 @@ static struct cache_head *expkey_alloc(void)
return NULL;
}

+static void
+expkey_flush(void)
+{
+ /*
+ * Take the nfsd_mutex here to ensure that the file cache is not
+ * destroyed while we're in the middle of flushing.
+ */
+ mutex_lock(&nfsd_mutex);
+ nfsd_file_cache_purge();
+ mutex_unlock(&nfsd_mutex);
+}
+
static struct cache_detail svc_expkey_cache_template = {
.owner = THIS_MODULE,
.hash_size = EXPKEY_HASHMAX,
@@ -243,6 +256,7 @@ static struct cache_detail svc_expkey_cache_template = {
.init = expkey_init,
.update = expkey_update,
.alloc = expkey_alloc,
+ .flush = expkey_flush,
};

static int
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 5278b8d9e79a..a3782cf7b352 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -124,6 +124,30 @@ nfsd_file_dispose_list(struct list_head *dispose)
}
}

+void
+nfsd_file_cache_purge(void)
+{
+ unsigned int i;
+ struct nfsd_file *nf;
+ LIST_HEAD(dispose);
+
+ if (!atomic_read(&nfsd_file_count))
+ return;
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+ while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
+ nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
+ struct nfsd_file, nf_node);
+ nfsd_file_unhash(nf);
+ /* put the hash reference */
+ nfsd_file_put_locked(nf, &dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+ nfsd_file_dispose_list(&dispose);
+ }
+}
+
static void
nfsd_file_cache_prune(void)
{
@@ -200,23 +224,10 @@ out_nomem:
void
nfsd_file_cache_shutdown(void)
{
- unsigned int i;
- struct nfsd_file *nf;
LIST_HEAD(dispose);

cancel_delayed_work_sync(&nfsd_file_cache_clean_work);
- for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
- spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
- while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
- nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
- struct nfsd_file, nf_node);
- nfsd_file_unhash(nf);
- /* put the hash reference */
- nfsd_file_put_locked(nf, &dispose);
- }
- spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
- nfsd_file_dispose_list(&dispose);
- }
+ nfsd_file_cache_purge();
kfree(nfsd_file_hashtbl);
nfsd_file_hashtbl = NULL;
}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index adf7e78b8e43..98976f71caa8 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -39,6 +39,7 @@ struct nfsd_file {
};

int nfsd_file_cache_init(void);
+void nfsd_file_cache_purge(void);
void nfsd_file_cache_shutdown(void);
void nfsd_file_put(struct nfsd_file *nf);
__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
--
2.4.3


2015-08-05 21:13:54

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 14/18] nfsd: have _fh_update take a knfsd_fh instead of a svc_fh

This also mandates that we pass in the fh_maxsize parm as a separate
value.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfsfh.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 350041a40fe5..a945500db17c 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -388,20 +388,20 @@ out:
* an inode. In this case a call to fh_update should be made
* before the fh goes out on the wire ...
*/
-static void _fh_update(struct svc_fh *fhp, struct svc_export *exp,
- struct dentry *dentry)
+static void _fh_update(struct knfsd_fh *kfh, int fh_maxsize,
+ struct svc_export *exp, struct dentry *dentry)
{
if (dentry != exp->ex_path.dentry) {
- struct fid *fid = (struct fid *)
- (fhp->fh_handle.fh_fsid + fhp->fh_handle.fh_size/4 - 1);
- int maxsize = (fhp->fh_maxsize - fhp->fh_handle.fh_size)/4;
+ struct fid *fid =
+ (struct fid *)(kfh->fh_fsid + kfh->fh_size/4 - 1);
+ int maxsize = (fh_maxsize - kfh->fh_size)/4;
int subtreecheck = !(exp->ex_flags & NFSEXP_NOSUBTREECHECK);

- fhp->fh_handle.fh_fileid_type =
+ kfh->fh_fileid_type =
exportfs_encode_fh(dentry, fid, &maxsize, subtreecheck);
- fhp->fh_handle.fh_size += maxsize * 4;
+ kfh->fh_size += maxsize * 4;
} else {
- fhp->fh_handle.fh_fileid_type = FILEID_ROOT;
+ kfh->fh_fileid_type = FILEID_ROOT;
}
}

@@ -574,7 +574,8 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
exp->ex_fsid, exp->ex_uuid);

if (inode)
- _fh_update(fhp, exp, dentry);
+ _fh_update(&fhp->fh_handle, fhp->fh_maxsize,
+ exp, dentry);
if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) {
fh_put(fhp);
return nfserr_opnotsupp;
@@ -605,7 +606,8 @@ fh_update(struct svc_fh *fhp)
if (fhp->fh_handle.fh_fileid_type != FILEID_ROOT)
return 0;

- _fh_update(fhp, fhp->fh_export, dentry);
+ _fh_update(&fhp->fh_handle, fhp->fh_maxsize,
+ fhp->fh_export, dentry);
if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID)
return nfserr_opnotsupp;
}
--
2.4.3


2015-08-05 21:13:54

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 15/18] nfsd: have set_version_and_fsid_type take a knfsd_fh instead of svc_fh

Note however that we do still have it take a svc_fh for the ref_fh, but
that could be changed in the future if it's helpful to do so.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfsfh.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index a945500db17c..b601f291d825 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -454,7 +454,9 @@ static bool fsid_type_ok_for_exp(u8 fsid_type, struct svc_export *exp)
}


-static void set_version_and_fsid_type(struct svc_fh *fhp, struct svc_export *exp, struct svc_fh *ref_fh)
+static void
+set_version_and_fsid_type(struct knfsd_fh *kfh, int maxsize,
+ struct svc_export *exp, struct svc_fh *ref_fh)
{
u8 version;
u8 fsid_type;
@@ -487,7 +489,7 @@ retry:
} else if (exp->ex_flags & NFSEXP_FSID) {
fsid_type = FSID_NUM;
} else if (exp->ex_uuid) {
- if (fhp->fh_maxsize >= 64) {
+ if (maxsize >= 64) {
if (is_root_export(exp))
fsid_type = FSID_UUID16;
else
@@ -503,9 +505,9 @@ retry:
fsid_type = FSID_ENCODE_DEV;
else
fsid_type = FSID_DEV;
- fhp->fh_handle.fh_version = version;
+ kfh->fh_version = version;
if (version)
- fhp->fh_handle.fh_fsid_type = fsid_type;
+ kfh->fh_fsid_type = fsid_type;
}

__be32
@@ -533,7 +535,8 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
* the reference filehandle (if it is in the same export)
* or the export options.
*/
- set_version_and_fsid_type(fhp, exp, ref_fh);
+ set_version_and_fsid_type(&fhp->fh_handle, fhp->fh_maxsize,
+ exp, ref_fh);

if (ref_fh == fhp)
fh_put(ref_fh);
--
2.4.3


2015-08-05 21:13:55

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 16/18] nfsd: add a fh_compose_shallow

In a later patch, we'll need to be able to cook up a knfsd_fh from a
dentry/export combo. Usually nfsd works with svc_fh's, but that takes
a lot of extra baggage that we don't really need here.

Add a routine that encodes the filehandle directly into a knfsd_fh
instead. Much like we have a fh_copy_shallow, the new routine is called
fh_compose_shallow.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfsfh.c | 112 +++++++++++++++++++++++++++++---------------------------
fs/nfsd/nfsfh.h | 2 +
2 files changed, 60 insertions(+), 54 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index b601f291d825..ee6d4b746467 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -511,32 +511,64 @@ retry:
}

__be32
-fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
- struct svc_fh *ref_fh)
+fh_compose_shallow(struct knfsd_fh *kfh, int maxsize, struct svc_export *exp,
+ struct dentry *dentry, struct svc_fh *ref_fh)
{
- /* ref_fh is a reference file handle.
- * if it is non-null and for the same filesystem, then we should compose
- * a filehandle which is of the same version, where possible.
- * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
- * Then create a 32byte filehandle using nfs_fhbase_old
- *
- */
+ struct inode *inode = d_inode(dentry);
+ dev_t ex_dev = exp_sb(exp)->s_dev;

- struct inode * inode = d_inode(dentry);
- dev_t ex_dev = exp_sb(exp)->s_dev;
-
- dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
- MAJOR(ex_dev), MINOR(ex_dev),
+ dprintk("nfsd: %s(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
+ __func__, MAJOR(ex_dev), MINOR(ex_dev),
(long) d_inode(exp->ex_path.dentry)->i_ino,
- dentry,
- (inode ? inode->i_ino : 0));
+ dentry, (inode ? inode->i_ino : 0));

- /* Choose filehandle version and fsid type based on
- * the reference filehandle (if it is in the same export)
- * or the export options.
+ /*
+ * Choose filehandle version and fsid type based on the reference
+ * filehandle (if it is in the same export) or the export options.
*/
- set_version_and_fsid_type(&fhp->fh_handle, fhp->fh_maxsize,
- exp, ref_fh);
+ set_version_and_fsid_type(kfh, maxsize, exp, ref_fh);
+ if (kfh->fh_version == 0xca) {
+ /* old style filehandle please */
+ memset(&kfh->fh_base, 0, NFS_FHSIZE);
+ kfh->fh_size = NFS_FHSIZE;
+ kfh->ofh_dcookie = 0xfeebbaca;
+ kfh->ofh_dev = old_encode_dev(ex_dev);
+ kfh->ofh_xdev = kfh->ofh_dev;
+ kfh->ofh_xino =
+ ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
+ kfh->ofh_dirino = ino_t_to_u32(parent_ino(dentry));
+ if (inode)
+ _fh_update_old(dentry, exp, kfh);
+ } else {
+ kfh->fh_size = key_len(kfh->fh_fsid_type) + 4;
+ kfh->fh_auth_type = 0;
+
+ mk_fsid(kfh->fh_fsid_type, kfh->fh_fsid, ex_dev,
+ d_inode(exp->ex_path.dentry)->i_ino,
+ exp->ex_fsid, exp->ex_uuid);
+
+ if (inode)
+ _fh_update(kfh, maxsize, exp, dentry);
+
+ if (kfh->fh_fileid_type == FILEID_INVALID)
+ return nfserr_opnotsupp;
+ }
+ return nfs_ok;
+}
+
+/*
+ * ref_fh is a reference file handle.
+ * if it is non-null and for the same filesystem, then we should compose
+ * a filehandle which is of the same version, where possible.
+ * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
+ * Then create a 32byte filehandle using nfs_fhbase_old
+ *
+ */
+__be32
+fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
+ struct svc_fh *ref_fh)
+{
+ __be32 status;

if (ref_fh == fhp)
fh_put(ref_fh);
@@ -553,39 +585,11 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
fhp->fh_dentry = dget(dentry); /* our internal copy */
fhp->fh_export = exp_get(exp);

- if (fhp->fh_handle.fh_version == 0xca) {
- /* old style filehandle please */
- memset(&fhp->fh_handle.fh_base, 0, NFS_FHSIZE);
- fhp->fh_handle.fh_size = NFS_FHSIZE;
- fhp->fh_handle.ofh_dcookie = 0xfeebbaca;
- fhp->fh_handle.ofh_dev = old_encode_dev(ex_dev);
- fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev;
- fhp->fh_handle.ofh_xino =
- ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
- fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry));
- if (inode)
- _fh_update_old(dentry, exp, &fhp->fh_handle);
- } else {
- fhp->fh_handle.fh_size =
- key_len(fhp->fh_handle.fh_fsid_type) + 4;
- fhp->fh_handle.fh_auth_type = 0;
-
- mk_fsid(fhp->fh_handle.fh_fsid_type,
- fhp->fh_handle.fh_fsid,
- ex_dev,
- d_inode(exp->ex_path.dentry)->i_ino,
- exp->ex_fsid, exp->ex_uuid);
-
- if (inode)
- _fh_update(&fhp->fh_handle, fhp->fh_maxsize,
- exp, dentry);
- if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) {
- fh_put(fhp);
- return nfserr_opnotsupp;
- }
- }
-
- return 0;
+ status = fh_compose_shallow(&fhp->fh_handle, fhp->fh_maxsize, exp,
+ dentry, ref_fh);
+ if (status != nfs_ok)
+ fh_put(fhp);
+ return status;
}

/*
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 1e90dad4926b..dec472131588 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -159,6 +159,8 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
* Function prototypes
*/
__be32 fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
+__be32 fh_compose_shallow(struct knfsd_fh *, int, struct svc_export *,
+ struct dentry *, struct svc_fh *);
__be32 fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
__be32 fh_update(struct svc_fh *);
void fh_put(struct svc_fh *);
--
2.4.3


2015-08-05 21:13:56

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 17/18] nfsd: close cached files prior to a REMOVE or RENAME that would replace target

It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.

On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.c | 27 +++++++++++++++++++++++++++
fs/nfsd/filecache.h | 1 +
fs/nfsd/trace.h | 17 +++++++++++++++++
fs/nfsd/vfs.c | 23 +++++++++++++++++++++--
4 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 44dee985864b..a114f3a69117 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -272,6 +272,33 @@ nfsd_file_find_locked(struct knfsd_fh *fh, unsigned int may_flags,
return NULL;
}

+/**
+ * nfsd_file_force_close - attempt to forcibly close a nfsd_file
+ * @fh: filehandle of the file to attempt to remove
+ *
+ * Walk the whole hash bucket, looking for any files that correspond to "fh".
+ * If any do, then unhash them and put the hashtable reference to them.
+ */
+void
+nfsd_file_close_fh(struct knfsd_fh *fh)
+{
+ struct nfsd_file *nf;
+ struct hlist_node *tmp;
+ unsigned int hashval = file_hashval(fh);
+ LIST_HEAD(dispose);
+
+ spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
+ if (fh_match(&nf->nf_handle, fh)) {
+ nfsd_file_unhash(nf);
+ nfsd_file_put_locked(nf, &dispose);
+ }
+ }
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ trace_nfsd_file_close_fh(hashval, fh, !list_empty(&dispose));
+ nfsd_file_dispose_list(&dispose);
+}
+
__be32
nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int may_flags, struct nfsd_file **pnf)
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index b8a59d615729..40c47cac2c33 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -44,6 +44,7 @@ int nfsd_file_cache_init(void);
void nfsd_file_cache_purge(void);
void nfsd_file_cache_shutdown(void);
void nfsd_file_put(struct nfsd_file *nf);
+void nfsd_file_close_fh(struct knfsd_fh *fh);
__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int may_flags, struct nfsd_file **nfp);
#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index ff26d91f8106..c45d318e3a0b 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -147,6 +147,23 @@ TRACE_EVENT(nfsd_file_acquire,
show_nf_may(__entry->nf_may), __entry->nf_time,
__entry->nf_file, be32_to_cpu(__entry->status))
);
+
+TRACE_EVENT(nfsd_file_close_fh,
+ TP_PROTO(unsigned int hash, struct knfsd_fh *handle, int found),
+ TP_ARGS(hash, handle, found),
+ TP_STRUCT__entry(
+ __field(unsigned int, hash)
+ __field_struct(struct knfsd_fh, handle)
+ __field(int, found)
+ ),
+ TP_fast_assign(
+ __entry->hash = hash;
+ __entry->handle = *handle;
+ __entry->found = found;
+ ),
+ TP_printk("hash=0x%x handle=%s found=%d", __entry->hash,
+ show_nf_fh(__entry->handle), __entry->found)
+);
#endif /* _NFSD_TRACE_H */

#undef TRACE_INCLUDE_PATH
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 8308d786c1e2..c70c6fbec4d7 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1581,6 +1581,21 @@ out_nfserr:
goto out_unlock;
}

+static __be32
+nfsd_close_cached_files(struct dentry *dentry, struct svc_fh *fhp)
+{
+ __be32 err = nfs_ok;
+ struct knfsd_fh kfh = { 0 };
+
+ if (d_really_is_positive(dentry) && S_ISREG(d_inode(dentry)->i_mode)) {
+ err = fh_compose_shallow(&kfh, fhp->fh_maxsize,
+ fhp->fh_export, dentry, fhp);
+ if (err == nfs_ok)
+ nfsd_file_close_fh(&kfh);
+ }
+ return err;
+}
+
/*
* Rename a file
* N.B. After this call _both_ ffhp and tfhp need an fh_put
@@ -1650,6 +1665,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
goto out_dput_new;

+ nfsd_close_cached_files(ndentry, tfhp);
host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
if (!host_err) {
host_err = commit_metadata(tfhp);
@@ -1719,10 +1735,13 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
if (!type)
type = d_inode(rdentry)->i_mode & S_IFMT;

- if (type != S_IFDIR)
+ if (type != S_IFDIR) {
+ nfsd_close_cached_files(rdentry, fhp);
host_err = vfs_unlink(dirp, rdentry, NULL);
- else
+ } else {
host_err = vfs_rmdir(dirp, rdentry);
+ }
+
if (!host_err)
host_err = commit_metadata(fhp);
dput(rdentry);
--
2.4.3


2015-08-05 21:13:52

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 12/18] nfsd: move include of state.h from trace.c to trace.h

Any file which includes trace.h will need to include state.h, even if
they aren't using any state tracepoints. Ensure that we include any
headers that might be needed in trace.h instead of relying on the
*.c files to have the right ones.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/trace.c | 2 --
fs/nfsd/trace.h | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/trace.c b/fs/nfsd/trace.c
index 82f89070594c..90967466a1e5 100644
--- a/fs/nfsd/trace.c
+++ b/fs/nfsd/trace.c
@@ -1,5 +1,3 @@

-#include "state.h"
-
#define CREATE_TRACE_POINTS
#include "trace.h"
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c668520c344b..0befe762762b 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -9,6 +9,8 @@

#include <linux/tracepoint.h>

+#include "state.h"
+
DECLARE_EVENT_CLASS(nfsd_stateid_class,
TP_PROTO(stateid_t *stp),
TP_ARGS(stp),
--
2.4.3


2015-08-05 21:13:57

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 18/18] nfsd: call flush_delayed_fput from nfsd_file_close_fh

...when there are open files to be closed.

When knfsd does an fput(), it gets queued to a list and a workqueue job
is then scheduled to do the actual __fput work. In the case of knfsd
closing down the file prior to a REMOVE or RENAME, we really want to
ensure that those files are closed prior to returning. When there are
files to be closed, call flush_delayed_fput to ensure this.

There are deadlock possibilities if you call flush_delayed_fput while
holding locks, however. In the case of nfsd_rename, we don't even do the
lookups of the dentries to be renamed until we've locked for rename.

Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/file_table.c | 1 +
fs/nfsd/filecache.c | 33 ++++++++++++++++++++++++++++++++-
fs/nfsd/filecache.h | 1 +
fs/nfsd/trace.h | 10 +++++++++-
fs/nfsd/vfs.c | 52 ++++++++++++++++++++++++++++++++++++++++++++--------
5 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 7f9d407c7595..33898e72618c 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -257,6 +257,7 @@ void flush_delayed_fput(void)
{
delayed_fput(NULL);
}
+EXPORT_SYMBOL_GPL(flush_delayed_fput);

static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a114f3a69117..ce8b8e4f7cf3 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -273,6 +273,34 @@ nfsd_file_find_locked(struct knfsd_fh *fh, unsigned int may_flags,
}

/**
+ * nfsd_file_is_cached - are there any cached open files for this fh?
+ * @fh: filehandle of the file to check
+ *
+ * Scan the hashtable for open files that match this fh. Returns true if there
+ * are any, and false if not.
+ */
+bool
+nfsd_file_is_cached(struct knfsd_fh *fh)
+{
+ bool ret = false;
+ struct nfsd_file *nf;
+ unsigned int hashval = file_hashval(fh);
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+ nf_node) {
+ if (fh_match(&nf->nf_handle, fh)) {
+ ret = true;
+ break;
+ }
+ }
+ rcu_read_unlock();
+ trace_nfsd_file_is_cached(hashval, fh, (int)ret);
+ return ret;
+}
+
+
+/**
* nfsd_file_force_close - attempt to forcibly close a nfsd_file
* @fh: filehandle of the file to attempt to remove
*
@@ -296,7 +324,10 @@ nfsd_file_close_fh(struct knfsd_fh *fh)
}
spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
trace_nfsd_file_close_fh(hashval, fh, !list_empty(&dispose));
- nfsd_file_dispose_list(&dispose);
+ if (!list_empty(&dispose)) {
+ nfsd_file_dispose_list(&dispose);
+ flush_delayed_fput();
+ }
}

__be32
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 40c47cac2c33..2a347e16f095 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -44,6 +44,7 @@ int nfsd_file_cache_init(void);
void nfsd_file_cache_purge(void);
void nfsd_file_cache_shutdown(void);
void nfsd_file_put(struct nfsd_file *nf);
+bool nfsd_file_is_cached(struct knfsd_fh *fh);
void nfsd_file_close_fh(struct knfsd_fh *fh);
__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int may_flags, struct nfsd_file **nfp);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c45d318e3a0b..150ce332fbd7 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -148,7 +148,7 @@ TRACE_EVENT(nfsd_file_acquire,
__entry->nf_file, be32_to_cpu(__entry->status))
);

-TRACE_EVENT(nfsd_file_close_fh,
+DECLARE_EVENT_CLASS(nfsd_file_search_class,
TP_PROTO(unsigned int hash, struct knfsd_fh *handle, int found),
TP_ARGS(hash, handle, found),
TP_STRUCT__entry(
@@ -164,6 +164,14 @@ TRACE_EVENT(nfsd_file_close_fh,
TP_printk("hash=0x%x handle=%s found=%d", __entry->hash,
show_nf_fh(__entry->handle), __entry->found)
);
+
+#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \
+DEFINE_EVENT(nfsd_file_search_class, name, \
+ TP_PROTO(unsigned int hash, struct knfsd_fh *handle, int found), \
+ TP_ARGS(hash, handle, found))
+
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_fh);
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_is_cached);
#endif /* _NFSD_TRACE_H */

#undef TRACE_INCLUDE_PATH
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c70c6fbec4d7..464e524e8f6e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1596,6 +1596,22 @@ nfsd_close_cached_files(struct dentry *dentry, struct svc_fh *fhp)
return err;
}

+static bool
+nfsd_has_cached_files(struct dentry *dentry, struct svc_fh *fhp)
+{
+ __be32 err;
+ bool ret = false;
+ struct knfsd_fh kfh = { 0 };
+
+ if (d_really_is_positive(dentry) && S_ISREG(d_inode(dentry)->i_mode)) {
+ err = fh_compose_shallow(&kfh, fhp->fh_maxsize,
+ fhp->fh_export, dentry, fhp);
+ if (err == nfs_ok)
+ ret = nfsd_file_is_cached(&kfh);
+ }
+ return ret;
+}
+
/*
* Rename a file
* N.B. After this call _both_ ffhp and tfhp need an fh_put
@@ -1608,6 +1624,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
struct inode *fdir, *tdir;
__be32 err;
int host_err;
+ bool has_cached = false;

err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE);
if (err)
@@ -1626,6 +1643,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
goto out;

+retry:
host_err = fh_want_write(ffhp);
if (host_err) {
err = nfserrno(host_err);
@@ -1665,12 +1683,16 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
goto out_dput_new;

- nfsd_close_cached_files(ndentry, tfhp);
- host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
- if (!host_err) {
- host_err = commit_metadata(tfhp);
- if (!host_err)
- host_err = commit_metadata(ffhp);
+ if (nfsd_has_cached_files(ndentry, tfhp)) {
+ has_cached = true;
+ goto out_dput_old;
+ } else {
+ host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0);
+ if (!host_err) {
+ host_err = commit_metadata(tfhp);
+ if (!host_err)
+ host_err = commit_metadata(ffhp);
+ }
}
out_dput_new:
dput(ndentry);
@@ -1683,12 +1705,26 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
* as that would do the wrong thing if the two directories
* were the same, so again we do it by hand.
*/
- fill_post_wcc(ffhp);
- fill_post_wcc(tfhp);
+ if (!has_cached) {
+ fill_post_wcc(ffhp);
+ fill_post_wcc(tfhp);
+ }
unlock_rename(tdentry, fdentry);
ffhp->fh_locked = tfhp->fh_locked = 0;
fh_drop_write(ffhp);

+ /*
+ * If the target dentry has cached open files, then we need to try to
+ * close them prior to doing the rename. Flushing delayed fput
+ * shouldn't be done with locks held however, so we delay it until this
+ * point and then reattempt the whole shebang.
+ */
+ if (has_cached) {
+ has_cached = false;
+ nfsd_close_cached_files(ndentry, tfhp);
+ dput(ndentry);
+ goto retry;
+ }
out:
return err;
}
--
2.4.3


2015-08-05 21:13:53

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 13/18] nfsd: add new tracepoints for nfsd_file cache

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.c | 19 ++++++++--
fs/nfsd/trace.h | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 0adcc674441d..44dee985864b 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -14,6 +14,7 @@
#include "nfsd.h"
#include "nfsfh.h"
#include "filecache.h"
+#include "trace.h"

#define NFSDDBG_FACILITY NFSDDBG_FH

@@ -74,6 +75,7 @@ nfsd_file_alloc(struct knfsd_fh *fh, unsigned int may, unsigned int hashval)
if (may & NFSD_MAY_READ)
__set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
}
+ trace_nfsd_file_alloc(nf);
}
return nf;
}
@@ -81,6 +83,7 @@ nfsd_file_alloc(struct knfsd_fh *fh, unsigned int may, unsigned int hashval)
static void
nfsd_file_put_final(struct nfsd_file *nf)
{
+ trace_nfsd_file_put_final(nf);
if (nf->nf_file)
fput(nf->nf_file);
kfree_rcu(nf, nf_rcu);
@@ -89,6 +92,7 @@ nfsd_file_put_final(struct nfsd_file *nf)
static void
nfsd_file_unhash(struct nfsd_file *nf)
{
+ trace_nfsd_file_unhash(nf);
if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
hlist_del_rcu(&nf->nf_node);
nfsd_file_count_dec();
@@ -98,6 +102,7 @@ nfsd_file_unhash(struct nfsd_file *nf)
static void
nfsd_file_put_locked(struct nfsd_file *nf, struct list_head *dispose)
{
+ trace_nfsd_file_put_locked(nf);
if (!atomic_dec_and_test(&nf->nf_ref)) {
nf->nf_time = jiffies;
return;
@@ -110,6 +115,7 @@ nfsd_file_put_locked(struct nfsd_file *nf, struct list_head *dispose)
void
nfsd_file_put(struct nfsd_file *nf)
{
+ trace_nfsd_file_put(nf);
if (!atomic_dec_and_lock(&nf->nf_ref,
&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock)) {
nf->nf_time = jiffies;
@@ -284,8 +290,11 @@ retry:

if (!new) {
new = nfsd_file_alloc(&fhp->fh_handle, may_flags, hashval);
- if (!new)
+ if (!new) {
+ trace_nfsd_file_acquire(hashval, fh, may_flags, NULL,
+ nfserr_jukebox);
return nfserr_jukebox;
+ }
}

spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
@@ -364,13 +373,17 @@ wait_for_construction:
}
}
out:
- if (status == nfs_ok)
+ if (status == nfs_ok) {
*pnf = nf;
- else
+ } else {
nfsd_file_put(nf);
+ nf = NULL;
+ }

if (new)
nfsd_file_put(new);
+
+ trace_nfsd_file_acquire(hashval, fh, may_flags, nf, status);
return status;
open_file:
status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 0befe762762b..ff26d91f8106 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -10,6 +10,8 @@
#include <linux/tracepoint.h>

#include "state.h"
+#include "filecache.h"
+#include "vfs.h"

DECLARE_EVENT_CLASS(nfsd_stateid_class,
TP_PROTO(stateid_t *stp),
@@ -48,6 +50,103 @@ DEFINE_STATEID_EVENT(layout_recall_done);
DEFINE_STATEID_EVENT(layout_recall_fail);
DEFINE_STATEID_EVENT(layout_recall_release);

+#define show_nf_flags(val) \
+ __print_flags(val, "|", \
+ { 1 << NFSD_FILE_HASHED, "HASHED" }, \
+ { 1 << NFSD_FILE_PENDING, "PENDING" }, \
+ { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \
+ { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" })
+
+/* FIXME: This should probably be fleshed out in the future. */
+#define show_nf_may(val) \
+ __print_flags(val, "|", \
+ { NFSD_MAY_READ, "READ" }, \
+ { NFSD_MAY_WRITE, "WRITE" }, \
+ { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" })
+
+#define show_nf_fh(fh) \
+ __print_hex((char *)&fh.fh_base, fh.fh_size)
+
+DECLARE_EVENT_CLASS(nfsd_file_class,
+ TP_PROTO(struct nfsd_file *nf),
+ TP_ARGS(nf),
+ TP_STRUCT__entry(
+ __field(unsigned int, nf_hashval)
+ __field_struct(struct knfsd_fh, nf_handle)
+ __field(int, nf_ref)
+ __field(unsigned long, nf_flags)
+ __field(unsigned char, nf_may)
+ __field(unsigned long, nf_time)
+ __field(struct file *, nf_file)
+ ),
+ TP_fast_assign(
+ __entry->nf_hashval = nf->nf_hashval;
+ __entry->nf_handle = nf->nf_handle;
+ __entry->nf_ref = atomic_read(&nf->nf_ref);
+ __entry->nf_flags = nf->nf_flags;
+ __entry->nf_may = nf->nf_may;
+ __entry->nf_time = nf->nf_time;
+ __entry->nf_file = nf->nf_file;
+ ),
+ TP_printk("hash=0x%x handle=%s ref=%d flags=%s may=%s time=%lu file=%p",
+ __entry->nf_hashval,
+ show_nf_fh(__entry->nf_handle),
+ __entry->nf_ref,
+ show_nf_flags(__entry->nf_flags),
+ show_nf_may(__entry->nf_may),
+ __entry->nf_time,
+ __entry->nf_file)
+)
+
+#define DEFINE_NFSD_FILE_EVENT(name) \
+DEFINE_EVENT(nfsd_file_class, name, \
+ TP_PROTO(struct nfsd_file *nf), \
+ TP_ARGS(nf))
+
+DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put_locked);
+
+TRACE_EVENT(nfsd_file_acquire,
+ TP_PROTO(unsigned int hash, struct knfsd_fh *handle,
+ unsigned int may_flags, struct nfsd_file *nf,
+ __be32 status),
+
+ TP_ARGS(hash, handle, may_flags, nf, status),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, hash)
+ __field_struct(struct knfsd_fh, handle)
+ __field(unsigned int, may_flags)
+ __field(int, nf_ref)
+ __field(unsigned long, nf_flags)
+ __field(unsigned char, nf_may)
+ __field(unsigned long, nf_time)
+ __field(struct file *, nf_file)
+ __field(__be32, status)
+ ),
+
+ TP_fast_assign(
+ __entry->hash = hash;
+ __entry->handle = *handle;
+ __entry->may_flags = may_flags;
+ __entry->nf_ref = nf ? atomic_read(&nf->nf_ref) : 0;
+ __entry->nf_flags = nf ? nf->nf_flags : 0;
+ __entry->nf_may = nf ? nf->nf_may : 0;
+ __entry->nf_time = nf ? nf->nf_time : 0;
+ __entry->nf_file = nf ? nf->nf_file : NULL;
+ __entry->status = status;
+ ),
+
+ TP_printk("hash=0x%x handle=%s may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_time=%lu nf_file=0x%p status=%u",
+ __entry->hash, show_nf_fh(__entry->handle),
+ show_nf_may(__entry->may_flags), __entry->nf_ref,
+ show_nf_flags(__entry->nf_flags),
+ show_nf_may(__entry->nf_may), __entry->nf_time,
+ __entry->nf_file, be32_to_cpu(__entry->status))
+);
#endif /* _NFSD_TRACE_H */

#undef TRACE_INCLUDE_PATH
--
2.4.3


2015-08-07 15:25:58

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 02/18] nfsd: move some file caching helpers to a common header

On 8/6/2015 05:13, Jeff Layton wrote:
> We'll want to reuse some of this for common open file caching
> infrastructure.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/filecache.h | 25 +++++++++++++++++++++++++
> fs/nfsd/nfs4state.c | 17 ++---------------
> 2 files changed, 27 insertions(+), 15 deletions(-)
> create mode 100644 fs/nfsd/filecache.h
>
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> new file mode 100644
> index 000000000000..9051ee54faa3
> --- /dev/null
> +++ b/fs/nfsd/filecache.h
> @@ -0,0 +1,25 @@
> +#ifndef _FS_NFSD_FILECACHE_H
> +#define _FS_NFSD_FILECACHE_H
> +
> +#include <linux/jhash.h>
> +#include <linux/sunrpc/xdr.h>
> +
> +#include "export.h"
> +
> +/* hash table for nfs4_file */
> +#define NFSD_FILE_HASH_BITS 8
> +#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS)
> +
> +static inline unsigned int
> +nfsd_fh_hashval(struct knfsd_fh *fh)
> +{
> + return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0);
> +}
> +
> +static inline unsigned int
> +file_hashval(struct knfsd_fh *fh)

Update to nfsd_file_hashval maybe better ?

thanks,
Kinglong Mee

> +{
> + return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
> +}
> +
> +#endif /* _FS_NFSD_FILECACHE_H */
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 43903abd6189..b3b306bd1830 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -49,6 +49,7 @@
>
> #include "netns.h"
> #include "pnfs.h"
> +#include "filecache.h"
>
> #define NFSDDBG_FACILITY NFSDDBG_PROC
>
> @@ -381,21 +382,7 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
> return ret & OWNER_HASH_MASK;
> }
>
> -/* hash table for nfs4_file */
> -#define FILE_HASH_BITS 8
> -#define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
> -
> -static unsigned int nfsd_fh_hashval(struct knfsd_fh *fh)
> -{
> - return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0);
> -}
> -
> -static unsigned int file_hashval(struct knfsd_fh *fh)
> -{
> - return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1);
> -}
> -
> -static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
> +static struct hlist_head file_hashtbl[NFSD_FILE_HASH_SIZE];
>
> static void
> __nfs4_file_get_access(struct nfs4_file *fp, u32 access)
>

2015-08-07 15:26:31

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On 8/6/2015 05:13, Jeff Layton wrote:
> Currently, nfsd uses a singlethread workqueue for this, but that's not
> necessary. If we have multiple namespaces, then there's no need to
> serialize the laundromat runs since they are only requeued at the end of
> the work itself.
>
> Also, create_singlethread_workqueue adds the WQ_MEM_RECLAIM flag, which
> doesn't really seem to be necessary. The laundromat jobs are always
> kicked off via a timer, and not from memory reclaim paths. There's no
> need for a rescuer thread.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 23 +++++------------------
> fs/nfsd/nfsd.h | 1 +
> fs/nfsd/nfssvc.c | 14 +++++++++++++-
> 3 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index b3b306bd1830..c94859122e6f 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4366,7 +4366,6 @@ nfs4_laundromat(struct nfsd_net *nn)
> return new_timeo;
> }
>
> -static struct workqueue_struct *laundry_wq;
> static void laundromat_main(struct work_struct *);
>
> static void
> @@ -4380,7 +4379,7 @@ laundromat_main(struct work_struct *laundry)
>
> t = nfs4_laundromat(nn);
> dprintk("NFSD: laundromat_main - sleeping for %ld seconds\n", t);
> - queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ);
> + queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work, t*HZ);
> }
>
> static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
> @@ -6569,7 +6568,8 @@ nfs4_state_start_net(struct net *net)
> nfsd4_client_tracking_init(net);
> printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
> nn->nfsd4_grace, net);
> - queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
> + queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work,
> + nn->nfsd4_grace * HZ);
> return 0;
> }
>
> @@ -6583,22 +6583,10 @@ nfs4_state_start(void)
> ret = set_callback_cred();
> if (ret)
> return -ENOMEM;
> - laundry_wq = create_singlethread_workqueue("nfsd4");
> - if (laundry_wq == NULL) {
> - ret = -ENOMEM;
> - goto out_recovery;
> - }
> ret = nfsd4_create_callback_queue();
> - if (ret)
> - goto out_free_laundry;
> -
> - set_max_delegations();
> -
> - return 0;
> + if (!ret)
> + set_max_delegations();
>
> -out_free_laundry:
> - destroy_workqueue(laundry_wq);
> -out_recovery:
> return ret;
> }
>
> @@ -6635,7 +6623,6 @@ nfs4_state_shutdown_net(struct net *net)
> void
> nfs4_state_shutdown(void)
> {
> - destroy_workqueue(laundry_wq);
> nfsd4_destroy_callback_queue();
> }
>
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index cf980523898b..0199415344ff 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -62,6 +62,7 @@ struct readdir_cd {
> extern struct svc_program nfsd_program;
> extern struct svc_version nfsd_version2, nfsd_version3,
> nfsd_version4;
> +extern struct workqueue_struct *nfsd_laundry_wq;
> extern struct mutex nfsd_mutex;
> extern spinlock_t nfsd_drc_lock;
> extern unsigned long nfsd_drc_max_mem;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index ad4e2377dd63..ced9944201a0 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -17,6 +17,7 @@
> #include <linux/lockd/bind.h>
> #include <linux/nfsacl.h>
> #include <linux/seq_file.h>
> +#include <linux/workqueue.h>
> #include <net/net_namespace.h>
> #include "nfsd.h"
> #include "cache.h"
> @@ -28,6 +29,9 @@
> extern struct svc_program nfsd_program;
> static int nfsd(void *vrqstp);
>
> +/* A workqueue for nfsd-related cleanup tasks */
> +struct workqueue_struct *nfsd_laundry_wq;
> +
> /*
> * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members
> * of the svc_serv struct. In particular, ->sv_nrthreads but also to some
> @@ -224,11 +228,19 @@ static int nfsd_startup_generic(int nrservs)
> if (ret)
> goto dec_users;
>
> + ret = -ENOMEM;
> + nfsd_laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd-laundry");
> + if (!nfsd_laundry_wq)
> + goto out_racache;
> +
> ret = nfs4_state_start();
> if (ret)
> - goto out_racache;
> + goto out_wq;
> return 0;
>
> +out_wq:
> + destroy_workqueue(nfsd_laundry_wq);
> + nfsd_laundry_wq = NULL;
> out_racache:
> nfsd_racache_shutdown();
> dec_users:

Is a destroy_workqueue(nfsd_laundry_wq) needed in nfsd_shutdown_generic() ?

thanks,
Kinglong Mee

2015-08-07 15:29:10

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

On 8/6/2015 05:13, Jeff Layton wrote:
> Currently, NFSv2/3 reads and writes have to open a file, do the read or
> write and then close it again for each RPC. This is highly inefficient,
> especially when the underlying filesystem has a relatively slow open
> routine.
>
> This patch adds a new open file cache to knfsd. Rather than doing an
> open for each RPC, the read/write handlers can call into this cache to
> see if there is one already there for the correct filehandle and
> NFS_MAY_READ/WRITE flags.
>
> If there isn't an entry, then we create a new one and attempt to
> perform the open. If there is, then we wait until the entry is fully
> instantiated and return it if it is at the end of the wait. If it's
> not, then we attempt to take over construction.
>
> Since the main goal is to speed up NFSv2/3 I/O, we don't want to
> close these files on last put of these objects. We need to keep them
> around for a little while since we never know when the next READ/WRITE
> will come in.
>
> Cache entries have a hardcoded 1s timeout, and we have a recurring
> workqueue job that walks the cache and purges any entries that have
> expired.
>
> Signed-off-by: Jeff Layton <[email protected]>
... snip ...
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index 9051ee54faa3..adf7e78b8e43 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -4,6 +4,7 @@
> #include <linux/jhash.h>
> #include <linux/sunrpc/xdr.h>
>
> +#include "nfsfh.h"
> #include "export.h"
>
> /* hash table for nfs4_file */
> @@ -22,4 +23,24 @@ file_hashval(struct knfsd_fh *fh)
> return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
> }
>
> +struct nfsd_file {

There is a nfs4_file in nfsd, it's not easy to distinguish them for a new folk.
More comments or a meaningful name is better.

> + struct hlist_node nf_node;
> + struct list_head nf_dispose;
> + struct rcu_head nf_rcu;
> + struct file *nf_file;
> + unsigned long nf_time;
> +#define NFSD_FILE_HASHED (0)

Why not using hlist_unhashed()?

thanks,
Kinglong Mee

> +#define NFSD_FILE_PENDING (1)
> + unsigned long nf_flags;
> + struct knfsd_fh nf_handle;
> + unsigned int nf_hashval;
> + atomic_t nf_ref;
> + unsigned char nf_may;
> +};
> +
> +int nfsd_file_cache_init(void);
> +void nfsd_file_cache_shutdown(void);
> +void nfsd_file_put(struct nfsd_file *nf);
> +__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> + unsigned int may_flags, struct nfsd_file **nfp);
> #endif /* _FS_NFSD_FILECACHE_H */
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index ced9944201a0..0572441e23ec 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -23,6 +23,7 @@
> #include "cache.h"
> #include "vfs.h"
> #include "netns.h"
> +#include "filecache.h"
>
> #define NFSDDBG_FACILITY NFSDDBG_SVC
>
> @@ -233,11 +234,17 @@ static int nfsd_startup_generic(int nrservs)
> if (!nfsd_laundry_wq)
> goto out_racache;
>
> - ret = nfs4_state_start();
> + ret = nfsd_file_cache_init();
> if (ret)
> goto out_wq;
> +
> + ret = nfs4_state_start();
> + if (ret)
> + goto out_nfsd_file;
> return 0;
>
> +out_nfsd_file:
> + nfsd_file_cache_shutdown();
> out_wq:
> destroy_workqueue(nfsd_laundry_wq);
> nfsd_laundry_wq = NULL;
> @@ -254,6 +261,7 @@ static void nfsd_shutdown_generic(void)
> return;
>
> nfs4_state_shutdown();
> + nfsd_file_cache_shutdown();
> nfsd_racache_shutdown();
> }
>
>

2015-08-07 15:30:25

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 06/18] nfsd: hook up nfsd_read to the nfsd_file cache


On 8/6/2015 05:13, Jeff Layton wrote:
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/vfs.c | 20 +++++++-------------
> 1 file changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 59234d1d8d8e..fd688c86af66 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -980,20 +980,14 @@ out_nfserr:
> __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
> {
> - struct file *file;
> - struct raparms *ra;
> - __be32 err;
> -
> - err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
> - if (err)
> - return err;
> -
> - ra = nfsd_init_raparms(file);
> - err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
> - if (ra)
> - nfsd_put_raparams(file, ra);

Drop the raparms here ?

thanks,
Kinglong Mee

> - fput(file);
> + __be32 err;
> + struct nfsd_file *nf;
>
> + err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> + if (err == nfs_ok)
> + err = nfsd_vfs_read(rqstp, nf->nf_file, offset, vec, vlen,
> + count);
> + nfsd_file_put(nf);
> return err;
> }
>
>

2015-08-07 15:34:27

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 16/18] nfsd: add a fh_compose_shallow

On 8/6/2015 05:13, Jeff Layton wrote:
> In a later patch, we'll need to be able to cook up a knfsd_fh from a
> dentry/export combo. Usually nfsd works with svc_fh's, but that takes
> a lot of extra baggage that we don't really need here.
>
> Add a routine that encodes the filehandle directly into a knfsd_fh
> instead. Much like we have a fh_copy_shallow, the new routine is called
> fh_compose_shallow.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/nfsfh.c | 112 +++++++++++++++++++++++++++++---------------------------
> fs/nfsd/nfsfh.h | 2 +
> 2 files changed, 60 insertions(+), 54 deletions(-)
>
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index b601f291d825..ee6d4b746467 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -511,32 +511,64 @@ retry:
> }
>
> __be32
> -fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> - struct svc_fh *ref_fh)
> +fh_compose_shallow(struct knfsd_fh *kfh, int maxsize, struct svc_export *exp,
> + struct dentry *dentry, struct svc_fh *ref_fh)
> {
> - /* ref_fh is a reference file handle.
> - * if it is non-null and for the same filesystem, then we should compose
> - * a filehandle which is of the same version, where possible.
> - * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
> - * Then create a 32byte filehandle using nfs_fhbase_old
> - *
> - */
> + struct inode *inode = d_inode(dentry);
> + dev_t ex_dev = exp_sb(exp)->s_dev;
>
> - struct inode * inode = d_inode(dentry);
> - dev_t ex_dev = exp_sb(exp)->s_dev;
> -
> - dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
> - MAJOR(ex_dev), MINOR(ex_dev),
> + dprintk("nfsd: %s(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
> + __func__, MAJOR(ex_dev), MINOR(ex_dev),
> (long) d_inode(exp->ex_path.dentry)->i_ino,
> - dentry,
> - (inode ? inode->i_ino : 0));
> + dentry, (inode ? inode->i_ino : 0));
>
> - /* Choose filehandle version and fsid type based on
> - * the reference filehandle (if it is in the same export)
> - * or the export options.
> + /*
> + * Choose filehandle version and fsid type based on the reference
> + * filehandle (if it is in the same export) or the export options.
> */
> - set_version_and_fsid_type(&fhp->fh_handle, fhp->fh_maxsize,
> - exp, ref_fh);
> + set_version_and_fsid_type(kfh, maxsize, exp, ref_fh);
> + if (kfh->fh_version == 0xca) {
> + /* old style filehandle please */
> + memset(&kfh->fh_base, 0, NFS_FHSIZE);
> + kfh->fh_size = NFS_FHSIZE;
> + kfh->ofh_dcookie = 0xfeebbaca;
> + kfh->ofh_dev = old_encode_dev(ex_dev);
> + kfh->ofh_xdev = kfh->ofh_dev;
> + kfh->ofh_xino =
> + ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
> + kfh->ofh_dirino = ino_t_to_u32(parent_ino(dentry));
> + if (inode)
> + _fh_update_old(dentry, exp, kfh);
> + } else {
> + kfh->fh_size = key_len(kfh->fh_fsid_type) + 4;
> + kfh->fh_auth_type = 0;
> +
> + mk_fsid(kfh->fh_fsid_type, kfh->fh_fsid, ex_dev,
> + d_inode(exp->ex_path.dentry)->i_ino,
> + exp->ex_fsid, exp->ex_uuid);
> +
> + if (inode)
> + _fh_update(kfh, maxsize, exp, dentry);
> +
> + if (kfh->fh_fileid_type == FILEID_INVALID)
> + return nfserr_opnotsupp;
> + }
> + return nfs_ok;
> +}
> +
> +/*
> + * ref_fh is a reference file handle.
> + * if it is non-null and for the same filesystem, then we should compose
> + * a filehandle which is of the same version, where possible.
> + * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
> + * Then create a 32byte filehandle using nfs_fhbase_old
> + *
> + */
> +__be32
> +fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> + struct svc_fh *ref_fh)
> +{
> + __be32 status;
>
> if (ref_fh == fhp)
> fh_put(ref_fh);

set_version_and_fsid_type will use fh_dentry in ref_fh,
So it *must* be used with a valid ref_fh.

With this patch, fh_put(ref_fh) will put fh_dentry and set to NULL!!!

Also, pynfs4.1's LKPP1d failed as,
LKPP1d st_lookupp.testLookupp : FAILURE
LOOKUPP returned
'\x01\x00\x07\x00\x02\x00\x00\x00\x00\x00\x00\x004\x93\x1f\x04L*C\x9b\x95L]\x83\x0f\xa4\xbe\x8c',
expected '\x01\x00\x01\x00\x00\x00\x00\x00'

Move set_version_and_fsid_type back, before fh_put(ref_fh).

thanks,
Kinglong Mee

> @@ -553,39 +585,11 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> fhp->fh_dentry = dget(dentry); /* our internal copy */
> fhp->fh_export = exp_get(exp);
>
> - if (fhp->fh_handle.fh_version == 0xca) {
> - /* old style filehandle please */
> - memset(&fhp->fh_handle.fh_base, 0, NFS_FHSIZE);
> - fhp->fh_handle.fh_size = NFS_FHSIZE;
> - fhp->fh_handle.ofh_dcookie = 0xfeebbaca;
> - fhp->fh_handle.ofh_dev = old_encode_dev(ex_dev);
> - fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev;
> - fhp->fh_handle.ofh_xino =
> - ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
> - fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry));
> - if (inode)
> - _fh_update_old(dentry, exp, &fhp->fh_handle);
> - } else {
> - fhp->fh_handle.fh_size =
> - key_len(fhp->fh_handle.fh_fsid_type) + 4;
> - fhp->fh_handle.fh_auth_type = 0;
> -
> - mk_fsid(fhp->fh_handle.fh_fsid_type,
> - fhp->fh_handle.fh_fsid,
> - ex_dev,
> - d_inode(exp->ex_path.dentry)->i_ino,
> - exp->ex_fsid, exp->ex_uuid);
> -
> - if (inode)
> - _fh_update(&fhp->fh_handle, fhp->fh_maxsize,
> - exp, dentry);
> - if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) {
> - fh_put(fhp);
> - return nfserr_opnotsupp;
> - }
> - }
> -
> - return 0;
> + status = fh_compose_shallow(&fhp->fh_handle, fhp->fh_maxsize, exp,
> + dentry, ref_fh);
> + if (status != nfs_ok)
> + fh_put(fhp);
> + return status;
> }
>
> /*
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 1e90dad4926b..dec472131588 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -159,6 +159,8 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
> * Function prototypes
> */
> __be32 fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
> +__be32 fh_compose_shallow(struct knfsd_fh *, int, struct svc_export *,
> + struct dentry *, struct svc_fh *);
> __be32 fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
> __be32 fh_update(struct svc_fh *);
> void fh_put(struct svc_fh *);
>

2015-08-07 17:07:12

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 02/18] nfsd: move some file caching helpers to a common header

On Fri, 7 Aug 2015 23:25:27 +0800
Kinglong Mee <[email protected]> wrote:

> On 8/6/2015 05:13, Jeff Layton wrote:
> > We'll want to reuse some of this for common open file caching
> > infrastructure.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/filecache.h | 25 +++++++++++++++++++++++++
> > fs/nfsd/nfs4state.c | 17 ++---------------
> > 2 files changed, 27 insertions(+), 15 deletions(-)
> > create mode 100644 fs/nfsd/filecache.h
> >
> > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > new file mode 100644
> > index 000000000000..9051ee54faa3
> > --- /dev/null
> > +++ b/fs/nfsd/filecache.h
> > @@ -0,0 +1,25 @@
> > +#ifndef _FS_NFSD_FILECACHE_H
> > +#define _FS_NFSD_FILECACHE_H
> > +
> > +#include <linux/jhash.h>
> > +#include <linux/sunrpc/xdr.h>
> > +
> > +#include "export.h"
> > +
> > +/* hash table for nfs4_file */
> > +#define NFSD_FILE_HASH_BITS 8
> > +#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS)
> > +
> > +static inline unsigned int
> > +nfsd_fh_hashval(struct knfsd_fh *fh)
> > +{
> > + return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0);
> > +}
> > +
> > +static inline unsigned int
> > +file_hashval(struct knfsd_fh *fh)
>
> Update to nfsd_file_hashval maybe better ?
>
> thanks,
> Kinglong Mee
>

Suggestions? Both hashtables are hashing on the filehandle, so I'm not
sure what would be better...

> > +{
> > + return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
> > +}
> > +
> > +#endif /* _FS_NFSD_FILECACHE_H */
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 43903abd6189..b3b306bd1830 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -49,6 +49,7 @@
> >
> > #include "netns.h"
> > #include "pnfs.h"
> > +#include "filecache.h"
> >
> > #define NFSDDBG_FACILITY NFSDDBG_PROC
> >
> > @@ -381,21 +382,7 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
> > return ret & OWNER_HASH_MASK;
> > }
> >
> > -/* hash table for nfs4_file */
> > -#define FILE_HASH_BITS 8
> > -#define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
> > -
> > -static unsigned int nfsd_fh_hashval(struct knfsd_fh *fh)
> > -{
> > - return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0);
> > -}
> > -
> > -static unsigned int file_hashval(struct knfsd_fh *fh)
> > -{
> > - return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1);
> > -}
> > -
> > -static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
> > +static struct hlist_head file_hashtbl[NFSD_FILE_HASH_SIZE];
> >
> > static void
> > __nfs4_file_get_access(struct nfs4_file *fp, u32 access)
> >


--
Jeff Layton <[email protected]>

2015-08-07 17:12:03

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On Fri, 7 Aug 2015 23:26:02 +0800
Kinglong Mee <[email protected]> wrote:

> On 8/6/2015 05:13, Jeff Layton wrote:
> > Currently, nfsd uses a singlethread workqueue for this, but that's not
> > necessary. If we have multiple namespaces, then there's no need to
> > serialize the laundromat runs since they are only requeued at the end of
> > the work itself.
> >
> > Also, create_singlethread_workqueue adds the WQ_MEM_RECLAIM flag, which
> > doesn't really seem to be necessary. The laundromat jobs are always
> > kicked off via a timer, and not from memory reclaim paths. There's no
> > need for a rescuer thread.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 23 +++++------------------
> > fs/nfsd/nfsd.h | 1 +
> > fs/nfsd/nfssvc.c | 14 +++++++++++++-
> > 3 files changed, 19 insertions(+), 19 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index b3b306bd1830..c94859122e6f 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -4366,7 +4366,6 @@ nfs4_laundromat(struct nfsd_net *nn)
> > return new_timeo;
> > }
> >
> > -static struct workqueue_struct *laundry_wq;
> > static void laundromat_main(struct work_struct *);
> >
> > static void
> > @@ -4380,7 +4379,7 @@ laundromat_main(struct work_struct *laundry)
> >
> > t = nfs4_laundromat(nn);
> > dprintk("NFSD: laundromat_main - sleeping for %ld seconds\n", t);
> > - queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ);
> > + queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work, t*HZ);
> > }
> >
> > static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
> > @@ -6569,7 +6568,8 @@ nfs4_state_start_net(struct net *net)
> > nfsd4_client_tracking_init(net);
> > printk(KERN_INFO "NFSD: starting %ld-second grace period (net %p)\n",
> > nn->nfsd4_grace, net);
> > - queue_delayed_work(laundry_wq, &nn->laundromat_work, nn->nfsd4_grace * HZ);
> > + queue_delayed_work(nfsd_laundry_wq, &nn->laundromat_work,
> > + nn->nfsd4_grace * HZ);
> > return 0;
> > }
> >
> > @@ -6583,22 +6583,10 @@ nfs4_state_start(void)
> > ret = set_callback_cred();
> > if (ret)
> > return -ENOMEM;
> > - laundry_wq = create_singlethread_workqueue("nfsd4");
> > - if (laundry_wq == NULL) {
> > - ret = -ENOMEM;
> > - goto out_recovery;
> > - }
> > ret = nfsd4_create_callback_queue();
> > - if (ret)
> > - goto out_free_laundry;
> > -
> > - set_max_delegations();
> > -
> > - return 0;
> > + if (!ret)
> > + set_max_delegations();
> >
> > -out_free_laundry:
> > - destroy_workqueue(laundry_wq);
> > -out_recovery:
> > return ret;
> > }
> >
> > @@ -6635,7 +6623,6 @@ nfs4_state_shutdown_net(struct net *net)
> > void
> > nfs4_state_shutdown(void)
> > {
> > - destroy_workqueue(laundry_wq);
> > nfsd4_destroy_callback_queue();
> > }
> >
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index cf980523898b..0199415344ff 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -62,6 +62,7 @@ struct readdir_cd {
> > extern struct svc_program nfsd_program;
> > extern struct svc_version nfsd_version2, nfsd_version3,
> > nfsd_version4;
> > +extern struct workqueue_struct *nfsd_laundry_wq;
> > extern struct mutex nfsd_mutex;
> > extern spinlock_t nfsd_drc_lock;
> > extern unsigned long nfsd_drc_max_mem;
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index ad4e2377dd63..ced9944201a0 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -17,6 +17,7 @@
> > #include <linux/lockd/bind.h>
> > #include <linux/nfsacl.h>
> > #include <linux/seq_file.h>
> > +#include <linux/workqueue.h>
> > #include <net/net_namespace.h>
> > #include "nfsd.h"
> > #include "cache.h"
> > @@ -28,6 +29,9 @@
> > extern struct svc_program nfsd_program;
> > static int nfsd(void *vrqstp);
> >
> > +/* A workqueue for nfsd-related cleanup tasks */
> > +struct workqueue_struct *nfsd_laundry_wq;
> > +
> > /*
> > * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members
> > * of the svc_serv struct. In particular, ->sv_nrthreads but also to some
> > @@ -224,11 +228,19 @@ static int nfsd_startup_generic(int nrservs)
> > if (ret)
> > goto dec_users;
> >
> > + ret = -ENOMEM;
> > + nfsd_laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd-laundry");
> > + if (!nfsd_laundry_wq)
> > + goto out_racache;
> > +
> > ret = nfs4_state_start();
> > if (ret)
> > - goto out_racache;
> > + goto out_wq;
> > return 0;
> >
> > +out_wq:
> > + destroy_workqueue(nfsd_laundry_wq);
> > + nfsd_laundry_wq = NULL;
> > out_racache:
> > nfsd_racache_shutdown();
> > dec_users:
>
> Is a destroy_workqueue(nfsd_laundry_wq) needed in nfsd_shutdown_generic() ?
>
> thanks,
> Kinglong Mee

Yes! I had that in an earlier version of this, but somehow it got
dropped. Good catch. I'll fix that for the next respin.

--
Jeff Layton <[email protected]>

2015-08-07 17:19:00

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

On Fri, 7 Aug 2015 23:28:41 +0800
Kinglong Mee <[email protected]> wrote:

> On 8/6/2015 05:13, Jeff Layton wrote:
> > Currently, NFSv2/3 reads and writes have to open a file, do the read or
> > write and then close it again for each RPC. This is highly inefficient,
> > especially when the underlying filesystem has a relatively slow open
> > routine.
> >
> > This patch adds a new open file cache to knfsd. Rather than doing an
> > open for each RPC, the read/write handlers can call into this cache to
> > see if there is one already there for the correct filehandle and
> > NFS_MAY_READ/WRITE flags.
> >
> > If there isn't an entry, then we create a new one and attempt to
> > perform the open. If there is, then we wait until the entry is fully
> > instantiated and return it if it is at the end of the wait. If it's
> > not, then we attempt to take over construction.
> >
> > Since the main goal is to speed up NFSv2/3 I/O, we don't want to
> > close these files on last put of these objects. We need to keep them
> > around for a little while since we never know when the next READ/WRITE
> > will come in.
> >
> > Cache entries have a hardcoded 1s timeout, and we have a recurring
> > workqueue job that walks the cache and purges any entries that have
> > expired.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> ... snip ...
> > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > index 9051ee54faa3..adf7e78b8e43 100644
> > --- a/fs/nfsd/filecache.h
> > +++ b/fs/nfsd/filecache.h
> > @@ -4,6 +4,7 @@
> > #include <linux/jhash.h>
> > #include <linux/sunrpc/xdr.h>
> >
> > +#include "nfsfh.h"
> > #include "export.h"
> >
> > /* hash table for nfs4_file */
> > @@ -22,4 +23,24 @@ file_hashval(struct knfsd_fh *fh)
> > return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
> > }
> >
> > +struct nfsd_file {
>
> There is a nfs4_file in nfsd, it's not easy to distinguish them for a new folk.
> More comments or a meaningful name is better.
>

Maybe. Again, any suggestions? My hope was that eventually we can unify
the two caches somehow but maybe that's naive.

> > + struct hlist_node nf_node;
> > + struct list_head nf_dispose;
> > + struct rcu_head nf_rcu;
> > + struct file *nf_file;
> > + unsigned long nf_time;
> > +#define NFSD_FILE_HASHED (0)
>
> Why not using hlist_unhashed()?
>

These entries are removed from the list using hlist_del_rcu(), and
hlist_unhashed will not return true after that.


> > +#define NFSD_FILE_PENDING (1)
> > + unsigned long nf_flags;
> > + struct knfsd_fh nf_handle;
> > + unsigned int nf_hashval;
> > + atomic_t nf_ref;
> > + unsigned char nf_may;
> > +};
> > +
> > +int nfsd_file_cache_init(void);
> > +void nfsd_file_cache_shutdown(void);
> > +void nfsd_file_put(struct nfsd_file *nf);
> > +__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > + unsigned int may_flags, struct nfsd_file **nfp);
> > #endif /* _FS_NFSD_FILECACHE_H */
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index ced9944201a0..0572441e23ec 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -23,6 +23,7 @@
> > #include "cache.h"
> > #include "vfs.h"
> > #include "netns.h"
> > +#include "filecache.h"
> >
> > #define NFSDDBG_FACILITY NFSDDBG_SVC
> >
> > @@ -233,11 +234,17 @@ static int nfsd_startup_generic(int nrservs)
> > if (!nfsd_laundry_wq)
> > goto out_racache;
> >
> > - ret = nfs4_state_start();
> > + ret = nfsd_file_cache_init();
> > if (ret)
> > goto out_wq;
> > +
> > + ret = nfs4_state_start();
> > + if (ret)
> > + goto out_nfsd_file;
> > return 0;
> >
> > +out_nfsd_file:
> > + nfsd_file_cache_shutdown();
> > out_wq:
> > destroy_workqueue(nfsd_laundry_wq);
> > nfsd_laundry_wq = NULL;
> > @@ -254,6 +261,7 @@ static void nfsd_shutdown_generic(void)
> > return;
> >
> > nfs4_state_shutdown();
> > + nfsd_file_cache_shutdown();
> > nfsd_racache_shutdown();
> > }
> >
> >


--
Jeff Layton <[email protected]>

2015-08-07 17:26:53

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 06/18] nfsd: hook up nfsd_read to the nfsd_file cache

On Fri, 7 Aug 2015 23:29:52 +0800
Kinglong Mee <[email protected]> wrote:

>
> On 8/6/2015 05:13, Jeff Layton wrote:
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/vfs.c | 20 +++++++-------------
> > 1 file changed, 7 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 59234d1d8d8e..fd688c86af66 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -980,20 +980,14 @@ out_nfserr:
> > __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
> > {
> > - struct file *file;
> > - struct raparms *ra;
> > - __be32 err;
> > -
> > - err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
> > - if (err)
> > - return err;
> > -
> > - ra = nfsd_init_raparms(file);
> > - err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
> > - if (ra)
> > - nfsd_put_raparams(file, ra);
>
> Drop the raparms here ?
>
>

I'm not sure I understand your question. Are you asking why I dropped
the raparms from this code?

If so, the reason is that we shouldn't need it any longer. We only keep
that cache now because we do an open for every READ RPC. With this, a
streaming read should end up using the same struct file, at least as
long as there's not _too_ long a delay between READ RPCs. The normal
vfs readahead machinery should work properly with this change.

I'd like to eventually have us hook this up to the nfs4_file cache as
well. I stopped short of that here since I didn't need that immediately
for what I'm working on, but it should be possible to make the
nfs4_file cache use the nfsd_file cache instead of calling dentry_open
directly. Once we do that, then I don't think we'll need the raparms
cache at all anymore.

> > - fput(file);
> > + __be32 err;
> > + struct nfsd_file *nf;
> >
> > + err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> > + if (err == nfs_ok)
> > + err = nfsd_vfs_read(rqstp, nf->nf_file, offset, vec, vlen,
> > + count);
> > + nfsd_file_put(nf);
> > return err;
> > }
> >
> >


--
Jeff Layton <[email protected]>

2015-08-07 17:56:15

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 16/18] nfsd: add a fh_compose_shallow

On Fri, 7 Aug 2015 23:33:58 +0800
Kinglong Mee <[email protected]> wrote:

> On 8/6/2015 05:13, Jeff Layton wrote:
> > In a later patch, we'll need to be able to cook up a knfsd_fh from a
> > dentry/export combo. Usually nfsd works with svc_fh's, but that takes
> > a lot of extra baggage that we don't really need here.
> >
> > Add a routine that encodes the filehandle directly into a knfsd_fh
> > instead. Much like we have a fh_copy_shallow, the new routine is called
> > fh_compose_shallow.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/nfsfh.c | 112 +++++++++++++++++++++++++++++---------------------------
> > fs/nfsd/nfsfh.h | 2 +
> > 2 files changed, 60 insertions(+), 54 deletions(-)
> >
> > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > index b601f291d825..ee6d4b746467 100644
> > --- a/fs/nfsd/nfsfh.c
> > +++ b/fs/nfsd/nfsfh.c
> > @@ -511,32 +511,64 @@ retry:
> > }
> >
> > __be32
> > -fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> > - struct svc_fh *ref_fh)
> > +fh_compose_shallow(struct knfsd_fh *kfh, int maxsize, struct svc_export *exp,
> > + struct dentry *dentry, struct svc_fh *ref_fh)
> > {
> > - /* ref_fh is a reference file handle.
> > - * if it is non-null and for the same filesystem, then we should compose
> > - * a filehandle which is of the same version, where possible.
> > - * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
> > - * Then create a 32byte filehandle using nfs_fhbase_old
> > - *
> > - */
> > + struct inode *inode = d_inode(dentry);
> > + dev_t ex_dev = exp_sb(exp)->s_dev;
> >
> > - struct inode * inode = d_inode(dentry);
> > - dev_t ex_dev = exp_sb(exp)->s_dev;
> > -
> > - dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
> > - MAJOR(ex_dev), MINOR(ex_dev),
> > + dprintk("nfsd: %s(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
> > + __func__, MAJOR(ex_dev), MINOR(ex_dev),
> > (long) d_inode(exp->ex_path.dentry)->i_ino,
> > - dentry,
> > - (inode ? inode->i_ino : 0));
> > + dentry, (inode ? inode->i_ino : 0));
> >
> > - /* Choose filehandle version and fsid type based on
> > - * the reference filehandle (if it is in the same export)
> > - * or the export options.
> > + /*
> > + * Choose filehandle version and fsid type based on the reference
> > + * filehandle (if it is in the same export) or the export options.
> > */
> > - set_version_and_fsid_type(&fhp->fh_handle, fhp->fh_maxsize,
> > - exp, ref_fh);
> > + set_version_and_fsid_type(kfh, maxsize, exp, ref_fh);
> > + if (kfh->fh_version == 0xca) {
> > + /* old style filehandle please */
> > + memset(&kfh->fh_base, 0, NFS_FHSIZE);
> > + kfh->fh_size = NFS_FHSIZE;
> > + kfh->ofh_dcookie = 0xfeebbaca;
> > + kfh->ofh_dev = old_encode_dev(ex_dev);
> > + kfh->ofh_xdev = kfh->ofh_dev;
> > + kfh->ofh_xino =
> > + ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
> > + kfh->ofh_dirino = ino_t_to_u32(parent_ino(dentry));
> > + if (inode)
> > + _fh_update_old(dentry, exp, kfh);
> > + } else {
> > + kfh->fh_size = key_len(kfh->fh_fsid_type) + 4;
> > + kfh->fh_auth_type = 0;
> > +
> > + mk_fsid(kfh->fh_fsid_type, kfh->fh_fsid, ex_dev,
> > + d_inode(exp->ex_path.dentry)->i_ino,
> > + exp->ex_fsid, exp->ex_uuid);
> > +
> > + if (inode)
> > + _fh_update(kfh, maxsize, exp, dentry);
> > +
> > + if (kfh->fh_fileid_type == FILEID_INVALID)
> > + return nfserr_opnotsupp;
> > + }
> > + return nfs_ok;
> > +}
> > +
> > +/*
> > + * ref_fh is a reference file handle.
> > + * if it is non-null and for the same filesystem, then we should compose
> > + * a filehandle which is of the same version, where possible.
> > + * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
> > + * Then create a 32byte filehandle using nfs_fhbase_old
> > + *
> > + */
> > +__be32
> > +fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> > + struct svc_fh *ref_fh)
> > +{
> > + __be32 status;
> >
> > if (ref_fh == fhp)
> > fh_put(ref_fh);
>
> set_version_and_fsid_type will use fh_dentry in ref_fh,
> So it *must* be used with a valid ref_fh.
>
> With this patch, fh_put(ref_fh) will put fh_dentry and set to NULL!!!
>
> Also, pynfs4.1's LKPP1d failed as,
> LKPP1d st_lookupp.testLookupp : FAILURE
> LOOKUPP returned
> '\x01\x00\x07\x00\x02\x00\x00\x00\x00\x00\x00\x004\x93\x1f\x04L*C\x9b\x95L]\x83\x0f\xa4\xbe\x8c',
> expected '\x01\x00\x01\x00\x00\x00\x00\x00'
>
> Move set_version_and_fsid_type back, before fh_put(ref_fh).
>
> thanks,
> Kinglong Mee
>

Ahh ok, it took me a minute but I see what you're saying now. Yes, this
is broken. I'll fix it, but I'm yet convinced that that is the best way.

Thanks,
Jeff

> > @@ -553,39 +585,11 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> > fhp->fh_dentry = dget(dentry); /* our internal copy */
> > fhp->fh_export = exp_get(exp);
> >
> > - if (fhp->fh_handle.fh_version == 0xca) {
> > - /* old style filehandle please */
> > - memset(&fhp->fh_handle.fh_base, 0, NFS_FHSIZE);
> > - fhp->fh_handle.fh_size = NFS_FHSIZE;
> > - fhp->fh_handle.ofh_dcookie = 0xfeebbaca;
> > - fhp->fh_handle.ofh_dev = old_encode_dev(ex_dev);
> > - fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev;
> > - fhp->fh_handle.ofh_xino =
> > - ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
> > - fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry));
> > - if (inode)
> > - _fh_update_old(dentry, exp, &fhp->fh_handle);
> > - } else {
> > - fhp->fh_handle.fh_size =
> > - key_len(fhp->fh_handle.fh_fsid_type) + 4;
> > - fhp->fh_handle.fh_auth_type = 0;
> > -
> > - mk_fsid(fhp->fh_handle.fh_fsid_type,
> > - fhp->fh_handle.fh_fsid,
> > - ex_dev,
> > - d_inode(exp->ex_path.dentry)->i_ino,
> > - exp->ex_fsid, exp->ex_uuid);
> > -
> > - if (inode)
> > - _fh_update(&fhp->fh_handle, fhp->fh_maxsize,
> > - exp, dentry);
> > - if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) {
> > - fh_put(fhp);
> > - return nfserr_opnotsupp;
> > - }
> > - }
> > -
> > - return 0;
> > + status = fh_compose_shallow(&fhp->fh_handle, fhp->fh_maxsize, exp,
> > + dentry, ref_fh);
> > + if (status != nfs_ok)
> > + fh_put(fhp);
> > + return status;
> > }
> >
> > /*
> > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > index 1e90dad4926b..dec472131588 100644
> > --- a/fs/nfsd/nfsfh.h
> > +++ b/fs/nfsd/nfsfh.h
> > @@ -159,6 +159,8 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
> > * Function prototypes
> > */
> > __be32 fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
> > +__be32 fh_compose_shallow(struct knfsd_fh *, int, struct svc_export *,
> > + struct dentry *, struct svc_fh *);
> > __be32 fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
> > __be32 fh_update(struct svc_fh *);
> > void fh_put(struct svc_fh *);
> >


--
Jeff Layton <[email protected]>

2015-08-07 18:25:00

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 16/18] nfsd: add a fh_compose_shallow

On Fri, 7 Aug 2015 13:56:10 -0400
Jeff Layton <[email protected]> wrote:

> On Fri, 7 Aug 2015 23:33:58 +0800
> Kinglong Mee <[email protected]> wrote:
>
> > On 8/6/2015 05:13, Jeff Layton wrote:
> > > In a later patch, we'll need to be able to cook up a knfsd_fh from a
> > > dentry/export combo. Usually nfsd works with svc_fh's, but that takes
> > > a lot of extra baggage that we don't really need here.
> > >
> > > Add a routine that encodes the filehandle directly into a knfsd_fh
> > > instead. Much like we have a fh_copy_shallow, the new routine is called
> > > fh_compose_shallow.
> > >
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > > fs/nfsd/nfsfh.c | 112 +++++++++++++++++++++++++++++---------------------------
> > > fs/nfsd/nfsfh.h | 2 +
> > > 2 files changed, 60 insertions(+), 54 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > > index b601f291d825..ee6d4b746467 100644
> > > --- a/fs/nfsd/nfsfh.c
> > > +++ b/fs/nfsd/nfsfh.c
> > > @@ -511,32 +511,64 @@ retry:
> > > }
> > >
> > > __be32
> > > -fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> > > - struct svc_fh *ref_fh)
> > > +fh_compose_shallow(struct knfsd_fh *kfh, int maxsize, struct svc_export *exp,
> > > + struct dentry *dentry, struct svc_fh *ref_fh)
> > > {
> > > - /* ref_fh is a reference file handle.
> > > - * if it is non-null and for the same filesystem, then we should compose
> > > - * a filehandle which is of the same version, where possible.
> > > - * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
> > > - * Then create a 32byte filehandle using nfs_fhbase_old
> > > - *
> > > - */
> > > + struct inode *inode = d_inode(dentry);
> > > + dev_t ex_dev = exp_sb(exp)->s_dev;
> > >
> > > - struct inode * inode = d_inode(dentry);
> > > - dev_t ex_dev = exp_sb(exp)->s_dev;
> > > -
> > > - dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
> > > - MAJOR(ex_dev), MINOR(ex_dev),
> > > + dprintk("nfsd: %s(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
> > > + __func__, MAJOR(ex_dev), MINOR(ex_dev),
> > > (long) d_inode(exp->ex_path.dentry)->i_ino,
> > > - dentry,
> > > - (inode ? inode->i_ino : 0));
> > > + dentry, (inode ? inode->i_ino : 0));
> > >
> > > - /* Choose filehandle version and fsid type based on
> > > - * the reference filehandle (if it is in the same export)
> > > - * or the export options.
> > > + /*
> > > + * Choose filehandle version and fsid type based on the reference
> > > + * filehandle (if it is in the same export) or the export options.
> > > */
> > > - set_version_and_fsid_type(&fhp->fh_handle, fhp->fh_maxsize,
> > > - exp, ref_fh);
> > > + set_version_and_fsid_type(kfh, maxsize, exp, ref_fh);
> > > + if (kfh->fh_version == 0xca) {
> > > + /* old style filehandle please */
> > > + memset(&kfh->fh_base, 0, NFS_FHSIZE);
> > > + kfh->fh_size = NFS_FHSIZE;
> > > + kfh->ofh_dcookie = 0xfeebbaca;
> > > + kfh->ofh_dev = old_encode_dev(ex_dev);
> > > + kfh->ofh_xdev = kfh->ofh_dev;
> > > + kfh->ofh_xino =
> > > + ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
> > > + kfh->ofh_dirino = ino_t_to_u32(parent_ino(dentry));
> > > + if (inode)
> > > + _fh_update_old(dentry, exp, kfh);
> > > + } else {
> > > + kfh->fh_size = key_len(kfh->fh_fsid_type) + 4;
> > > + kfh->fh_auth_type = 0;
> > > +
> > > + mk_fsid(kfh->fh_fsid_type, kfh->fh_fsid, ex_dev,
> > > + d_inode(exp->ex_path.dentry)->i_ino,
> > > + exp->ex_fsid, exp->ex_uuid);
> > > +
> > > + if (inode)
> > > + _fh_update(kfh, maxsize, exp, dentry);
> > > +
> > > + if (kfh->fh_fileid_type == FILEID_INVALID)
> > > + return nfserr_opnotsupp;
> > > + }
> > > + return nfs_ok;
> > > +}
> > > +
> > > +/*
> > > + * ref_fh is a reference file handle.
> > > + * if it is non-null and for the same filesystem, then we should compose
> > > + * a filehandle which is of the same version, where possible.
> > > + * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
> > > + * Then create a 32byte filehandle using nfs_fhbase_old
> > > + *
> > > + */
> > > +__be32
> > > +fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> > > + struct svc_fh *ref_fh)
> > > +{
> > > + __be32 status;
> > >
> > > if (ref_fh == fhp)
> > > fh_put(ref_fh);
> >
> > set_version_and_fsid_type will use fh_dentry in ref_fh,
> > So it *must* be used with a valid ref_fh.
> >
> > With this patch, fh_put(ref_fh) will put fh_dentry and set to NULL!!!
> >
> > Also, pynfs4.1's LKPP1d failed as,
> > LKPP1d st_lookupp.testLookupp : FAILURE
> > LOOKUPP returned
> > '\x01\x00\x07\x00\x02\x00\x00\x00\x00\x00\x00\x004\x93\x1f\x04L*C\x9b\x95L]\x83\x0f\xa4\xbe\x8c',
> > expected '\x01\x00\x01\x00\x00\x00\x00\x00'
> >
> > Move set_version_and_fsid_type back, before fh_put(ref_fh).
> >
> > thanks,
> > Kinglong Mee
> >
>
> Ahh ok, it took me a minute but I see what you're saying now. Yes, this
> is broken. I'll fix it, but I'm yet convinced that that is the best way.
>
> Thanks,
> Jeff
>

Ok, I think this may be a better fix. I'll roll this into the original
patch before I send the next respin, but wanted to send it along first
to see whether you see any issue with it

The idea here is to instead encode the filehandle first, and only do
the fh_put for the ref_fh and take dentry/export references if that
succeeds.

Does that look like a reasonable fix to you, or am I missing something?
I'll also make sure I run pynfs against this before I submit the next
iteration of the series.

--
Jeff Layton <[email protected]>

2015-08-08 00:14:45

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

On 8/8/2015 01:18, Jeff Layton wrote:
> On Fri, 7 Aug 2015 23:28:41 +0800
> Kinglong Mee <[email protected]> wrote:
>
>> On 8/6/2015 05:13, Jeff Layton wrote:
>>> Currently, NFSv2/3 reads and writes have to open a file, do the read or
>>> write and then close it again for each RPC. This is highly inefficient,
>>> especially when the underlying filesystem has a relatively slow open
>>> routine.
>>>
>>> This patch adds a new open file cache to knfsd. Rather than doing an
>>> open for each RPC, the read/write handlers can call into this cache to
>>> see if there is one already there for the correct filehandle and
>>> NFS_MAY_READ/WRITE flags.
>>>
>>> If there isn't an entry, then we create a new one and attempt to
>>> perform the open. If there is, then we wait until the entry is fully
>>> instantiated and return it if it is at the end of the wait. If it's
>>> not, then we attempt to take over construction.
>>>
>>> Since the main goal is to speed up NFSv2/3 I/O, we don't want to
>>> close these files on last put of these objects. We need to keep them
>>> around for a little while since we never know when the next READ/WRITE
>>> will come in.
>>>
>>> Cache entries have a hardcoded 1s timeout, and we have a recurring
>>> workqueue job that walks the cache and purges any entries that have
>>> expired.
>>>
>>> Signed-off-by: Jeff Layton <[email protected]>
>> ... snip ...
>>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
>>> index 9051ee54faa3..adf7e78b8e43 100644
>>> --- a/fs/nfsd/filecache.h
>>> +++ b/fs/nfsd/filecache.h
>>> @@ -4,6 +4,7 @@
>>> #include <linux/jhash.h>
>>> #include <linux/sunrpc/xdr.h>
>>>
>>> +#include "nfsfh.h"
>>> #include "export.h"
>>>
>>> /* hash table for nfs4_file */
>>> @@ -22,4 +23,24 @@ file_hashval(struct knfsd_fh *fh)
>>> return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
>>> }
>>>
>>> +struct nfsd_file {
>>
>> There is a nfs4_file in nfsd, it's not easy to distinguish them for a new folk.
>> More comments or a meaningful name is better.
>>
>
> Maybe. Again, any suggestions? My hope was that eventually we can unify
> the two caches somehow but maybe that's naive.

I cannot find a better name for the new file cache. More comments.

I don't agree with merging those two into one cache.

nfsv4's file cache is a state resource of client which will exist since close
or lease expire. But nfsd_file cache is a temporary resource for nfsv2/v3 client
without state.

Also, for nfsv4's conflict checking, should we check the temporary file cache
for nfsv2/v3 too?

>
>>> + struct hlist_node nf_node;
>>> + struct list_head nf_dispose;
>>> + struct rcu_head nf_rcu;
>>> + struct file *nf_file;
>>> + unsigned long nf_time;
>>> +#define NFSD_FILE_HASHED (0)
>>
>> Why not using hlist_unhashed()?
>>
>
> These entries are removed from the list using hlist_del_rcu(), and
> hlist_unhashed will not return true after that.

As I understand, NFSD_FILE_HASHED means the file cache is added to
nfsd_file_hashtbl, and increased the global file count.

With calling hlist_del_rcu, file cache has be deleted from the hashtbl,
and clear the NFSD_FILE_HASHED bit.

As using in the codes, the bit and hlist_unhashed have the same meaning.
Any mistake of my understand?

+static void
+nfsd_file_unhash(struct nfsd_file *nf)
+{
+ if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+ hlist_del_rcu(&nf->nf_node);
+ nfsd_file_count_dec();
+ }
+}

thanks,
Kinglong Mee

>
>
>>> +#define NFSD_FILE_PENDING (1)
>>> + unsigned long nf_flags;
>>> + struct knfsd_fh nf_handle;
>>> + unsigned int nf_hashval;
>>> + atomic_t nf_ref;
>>> + unsigned char nf_may;
>>> +};
>>> +
>>> +int nfsd_file_cache_init(void);
>>> +void nfsd_file_cache_shutdown(void);
>>> +void nfsd_file_put(struct nfsd_file *nf);
>>> +__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> + unsigned int may_flags, struct nfsd_file **nfp);
>>> #endif /* _FS_NFSD_FILECACHE_H */
>>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>>> index ced9944201a0..0572441e23ec 100644
>>> --- a/fs/nfsd/nfssvc.c
>>> +++ b/fs/nfsd/nfssvc.c
>>> @@ -23,6 +23,7 @@
>>> #include "cache.h"
>>> #include "vfs.h"
>>> #include "netns.h"
>>> +#include "filecache.h"
>>>
>>> #define NFSDDBG_FACILITY NFSDDBG_SVC
>>>
>>> @@ -233,11 +234,17 @@ static int nfsd_startup_generic(int nrservs)
>>> if (!nfsd_laundry_wq)
>>> goto out_racache;
>>>
>>> - ret = nfs4_state_start();
>>> + ret = nfsd_file_cache_init();
>>> if (ret)
>>> goto out_wq;
>>> +
>>> + ret = nfs4_state_start();
>>> + if (ret)
>>> + goto out_nfsd_file;
>>> return 0;
>>>
>>> +out_nfsd_file:
>>> + nfsd_file_cache_shutdown();
>>> out_wq:
>>> destroy_workqueue(nfsd_laundry_wq);
>>> nfsd_laundry_wq = NULL;
>>> @@ -254,6 +261,7 @@ static void nfsd_shutdown_generic(void)
>>> return;
>>>
>>> nfs4_state_shutdown();
>>> + nfsd_file_cache_shutdown();
>>> nfsd_racache_shutdown();
>>> }
>>>
>>>
>
>

2015-08-08 00:20:23

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 06/18] nfsd: hook up nfsd_read to the nfsd_file cache

On 8/8/2015 01:26, Jeff Layton wrote:
> On Fri, 7 Aug 2015 23:29:52 +0800
> Kinglong Mee <[email protected]> wrote:
>
>>
>> On 8/6/2015 05:13, Jeff Layton wrote:
>>> Signed-off-by: Jeff Layton <[email protected]>
>>> ---
>>> fs/nfsd/vfs.c | 20 +++++++-------------
>>> 1 file changed, 7 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 59234d1d8d8e..fd688c86af66 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -980,20 +980,14 @@ out_nfserr:
>>> __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
>>> {
>>> - struct file *file;
>>> - struct raparms *ra;
>>> - __be32 err;
>>> -
>>> - err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
>>> - if (err)
>>> - return err;
>>> -
>>> - ra = nfsd_init_raparms(file);
>>> - err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
>>> - if (ra)
>>> - nfsd_put_raparams(file, ra);
>>
>> Drop the raparms here ?
>>
>>
>
> I'm not sure I understand your question. Are you asking why I dropped
> the raparms from this code?

Sorry. Yes, I'm asking why you dropped the raparms.

>
> If so, the reason is that we shouldn't need it any longer. We only keep
> that cache now because we do an open for every READ RPC. With this, a
> streaming read should end up using the same struct file, at least as
> long as there's not _too_ long a delay between READ RPCs. The normal
> vfs readahead machinery should work properly with this change.
>
> I'd like to eventually have us hook this up to the nfs4_file cache as
> well. I stopped short of that here since I didn't need that immediately
> for what I'm working on, but it should be possible to make the
> nfs4_file cache use the nfsd_file cache instead of calling dentry_open
> directly. Once we do that, then I don't think we'll need the raparms
> cache at all anymore.

Got it.

thanks,
Kinglong Mee

>
>>> - fput(file);
>>> + __be32 err;
>>> + struct nfsd_file *nf;
>>>
>>> + err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>>> + if (err == nfs_ok)
>>> + err = nfsd_vfs_read(rqstp, nf->nf_file, offset, vec, vlen,
>>> + count);
>>> + nfsd_file_put(nf);
>>> return err;
>>> }
>>>
>>>
>
>

2015-08-08 00:28:17

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 16/18] nfsd: add a fh_compose_shallow

On 8/8/2015 02:24, Jeff Layton wrote:
> On Fri, 7 Aug 2015 13:56:10 -0400
> Jeff Layton <[email protected]> wrote:
>
>> On Fri, 7 Aug 2015 23:33:58 +0800
>> Kinglong Mee <[email protected]> wrote:
>>
>>> On 8/6/2015 05:13, Jeff Layton wrote:
>>>> In a later patch, we'll need to be able to cook up a knfsd_fh from a
>>>> dentry/export combo. Usually nfsd works with svc_fh's, but that takes
>>>> a lot of extra baggage that we don't really need here.
>>>>
>>>> Add a routine that encodes the filehandle directly into a knfsd_fh
>>>> instead. Much like we have a fh_copy_shallow, the new routine is called
>>>> fh_compose_shallow.
>>>>
>>>> Signed-off-by: Jeff Layton <[email protected]>
>>>> ---
>>>> fs/nfsd/nfsfh.c | 112 +++++++++++++++++++++++++++++---------------------------
>>>> fs/nfsd/nfsfh.h | 2 +
>>>> 2 files changed, 60 insertions(+), 54 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
>>>> index b601f291d825..ee6d4b746467 100644
>>>> --- a/fs/nfsd/nfsfh.c
>>>> +++ b/fs/nfsd/nfsfh.c
>>>> @@ -511,32 +511,64 @@ retry:
>>>> }
>>>>
>>>> __be32
>>>> -fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
>>>> - struct svc_fh *ref_fh)
>>>> +fh_compose_shallow(struct knfsd_fh *kfh, int maxsize, struct svc_export *exp,
>>>> + struct dentry *dentry, struct svc_fh *ref_fh)
>>>> {
>>>> - /* ref_fh is a reference file handle.
>>>> - * if it is non-null and for the same filesystem, then we should compose
>>>> - * a filehandle which is of the same version, where possible.
>>>> - * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
>>>> - * Then create a 32byte filehandle using nfs_fhbase_old
>>>> - *
>>>> - */
>>>> + struct inode *inode = d_inode(dentry);
>>>> + dev_t ex_dev = exp_sb(exp)->s_dev;
>>>>
>>>> - struct inode * inode = d_inode(dentry);
>>>> - dev_t ex_dev = exp_sb(exp)->s_dev;
>>>> -
>>>> - dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
>>>> - MAJOR(ex_dev), MINOR(ex_dev),
>>>> + dprintk("nfsd: %s(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
>>>> + __func__, MAJOR(ex_dev), MINOR(ex_dev),
>>>> (long) d_inode(exp->ex_path.dentry)->i_ino,
>>>> - dentry,
>>>> - (inode ? inode->i_ino : 0));
>>>> + dentry, (inode ? inode->i_ino : 0));
>>>>
>>>> - /* Choose filehandle version and fsid type based on
>>>> - * the reference filehandle (if it is in the same export)
>>>> - * or the export options.
>>>> + /*
>>>> + * Choose filehandle version and fsid type based on the reference
>>>> + * filehandle (if it is in the same export) or the export options.
>>>> */
>>>> - set_version_and_fsid_type(&fhp->fh_handle, fhp->fh_maxsize,
>>>> - exp, ref_fh);
>>>> + set_version_and_fsid_type(kfh, maxsize, exp, ref_fh);
>>>> + if (kfh->fh_version == 0xca) {
>>>> + /* old style filehandle please */
>>>> + memset(&kfh->fh_base, 0, NFS_FHSIZE);
>>>> + kfh->fh_size = NFS_FHSIZE;
>>>> + kfh->ofh_dcookie = 0xfeebbaca;
>>>> + kfh->ofh_dev = old_encode_dev(ex_dev);
>>>> + kfh->ofh_xdev = kfh->ofh_dev;
>>>> + kfh->ofh_xino =
>>>> + ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
>>>> + kfh->ofh_dirino = ino_t_to_u32(parent_ino(dentry));
>>>> + if (inode)
>>>> + _fh_update_old(dentry, exp, kfh);
>>>> + } else {
>>>> + kfh->fh_size = key_len(kfh->fh_fsid_type) + 4;
>>>> + kfh->fh_auth_type = 0;
>>>> +
>>>> + mk_fsid(kfh->fh_fsid_type, kfh->fh_fsid, ex_dev,
>>>> + d_inode(exp->ex_path.dentry)->i_ino,
>>>> + exp->ex_fsid, exp->ex_uuid);
>>>> +
>>>> + if (inode)
>>>> + _fh_update(kfh, maxsize, exp, dentry);
>>>> +
>>>> + if (kfh->fh_fileid_type == FILEID_INVALID)
>>>> + return nfserr_opnotsupp;
>>>> + }
>>>> + return nfs_ok;
>>>> +}
>>>> +
>>>> +/*
>>>> + * ref_fh is a reference file handle.
>>>> + * if it is non-null and for the same filesystem, then we should compose
>>>> + * a filehandle which is of the same version, where possible.
>>>> + * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
>>>> + * Then create a 32byte filehandle using nfs_fhbase_old
>>>> + *
>>>> + */
>>>> +__be32
>>>> +fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
>>>> + struct svc_fh *ref_fh)
>>>> +{
>>>> + __be32 status;
>>>>
>>>> if (ref_fh == fhp)
>>>> fh_put(ref_fh);
>>>
>>> set_version_and_fsid_type will use fh_dentry in ref_fh,
>>> So it *must* be used with a valid ref_fh.
>>>
>>> With this patch, fh_put(ref_fh) will put fh_dentry and set to NULL!!!
>>>
>>> Also, pynfs4.1's LKPP1d failed as,
>>> LKPP1d st_lookupp.testLookupp : FAILURE
>>> LOOKUPP returned
>>> '\x01\x00\x07\x00\x02\x00\x00\x00\x00\x00\x00\x004\x93\x1f\x04L*C\x9b\x95L]\x83\x0f\xa4\xbe\x8c',
>>> expected '\x01\x00\x01\x00\x00\x00\x00\x00'
>>>
>>> Move set_version_and_fsid_type back, before fh_put(ref_fh).
>>>
>>> thanks,
>>> Kinglong Mee
>>>
>>
>> Ahh ok, it took me a minute but I see what you're saying now. Yes, this
>> is broken. I'll fix it, but I'm yet convinced that that is the best way.
>>
>> Thanks,
>> Jeff
>>
>
> Ok, I think this may be a better fix. I'll roll this into the original
> patch before I send the next respin, but wanted to send it along first
> to see whether you see any issue with it
>
> The idea here is to instead encode the filehandle first, and only do
> the fh_put for the ref_fh and take dentry/export references if that
> succeeds.

Must make sure the order as following,

set_version_and_fsid_type(fhp, exp, ref_fh);
if (ref_fh == fhp)
fh_put(ref_fh);
fhp->fh_dentry = dget(dentry); /* our internal copy */
fhp->fh_export = exp_get(exp);

>
> Does that look like a reasonable fix to you, or am I missing something?
> I'll also make sure I run pynfs against this before I submit the next
> iteration of the series.

I will look forward to it.

thanks,
Kinglong Mee

2015-08-08 10:36:28

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

On Sat, 8 Aug 2015 08:14:14 +0800
Kinglong Mee <[email protected]> wrote:

> On 8/8/2015 01:18, Jeff Layton wrote:
> > On Fri, 7 Aug 2015 23:28:41 +0800
> > Kinglong Mee <[email protected]> wrote:
> >
> >> On 8/6/2015 05:13, Jeff Layton wrote:
> >>> Currently, NFSv2/3 reads and writes have to open a file, do the read or
> >>> write and then close it again for each RPC. This is highly inefficient,
> >>> especially when the underlying filesystem has a relatively slow open
> >>> routine.
> >>>
> >>> This patch adds a new open file cache to knfsd. Rather than doing an
> >>> open for each RPC, the read/write handlers can call into this cache to
> >>> see if there is one already there for the correct filehandle and
> >>> NFS_MAY_READ/WRITE flags.
> >>>
> >>> If there isn't an entry, then we create a new one and attempt to
> >>> perform the open. If there is, then we wait until the entry is fully
> >>> instantiated and return it if it is at the end of the wait. If it's
> >>> not, then we attempt to take over construction.
> >>>
> >>> Since the main goal is to speed up NFSv2/3 I/O, we don't want to
> >>> close these files on last put of these objects. We need to keep them
> >>> around for a little while since we never know when the next READ/WRITE
> >>> will come in.
> >>>
> >>> Cache entries have a hardcoded 1s timeout, and we have a recurring
> >>> workqueue job that walks the cache and purges any entries that have
> >>> expired.
> >>>
> >>> Signed-off-by: Jeff Layton <[email protected]>
> >> ... snip ...
> >>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> >>> index 9051ee54faa3..adf7e78b8e43 100644
> >>> --- a/fs/nfsd/filecache.h
> >>> +++ b/fs/nfsd/filecache.h
> >>> @@ -4,6 +4,7 @@
> >>> #include <linux/jhash.h>
> >>> #include <linux/sunrpc/xdr.h>
> >>>
> >>> +#include "nfsfh.h"
> >>> #include "export.h"
> >>>
> >>> /* hash table for nfs4_file */
> >>> @@ -22,4 +23,24 @@ file_hashval(struct knfsd_fh *fh)
> >>> return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
> >>> }
> >>>
> >>> +struct nfsd_file {
> >>
> >> There is a nfs4_file in nfsd, it's not easy to distinguish them for a new folk.
> >> More comments or a meaningful name is better.
> >>
> >
> > Maybe. Again, any suggestions? My hope was that eventually we can unify
> > the two caches somehow but maybe that's naive.
>
> I cannot find a better name for the new file cache. More comments.
>
> I don't agree with merging those two into one cache.
>
> nfsv4's file cache is a state resource of client which will exist since close
> or lease expire. But nfsd_file cache is a temporary resource for nfsv2/v3 client
> without state.
>

You're probably right here. It was idle thought from when I first
started this work. What we probably would want to do however is to
layer the nfs4_file cache on top of this cache instead of having it
manage filps on its own.

I tried to design this cache so that it can handle O_RDWR opens, even
though the current callers don't actually ever request those. It should
be possible to hook up the nfs4_file cache to it, though I'd prefer to
wait until we have this code in place first and then do that later.

> Also, for nfsv4's conflict checking, should we check the temporary file cache
> for nfsv2/v3 too?
>

You mean for share/deny modes? We traditionally have not done that, and
I don't see a compelling reason to start now. That would be a separate
project in its own right, IMO. We'd need to lift the share/deny mode
handling into this new cache.

There's also the problem of there not being any protocol support for
that in NFSv2/3. What would we return to the client if there's a deny
mode conflict when it's trying to do (e.g.) a READ?


> >
> >>> + struct hlist_node nf_node;
> >>> + struct list_head nf_dispose;
> >>> + struct rcu_head nf_rcu;
> >>> + struct file *nf_file;
> >>> + unsigned long nf_time;
> >>> +#define NFSD_FILE_HASHED (0)
> >>
> >> Why not using hlist_unhashed()?
> >>
> >
> > These entries are removed from the list using hlist_del_rcu(), and
> > hlist_unhashed will not return true after that.
>
> As I understand, NFSD_FILE_HASHED means the file cache is added to
> nfsd_file_hashtbl, and increased the global file count.
>
> With calling hlist_del_rcu, file cache has be deleted from the hashtbl,
> and clear the NFSD_FILE_HASHED bit.
>

That's correct.

> As using in the codes, the bit and hlist_unhashed have the same meaning.
> Any mistake of my understand?
>

hlist_unhashed() won't work here:

static inline int hlist_unhashed(const struct hlist_node *h)
{
return !h->pprev;
}

...but:

static inline void hlist_del_rcu(struct hlist_node *n)
{
__hlist_del(n);
n->pprev = LIST_POISON2;
}

...so after a hlist_del_rcu, hlist_unhashed will still return false. In
order to get it to return true, we'd need to use hlist_del_init, but
that would mean that we couldn't safely traverse the hash chain under
the rcu_read_lock.

> +static void
> +nfsd_file_unhash(struct nfsd_file *nf)
> +{
> + if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> + hlist_del_rcu(&nf->nf_node);
> + nfsd_file_count_dec();
> + }
> +}
>
> thanks,
> Kinglong Mee
>
> >
> >
> >>> +#define NFSD_FILE_PENDING (1)
> >>> + unsigned long nf_flags;
> >>> + struct knfsd_fh nf_handle;
> >>> + unsigned int nf_hashval;
> >>> + atomic_t nf_ref;
> >>> + unsigned char nf_may;
> >>> +};
> >>> +
> >>> +int nfsd_file_cache_init(void);
> >>> +void nfsd_file_cache_shutdown(void);
> >>> +void nfsd_file_put(struct nfsd_file *nf);
> >>> +__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>> + unsigned int may_flags, struct nfsd_file **nfp);
> >>> #endif /* _FS_NFSD_FILECACHE_H */
> >>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> >>> index ced9944201a0..0572441e23ec 100644
> >>> --- a/fs/nfsd/nfssvc.c
> >>> +++ b/fs/nfsd/nfssvc.c
> >>> @@ -23,6 +23,7 @@
> >>> #include "cache.h"
> >>> #include "vfs.h"
> >>> #include "netns.h"
> >>> +#include "filecache.h"
> >>>
> >>> #define NFSDDBG_FACILITY NFSDDBG_SVC
> >>>
> >>> @@ -233,11 +234,17 @@ static int nfsd_startup_generic(int nrservs)
> >>> if (!nfsd_laundry_wq)
> >>> goto out_racache;
> >>>
> >>> - ret = nfs4_state_start();
> >>> + ret = nfsd_file_cache_init();
> >>> if (ret)
> >>> goto out_wq;
> >>> +
> >>> + ret = nfs4_state_start();
> >>> + if (ret)
> >>> + goto out_nfsd_file;
> >>> return 0;
> >>>
> >>> +out_nfsd_file:
> >>> + nfsd_file_cache_shutdown();
> >>> out_wq:
> >>> destroy_workqueue(nfsd_laundry_wq);
> >>> nfsd_laundry_wq = NULL;
> >>> @@ -254,6 +261,7 @@ static void nfsd_shutdown_generic(void)
> >>> return;
> >>>
> >>> nfs4_state_shutdown();
> >>> + nfsd_file_cache_shutdown();
> >>> nfsd_racache_shutdown();
> >>> }
> >>>
> >>>
> >
> >


--
Jeff Layton <[email protected]>

2015-08-08 10:38:20

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 16/18] nfsd: add a fh_compose_shallow

On Sat, 8 Aug 2015 08:27:47 +0800
Kinglong Mee <[email protected]> wrote:

> On 8/8/2015 02:24, Jeff Layton wrote:
> > On Fri, 7 Aug 2015 13:56:10 -0400
> > Jeff Layton <[email protected]> wrote:
> >
> >> On Fri, 7 Aug 2015 23:33:58 +0800
> >> Kinglong Mee <[email protected]> wrote:
> >>
> >>> On 8/6/2015 05:13, Jeff Layton wrote:
> >>>> In a later patch, we'll need to be able to cook up a knfsd_fh from a
> >>>> dentry/export combo. Usually nfsd works with svc_fh's, but that takes
> >>>> a lot of extra baggage that we don't really need here.
> >>>>
> >>>> Add a routine that encodes the filehandle directly into a knfsd_fh
> >>>> instead. Much like we have a fh_copy_shallow, the new routine is called
> >>>> fh_compose_shallow.
> >>>>
> >>>> Signed-off-by: Jeff Layton <[email protected]>
> >>>> ---
> >>>> fs/nfsd/nfsfh.c | 112 +++++++++++++++++++++++++++++---------------------------
> >>>> fs/nfsd/nfsfh.h | 2 +
> >>>> 2 files changed, 60 insertions(+), 54 deletions(-)
> >>>>
> >>>> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> >>>> index b601f291d825..ee6d4b746467 100644
> >>>> --- a/fs/nfsd/nfsfh.c
> >>>> +++ b/fs/nfsd/nfsfh.c
> >>>> @@ -511,32 +511,64 @@ retry:
> >>>> }
> >>>>
> >>>> __be32
> >>>> -fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> >>>> - struct svc_fh *ref_fh)
> >>>> +fh_compose_shallow(struct knfsd_fh *kfh, int maxsize, struct svc_export *exp,
> >>>> + struct dentry *dentry, struct svc_fh *ref_fh)
> >>>> {
> >>>> - /* ref_fh is a reference file handle.
> >>>> - * if it is non-null and for the same filesystem, then we should compose
> >>>> - * a filehandle which is of the same version, where possible.
> >>>> - * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
> >>>> - * Then create a 32byte filehandle using nfs_fhbase_old
> >>>> - *
> >>>> - */
> >>>> + struct inode *inode = d_inode(dentry);
> >>>> + dev_t ex_dev = exp_sb(exp)->s_dev;
> >>>>
> >>>> - struct inode * inode = d_inode(dentry);
> >>>> - dev_t ex_dev = exp_sb(exp)->s_dev;
> >>>> -
> >>>> - dprintk("nfsd: fh_compose(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
> >>>> - MAJOR(ex_dev), MINOR(ex_dev),
> >>>> + dprintk("nfsd: %s(exp %02x:%02x/%ld %pd2, ino=%ld)\n",
> >>>> + __func__, MAJOR(ex_dev), MINOR(ex_dev),
> >>>> (long) d_inode(exp->ex_path.dentry)->i_ino,
> >>>> - dentry,
> >>>> - (inode ? inode->i_ino : 0));
> >>>> + dentry, (inode ? inode->i_ino : 0));
> >>>>
> >>>> - /* Choose filehandle version and fsid type based on
> >>>> - * the reference filehandle (if it is in the same export)
> >>>> - * or the export options.
> >>>> + /*
> >>>> + * Choose filehandle version and fsid type based on the reference
> >>>> + * filehandle (if it is in the same export) or the export options.
> >>>> */
> >>>> - set_version_and_fsid_type(&fhp->fh_handle, fhp->fh_maxsize,
> >>>> - exp, ref_fh);
> >>>> + set_version_and_fsid_type(kfh, maxsize, exp, ref_fh);
> >>>> + if (kfh->fh_version == 0xca) {
> >>>> + /* old style filehandle please */
> >>>> + memset(&kfh->fh_base, 0, NFS_FHSIZE);
> >>>> + kfh->fh_size = NFS_FHSIZE;
> >>>> + kfh->ofh_dcookie = 0xfeebbaca;
> >>>> + kfh->ofh_dev = old_encode_dev(ex_dev);
> >>>> + kfh->ofh_xdev = kfh->ofh_dev;
> >>>> + kfh->ofh_xino =
> >>>> + ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino);
> >>>> + kfh->ofh_dirino = ino_t_to_u32(parent_ino(dentry));
> >>>> + if (inode)
> >>>> + _fh_update_old(dentry, exp, kfh);
> >>>> + } else {
> >>>> + kfh->fh_size = key_len(kfh->fh_fsid_type) + 4;
> >>>> + kfh->fh_auth_type = 0;
> >>>> +
> >>>> + mk_fsid(kfh->fh_fsid_type, kfh->fh_fsid, ex_dev,
> >>>> + d_inode(exp->ex_path.dentry)->i_ino,
> >>>> + exp->ex_fsid, exp->ex_uuid);
> >>>> +
> >>>> + if (inode)
> >>>> + _fh_update(kfh, maxsize, exp, dentry);
> >>>> +
> >>>> + if (kfh->fh_fileid_type == FILEID_INVALID)
> >>>> + return nfserr_opnotsupp;
> >>>> + }
> >>>> + return nfs_ok;
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> + * ref_fh is a reference file handle.
> >>>> + * if it is non-null and for the same filesystem, then we should compose
> >>>> + * a filehandle which is of the same version, where possible.
> >>>> + * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca
> >>>> + * Then create a 32byte filehandle using nfs_fhbase_old
> >>>> + *
> >>>> + */
> >>>> +__be32
> >>>> +fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry,
> >>>> + struct svc_fh *ref_fh)
> >>>> +{
> >>>> + __be32 status;
> >>>>
> >>>> if (ref_fh == fhp)
> >>>> fh_put(ref_fh);
> >>>
> >>> set_version_and_fsid_type will use fh_dentry in ref_fh,
> >>> So it *must* be used with a valid ref_fh.
> >>>
> >>> With this patch, fh_put(ref_fh) will put fh_dentry and set to NULL!!!
> >>>
> >>> Also, pynfs4.1's LKPP1d failed as,
> >>> LKPP1d st_lookupp.testLookupp : FAILURE
> >>> LOOKUPP returned
> >>> '\x01\x00\x07\x00\x02\x00\x00\x00\x00\x00\x00\x004\x93\x1f\x04L*C\x9b\x95L]\x83\x0f\xa4\xbe\x8c',
> >>> expected '\x01\x00\x01\x00\x00\x00\x00\x00'
> >>>
> >>> Move set_version_and_fsid_type back, before fh_put(ref_fh).
> >>>
> >>> thanks,
> >>> Kinglong Mee
> >>>
> >>
> >> Ahh ok, it took me a minute but I see what you're saying now. Yes, this
> >> is broken. I'll fix it, but I'm yet convinced that that is the best way.
> >>
> >> Thanks,
> >> Jeff
> >>
> >
> > Ok, I think this may be a better fix. I'll roll this into the original
> > patch before I send the next respin, but wanted to send it along first
> > to see whether you see any issue with it
> >
> > The idea here is to instead encode the filehandle first, and only do
> > the fh_put for the ref_fh and take dentry/export references if that
> > succeeds.
>
> Must make sure the order as following,
>
> set_version_and_fsid_type(fhp, exp, ref_fh);
> if (ref_fh == fhp)
> fh_put(ref_fh);
> fhp->fh_dentry = dget(dentry); /* our internal copy */
> fhp->fh_export = exp_get(exp);
>

Ok, I think this change should satisfy that requirement.

> >
> > Does that look like a reasonable fix to you, or am I missing something?
> > I'll also make sure I run pynfs against this before I submit the next
> > iteration of the series.
>
> I will look forward to it.
>

I ran pynfs against my original patches and saw the breakage you saw.
This patch seems to have fixed it. I'll roll that change into v3 of the
series.

Thanks for the review so far!
--
Jeff Layton <[email protected]>

2015-08-09 07:12:05

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 01/18] nfsd: include linux/nfs4.h in export.h

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2015-08-09 07:14:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On Wed, Aug 05, 2015 at 05:13:21PM -0400, Jeff Layton wrote:
> Currently, nfsd uses a singlethread workqueue for this, but that's not
> necessary. If we have multiple namespaces, then there's no need to
> serialize the laundromat runs since they are only requeued at the end of
> the work itself.
>
> Also, create_singlethread_workqueue adds the WQ_MEM_RECLAIM flag, which
> doesn't really seem to be necessary. The laundromat jobs are always
> kicked off via a timer, and not from memory reclaim paths. There's no
> need for a rescuer thread.

Why would you change it to an unbound WQ? I'd really prefer to split
the change of workqueue semantics and making it globall available, too.

2015-08-09 07:17:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

On Wed, Aug 05, 2015 at 05:13:22PM -0400, Jeff Layton wrote:
> Currently, NFSv2/3 reads and writes have to open a file, do the read or
> write and then close it again for each RPC. This is highly inefficient,
> especially when the underlying filesystem has a relatively slow open
> routine.

.. as do many NFSv4 reads and writes as they often get special stateids
passed from the client. Seems a little odd that we take more care
of this for the legacy protocols than the current one.

I think I'd rather see a nfs4_file cache and turn v2/3 into something
like the special stateid case. Did you consider that?

2015-08-09 07:18:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 12/18] nfsd: move include of state.h from trace.c to trace.h

On Wed, Aug 05, 2015 at 05:13:30PM -0400, Jeff Layton wrote:
> Any file which includes trace.h will need to include state.h, even if
> they aren't using any state tracepoints. Ensure that we include any
> headers that might be needed in trace.h instead of relying on the
> *.c files to have the right ones.
>
> Signed-off-by: Jeff Layton <[email protected]>

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2015-08-09 07:21:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 14/18] nfsd: have _fh_update take a knfsd_fh instead of a svc_fh

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

Can you maybe rename it to __fh_update to fit the normal naming scheme
in the kernel while you're at it?

2015-08-09 07:21:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 15/18] nfsd: have set_version_and_fsid_type take a knfsd_fh instead of svc_fh

On Wed, Aug 05, 2015 at 05:13:33PM -0400, Jeff Layton wrote:
> Note however that we do still have it take a svc_fh for the ref_fh, but
> that could be changed in the future if it's helpful to do so.
>
> Signed-off-by: Jeff Layton <[email protected]>

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2015-08-09 11:11:40

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On Sun, 9 Aug 2015 00:14:38 -0700
Christoph Hellwig <[email protected]> wrote:

> On Wed, Aug 05, 2015 at 05:13:21PM -0400, Jeff Layton wrote:
> > Currently, nfsd uses a singlethread workqueue for this, but that's not
> > necessary. If we have multiple namespaces, then there's no need to
> > serialize the laundromat runs since they are only requeued at the end of
> > the work itself.
> >
> > Also, create_singlethread_workqueue adds the WQ_MEM_RECLAIM flag, which
> > doesn't really seem to be necessary. The laundromat jobs are always
> > kicked off via a timer, and not from memory reclaim paths. There's no
> > need for a rescuer thread.
>
> Why would you change it to an unbound WQ? I'd really prefer to split
> the change of workqueue semantics and making it globall available, too.

create_singlethread_workqueue already makes an unbound workqueue. This
patch just lifts the "max_active" value to the default, and removes the
WQ_MEM_RECLAIM flag.

We certainly could turn this into a bound workqueue, but given the sort
of job that the laundromat runs I'm not sure we'd benefit much from the
locality.

...and sure, I can turn this into two patches if you'd prefer.

--
Jeff Layton <[email protected]>

2015-08-09 11:19:35

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

On Sun, 9 Aug 2015 00:17:02 -0700
Christoph Hellwig <[email protected]> wrote:

> On Wed, Aug 05, 2015 at 05:13:22PM -0400, Jeff Layton wrote:
> > Currently, NFSv2/3 reads and writes have to open a file, do the read or
> > write and then close it again for each RPC. This is highly inefficient,
> > especially when the underlying filesystem has a relatively slow open
> > routine.
>
> .. as do many NFSv4 reads and writes as they often get special stateids
> passed from the client. Seems a little odd that we take more care
> of this for the legacy protocols than the current one.
>

This is just an initial step. I'd like to see the nfs4_file cache
layered on top of this eventually. That's a bit more work though, and I
wanted to get this piece merged first before I did that part.

> I think I'd rather see a nfs4_file cache and turn v2/3 into something
> like the special stateid case. Did you consider that?

I started looking at extending the nfs4_file cache to NFSv2/3, but it's
actually rather difficult...

We traditionally haven't dealt with share/deny modes in NFSv3, so we'd
need a mechanism to bypass all of that stuff for legacy protocols. We'd
also have to convert that cache from one that frees objects when the
last one is put to one that keeps them around for a bit.

That also means that the v2/3 opens have to keep around extra fields
that aren't really needed, and deal with the really godawful locking of
the NFSv4 code.

I really think this approach is a better one. We can still use this
cache from the NFSv4 code and wiring it in shouldn't be too hard. It's
mostly a matter of plumbing in struct nfsd_file objects where that code
is passing around struct file objects now.

--
Jeff Layton <[email protected]>

2015-08-10 08:26:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On Sun, Aug 09, 2015 at 07:11:37AM -0400, Jeff Layton wrote:
> create_singlethread_workqueue already makes an unbound workqueue. This
> patch just lifts the "max_active" value to the default, and removes the
> WQ_MEM_RECLAIM flag.
>
> We certainly could turn this into a bound workqueue, but given the sort
> of job that the laundromat runs I'm not sure we'd benefit much from the
> locality.
>
> ...and sure, I can turn this into two patches if you'd prefer.

The patch was just rather confusing to me. Do you want the existing
laundromat to scale better with lots of namespaces? Sounds reasonable,
but I don't really see the use case.

Looking at the later patches I now see you're overloading a totally
different job to it. I don't think there's a point given how cheap
workqueues are these days. Even more it seems like you really should
use the mm/list_lru.c infrastructure and a shrinker for a your file
cache.

2015-08-10 08:28:08

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

> I started looking at extending the nfs4_file cache to NFSv2/3, but it's
> actually rather difficult...
>
> We traditionally haven't dealt with share/deny modes in NFSv3, so we'd
> need a mechanism to bypass all of that stuff for legacy protocols. We'd
> also have to convert that cache from one that frees objects when the
> last one is put to one that keeps them around for a bit.
>
> That also means that the v2/3 opens have to keep around extra fields
> that aren't really needed, and deal with the really godawful locking of
> the NFSv4 code.
>
> I really think this approach is a better one. We can still use this
> cache from the NFSv4 code and wiring it in shouldn't be too hard. It's
> mostly a matter of plumbing in struct nfsd_file objects where that code
> is passing around struct file objects now.

Ok. I'd still like to it wired up to NFSv4 - with my changes to make all
temporary file opens happens from nfs4_preprocess_stateid_op it should
be fairly simple.

2015-08-10 11:23:54

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On Mon, 10 Aug 2015 01:26:22 -0700
Christoph Hellwig <[email protected]> wrote:

> On Sun, Aug 09, 2015 at 07:11:37AM -0400, Jeff Layton wrote:
> > create_singlethread_workqueue already makes an unbound workqueue. This
> > patch just lifts the "max_active" value to the default, and removes the
> > WQ_MEM_RECLAIM flag.
> >
> > We certainly could turn this into a bound workqueue, but given the sort
> > of job that the laundromat runs I'm not sure we'd benefit much from the
> > locality.
> >
> > ...and sure, I can turn this into two patches if you'd prefer.
>
> The patch was just rather confusing to me. Do you want the existing
> laundromat to scale better with lots of namespaces? Sounds reasonable,
> but I don't really see the use case.
>
> Looking at the later patches I now see you're overloading a totally
> different job to it. I don't think there's a point given how cheap
> workqueues are these days. Even more it seems like you really should
> use the mm/list_lru.c infrastructure and a shrinker for a your file
> cache.

Right, it's a laundry job of a different sort, so I figured using the
laundry_wq would make sense. I also just saw absolutely no reason to
serialize all of the nfsd4 laundromat jobs (if there were ever more
than one on the box at a time), so it was an opportunity to clean that
up.

I did consider a shrinker and LRU list for this. The problem there is
that shrinkers are triggered on memory pressure. Keeping these files
open after they've been idle for a long period of time would prevent
the kernel from handing out leases on them, so closing them after a
reasonable idle period seemed like the right thing to do.

I suppose however we could use a shrinker/LRU _and_ add a mechanism
that would cause the kernel to close idle nfsd_files for an inode when
there is an attempt to do a F_SETLEASE. That would probably work,
unless I'm missing other reasons that keeping unused files open might
be problematic. Are there any?

--
Jeff Layton <[email protected]>

2015-08-10 11:32:02

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

On Mon, 10 Aug 2015 01:28:07 -0700
Christoph Hellwig <[email protected]> wrote:

> > I started looking at extending the nfs4_file cache to NFSv2/3, but it's
> > actually rather difficult...
> >
> > We traditionally haven't dealt with share/deny modes in NFSv3, so we'd
> > need a mechanism to bypass all of that stuff for legacy protocols. We'd
> > also have to convert that cache from one that frees objects when the
> > last one is put to one that keeps them around for a bit.
> >
> > That also means that the v2/3 opens have to keep around extra fields
> > that aren't really needed, and deal with the really godawful locking of
> > the NFSv4 code.
> >
> > I really think this approach is a better one. We can still use this
> > cache from the NFSv4 code and wiring it in shouldn't be too hard. It's
> > mostly a matter of plumbing in struct nfsd_file objects where that code
> > is passing around struct file objects now.
>
> Ok. I'd still like to it wired up to NFSv4 - with my changes to make all
> temporary file opens happens from nfs4_preprocess_stateid_op it should
> be fairly simple.


Ok, I should be able to add that in. I'll plan to do so before I send
out a v3 of the patchset.

--
Jeff Layton <[email protected]>

2015-08-10 11:36:49

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH v2 04/18] nfsd: add a new struct file caching facility to nfsd

On 8/8/2015 18:36, Jeff Layton wrote:
> On Sat, 8 Aug 2015 08:14:14 +0800
> Kinglong Mee <[email protected]> wrote:
>
>> On 8/8/2015 01:18, Jeff Layton wrote:
>>> On Fri, 7 Aug 2015 23:28:41 +0800
>>> Kinglong Mee <[email protected]> wrote:
>>>
>>>> On 8/6/2015 05:13, Jeff Layton wrote:
>>>>> Currently, NFSv2/3 reads and writes have to open a file, do the read or
>>>>> write and then close it again for each RPC. This is highly inefficient,
>>>>> especially when the underlying filesystem has a relatively slow open
>>>>> routine.
>>>>>
>>>>> This patch adds a new open file cache to knfsd. Rather than doing an
>>>>> open for each RPC, the read/write handlers can call into this cache to
>>>>> see if there is one already there for the correct filehandle and
>>>>> NFS_MAY_READ/WRITE flags.
>>>>>
>>>>> If there isn't an entry, then we create a new one and attempt to
>>>>> perform the open. If there is, then we wait until the entry is fully
>>>>> instantiated and return it if it is at the end of the wait. If it's
>>>>> not, then we attempt to take over construction.
>>>>>
>>>>> Since the main goal is to speed up NFSv2/3 I/O, we don't want to
>>>>> close these files on last put of these objects. We need to keep them
>>>>> around for a little while since we never know when the next READ/WRITE
>>>>> will come in.
>>>>>
>>>>> Cache entries have a hardcoded 1s timeout, and we have a recurring
>>>>> workqueue job that walks the cache and purges any entries that have
>>>>> expired.
>>>>>
>>>>> Signed-off-by: Jeff Layton <[email protected]>
>>>> ... snip ...
>>>>> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
>>>>> index 9051ee54faa3..adf7e78b8e43 100644
>>>>> --- a/fs/nfsd/filecache.h
>>>>> +++ b/fs/nfsd/filecache.h
>>>>> @@ -4,6 +4,7 @@
>>>>> #include <linux/jhash.h>
>>>>> #include <linux/sunrpc/xdr.h>
>>>>>
>>>>> +#include "nfsfh.h"
>>>>> #include "export.h"
>>>>>
>>>>> /* hash table for nfs4_file */
>>>>> @@ -22,4 +23,24 @@ file_hashval(struct knfsd_fh *fh)
>>>>> return nfsd_fh_hashval(fh) & (NFSD_FILE_HASH_SIZE - 1);
>>>>> }
>>>>>
>>>>> +struct nfsd_file {
>>>>
>>>> There is a nfs4_file in nfsd, it's not easy to distinguish them for a new folk.
>>>> More comments or a meaningful name is better.
>>>>
>>>
>>> Maybe. Again, any suggestions? My hope was that eventually we can unify
>>> the two caches somehow but maybe that's naive.
>>
>> I cannot find a better name for the new file cache. More comments.
>>
>> I don't agree with merging those two into one cache.
>>
>> nfsv4's file cache is a state resource of client which will exist since close
>> or lease expire. But nfsd_file cache is a temporary resource for nfsv2/v3 client
>> without state.
>>
>
> You're probably right here. It was idle thought from when I first
> started this work. What we probably would want to do however is to
> layer the nfs4_file cache on top of this cache instead of having it
> manage filps on its own.
>
> I tried to design this cache so that it can handle O_RDWR opens, even
> though the current callers don't actually ever request those. It should
> be possible to hook up the nfs4_file cache to it, though I'd prefer to
> wait until we have this code in place first and then do that later.
>
>> Also, for nfsv4's conflict checking, should we check the temporary file cache
>> for nfsv2/v3 too?
>>
>
> You mean for share/deny modes? We traditionally have not done that, and
> I don't see a compelling reason to start now. That would be a separate
> project in its own right, IMO. We'd need to lift the share/deny mode
> handling into this new cache.

OK.
I will look forward to the new project.

>
> There's also the problem of there not being any protocol support for
> that in NFSv2/3. What would we return to the client if there's a deny
> mode conflict when it's trying to do (e.g.) a READ?
>

It's my worry too.
Without this cache, this case only influence an NFSv2/3 RPC process time,
but, with this cache, it's more than 1s at worst.

>
>>>
>>>>> + struct hlist_node nf_node;
>>>>> + struct list_head nf_dispose;
>>>>> + struct rcu_head nf_rcu;
>>>>> + struct file *nf_file;
>>>>> + unsigned long nf_time;
>>>>> +#define NFSD_FILE_HASHED (0)
>>>>
>>>> Why not using hlist_unhashed()?
>>>>
>>>
>>> These entries are removed from the list using hlist_del_rcu(), and
>>> hlist_unhashed will not return true after that.
>>
>> As I understand, NFSD_FILE_HASHED means the file cache is added to
>> nfsd_file_hashtbl, and increased the global file count.
>>
>> With calling hlist_del_rcu, file cache has be deleted from the hashtbl,
>> and clear the NFSD_FILE_HASHED bit.
>>
>
> That's correct.
>
>> As using in the codes, the bit and hlist_unhashed have the same meaning.
>> Any mistake of my understand?
>>
>
> hlist_unhashed() won't work here:
>
> static inline int hlist_unhashed(const struct hlist_node *h)
> {
> return !h->pprev;
> }
>
> ...but:
>
> static inline void hlist_del_rcu(struct hlist_node *n)
> {
> __hlist_del(n);
> n->pprev = LIST_POISON2;
> }

Got it.
You are right.

thanks,
Kinglong Mee

2015-08-10 12:10:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On Mon, Aug 10, 2015 at 07:23:51AM -0400, Jeff Layton wrote:
> I did consider a shrinker and LRU list for this. The problem there is
> that shrinkers are triggered on memory pressure. Keeping these files
> open after they've been idle for a long period of time would prevent
> the kernel from handing out leases on them, so closing them after a
> reasonable idle period seemed like the right thing to do.

True.

> I suppose however we could use a shrinker/LRU _and_ add a mechanism
> that would cause the kernel to close idle nfsd_files for an inode when
> there is an attempt to do a F_SETLEASE. That would probably work,
> unless I'm missing other reasons that keeping unused files open might
> be problematic. Are there any?

That seems reasonable. Keepign the file open also will prevent
unmounting the file system, although currently any NFS export already
causes that as well.

2015-08-10 12:14:59

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On Mon, 10 Aug 2015 05:10:35 -0700
Christoph Hellwig <[email protected]> wrote:

> On Mon, Aug 10, 2015 at 07:23:51AM -0400, Jeff Layton wrote:
> > I did consider a shrinker and LRU list for this. The problem there is
> > that shrinkers are triggered on memory pressure. Keeping these files
> > open after they've been idle for a long period of time would prevent
> > the kernel from handing out leases on them, so closing them after a
> > reasonable idle period seemed like the right thing to do.
>
> True.
>
> > I suppose however we could use a shrinker/LRU _and_ add a mechanism
> > that would cause the kernel to close idle nfsd_files for an inode when
> > there is an attempt to do a F_SETLEASE. That would probably work,
> > unless I'm missing other reasons that keeping unused files open might
> > be problematic. Are there any?
>
> That seems reasonable. Keepign the file open also will prevent
> unmounting the file system, although currently any NFS export already
> causes that as well.

Yes, though that's the reason for the new ->flush hook in the sunrpc
cache code. On any export table change, we'll clean out the nfsd_file
cache to help ensure that you'll be able to unmount soon after
unexporting a filesystem.

In any case, I'll look at the shrinker/lru thing for the next respin
and see whether adding a hook into the setlease code might be
reasonable.

Thanks for the review so far,
--
Jeff Layton <[email protected]>

2015-08-10 14:33:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] nfsd: convert laundry_wq to something less nfsd4 specific

On Mon, Aug 10, 2015 at 08:14:56AM -0400, Jeff Layton wrote:
> On Mon, 10 Aug 2015 05:10:35 -0700
> Christoph Hellwig <[email protected]> wrote:
>
> > On Mon, Aug 10, 2015 at 07:23:51AM -0400, Jeff Layton wrote:
> > > I did consider a shrinker and LRU list for this. The problem there is
> > > that shrinkers are triggered on memory pressure. Keeping these files
> > > open after they've been idle for a long period of time would prevent
> > > the kernel from handing out leases on them, so closing them after a
> > > reasonable idle period seemed like the right thing to do.
> >
> > True.
> >
> > > I suppose however we could use a shrinker/LRU _and_ add a mechanism
> > > that would cause the kernel to close idle nfsd_files for an inode when
> > > there is an attempt to do a F_SETLEASE. That would probably work,
> > > unless I'm missing other reasons that keeping unused files open might
> > > be problematic. Are there any?
> >
> > That seems reasonable. Keepign the file open also will prevent
> > unmounting the file system, although currently any NFS export already
> > causes that as well.
>
> Yes, though that's the reason for the new ->flush hook in the sunrpc
> cache code. On any export table change, we'll clean out the nfsd_file
> cache to help ensure that you'll be able to unmount soon after
> unexporting a filesystem.

There are definitely people with scripts that try to unexport and then
immediately unmount, e.g. to migrate a filesystem elsewhere. They
already run into problems thanks to export caches, locks, and v4 state.
A complete shutdown of nfsd is currently the only supported way to
unmount. Still, I wouldn't be surprised if there are people who
(possibly just out of luck) have a working setup now that will start
failing after we take these additional references.

Extending the unlock_* interfaces or getting Kinglong's stuff working
would help.

--b.

>
> In any case, I'll look at the shrinker/lru thing for the next respin
> and see whether adding a hook into the setlease code might be
> reasonable.
>
> Thanks for the review so far,
> --
> Jeff Layton <[email protected]>