2015-10-20 17:33:58

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 00/19] nfsd: open file caching

v6:
- rename delayed_fput infrastructure to global_fput
- drop list_lru_rotate patch, rework LRU list code to use LRU_ROTATE
and a new NFSD_FILE_REFERENCED flag
- rework fsnotify_mark handling to be done with separate allocation
- bugfixes

v5:
- switch to using flush_delayed_fput instead of __fput_sync
- hash on inode->i_ino instead of inode pointer
- add /proc/fs/nfsd/file_cache_stats file to track stats on the hash
- eliminate extra fh_verify in nfsd_file_acquire

v4:
- squash some of the patches down into one patch to reduce churn
- close cached open files after unlink instead of before
- don't just close files after nfsd does an unlink, must do it
after any vfs-layer unlink. Use fsnotify to handle that.
- use a SRCU notifier chain for setlease
- add patch to allow non-kthreads to do a fput_sync

v3:
- open files are now hashed on inode pointer instead of fh
- eliminate the recurring workqueue job in favor of shrinker/LRU and
notifier from lease setting code
- have nfsv4 use the cache as well
- removal of raparms cache

v2:
- changelog cleanups and clarifications
- allow COMMIT to use cached open files
- tracepoints for nfsd_file cache
- proactively close open files prior to REMOVE, or a RENAME over a
positive dentry

This is the sixth posting of this patchset. For those just tuning in
now, the basic idea here is to allow nfsd to keep a cache of open
file descriptions, primarily to help NFSv3 workloads but also to clean
up some nastiness in the NFSv4 file handling.

Of course, we can't keep files open indefinitely, so much of the new
infrastructure is geared toward allowing the files to be closed on
demand, in the following cases:

- nfsd_files are released whenever the inode's link count goes to zero
- they are released when userspace wants to set a lease on the inode
- when a shrinker callback occurs for the nfsd_file cache
- when filesystems are unexported or nfsd is shut down, we clean
out the cache

This allows nfsd to keep the file open basically indefinitely. The
tricky part is the setlease case. To handle that properly we have to
allow userland processes to queue the fputs involved to a workqueue.
This means allowing non-kthreads to queue fputs to the delayed_fput
infrastructure.

Al, can you look and weigh in? Do the delayed_fput/global_fput
changes look reasonable?

Jeff Layton (19):
nfsd: move include of state.h from trace.c to trace.h
fs: have flush_delayed_fput flush the workqueue job
fs: add a kerneldoc header to fput
fs: rename "delayed_fput" infrastructure to "fput_global"
fs: add fput_global
fsnotify: export several symbols
locks: create a new notifier chain for lease attempts
sunrpc: add a new cache_detail operation for when a cache is flushed
nfsd: add a new struct file caching facility to nfsd
nfsd: keep some rudimentary stats on nfsd_file cache
nfsd: allow filecache open to skip fh_verify check
nfsd: hook up nfsd_write to the new nfsd_file cache
nfsd: hook up nfsd_read to the nfsd_file cache
nfsd: hook nfsd_commit up to the nfsd_file cache
nfsd: convert nfs4_file->fi_fds array to use nfsd_files
nfsd: have nfsd_test_lock use the nfsd_file cache
nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
nfsd: rip out the raparms cache

fs/file_table.c | 94 ++++--
fs/locks.c | 37 +++
fs/nfsd/Kconfig | 2 +
fs/nfsd/Makefile | 3 +-
fs/nfsd/export.c | 14 +
fs/nfsd/filecache.c | 712 +++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.h | 44 +++
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfs4layouts.c | 12 +-
fs/nfsd/nfs4proc.c | 32 +-
fs/nfsd/nfs4state.c | 174 +++++------
fs/nfsd/nfs4xdr.c | 16 +-
fs/nfsd/nfsctl.c | 10 +
fs/nfsd/nfsproc.c | 2 +-
fs/nfsd/nfssvc.c | 16 +-
fs/nfsd/state.h | 10 +-
fs/nfsd/trace.c | 2 -
fs/nfsd/trace.h | 137 +++++++++
fs/nfsd/vfs.c | 269 ++++------------
fs/nfsd/vfs.h | 11 +-
fs/nfsd/xdr4.h | 15 +-
fs/notify/group.c | 2 +
fs/notify/inode_mark.c | 1 +
fs/notify/mark.c | 4 +
include/linux/file.h | 3 +-
include/linux/fs.h | 1 +
include/linux/sunrpc/cache.h | 1 +
init/main.c | 2 +-
net/sunrpc/cache.c | 3 +
29 files changed, 1249 insertions(+), 382 deletions(-)
create mode 100644 fs/nfsd/filecache.c
create mode 100644 fs/nfsd/filecache.h

--
2.4.3



2015-10-20 17:33:59

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 01/19] nfsd: move include of state.h from trace.c to trace.h

Any file which includes trace.h will need to include state.h, even if
they aren't using any state tracepoints. Ensure that we include any
headers that might be needed in trace.h instead of relying on the
*.c files to have the right ones.

Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/nfsd/trace.c | 2 --
fs/nfsd/trace.h | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/trace.c b/fs/nfsd/trace.c
index 82f89070594c..90967466a1e5 100644
--- a/fs/nfsd/trace.c
+++ b/fs/nfsd/trace.c
@@ -1,5 +1,3 @@

-#include "state.h"
-
#define CREATE_TRACE_POINTS
#include "trace.h"
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c668520c344b..0befe762762b 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -9,6 +9,8 @@

#include <linux/tracepoint.h>

+#include "state.h"
+
DECLARE_EVENT_CLASS(nfsd_stateid_class,
TP_PROTO(stateid_t *stp),
TP_ARGS(stp),
--
2.4.3


2015-10-20 17:34:00

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 02/19] fs: have flush_delayed_fput flush the workqueue job

I think there's a potential race in flush_delayed_fput. A kthread does
an fput() and that file gets added to the list and the delayed work is
scheduled. More than 1 jiffy passes, and the workqueue thread picks up
the work and starts running it. Then the kthread calls
flush_delayed_work. It sees that the list is empty and returns
immediately, even though the __fput for its file may not have run yet.

Close this by making flush_delayed_fput use flush_delayed_work instead,
which should immediately schedule the work to run if it's not already,
and block until the workqueue job completes.

Cc: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/file_table.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05ebf95..52cc6803c07a 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -244,6 +244,8 @@ static void ____fput(struct callback_head *work)
__fput(container_of(work, struct file, f_u.fu_rcuhead));
}

+static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+
/*
* If kernel thread really needs to have the final fput() it has done
* to complete, call this. The only user right now is the boot - we
@@ -256,11 +258,9 @@ static void ____fput(struct callback_head *work)
*/
void flush_delayed_fput(void)
{
- delayed_fput(NULL);
+ flush_delayed_work(&delayed_fput_work);
}

-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-
void fput(struct file *file)
{
if (atomic_long_dec_and_test(&file->f_count)) {
--
2.4.3


2015-10-20 17:34:06

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 08/19] sunrpc: add a new cache_detail operation for when a cache is flushed

When the exports table is changed, exportfs will usually write a new
time to the "flush" file in the nfsd.export cache procfile. This tells
the kernel to flush any entries that are older than that value.

This gives us a mechanism to tell whether an unexport might have
occurred. Add a new ->flush cache_detail operation that is called after
flushing the cache whenever someone writes to a "flush" file.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/sunrpc/cache.h | 1 +
net/sunrpc/cache.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 03d3b4c92d9f..d1c10a978bb2 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -98,6 +98,7 @@ struct cache_detail {
int has_died);

struct cache_head * (*alloc)(void);
+ void (*flush)(void);
int (*match)(struct cache_head *orig, struct cache_head *new);
void (*init)(struct cache_head *orig, struct cache_head *new);
void (*update)(struct cache_head *orig, struct cache_head *new);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 4a2340a54401..60da9aa2bdc5 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1451,6 +1451,9 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
cd->nextcheck = seconds_since_boot();
cache_flush();

+ if (cd->flush)
+ cd->flush();
+
*ppos += count;
return count;
}
--
2.4.3


2015-10-20 17:34:06

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 05/19] fs: add fput_global

When nfsd caches a file, we want to be able to close it down in advance
of setlease attempts. Setting a lease is generally done at the behest of
userland, so we need a mechanism to ensure that a userland task can
completely close a file without having to return back to userspace.

To do this, we borrow the global delayed_fput infrastructure that
kthreads use. fput_global will queue to the global_fput list if
the last reference was put. The caller can then use fput_global_flush
to force the final __fput to run.

While we're at it, export fput_global_flush as nfsd will need to use
that as well.

Cc: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/file_table.c | 33 +++++++++++++++++++++++++++++++++
include/linux/file.h | 1 +
2 files changed, 34 insertions(+)

diff --git a/fs/file_table.c b/fs/file_table.c
index 0bf8ddc680ab..d214c45765e6 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -262,6 +262,7 @@ void fput_global_flush(void)
{
flush_delayed_work(&global_fput_work);
}
+EXPORT_SYMBOL(fput_global_flush);

/**
* fput - put a struct file reference
@@ -304,6 +305,38 @@ void fput(struct file *file)
}
EXPORT_SYMBOL(fput);

+/**
+ * fput_global - do an fput without using task_work
+ * @file: file of which to put the reference
+ *
+ * When fput is called in the context of a userland process, it'll queue the
+ * actual work (__fput()) to be done just before returning to userland. In some
+ * cases however, we need to ensure that the __fput runs before that point.
+ *
+ * There is no safe way to flush work that has been queued via task_work_add
+ * however, so to do this we borrow the global_fput infrastructure that
+ * kthreads use. The userland process can use fput_global() on one or more
+ * struct files and then call fput_global_flush() to ensure that they are
+ * completely closed before proceeding.
+ *
+ * The main user is nfsd, which uses this to ensure that all cached but
+ * otherwise unused files are closed to allow a userland request for a lease
+ * to proceed.
+ *
+ * Returns true if the final fput was done, false otherwise. The caller can
+ * use this to determine whether to fput_global_flush afterward.
+ */
+bool fput_global(struct file *file)
+{
+ if (atomic_long_dec_and_test(&file->f_count)) {
+ if (llist_add(&file->f_u.fu_llist, &global_fput_list))
+ schedule_delayed_work(&global_fput_work, 1);
+ return true;
+ }
+ return false;
+}
+EXPORT_SYMBOL(fput_global);
+
/*
* synchronous analog of fput(); for kernel threads that might be needed
* in some umount() (and thus can't use fput_global_flush() without
diff --git a/include/linux/file.h b/include/linux/file.h
index 73bb7cee57e9..7803aed00271 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,6 +12,7 @@
struct file;

extern void fput(struct file *);
+extern bool fput_global(struct file *);

struct file_operations;
struct vfsmount;
--
2.4.3


2015-10-20 17:34:04

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 06/19] fsnotify: export several symbols

With nfsd's new open-file caching infrastructure, we need a way to know
when unlinks occur so it can close files that it may be holding open.
fsnotify fits the bill nicely, but the symbols aren't currently exported
to modules. Export some of its symbols so nfsd can use this
infrastructure.

Cc: Eric Paris <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/notify/group.c | 2 ++
fs/notify/inode_mark.c | 1 +
fs/notify/mark.c | 4 ++++
3 files changed, 7 insertions(+)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index d16b62cb2854..295d08800126 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -81,6 +81,7 @@ void fsnotify_put_group(struct fsnotify_group *group)
if (atomic_dec_and_test(&group->refcnt))
fsnotify_final_destroy_group(group);
}
+EXPORT_SYMBOL_GPL(fsnotify_put_group);

/*
* Create a new fsnotify_group and hold a reference for the group returned.
@@ -109,6 +110,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)

return group;
}
+EXPORT_SYMBOL_GPL(fsnotify_alloc_group);

int fsnotify_fasync(int fd, struct file *file, int on)
{
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index e785fd954c30..76e608b8894c 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -87,6 +87,7 @@ struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group,

return mark;
}
+EXPORT_SYMBOL_GPL(fsnotify_find_inode_mark);

/*
* If we are setting a mark mask on an inode mark we should pin the inode
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index fc0df4442f7b..c2bd670d4704 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -109,6 +109,7 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
mark->free_mark(mark);
}
}
+EXPORT_SYMBOL_GPL(fsnotify_put_mark);

/* Calculate mask of events for a list of marks */
u32 fsnotify_recalc_mask(struct hlist_head *head)
@@ -208,6 +209,7 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
mutex_unlock(&group->mark_mutex);
fsnotify_free_mark(mark);
}
+EXPORT_SYMBOL_GPL(fsnotify_destroy_mark);

void fsnotify_destroy_marks(struct hlist_head *head, spinlock_t *lock)
{
@@ -402,6 +404,7 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
mutex_unlock(&group->mark_mutex);
return ret;
}
+EXPORT_SYMBOL_GPL(fsnotify_add_mark);

/*
* Given a list of marks, find the mark associated with given group. If found
@@ -492,6 +495,7 @@ void fsnotify_init_mark(struct fsnotify_mark *mark,
atomic_set(&mark->refcnt, 1);
mark->free_mark = free_mark;
}
+EXPORT_SYMBOL_GPL(fsnotify_init_mark);

static int fsnotify_mark_destroy(void *ignored)
{
--
2.4.3


2015-10-20 17:34:05

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 07/19] locks: create a new notifier chain for lease attempts

With the new file caching infrastructure in nfsd, we can end up holding
files open for an indefinite period of time, even when they are still
idle. This may prevent the kernel from handing out leases on the file,
which is something we don't want to block.

Fix this by running a SRCU notifier call chain whenever any lease
attempt is made. nfsd can then purge the nfsd_file cache of any entries
associated with that inode, and use the fput_global infrastructure to
make sure the final __fput is done.

Since SRCU is only conditionally compiled in, we must only define the
new chain if it's enabled, and users of the chain must ensure that
SRCU is enabled.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 37 +++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 1 +
2 files changed, 38 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 2a54c800a223..add3eeb79ace 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1780,6 +1780,40 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp,
}
EXPORT_SYMBOL(generic_setlease);

+#if IS_ENABLED(CONFIG_SRCU)
+/*
+ * Kernel subsystems can register to be notified on any attempt to set
+ * a new lease with the lease_notifier_chain. This is used by (e.g.) nfsd
+ * to close files that it may have cached when there is an attempt to set a
+ * conflicting lease.
+ */
+struct srcu_notifier_head lease_notifier_chain;
+EXPORT_SYMBOL_GPL(lease_notifier_chain);
+
+static inline void
+lease_notifier_chain_init(void)
+{
+ srcu_init_notifier_head(&lease_notifier_chain);
+}
+
+static inline void
+setlease_notifier(long arg, struct file_lock *lease)
+{
+ if (arg != F_UNLCK)
+ srcu_notifier_call_chain(&lease_notifier_chain, arg, lease);
+}
+#else /* !IS_ENABLED(CONFIG_SRCU) */
+static inline void
+lease_notifier_chain_init(void)
+{
+}
+
+static inline void
+setlease_notifier(long arg, struct file_lock *lease)
+{
+}
+#endif /* IS_ENABLED(CONFIG_SRCU) */
+
/**
* vfs_setlease - sets a lease on an open file
* @filp: file pointer
@@ -1800,6 +1834,8 @@ EXPORT_SYMBOL(generic_setlease);
int
vfs_setlease(struct file *filp, long arg, struct file_lock **lease, void **priv)
{
+ if (lease)
+ setlease_notifier(arg, *lease);
if (filp->f_op->setlease)
return filp->f_op->setlease(filp, arg, lease, priv);
else
@@ -2696,6 +2732,7 @@ static int __init filelock_init(void)
for_each_possible_cpu(i)
INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));

+ lease_notifier_chain_init();
return 0;
}

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 72d8a844c692..c0df75909dd4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1042,6 +1042,7 @@ extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
extern int fcntl_getlease(struct file *filp);

/* fs/locks.c */
+extern struct srcu_notifier_head lease_notifier_chain;
void locks_free_lock_context(struct file_lock_context *ctx);
void locks_free_lock(struct file_lock *fl);
extern void locks_init_lock(struct file_lock *);
--
2.4.3


2015-10-20 17:34:07

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 09/19] nfsd: add a new struct file caching facility to nfsd

Currently, NFSv2/3 reads and writes have to open a file, do the read or
write and then close it again for each RPC. This is highly inefficient,
especially when the underlying filesystem has a relatively slow open
routine.

This patch adds a new open file cache to knfsd. Rather than doing an
open for each RPC, the read/write handlers can call into this cache to
see if there is one already there for the correct filehandle and
NFS_MAY_READ/WRITE flags.

If there isn't an entry, then we create a new one and attempt to
perform the open. If there is, then we wait until the entry is fully
instantiated and return it if it is at the end of the wait. If it's
not, then we attempt to take over construction.

Since the main goal is to speed up NFSv2/3 I/O, we don't want to
close these files on last put of these objects. We need to keep them
around for a little while since we never know when the next READ/WRITE
will come in.

Note that this patch just adds the infrastructure to allow the caching
of open files. Later patches will actually make nfsd use it.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/Kconfig | 2 +
fs/nfsd/Makefile | 3 +-
fs/nfsd/export.c | 14 ++
fs/nfsd/filecache.c | 671 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.h | 43 ++++
fs/nfsd/nfssvc.c | 10 +-
fs/nfsd/trace.h | 135 +++++++++++
fs/nfsd/vfs.c | 3 +-
fs/nfsd/vfs.h | 1 +
9 files changed, 879 insertions(+), 3 deletions(-)
create mode 100644 fs/nfsd/filecache.c
create mode 100644 fs/nfsd/filecache.h

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index a0b77fc1bd39..95e0a91d41ef 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -6,6 +6,8 @@ config NFSD
select SUNRPC
select EXPORTFS
select NFS_ACL_SUPPORT if NFSD_V2_ACL
+ select SRCU
+ select FSNOTIFY
depends on MULTIUSER
help
Choose Y here if you want to allow other computers to access
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 9a6028e120c6..8908bb467727 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -10,7 +10,8 @@ obj-$(CONFIG_NFSD) += nfsd.o
nfsd-y += trace.o

nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \
- export.o auth.o lockd.o nfscache.o nfsxdr.o stats.o
+ export.o auth.o lockd.o nfscache.o nfsxdr.o \
+ stats.o filecache.o
nfsd-$(CONFIG_NFSD_FAULT_INJECTION) += fault_inject.o
nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index b4d84b579f20..4b504edff121 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -21,6 +21,7 @@
#include "nfsfh.h"
#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_EXPORT

@@ -231,6 +232,18 @@ static struct cache_head *expkey_alloc(void)
return NULL;
}

+static void
+expkey_flush(void)
+{
+ /*
+ * Take the nfsd_mutex here to ensure that the file cache is not
+ * destroyed while we're in the middle of flushing.
+ */
+ mutex_lock(&nfsd_mutex);
+ nfsd_file_cache_purge();
+ mutex_unlock(&nfsd_mutex);
+}
+
static struct cache_detail svc_expkey_cache_template = {
.owner = THIS_MODULE,
.hash_size = EXPKEY_HASHMAX,
@@ -243,6 +256,7 @@ static struct cache_detail svc_expkey_cache_template = {
.init = expkey_init,
.update = expkey_update,
.alloc = expkey_alloc,
+ .flush = expkey_flush,
};

static int
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
new file mode 100644
index 000000000000..2cbdf9889993
--- /dev/null
+++ b/fs/nfsd/filecache.c
@@ -0,0 +1,671 @@
+/*
+ * Open file cache.
+ *
+ * (c) 2015 - Jeff Layton <[email protected]>
+ */
+
+#include <linux/hash.h>
+#include <linux/slab.h>
+#include <linux/hash.h>
+#include <linux/file.h>
+#include <linux/sched.h>
+#include <linux/list_lru.h>
+#include <linux/fsnotify_backend.h>
+
+#include "vfs.h"
+#include "nfsd.h"
+#include "nfsfh.h"
+#include "filecache.h"
+#include "trace.h"
+
+#define NFSDDBG_FACILITY NFSDDBG_FH
+
+/* FIXME: dynamically size this for the machine somehow? */
+#define NFSD_FILE_HASH_BITS 12
+#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS)
+
+/* We only care about NFSD_MAY_READ/WRITE for this cache */
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+
+struct nfsd_fcache_bucket {
+ struct hlist_head nfb_head;
+ spinlock_t nfb_lock;
+};
+
+static struct kmem_cache *nfsd_file_slab;
+static struct kmem_cache *nfsd_file_mark_slab;
+static struct nfsd_fcache_bucket *nfsd_file_hashtbl;
+static struct list_lru nfsd_file_lru;
+static struct fsnotify_group *nfsd_file_fsnotify_group;
+
+static void
+nfsd_file_slab_free(struct rcu_head *rcu)
+{
+ struct nfsd_file *nf = container_of(rcu, struct nfsd_file, nf_rcu);
+
+ kmem_cache_free(nfsd_file_slab, nf);
+}
+
+static void
+nfsd_file_mark_free(struct fsnotify_mark *mark)
+{
+ struct nfsd_file_mark *nfm = container_of(mark, struct nfsd_file_mark,
+ nfm_mark);
+
+ kmem_cache_free(nfsd_file_mark_slab, nfm);
+}
+
+static struct nfsd_file_mark *
+nfsd_file_mark_get(struct nfsd_file_mark *nfm)
+{
+ if (!atomic_inc_not_zero(&nfm->nfm_ref))
+ return NULL;
+ return nfm;
+}
+
+static void
+nfsd_file_mark_put(struct nfsd_file_mark *nfm)
+{
+ if (atomic_dec_and_test(&nfm->nfm_ref))
+ fsnotify_destroy_mark(&nfm->nfm_mark, nfsd_file_fsnotify_group);
+}
+
+static struct nfsd_file_mark *
+nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode)
+{
+ int err;
+ struct fsnotify_mark *mark;
+ struct nfsd_file_mark *nfm = NULL, *new = NULL;
+
+ do {
+ mark = fsnotify_find_inode_mark(nfsd_file_fsnotify_group,
+ inode);
+ if (mark) {
+ nfm = nfsd_file_mark_get(container_of(mark,
+ struct nfsd_file_mark,
+ nfm_mark));
+ fsnotify_put_mark(mark);
+ if (likely(nfm))
+ break;
+ }
+
+ /* allocate a new nfm */
+ if (!new) {
+ new = kmem_cache_alloc(nfsd_file_mark_slab, GFP_KERNEL);
+ if (!new)
+ return NULL;
+ fsnotify_init_mark(&new->nfm_mark, nfsd_file_mark_free);
+ atomic_set(&new->nfm_ref, 1);
+ }
+
+ err = fsnotify_add_mark(&new->nfm_mark,
+ nfsd_file_fsnotify_group, nf->nf_inode,
+ NULL, false);
+ if (likely(!err)) {
+ nfm = new;
+ new = NULL;
+ }
+ } while (unlikely(err == EEXIST));
+
+ if (new)
+ kmem_cache_free(nfsd_file_mark_slab, new);
+ return nfm;
+}
+
+static struct nfsd_file *
+nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval)
+{
+ struct nfsd_file *nf;
+
+ nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);
+ if (nf) {
+ INIT_HLIST_NODE(&nf->nf_node);
+ INIT_LIST_HEAD(&nf->nf_lru);
+ nf->nf_file = NULL;
+ nf->nf_flags = 0;
+ nf->nf_inode = inode;
+ nf->nf_hashval = hashval;
+ atomic_set(&nf->nf_ref, 1);
+ nf->nf_may = NFSD_FILE_MAY_MASK & may;
+ if (may & NFSD_MAY_NOT_BREAK_LEASE) {
+ if (may & NFSD_MAY_WRITE)
+ __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+ if (may & NFSD_MAY_READ)
+ __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ }
+ nf->nf_mark = NULL;
+ trace_nfsd_file_alloc(nf);
+ }
+ return nf;
+}
+
+static void
+nfsd_file_put_final(struct nfsd_file *nf)
+{
+ trace_nfsd_file_put_final(nf);
+ if (nf->nf_mark)
+ nfsd_file_mark_put(nf->nf_mark);
+ if (nf->nf_file)
+ fput(nf->nf_file);
+ call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
+}
+
+static bool
+nfsd_file_put_final_delayed(struct nfsd_file *nf)
+{
+ bool flush = false;
+
+ trace_nfsd_file_put_final(nf);
+ if (nf->nf_mark)
+ nfsd_file_mark_put(nf->nf_mark);
+ if (nf->nf_file)
+ flush = fput_global(nf->nf_file);
+ call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
+ return flush;
+}
+
+static bool
+nfsd_file_unhash(struct nfsd_file *nf)
+{
+ lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+
+ trace_nfsd_file_unhash(nf);
+ if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+ clear_bit(NFSD_FILE_HASHED, &nf->nf_flags);
+ hlist_del_rcu(&nf->nf_node);
+ list_lru_del(&nfsd_file_lru, &nf->nf_lru);
+ return true;
+ }
+ return false;
+}
+
+static void
+nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose)
+{
+ lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+
+ trace_nfsd_file_unhash_and_release_locked(nf);
+ if (!nfsd_file_unhash(nf))
+ return;
+ if (!atomic_dec_and_test(&nf->nf_ref))
+ return;
+
+ list_add(&nf->nf_lru, dispose);
+}
+
+void
+nfsd_file_put(struct nfsd_file *nf)
+{
+ trace_nfsd_file_put(nf);
+ set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+ smp_mb__after_atomic();
+ if (atomic_dec_and_test(&nf->nf_ref)) {
+ WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags));
+ nfsd_file_put_final(nf);
+ }
+}
+
+struct nfsd_file *
+nfsd_file_get(struct nfsd_file *nf)
+{
+ if (likely(atomic_inc_not_zero(&nf->nf_ref)))
+ return nf;
+ return NULL;
+}
+
+static void
+nfsd_file_dispose_list(struct list_head *dispose)
+{
+ struct nfsd_file *nf;
+
+ while(!list_empty(dispose)) {
+ nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
+ list_del(&nf->nf_lru);
+ nfsd_file_put_final(nf);
+ }
+}
+
+static void
+nfsd_file_dispose_list_sync(struct list_head *dispose)
+{
+ bool flush = false;
+ struct nfsd_file *nf;
+
+ while(!list_empty(dispose)) {
+ nf = list_first_entry(dispose, struct nfsd_file, nf_lru);
+ list_del(&nf->nf_lru);
+ if (nfsd_file_put_final_delayed(nf))
+ flush = true;
+ }
+ if (flush)
+ fput_global_flush();
+}
+
+static enum lru_status
+nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru,
+ spinlock_t *lock, void *arg)
+ __releases(lock)
+ __acquires(lock)
+{
+ struct nfsd_file *nf = list_entry(item, struct nfsd_file, nf_lru);
+ bool unhashed;
+
+ /*
+ * Do a lockless refcount check. The hashtable holds one reference, so
+ * we look to see if anything else has a reference, or if any have
+ * been put since the shrinker last ran. Those don't get unhashed and
+ * released.
+ *
+ * Note that in the put path, we set the flag and then decrement the
+ * counter. Here we check the counter and then test and clear the flag.
+ * That order is deliberate to ensure that we can do this locklessly.
+ */
+ if (atomic_read(&nf->nf_ref) > 1 ||
+ test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags))
+ return LRU_ROTATE;
+
+ spin_unlock(lock);
+ spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+ unhashed = nfsd_file_unhash(nf);
+ spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock);
+ if (unhashed)
+ nfsd_file_put(nf);
+ spin_lock(lock);
+ return unhashed ? LRU_REMOVED_RETRY : LRU_RETRY;
+}
+
+static unsigned long
+nfsd_file_lru_count(struct shrinker *s, struct shrink_control *sc)
+{
+ return list_lru_count(&nfsd_file_lru);
+}
+
+static unsigned long
+nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc)
+{
+ return list_lru_shrink_walk(&nfsd_file_lru, sc, nfsd_file_lru_cb, NULL);
+}
+
+static struct shrinker nfsd_file_shrinker = {
+ .scan_objects = nfsd_file_lru_scan,
+ .count_objects = nfsd_file_lru_count,
+ .seeks = 1,
+};
+
+static void
+__nfsd_file_close_inode(struct inode *inode, unsigned int hashval,
+ struct list_head *dispose)
+{
+ struct nfsd_file *nf;
+ struct hlist_node *tmp;
+
+ spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
+ if (inode == nf->nf_inode)
+ nfsd_file_unhash_and_release_locked(nf, dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+}
+
+/**
+ * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file
+ * @inode: inode of the file to attempt to remove
+ *
+ * Walk the whole hash bucket, looking for any files that correspond to "inode".
+ * If any do, then unhash them and put the hashtable reference to them and
+ * destroy any that had their last reference put. Also ensure that any of the
+ * fputs also have their final __fput done as well.
+ */
+void
+nfsd_file_close_inode_sync(struct inode *inode)
+{
+ unsigned int hashval = (unsigned int)hash_long(inode->i_ino,
+ NFSD_FILE_HASH_BITS);
+ LIST_HEAD(dispose);
+
+ __nfsd_file_close_inode(inode, hashval, &dispose);
+ trace_nfsd_file_close_inode_sync(inode, hashval, !list_empty(&dispose));
+ nfsd_file_dispose_list_sync(&dispose);
+}
+
+/**
+ * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file
+ * @inode: inode of the file to attempt to remove
+ *
+ * Walk the whole hash bucket, looking for any files that correspond to "inode".
+ * If any do, then unhash them and put the hashtable reference to them and
+ * destroy any that had their last reference put.
+ */
+static void
+nfsd_file_close_inode(struct inode *inode)
+{
+ unsigned int hashval = (unsigned int)hash_long(inode->i_ino,
+ NFSD_FILE_HASH_BITS);
+ LIST_HEAD(dispose);
+
+ __nfsd_file_close_inode(inode, hashval, &dispose);
+ trace_nfsd_file_close_inode(inode, hashval, !list_empty(&dispose));
+ nfsd_file_dispose_list(&dispose);
+}
+
+static int
+nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg,
+ void *data)
+{
+ struct file_lock *fl = data;
+
+ /* Only close files for F_SETLEASE leases */
+ if (fl->fl_flags & FL_LEASE)
+ nfsd_file_close_inode_sync(file_inode(fl->fl_file));
+ return 0;
+}
+
+static struct notifier_block nfsd_file_lease_notifier = {
+ .notifier_call = nfsd_file_lease_notifier_call,
+};
+
+static int
+nfsd_file_fsnotify_handle_event(struct fsnotify_group *group,
+ struct inode *inode,
+ struct fsnotify_mark *inode_mark,
+ struct fsnotify_mark *vfsmount_mark,
+ u32 mask, void *data, int data_type,
+ const unsigned char *file_name, u32 cookie)
+{
+ trace_nfsd_file_fsnotify_handle_event(inode, mask);
+
+ /* Should be no marks on non-regular files */
+ if (!S_ISREG(inode->i_mode)) {
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+
+ /* ...and we don't do anything with vfsmount marks */
+ BUG_ON(vfsmount_mark);
+
+ /* don't close files if this was not the last link */
+ if (mask & FS_ATTRIB) {
+ if (inode->i_nlink)
+ return 0;
+ }
+
+ nfsd_file_close_inode(inode);
+ return 0;
+}
+
+
+const static struct fsnotify_ops nfsd_file_fsnotify_ops = {
+ .handle_event = nfsd_file_fsnotify_handle_event,
+};
+
+int
+nfsd_file_cache_init(void)
+{
+ int ret = -ENOMEM;
+ unsigned int i;
+
+ if (nfsd_file_hashtbl)
+ return 0;
+
+ nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
+ sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
+ if (!nfsd_file_hashtbl) {
+ pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
+ goto out_err;
+ }
+
+ nfsd_file_slab = kmem_cache_create("nfsd_file",
+ sizeof(struct nfsd_file), 0, 0, NULL);
+ if (!nfsd_file_slab) {
+ pr_err("nfsd: unable to create nfsd_file_slab\n");
+ goto out_err;
+ }
+
+ nfsd_file_mark_slab = kmem_cache_create("nfsd_file_mark",
+ sizeof(struct nfsd_file_mark), 0, 0, NULL);
+ if (!nfsd_file_mark_slab) {
+ pr_err("nfsd: unable to create nfsd_file_mark_slab\n");
+ goto out_err;
+ }
+
+
+ ret = list_lru_init(&nfsd_file_lru);
+ if (ret) {
+ pr_err("nfsd: failed to init nfsd_file_lru: %d\n", ret);
+ goto out_err;
+ }
+
+ ret = register_shrinker(&nfsd_file_shrinker);
+ if (ret) {
+ pr_err("nfsd: failed to register nfsd_file_shrinker: %d\n", ret);
+ goto out_lru;
+ }
+
+ ret = srcu_notifier_chain_register(&lease_notifier_chain,
+ &nfsd_file_lease_notifier);
+ if (ret) {
+ pr_err("nfsd: unable to register lease notifier: %d\n", ret);
+ goto out_shrinker;
+ }
+
+ nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops);
+ if (IS_ERR(nfsd_file_fsnotify_group)) {
+ pr_err("nfsd: unable to create fsnotify group: %ld\n",
+ PTR_ERR(nfsd_file_fsnotify_group));
+ nfsd_file_fsnotify_group = NULL;
+ goto out_notifier;
+ }
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head);
+ spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock);
+ }
+out:
+ return ret;
+out_notifier:
+ srcu_notifier_chain_unregister(&lease_notifier_chain,
+ &nfsd_file_lease_notifier);
+out_shrinker:
+ unregister_shrinker(&nfsd_file_shrinker);
+out_lru:
+ list_lru_destroy(&nfsd_file_lru);
+out_err:
+ kmem_cache_destroy(nfsd_file_slab);
+ nfsd_file_slab = NULL;
+ kmem_cache_destroy(nfsd_file_mark_slab);
+ nfsd_file_mark_slab = NULL;
+ kfree(nfsd_file_hashtbl);
+ nfsd_file_hashtbl = NULL;
+ goto out;
+}
+
+void
+nfsd_file_cache_purge(void)
+{
+ unsigned int i;
+ struct nfsd_file *nf;
+ LIST_HEAD(dispose);
+
+ if (!nfsd_file_hashtbl)
+ return;
+
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ spin_lock(&nfsd_file_hashtbl[i].nfb_lock);
+ while(!hlist_empty(&nfsd_file_hashtbl[i].nfb_head)) {
+ nf = hlist_entry(nfsd_file_hashtbl[i].nfb_head.first,
+ struct nfsd_file, nf_node);
+ nfsd_file_unhash_and_release_locked(nf, &dispose);
+ }
+ spin_unlock(&nfsd_file_hashtbl[i].nfb_lock);
+ nfsd_file_dispose_list(&dispose);
+ }
+}
+
+void
+nfsd_file_cache_shutdown(void)
+{
+ LIST_HEAD(dispose);
+
+ srcu_notifier_chain_unregister(&lease_notifier_chain,
+ &nfsd_file_lease_notifier);
+ unregister_shrinker(&nfsd_file_shrinker);
+ nfsd_file_cache_purge();
+ list_lru_destroy(&nfsd_file_lru);
+ rcu_barrier();
+ fsnotify_put_group(nfsd_file_fsnotify_group);
+ nfsd_file_fsnotify_group = NULL;
+ kmem_cache_destroy(nfsd_file_slab);
+ nfsd_file_slab = NULL;
+ kmem_cache_destroy(nfsd_file_mark_slab);
+ nfsd_file_mark_slab = NULL;
+ kfree(nfsd_file_hashtbl);
+ nfsd_file_hashtbl = NULL;
+}
+
+/*
+ * Search nfsd_file_hashtbl[] for file. We hash on the filehandle and also on
+ * the NFSD_MAY_READ/WRITE flags. If the file is open for r/w, then it's usable
+ * for either.
+ */
+static struct nfsd_file *
+nfsd_file_find_locked(struct inode *inode, unsigned int may_flags,
+ unsigned int hashval)
+{
+ struct nfsd_file *nf;
+ unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
+
+ hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head,
+ nf_node) {
+ if ((need & nf->nf_may) != need)
+ continue;
+ if (nf->nf_inode == inode)
+ return nfsd_file_get(nf);
+ }
+ return NULL;
+}
+
+__be32
+nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **pnf)
+{
+ __be32 status;
+ struct nfsd_file *nf, *new = NULL;
+ struct inode *inode;
+ unsigned int hashval;
+
+ /* FIXME: skip this if fh_dentry is already set? */
+ status = fh_verify(rqstp, fhp, S_IFREG,
+ may_flags|NFSD_MAY_OWNER_OVERRIDE);
+ if (status != nfs_ok)
+ return status;
+
+ inode = d_inode(fhp->fh_dentry);
+ hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS);
+retry:
+ rcu_read_lock();
+ nf = nfsd_file_find_locked(inode, may_flags, hashval);
+ rcu_read_unlock();
+ if (nf)
+ goto wait_for_construction;
+
+ if (!new) {
+ new = nfsd_file_alloc(inode, may_flags, hashval);
+ if (!new) {
+ trace_nfsd_file_acquire(hashval, inode, may_flags, NULL,
+ nfserr_jukebox);
+ return nfserr_jukebox;
+ }
+ }
+
+ spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ nf = nfsd_file_find_locked(inode, may_flags, hashval);
+ if (likely(nf == NULL)) {
+ /* Take reference for the hashtable */
+ atomic_inc(&new->nf_ref);
+ __set_bit(NFSD_FILE_HASHED, &new->nf_flags);
+ __set_bit(NFSD_FILE_PENDING, &new->nf_flags);
+ list_lru_add(&nfsd_file_lru, &new->nf_lru);
+ hlist_add_head_rcu(&new->nf_node,
+ &nfsd_file_hashtbl[hashval].nfb_head);
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+ nf = new;
+ new = NULL;
+ goto open_file;
+ }
+ spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
+
+wait_for_construction:
+ wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
+
+ /* Did construction of this file fail? */
+ if (!nf->nf_file) {
+ /*
+ * We can only take over construction for this nfsd_file if the
+ * MAY flags are equal. Otherwise, we put the reference and try
+ * again.
+ */
+ if ((may_flags & NFSD_FILE_MAY_MASK) != nf->nf_may) {
+ nfsd_file_put(nf);
+ goto retry;
+ }
+
+ /* try to take over construction for this file */
+ if (test_and_set_bit(NFSD_FILE_PENDING, &nf->nf_flags))
+ goto wait_for_construction;
+
+ /* sync up the BREAK_* flags with our may_flags */
+ if (may_flags & NFSD_MAY_NOT_BREAK_LEASE) {
+ if (may_flags & NFSD_MAY_WRITE)
+ set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+ if (may_flags & NFSD_MAY_READ)
+ set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ } else {
+ clear_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags);
+ clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ }
+
+ goto open_file;
+ }
+
+ if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) {
+ bool write = (may_flags & NFSD_MAY_WRITE);
+
+ if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) ||
+ (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) {
+ status = nfserrno(nfsd_open_break_lease(
+ file_inode(nf->nf_file), may_flags));
+ if (status == nfs_ok) {
+ clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
+ if (write)
+ clear_bit(NFSD_FILE_BREAK_WRITE,
+ &nf->nf_flags);
+ }
+ }
+ }
+out:
+ if (status == nfs_ok) {
+ *pnf = nf;
+ } else {
+ nfsd_file_put(nf);
+ nf = NULL;
+ }
+
+ if (new)
+ nfsd_file_put(new);
+
+ trace_nfsd_file_acquire(hashval, inode, may_flags, nf, status);
+ return status;
+open_file:
+ if (!nf->nf_mark) {
+ nf->nf_mark = nfsd_file_mark_find_or_create(nf, inode);
+ if (!nf->nf_mark)
+ status = nfserr_jukebox;
+ }
+ /* FIXME: should we abort opening if the link count goes to 0? */
+ if (status == nfs_ok)
+ status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
+ clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
+ smp_mb__after_atomic();
+ wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
+ goto out;
+}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
new file mode 100644
index 000000000000..a045c7fe3655
--- /dev/null
+++ b/fs/nfsd/filecache.h
@@ -0,0 +1,43 @@
+#ifndef _FS_NFSD_FILECACHE_H
+#define _FS_NFSD_FILECACHE_H
+
+#include <linux/fsnotify_backend.h>
+
+struct nfsd_file_mark {
+ struct fsnotify_mark nfm_mark;
+ atomic_t nfm_ref;
+};
+
+/*
+ * A representation of a file that has been opened by knfsd. These are hashed
+ * in the hashtable by inode pointer value. Note that this object doesn't
+ * hold a reference to the inode by itself, so the nf_inode pointer should
+ * never be dereferenced, only used for comparison.
+ */
+struct nfsd_file {
+ struct hlist_node nf_node;
+ struct list_head nf_lru;
+ struct rcu_head nf_rcu;
+ struct file *nf_file;
+#define NFSD_FILE_HASHED (0)
+#define NFSD_FILE_PENDING (1)
+#define NFSD_FILE_BREAK_READ (2)
+#define NFSD_FILE_BREAK_WRITE (3)
+#define NFSD_FILE_REFERENCED (4)
+ unsigned long nf_flags;
+ struct inode *nf_inode;
+ unsigned int nf_hashval;
+ atomic_t nf_ref;
+ unsigned char nf_may;
+ struct nfsd_file_mark *nf_mark;
+};
+
+int nfsd_file_cache_init(void);
+void nfsd_file_cache_purge(void);
+void nfsd_file_cache_shutdown(void);
+void nfsd_file_put(struct nfsd_file *nf);
+struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+void nfsd_file_close_inode_sync(struct inode *inode);
+__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **nfp);
+#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index ad4e2377dd63..d816bb3faa6e 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -22,6 +22,7 @@
#include "cache.h"
#include "vfs.h"
#include "netns.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_SVC

@@ -224,11 +225,17 @@ static int nfsd_startup_generic(int nrservs)
if (ret)
goto dec_users;

- ret = nfs4_state_start();
+ ret = nfsd_file_cache_init();
if (ret)
goto out_racache;
+
+ ret = nfs4_state_start();
+ if (ret)
+ goto out_file_cache;
return 0;

+out_file_cache:
+ nfsd_file_cache_shutdown();
out_racache:
nfsd_racache_shutdown();
dec_users:
@@ -242,6 +249,7 @@ static void nfsd_shutdown_generic(void)
return;

nfs4_state_shutdown();
+ nfsd_file_cache_shutdown();
nfsd_racache_shutdown();
}

diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 0befe762762b..bbfa8aec762b 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -10,6 +10,8 @@
#include <linux/tracepoint.h>

#include "state.h"
+#include "filecache.h"
+#include "vfs.h"

DECLARE_EVENT_CLASS(nfsd_stateid_class,
TP_PROTO(stateid_t *stp),
@@ -48,6 +50,139 @@ DEFINE_STATEID_EVENT(layout_recall_done);
DEFINE_STATEID_EVENT(layout_recall_fail);
DEFINE_STATEID_EVENT(layout_recall_release);

+#define show_nf_flags(val) \
+ __print_flags(val, "|", \
+ { 1 << NFSD_FILE_HASHED, "HASHED" }, \
+ { 1 << NFSD_FILE_PENDING, "PENDING" }, \
+ { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \
+ { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" })
+
+/* FIXME: This should probably be fleshed out in the future. */
+#define show_nf_may(val) \
+ __print_flags(val, "|", \
+ { NFSD_MAY_READ, "READ" }, \
+ { NFSD_MAY_WRITE, "WRITE" }, \
+ { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" })
+
+DECLARE_EVENT_CLASS(nfsd_file_class,
+ TP_PROTO(struct nfsd_file *nf),
+ TP_ARGS(nf),
+ TP_STRUCT__entry(
+ __field(unsigned int, nf_hashval)
+ __field(void *, nf_inode)
+ __field(int, nf_ref)
+ __field(unsigned long, nf_flags)
+ __field(unsigned char, nf_may)
+ __field(struct file *, nf_file)
+ ),
+ TP_fast_assign(
+ __entry->nf_hashval = nf->nf_hashval;
+ __entry->nf_inode = nf->nf_inode;
+ __entry->nf_ref = atomic_read(&nf->nf_ref);
+ __entry->nf_flags = nf->nf_flags;
+ __entry->nf_may = nf->nf_may;
+ __entry->nf_file = nf->nf_file;
+ ),
+ TP_printk("hash=0x%x inode=0x%p ref=%d flags=%s may=%s file=%p",
+ __entry->nf_hashval,
+ __entry->nf_inode,
+ __entry->nf_ref,
+ show_nf_flags(__entry->nf_flags),
+ show_nf_may(__entry->nf_may),
+ __entry->nf_file)
+)
+
+#define DEFINE_NFSD_FILE_EVENT(name) \
+DEFINE_EVENT(nfsd_file_class, name, \
+ TP_PROTO(struct nfsd_file *nf), \
+ TP_ARGS(nf))
+
+DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_put);
+DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked);
+
+TRACE_EVENT(nfsd_file_acquire,
+ TP_PROTO(unsigned int hash, struct inode *inode,
+ unsigned int may_flags, struct nfsd_file *nf,
+ __be32 status),
+
+ TP_ARGS(hash, inode, may_flags, nf, status),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, hash)
+ __field(void *, inode)
+ __field(unsigned int, may_flags)
+ __field(int, nf_ref)
+ __field(unsigned long, nf_flags)
+ __field(unsigned char, nf_may)
+ __field(struct file *, nf_file)
+ __field(__be32, status)
+ ),
+
+ TP_fast_assign(
+ __entry->hash = hash;
+ __entry->inode = inode;
+ __entry->may_flags = may_flags;
+ __entry->nf_ref = nf ? atomic_read(&nf->nf_ref) : 0;
+ __entry->nf_flags = nf ? nf->nf_flags : 0;
+ __entry->nf_may = nf ? nf->nf_may : 0;
+ __entry->nf_file = nf ? nf->nf_file : NULL;
+ __entry->status = status;
+ ),
+
+ TP_printk("hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u",
+ __entry->hash, __entry->inode,
+ show_nf_may(__entry->may_flags), __entry->nf_ref,
+ show_nf_flags(__entry->nf_flags),
+ show_nf_may(__entry->nf_may), __entry->nf_file,
+ be32_to_cpu(__entry->status))
+);
+
+DECLARE_EVENT_CLASS(nfsd_file_search_class,
+ TP_PROTO(struct inode *inode, unsigned int hash, int found),
+ TP_ARGS(inode, hash, found),
+ TP_STRUCT__entry(
+ __field(struct inode *, inode)
+ __field(unsigned int, hash)
+ __field(int, found)
+ ),
+ TP_fast_assign(
+ __entry->inode = inode;
+ __entry->hash = hash;
+ __entry->found = found;
+ ),
+ TP_printk("hash=0x%x inode=0x%p found=%d", __entry->hash,
+ __entry->inode, __entry->found)
+);
+
+#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \
+DEFINE_EVENT(nfsd_file_search_class, name, \
+ TP_PROTO(struct inode *inode, unsigned int hash, int found), \
+ TP_ARGS(inode, hash, found))
+
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync);
+DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode);
+
+TRACE_EVENT(nfsd_file_fsnotify_handle_event,
+ TP_PROTO(struct inode *inode, u32 mask),
+ TP_ARGS(inode, mask),
+ TP_STRUCT__entry(
+ __field(struct inode *, inode)
+ __field(unsigned int, nlink)
+ __field(umode_t, mode)
+ __field(u32, mask)
+ ),
+ TP_fast_assign(
+ __entry->inode = inode;
+ __entry->nlink = inode->i_nlink;
+ __entry->mode = inode->i_mode;
+ __entry->mask = mask;
+ ),
+ TP_printk("inode=0x%p nlink=%u mode=0%ho mask=0x%x", __entry->inode,
+ __entry->nlink, __entry->mode, __entry->mask)
+);
#endif /* _NFSD_TRACE_H */

#undef TRACE_INCLUDE_PATH
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 4212aaacbb55..a144849cec10 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -623,7 +623,8 @@ nfsd_access(struct svc_rqst *rqstp, struct svc_fh *fhp, u32 *access, u32 *suppor
}
#endif /* CONFIG_NFSD_V3 */

-static int nfsd_open_break_lease(struct inode *inode, int access)
+int
+nfsd_open_break_lease(struct inode *inode, int access)
{
unsigned int mode;

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fcfc48cbe136..a877be59d5dd 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -69,6 +69,7 @@ __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *,
__be32 nfsd_commit(struct svc_rqst *, struct svc_fh *,
loff_t, unsigned long);
#endif /* CONFIG_NFSD_V3 */
+int nfsd_open_break_lease(struct inode *, int);
__be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
int, struct file **);
struct raparms;
--
2.4.3


2015-10-20 17:34:01

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 03/19] fs: add a kerneldoc header to fput

...and move its EXPORT_SYMBOL just below the function.

Cc: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/file_table.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 52cc6803c07a..8cfeaee6323f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -261,6 +261,25 @@ void flush_delayed_fput(void)
flush_delayed_work(&delayed_fput_work);
}

+/**
+ * fput - put a struct file reference
+ * @file: file of which to put the reference
+ *
+ * This function decrements the reference count for the struct file reference,
+ * and queues it up for destruction if the count goes to zero. In the case of
+ * most tasks we queue it to the task_work infrastructure, which will be run
+ * just before the task returns back to userspace. kthreads however never
+ * return to userspace, so for those we add them to a global list and schedule
+ * a delayed workqueue job to do the final cleanup work.
+ *
+ * Why not just do it synchronously? __fput can involve taking locks of all
+ * sorts, and doing it synchronously means that the callers must take extra care
+ * not to deadlock. That can be very difficult to ensure, so by deferring it
+ * until just before return to userland or to the workqueue, we sidestep that
+ * nastiness. Also, __fput can be quite stack intensive, so doing a final fput
+ * has the possibility of blowing up if we don't take steps to ensure that we
+ * have enough stack space to make it work.
+ */
void fput(struct file *file)
{
if (atomic_long_dec_and_test(&file->f_count)) {
@@ -281,6 +300,7 @@ void fput(struct file *file)
schedule_delayed_work(&delayed_fput_work, 1);
}
}
+EXPORT_SYMBOL(fput);

/*
* synchronous analog of fput(); for kernel threads that might be needed
@@ -299,7 +319,6 @@ void __fput_sync(struct file *file)
}
}

-EXPORT_SYMBOL(fput);

void put_filp(struct file *file)
{
--
2.4.3


2015-10-20 17:34:02

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 04/19] fs: rename "delayed_fput" infrastructure to "fput_global"

delayed_fput work is only delayed until the workqueue job runs. It's
even possible that it will run immediately after you call it if the
workqueue job happens to be executing at the time. What it does
guarantee is that the final __fput will run in a different task than the
the one that queued it. It's this property that we want to use in the
NFSD code to close files in advance of a setlease attempt by userland.

Change the name of the "delayed_fput" infrastructure to "fput_global"
as that better describes what this code does.

Cc: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/file_table.c | 38 ++++++++++++++++++++------------------
include/linux/file.h | 2 +-
init/main.c | 2 +-
3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 8cfeaee6323f..0bf8ddc680ab 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -227,10 +227,10 @@ static void __fput(struct file *file)
mntput(mnt);
}

-static LLIST_HEAD(delayed_fput_list);
-static void delayed_fput(struct work_struct *unused)
+static LLIST_HEAD(global_fput_list);
+static void global_fput(struct work_struct *unused)
{
- struct llist_node *node = llist_del_all(&delayed_fput_list);
+ struct llist_node *node = llist_del_all(&global_fput_list);
struct llist_node *next;

for (; node; node = next) {
@@ -244,21 +244,23 @@ static void ____fput(struct callback_head *work)
__fput(container_of(work, struct file, f_u.fu_rcuhead));
}

-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+static DECLARE_DELAYED_WORK(global_fput_work, global_fput);

-/*
- * If kernel thread really needs to have the final fput() it has done
- * to complete, call this. The only user right now is the boot - we
- * *do* need to make sure our writes to binaries on initramfs has
- * not left us with opened struct file waiting for __fput() - execve()
- * won't work without that. Please, don't add more callers without
- * very good reasons; in particular, never call that with locks
- * held and never call that from a thread that might need to do
- * some work on any kind of umount.
+/**
+ * fput_global_flush - ensure that all global_fput work is complete
+ *
+ * If a kernel thread really needs to have the final fput() it has done to
+ * complete, call this. One of the main users is the boot - we *do* need to
+ * make sure our writes to binaries on initramfs has not left us with opened
+ * struct file waiting for __fput() - execve() won't work without that.
+ *
+ * Please, don't add more callers without very good reasons; in particular,
+ * never call that with locks held and never from a thread that might need to
+ * do some work on any kind of umount.
*/
-void flush_delayed_fput(void)
+void fput_global_flush(void)
{
- flush_delayed_work(&delayed_fput_work);
+ flush_delayed_work(&global_fput_work);
}

/**
@@ -296,15 +298,15 @@ void fput(struct file *file)
*/
}

- if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
- schedule_delayed_work(&delayed_fput_work, 1);
+ if (llist_add(&file->f_u.fu_llist, &global_fput_list))
+ schedule_delayed_work(&global_fput_work, 1);
}
}
EXPORT_SYMBOL(fput);

/*
* synchronous analog of fput(); for kernel threads that might be needed
- * in some umount() (and thus can't use flush_delayed_fput() without
+ * in some umount() (and thus can't use fput_global_flush() without
* risking deadlocks), need to wait for completion of __fput() and know
* for this specific struct file it won't involve anything that would
* need them. Use only if you really need it - at the very least,
diff --git a/include/linux/file.h b/include/linux/file.h
index f87d30882a24..73bb7cee57e9 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -70,7 +70,7 @@ extern void put_unused_fd(unsigned int fd);

extern void fd_install(unsigned int fd, struct file *file);

-extern void flush_delayed_fput(void);
+extern void fput_global_flush(void);
extern void __fput_sync(struct file *);

#endif /* __LINUX_FILE_H */
diff --git a/init/main.c b/init/main.c
index 9e64d7097f1a..6cb24a16ddd6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -941,7 +941,7 @@ static int __ref kernel_init(void *unused)
system_state = SYSTEM_RUNNING;
numa_default_policy();

- flush_delayed_fput();
+ fput_global_flush();

if (ramdisk_execute_command) {
ret = run_init_process(ramdisk_execute_command);
--
2.4.3


2015-10-20 17:34:15

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 17/19] nfsd: convert fi_deleg_file and ls_file fields to nfsd_file

Have them keep an nfsd_file reference instead of a struct file.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4layouts.c | 12 ++---
fs/nfsd/nfs4state.c | 131 ++++++++++++++++++++++++++------------------------
fs/nfsd/state.h | 6 +--
3 files changed, 76 insertions(+), 73 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 9ffef06b30d5..1ee1881cf8d0 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -144,8 +144,8 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
list_del_init(&ls->ls_perfile);
spin_unlock(&fp->fi_lock);

- vfs_setlease(ls->ls_file, F_UNLCK, NULL, (void **)&ls);
- fput(ls->ls_file);
+ vfs_setlease(ls->ls_file->nf_file, F_UNLCK, NULL, (void **)&ls);
+ nfsd_file_put(ls->ls_file);

if (ls->ls_recalled)
atomic_dec(&ls->ls_stid.sc_file->fi_lo_recalls);
@@ -169,7 +169,7 @@ nfsd4_layout_setlease(struct nfs4_layout_stateid *ls)
fl->fl_end = OFFSET_MAX;
fl->fl_owner = ls;
fl->fl_pid = current->tgid;
- fl->fl_file = ls->ls_file;
+ fl->fl_file = ls->ls_file->nf_file;

status = vfs_setlease(fl->fl_file, fl->fl_type, &fl, NULL);
if (status) {
@@ -207,13 +207,13 @@ nfsd4_alloc_layout_stateid(struct nfsd4_compound_state *cstate,
NFSPROC4_CLNT_CB_LAYOUT);

if (parent->sc_type == NFS4_DELEG_STID)
- ls->ls_file = get_file(fp->fi_deleg_file);
+ ls->ls_file = nfsd_file_get(fp->fi_deleg_file);
else
ls->ls_file = find_any_file(fp);
BUG_ON(!ls->ls_file);

if (nfsd4_layout_setlease(ls)) {
- fput(ls->ls_file);
+ nfsd_file_put(ls->ls_file);
put_nfs4_file(fp);
kmem_cache_free(nfs4_layout_stateid_cache, ls);
return NULL;
@@ -598,7 +598,7 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)

argv[0] = "/sbin/nfsd-recall-failed";
argv[1] = addr_str;
- argv[2] = ls->ls_file->f_path.mnt->mnt_sb->s_id;
+ argv[2] = ls->ls_file->nf_file->f_path.mnt->mnt_sb->s_id;
argv[3] = NULL;

error = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4348af408ccb..4d58a6db6e41 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -289,18 +289,18 @@ put_nfs4_file(struct nfs4_file *fi)
}
}

-static struct file *
+static struct nfsd_file *
__nfs4_get_fd(struct nfs4_file *f, int oflag)
{
if (f->fi_fds[oflag])
- return get_file(f->fi_fds[oflag]->nf_file);
+ return nfsd_file_get(f->fi_fds[oflag]);
return NULL;
}

-static struct file *
+static struct nfsd_file *
find_writeable_file_locked(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;

lockdep_assert_held(&f->fi_lock);

@@ -310,10 +310,10 @@ find_writeable_file_locked(struct nfs4_file *f)
return ret;
}

-static struct file *
+static struct nfsd_file *
find_writeable_file(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;

spin_lock(&f->fi_lock);
ret = find_writeable_file_locked(f);
@@ -322,9 +322,10 @@ find_writeable_file(struct nfs4_file *f)
return ret;
}

-static struct file *find_readable_file_locked(struct nfs4_file *f)
+static struct nfsd_file *
+find_readable_file_locked(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;

lockdep_assert_held(&f->fi_lock);

@@ -334,10 +335,10 @@ static struct file *find_readable_file_locked(struct nfs4_file *f)
return ret;
}

-static struct file *
+static struct nfsd_file *
find_readable_file(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;

spin_lock(&f->fi_lock);
ret = find_readable_file_locked(f);
@@ -346,10 +347,10 @@ find_readable_file(struct nfs4_file *f)
return ret;
}

-struct file *
+struct nfsd_file *
find_any_file(struct nfs4_file *f)
{
- struct file *ret;
+ struct nfsd_file *ret;

spin_lock(&f->fi_lock);
ret = __nfs4_get_fd(f, O_RDWR);
@@ -761,16 +762,16 @@ nfs4_inc_and_copy_stateid(stateid_t *dst, struct nfs4_stid *stid)

static void nfs4_put_deleg_lease(struct nfs4_file *fp)
{
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;

spin_lock(&fp->fi_lock);
if (fp->fi_deleg_file && --fp->fi_delegees == 0)
- swap(filp, fp->fi_deleg_file);
+ swap(nf, fp->fi_deleg_file);
spin_unlock(&fp->fi_lock);

- if (filp) {
- vfs_setlease(filp, F_UNLCK, NULL, (void **)&fp);
- fput(filp);
+ if (nf) {
+ vfs_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&fp);
+ nfsd_file_put(nf);
}
}

@@ -1062,11 +1063,14 @@ static void nfs4_free_lock_stateid(struct nfs4_stid *stid)
{
struct nfs4_ol_stateid *stp = openlockstateid(stid);
struct nfs4_lockowner *lo = lockowner(stp->st_stateowner);
- struct file *file;
+ struct nfsd_file *nf;

- file = find_any_file(stp->st_stid.sc_file);
- if (file)
- filp_close(file, (fl_owner_t)lo);
+ nf = find_any_file(stp->st_stid.sc_file);
+ if (nf) {
+ get_file(nf->nf_file);
+ filp_close(nf->nf_file, (fl_owner_t)lo);
+ nfsd_file_put(nf);
+ }
nfs4_free_ol_stateid(stid);
}

@@ -3964,21 +3968,21 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
{
struct nfs4_file *fp = dp->dl_stid.sc_file;
struct file_lock *fl;
- struct file *filp;
+ struct nfsd_file *nf;
int status = 0;

fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
if (!fl)
return -ENOMEM;
- filp = find_readable_file(fp);
- if (!filp) {
+ nf = find_readable_file(fp);
+ if (!nf) {
/* We should always have a readable file here */
WARN_ON_ONCE(1);
locks_free_lock(fl);
return -EBADF;
}
- fl->fl_file = filp;
- status = vfs_setlease(filp, fl->fl_type, &fl, NULL);
+ fl->fl_file = nf->nf_file;
+ status = vfs_setlease(nf->nf_file, fl->fl_type, &fl, NULL);
if (fl)
locks_free_lock(fl);
if (status)
@@ -3996,7 +4000,7 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
hash_delegation_locked(dp, fp);
goto out_unlock;
}
- fp->fi_deleg_file = filp;
+ fp->fi_deleg_file = nf;
fp->fi_delegees = 1;
hash_delegation_locked(dp, fp);
spin_unlock(&fp->fi_lock);
@@ -4006,7 +4010,7 @@ out_unlock:
spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
out_fput:
- fput(filp);
+ nfsd_file_put(nf);
return status;
}

@@ -4618,7 +4622,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
return nfs_ok;
}

-static struct file *
+static struct nfsd_file *
nfs4_find_file(struct nfs4_stid *s, int flags)
{
if (!s)
@@ -4628,7 +4632,7 @@ nfs4_find_file(struct nfs4_stid *s, int flags)
case NFS4_DELEG_STID:
if (WARN_ON_ONCE(!s->sc_file->fi_deleg_file))
return NULL;
- return get_file(s->sc_file->fi_deleg_file);
+ return nfsd_file_get(s->sc_file->fi_deleg_file);
case NFS4_OPEN_STID:
case NFS4_LOCK_STID:
if (flags & RD_STATE)
@@ -4657,21 +4661,17 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
struct file **filpp, bool *tmp_file, int flags)
{
int acc = (flags & RD_STATE) ? NFSD_MAY_READ : NFSD_MAY_WRITE;
- struct file *file;
+ struct nfsd_file *nf;
__be32 status;

- file = nfs4_find_file(s, flags);
- if (file) {
+ nf = nfs4_find_file(s, flags);
+ if (nf) {
status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
acc | NFSD_MAY_OWNER_OVERRIDE);
- if (status) {
- fput(file);
- return status;
- }
-
- *filpp = file;
+ if (status)
+ goto out;
} else {
- status = nfsd_open(rqstp, fhp, S_IFREG, acc, filpp);
+ status = nfsd_file_acquire(rqstp, fhp, acc, &nf);
if (status)
return status;

@@ -4679,7 +4679,10 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
*tmp_file = true;
}

- return 0;
+ *filpp = get_file(nf->nf_file);
+out:
+ nfsd_file_put(nf);
+ return status;
}

/*
@@ -5413,7 +5416,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfs4_ol_stateid *lock_stp = NULL;
struct nfs4_ol_stateid *open_stp = NULL;
struct nfs4_file *fp;
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;
struct file_lock *file_lock = NULL;
struct file_lock *conflock = NULL;
__be32 status = 0;
@@ -5498,8 +5501,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
case NFS4_READ_LT:
case NFS4_READW_LT:
spin_lock(&fp->fi_lock);
- filp = find_readable_file_locked(fp);
- if (filp)
+ nf = find_readable_file_locked(fp);
+ if (nf)
get_lock_access(lock_stp, NFS4_SHARE_ACCESS_READ);
spin_unlock(&fp->fi_lock);
file_lock->fl_type = F_RDLCK;
@@ -5507,8 +5510,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
case NFS4_WRITE_LT:
case NFS4_WRITEW_LT:
spin_lock(&fp->fi_lock);
- filp = find_writeable_file_locked(fp);
- if (filp)
+ nf = find_writeable_file_locked(fp);
+ if (nf)
get_lock_access(lock_stp, NFS4_SHARE_ACCESS_WRITE);
spin_unlock(&fp->fi_lock);
file_lock->fl_type = F_WRLCK;
@@ -5517,14 +5520,14 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
status = nfserr_inval;
goto out;
}
- if (!filp) {
+ if (!nf) {
status = nfserr_openmode;
goto out;
}

file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(&lock_sop->lo_owner));
file_lock->fl_pid = current->tgid;
- file_lock->fl_file = filp;
+ file_lock->fl_file = nf->nf_file;
file_lock->fl_flags = FL_POSIX;
file_lock->fl_lmops = &nfsd_posix_mng_ops;
file_lock->fl_start = lock->lk_offset;
@@ -5538,7 +5541,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
goto out;
}

- err = vfs_lock_file(filp, F_SETLK, file_lock, conflock);
+ err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, conflock);
switch (-err) {
case 0: /* success! */
nfs4_inc_and_copy_stateid(&lock->lk_resp_stateid, &lock_stp->st_stid);
@@ -5558,8 +5561,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
break;
}
out:
- if (filp)
- fput(filp);
+ if (nf)
+ nfsd_file_put(nf);
if (lock_stp) {
/* Bump seqid manually if the 4.0 replay owner is openowner */
if (cstate->replay_owner &&
@@ -5686,7 +5689,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfsd4_locku *locku)
{
struct nfs4_ol_stateid *stp;
- struct file *filp = NULL;
+ struct nfsd_file *nf;
struct file_lock *file_lock = NULL;
__be32 status;
int err;
@@ -5704,8 +5707,8 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
&stp, nn);
if (status)
goto out;
- filp = find_any_file(stp->st_stid.sc_file);
- if (!filp) {
+ nf = find_any_file(stp->st_stid.sc_file);
+ if (!nf) {
status = nfserr_lock_range;
goto put_stateid;
}
@@ -5713,13 +5716,13 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (!file_lock) {
dprintk("NFSD: %s: unable to allocate lock!\n", __func__);
status = nfserr_jukebox;
- goto fput;
+ goto put_file;
}

file_lock->fl_type = F_UNLCK;
file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(stp->st_stateowner));
file_lock->fl_pid = current->tgid;
- file_lock->fl_file = filp;
+ file_lock->fl_file = nf->nf_file;
file_lock->fl_flags = FL_POSIX;
file_lock->fl_lmops = &nfsd_posix_mng_ops;
file_lock->fl_start = locku->lu_offset;
@@ -5728,14 +5731,14 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
locku->lu_length);
nfs4_transform_lock_offset(file_lock);

- err = vfs_lock_file(filp, F_SETLK, file_lock, NULL);
+ err = vfs_lock_file(nf->nf_file, F_SETLK, file_lock, NULL);
if (err) {
dprintk("NFSD: nfs4_locku: vfs_lock_file failed!\n");
goto out_nfserr;
}
nfs4_inc_and_copy_stateid(&locku->lu_stateid, &stp->st_stid);
-fput:
- fput(filp);
+put_file:
+ nfsd_file_put(nf);
put_stateid:
up_write(&stp->st_rwsem);
nfs4_put_stid(&stp->st_stid);
@@ -5747,7 +5750,7 @@ out:

out_nfserr:
status = nfserrno(err);
- goto fput;
+ goto put_file;
}

/*
@@ -5760,17 +5763,17 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
{
struct file_lock *fl;
int status = false;
- struct file *filp = find_any_file(fp);
+ struct nfsd_file *nf = find_any_file(fp);
struct inode *inode;
struct file_lock_context *flctx;

- if (!filp) {
+ if (!nf) {
/* Any valid lock stateid should have some sort of access */
WARN_ON_ONCE(1);
return status;
}

- inode = file_inode(filp);
+ inode = file_inode(nf->nf_file);
flctx = inode->i_flctx;

if (flctx && !list_empty_careful(&flctx->flc_posix)) {
@@ -5783,7 +5786,7 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
}
spin_unlock(&flctx->flc_lock);
}
- fput(filp);
+ nfsd_file_put(nf);
return status;
}

diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 473faa436e07..ee23de10663c 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -501,7 +501,7 @@ struct nfs4_file {
*/
atomic_t fi_access[2];
u32 fi_share_deny;
- struct file *fi_deleg_file;
+ struct nfsd_file *fi_deleg_file;
int fi_delegees;
struct knfsd_fh fi_fhandle;
bool fi_had_conflict;
@@ -550,7 +550,7 @@ struct nfs4_layout_stateid {
spinlock_t ls_lock;
struct list_head ls_layouts;
u32 ls_layout_type;
- struct file *ls_file;
+ struct nfsd_file *ls_file;
struct nfsd4_callback ls_recall;
stateid_t ls_recall_sid;
bool ls_recalled;
@@ -615,7 +615,7 @@ static inline void get_nfs4_file(struct nfs4_file *fi)
{
atomic_inc(&fi->fi_ref);
}
-struct file *find_any_file(struct nfs4_file *f);
+struct nfsd_file *find_any_file(struct nfs4_file *f);

/* grace period management */
void nfsd4_end_grace(struct nfsd_net *nn);
--
2.4.3


2015-10-20 17:34:13

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 16/19] nfsd: have nfsd_test_lock use the nfsd_file cache

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b72fa6816860..4348af408ccb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5596,11 +5596,11 @@ out:
*/
static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock)
{
- struct file *file;
- __be32 err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
+ struct nfsd_file *nf;
+ __be32 err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
if (!err) {
- err = nfserrno(vfs_test_lock(file, lock));
- fput(file);
+ err = nfserrno(vfs_test_lock(nf->nf_file, lock));
+ nfsd_file_put(nf);
}
return err;
}
--
2.4.3


2015-10-20 17:34:08

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 10/19] nfsd: keep some rudimentary stats on nfsd_file cache

Per chain count and max length, protected by the per-chain spinlock.
When the file is read, we walk the array of buckets and fetch the
count from each.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.c | 40 ++++++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.h | 1 +
fs/nfsd/nfsctl.c | 10 ++++++++++
3 files changed, 51 insertions(+)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2cbdf9889993..aca3c3c638ec 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -11,6 +11,7 @@
#include <linux/sched.h>
#include <linux/list_lru.h>
#include <linux/fsnotify_backend.h>
+#include <linux/seq_file.h>

#include "vfs.h"
#include "nfsd.h"
@@ -30,6 +31,8 @@
struct nfsd_fcache_bucket {
struct hlist_head nfb_head;
spinlock_t nfb_lock;
+ unsigned int nfb_count;
+ unsigned int nfb_maxcount;
};

static struct kmem_cache *nfsd_file_slab;
@@ -171,6 +174,7 @@ nfsd_file_unhash(struct nfsd_file *nf)

trace_nfsd_file_unhash(nf);
if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+ --nfsd_file_hashtbl[nf->nf_hashval].nfb_count;
clear_bit(NFSD_FILE_HASHED, &nf->nf_flags);
hlist_del_rcu(&nf->nf_node);
list_lru_del(&nfsd_file_lru, &nf->nf_lru);
@@ -587,6 +591,9 @@ retry:
list_lru_add(&nfsd_file_lru, &new->nf_lru);
hlist_add_head_rcu(&new->nf_node,
&nfsd_file_hashtbl[hashval].nfb_head);
+ ++nfsd_file_hashtbl[hashval].nfb_count;
+ nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount,
+ nfsd_file_hashtbl[hashval].nfb_count);
spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
nf = new;
new = NULL;
@@ -669,3 +676,36 @@ open_file:
wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
goto out;
}
+
+/*
+ * Note that fields may be added, removed or reordered in the future. Programs
+ * scraping this file for info should test the labels to ensure they're
+ * getting the correct field.
+ */
+static int nfsd_file_cache_stats_show(struct seq_file *m, void *v)
+{
+ unsigned int i, count = 0, longest = 0;
+
+ /*
+ * No need for spinlocks here since we're not terribly interested in
+ * accuracy. We do take the nfsd_mutex simply to ensure that we
+ * don't end up racing with server shutdown
+ */
+ mutex_lock(&nfsd_mutex);
+ if (nfsd_file_hashtbl) {
+ for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) {
+ count += nfsd_file_hashtbl[i].nfb_count;
+ longest = max(longest, nfsd_file_hashtbl[i].nfb_count);
+ }
+ }
+ mutex_unlock(&nfsd_mutex);
+
+ seq_printf(m, "total entries: %u\n", count);
+ seq_printf(m, "longest chain: %u\n", longest);
+ return 0;
+}
+
+int nfsd_file_cache_stats_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, nfsd_file_cache_stats_show, NULL);
+}
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index a045c7fe3655..756aea5431da 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -40,4 +40,5 @@ struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
void nfsd_file_close_inode_sync(struct inode *inode);
__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int may_flags, struct nfsd_file **nfp);
+int nfsd_file_cache_stats_open(struct inode *, struct file *);
#endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9690cb4dd588..eff44a277f70 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -22,6 +22,7 @@
#include "state.h"
#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"

/*
* We have a single directory with several nodes in it.
@@ -36,6 +37,7 @@ enum {
NFSD_Threads,
NFSD_Pool_Threads,
NFSD_Pool_Stats,
+ NFSD_File_Cache_Stats,
NFSD_Reply_Cache_Stats,
NFSD_Versions,
NFSD_Ports,
@@ -220,6 +222,13 @@ static const struct file_operations pool_stats_operations = {
.owner = THIS_MODULE,
};

+static struct file_operations file_cache_stats_operations = {
+ .open = nfsd_file_cache_stats_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
static struct file_operations reply_cache_stats_operations = {
.open = nfsd_reply_cache_stats_open,
.read = seq_read,
@@ -1138,6 +1147,7 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO},
+ [NFSD_File_Cache_Stats] = {"file_cache_stats", &file_cache_stats_operations, S_IRUGO},
[NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO},
[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
--
2.4.3


2015-10-20 17:34:09

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 11/19] nfsd: allow filecache open to skip fh_verify check

Currently, we call fh_verify twice on the filehandle. Once when we call
into nfsd_file_acquire, and then again from nfsd_open. The second one is
completely superfluous though, and fh_verify can do some things that
require a fair bit of work (checking permissions, for instance).

Create a new nfsd_open_verified function that will do an nfsd_open on a
filehandle that has already been verified. Call that from the filecache
code.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.c | 3 ++-
fs/nfsd/vfs.c | 63 +++++++++++++++++++++++++++++++++++------------------
fs/nfsd/vfs.h | 2 ++
3 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index aca3c3c638ec..3a4e423d2bb9 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -670,7 +670,8 @@ open_file:
}
/* FIXME: should we abort opening if the link count goes to 0? */
if (status == nfs_ok)
- status = nfsd_open(rqstp, fhp, S_IFREG, may_flags, &nf->nf_file);
+ status = nfsd_open_verified(rqstp, fhp, S_IFREG, may_flags,
+ &nf->nf_file);
clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
smp_mb__after_atomic();
wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index a144849cec10..cf4a2018d57a 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -640,9 +640,9 @@ nfsd_open_break_lease(struct inode *inode, int access)
* and additional flags.
* N.B. After this call fhp needs an fh_put
*/
-__be32
-nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
- int may_flags, struct file **filp)
+static __be32
+__nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+ int may_flags, struct file **filp)
{
struct path path;
struct inode *inode;
@@ -651,24 +651,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
__be32 err;
int host_err = 0;

- validate_process_creds();
-
- /*
- * If we get here, then the client has already done an "open",
- * and (hopefully) checked permission - so allow OWNER_OVERRIDE
- * in case a chmod has now revoked permission.
- *
- * Arguably we should also allow the owner override for
- * directories, but we never have and it doesn't seem to have
- * caused anyone a problem. If we were to change this, note
- * also that our filldir callbacks would need a variant of
- * lookup_one_len that doesn't check permissions.
- */
- if (type == S_IFREG)
- may_flags |= NFSD_MAY_OWNER_OVERRIDE;
- err = fh_verify(rqstp, fhp, type, may_flags);
- if (err)
- goto out;
+ BUG_ON(!fhp->fh_dentry);

path.mnt = fhp->fh_export->ex_path.mnt;
path.dentry = fhp->fh_dentry;
@@ -723,6 +706,44 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
out_nfserr:
err = nfserrno(host_err);
out:
+ return err;
+}
+
+__be32
+nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+ int may_flags, struct file **filp)
+{
+ __be32 err;
+
+ validate_process_creds();
+ /*
+ * If we get here, then the client has already done an "open",
+ * and (hopefully) checked permission - so allow OWNER_OVERRIDE
+ * in case a chmod has now revoked permission.
+ *
+ * Arguably we should also allow the owner override for
+ * directories, but we never have and it doesn't seem to have
+ * caused anyone a problem. If we were to change this, note
+ * also that our filldir callbacks would need a variant of
+ * lookup_one_len that doesn't check permissions.
+ */
+ if (type == S_IFREG)
+ may_flags |= NFSD_MAY_OWNER_OVERRIDE;
+ err = fh_verify(rqstp, fhp, type, may_flags);
+ if (!err)
+ err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
+ validate_process_creds();
+ return err;
+}
+
+__be32
+nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
+ int may_flags, struct file **filp)
+{
+ __be32 err;
+
+ validate_process_creds();
+ err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
validate_process_creds();
return err;
}
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index a877be59d5dd..b3beb896b08d 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -72,6 +72,8 @@ __be32 nfsd_commit(struct svc_rqst *, struct svc_fh *,
int nfsd_open_break_lease(struct inode *, int);
__be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
int, struct file **);
+__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t,
+ int, struct file **);
struct raparms;
__be32 nfsd_splice_read(struct svc_rqst *,
struct file *, loff_t, unsigned long *);
--
2.4.3


2015-10-20 17:34:10

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 12/19] nfsd: hook up nfsd_write to the new nfsd_file cache

Note that all callers currently pass in NULL for "file" anyway, so
there was already some dead code in here. Just eliminate that parm
and have it use the file cache instead of dealing directly with a
filp.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfsproc.c | 2 +-
fs/nfsd/vfs.c | 33 +++++++++++----------------------
fs/nfsd/vfs.h | 2 +-
4 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 7b755b7f785c..4e46ac511479 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -192,7 +192,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp, struct nfsd3_writeargs *argp,

fh_copy(&resp->fh, &argp->fh);
resp->committed = argp->stable;
- nfserr = nfsd_write(rqstp, &resp->fh, NULL,
+ nfserr = nfsd_write(rqstp, &resp->fh,
argp->offset,
rqstp->rq_vec, argp->vlen,
&cnt,
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 4cd78ef4c95c..9893095cbee1 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -213,7 +213,7 @@ nfsd_proc_write(struct svc_rqst *rqstp, struct nfsd_writeargs *argp,
SVCFH_fmt(&argp->fh),
argp->len, argp->offset);

- nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), NULL,
+ nfserr = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh),
argp->offset,
rqstp->rq_vec, argp->vlen,
&cnt,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index cf4a2018d57a..cb1ab146246b 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -42,6 +42,7 @@

#include "nfsd.h"
#include "vfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_FILEOP

@@ -1030,30 +1031,18 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
* N.B. After this call fhp needs an fh_put
*/
__be32
-nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
- loff_t offset, struct kvec *vec, int vlen, unsigned long *cnt,
- int *stablep)
+nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
+ struct kvec *vec, int vlen, unsigned long *cnt, int *stablep)
{
- __be32 err = 0;
-
- if (file) {
- err = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
- NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE);
- if (err)
- goto out;
- err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt,
- stablep);
- } else {
- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
- if (err)
- goto out;
-
- if (cnt)
- err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen,
- cnt, stablep);
- fput(file);
+ __be32 err;
+ struct nfsd_file *nf;
+
+ err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
+ if (err == nfs_ok) {
+ err = nfsd_vfs_write(rqstp, fhp, nf->nf_file, offset, vec,
+ vlen, cnt, stablep);
+ nfsd_file_put(nf);
}
-out:
return err;
}

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index b3beb896b08d..80692e06302d 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -81,7 +81,7 @@ __be32 nfsd_readv(struct file *, loff_t, struct kvec *, int,
unsigned long *);
__be32 nfsd_read(struct svc_rqst *, struct svc_fh *,
loff_t, struct kvec *, int, unsigned long *);
-__be32 nfsd_write(struct svc_rqst *, struct svc_fh *,struct file *,
+__be32 nfsd_write(struct svc_rqst *, struct svc_fh *,
loff_t, struct kvec *,int, unsigned long *, int *);
__be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file, loff_t offset,
--
2.4.3


2015-10-20 17:34:11

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 13/19] nfsd: hook up nfsd_read to the nfsd_file cache

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/vfs.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index cb1ab146246b..050db266ef80 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1008,20 +1008,15 @@ out_nfserr:
__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
{
- struct file *file;
- struct raparms *ra;
- __be32 err;
-
- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
- if (err)
- return err;
-
- ra = nfsd_init_raparms(file);
- err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
- if (ra)
- nfsd_put_raparams(file, ra);
- fput(file);
+ __be32 err;
+ struct nfsd_file *nf;

+ err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
+ if (err == nfs_ok) {
+ err = nfsd_vfs_read(rqstp, nf->nf_file, offset, vec, vlen,
+ count);
+ nfsd_file_put(nf);
+ }
return err;
}

--
2.4.3


2015-10-20 17:34:15

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 18/19] nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache

Have nfs4_preprocess_stateid_op pass back a nfsd_file instead of a filp.
Since we now presume that the struct file will be persistent in most
cases, we can stop fiddling with the raparms in the read code. This
also means that we don't really care about the rd_tmp_file field
anymore.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4proc.c | 32 ++++++++++++++++----------------
fs/nfsd/nfs4state.c | 24 ++++++++++--------------
fs/nfsd/nfs4xdr.c | 16 +++++-----------
fs/nfsd/state.h | 2 +-
fs/nfsd/xdr4.h | 15 +++++++--------
5 files changed, 39 insertions(+), 50 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a9f096c7e99f..7e21763a35f2 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -758,7 +758,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
{
__be32 status;

- read->rd_filp = NULL;
+ read->rd_nf = NULL;
if (read->rd_offset >= OFFSET_MAX)
return nfserr_inval;

@@ -775,7 +775,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,

/* check stateid */
status = nfs4_preprocess_stateid_op(rqstp, cstate, &read->rd_stateid,
- RD_STATE, &read->rd_filp, &read->rd_tmp_file);
+ RD_STATE, &read->rd_nf);
if (status) {
dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
goto out;
@@ -921,7 +921,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,

if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
status = nfs4_preprocess_stateid_op(rqstp, cstate,
- &setattr->sa_stateid, WR_STATE, NULL, NULL);
+ &setattr->sa_stateid, WR_STATE, NULL);
if (status) {
dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
return status;
@@ -977,7 +977,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfsd4_write *write)
{
stateid_t *stateid = &write->wr_stateid;
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;
__be32 status = nfs_ok;
unsigned long cnt;
int nvecs;
@@ -986,7 +986,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfserr_inval;

status = nfs4_preprocess_stateid_op(rqstp, cstate, stateid, WR_STATE,
- &filp, NULL);
+ &nf);
if (status) {
dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
return status;
@@ -999,10 +999,10 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
nvecs = fill_in_write_vector(rqstp->rq_vec, write);
WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec));

- status = nfsd_vfs_write(rqstp, &cstate->current_fh, filp,
+ status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf->nf_file,
write->wr_offset, rqstp->rq_vec, nvecs, &cnt,
&write->wr_how_written);
- fput(filp);
+ nfsd_file_put(nf);

write->wr_bytes_written = cnt;

@@ -1014,21 +1014,21 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfsd4_fallocate *fallocate, int flags)
{
__be32 status = nfserr_notsupp;
- struct file *file;
+ struct nfsd_file *nf;

status = nfs4_preprocess_stateid_op(rqstp, cstate,
&fallocate->falloc_stateid,
- WR_STATE, &file, NULL);
+ WR_STATE, &nf);
if (status != nfs_ok) {
dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
return status;
}

- status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, file,
+ status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file,
fallocate->falloc_offset,
fallocate->falloc_length,
flags);
- fput(file);
+ nfsd_file_put(nf);
return status;
}

@@ -1053,11 +1053,11 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
{
int whence;
__be32 status;
- struct file *file;
+ struct nfsd_file *nf;

status = nfs4_preprocess_stateid_op(rqstp, cstate,
&seek->seek_stateid,
- RD_STATE, &file, NULL);
+ RD_STATE, &nf);
if (status) {
dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
return status;
@@ -1079,14 +1079,14 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
* Note: This call does change file->f_pos, but nothing in NFSD
* should ever file->f_pos.
*/
- seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
+ seek->seek_pos = vfs_llseek(nf->nf_file, seek->seek_offset, whence);
if (seek->seek_pos < 0)
status = nfserrno(seek->seek_pos);
- else if (seek->seek_pos >= i_size_read(file_inode(file)))
+ else if (seek->seek_pos >= i_size_read(file_inode(nf->nf_file)))
seek->seek_eof = true;

out:
- fput(file);
+ nfsd_file_put(nf);
return status;
}

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4d58a6db6e41..477fe34567d0 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4658,7 +4658,7 @@ nfs4_check_olstateid(struct svc_fh *fhp, struct nfs4_ol_stateid *ols, int flags)

static __be32
nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
- struct file **filpp, bool *tmp_file, int flags)
+ struct nfsd_file **nfp, int flags)
{
int acc = (flags & RD_STATE) ? NFSD_MAY_READ : NFSD_MAY_WRITE;
struct nfsd_file *nf;
@@ -4668,20 +4668,18 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
if (nf) {
status = nfsd_permission(rqstp, fhp->fh_export, fhp->fh_dentry,
acc | NFSD_MAY_OWNER_OVERRIDE);
- if (status)
+ if (status) {
+ nfsd_file_put(nf);
goto out;
+ }
} else {
status = nfsd_file_acquire(rqstp, fhp, acc, &nf);
if (status)
return status;
-
- if (tmp_file)
- *tmp_file = true;
}

- *filpp = get_file(nf->nf_file);
+ *nfp = nf;
out:
- nfsd_file_put(nf);
return status;
}

@@ -4691,7 +4689,7 @@ out:
__be32
nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate, stateid_t *stateid,
- int flags, struct file **filpp, bool *tmp_file)
+ int flags, struct nfsd_file **nfp)
{
struct svc_fh *fhp = &cstate->current_fh;
struct inode *ino = d_inode(fhp->fh_dentry);
@@ -4700,10 +4698,8 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
struct nfs4_stid *s = NULL;
__be32 status;

- if (filpp)
- *filpp = NULL;
- if (tmp_file)
- *tmp_file = false;
+ if (nfp)
+ *nfp = NULL;

if (grace_disallows_io(net, ino))
return nfserr_grace;
@@ -4740,8 +4736,8 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
status = nfs4_check_fh(fhp, s);

done:
- if (!status && filpp)
- status = nfs4_check_file(rqstp, fhp, s, filpp, tmp_file, flags);
+ if (status == nfs_ok && nfp)
+ status = nfs4_check_file(rqstp, fhp, s, nfp, flags);
out:
if (s)
nfs4_put_stid(s);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 325521ce389a..92e5e8f884d0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -49,6 +49,7 @@
#include "cache.h"
#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"

#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
#include <linux/security.h>
@@ -3460,14 +3461,14 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
{
unsigned long maxcount;
struct xdr_stream *xdr = &resp->xdr;
- struct file *file = read->rd_filp;
+ struct file *file;
int starting_len = xdr->buf->len;
- struct raparms *ra = NULL;
__be32 *p;

if (nfserr)
goto out;

+ file = read->rd_nf->nf_file;
p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
if (!p) {
WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags));
@@ -3487,24 +3488,17 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
(xdr->buf->buflen - xdr->buf->len));
maxcount = min_t(unsigned long, maxcount, read->rd_length);

- if (read->rd_tmp_file)
- ra = nfsd_init_raparms(file);
-
if (file->f_op->splice_read &&
test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
else
nfserr = nfsd4_encode_readv(resp, read, file, maxcount);

- if (ra)
- nfsd_put_raparams(file, ra);
-
if (nfserr)
xdr_truncate_encode(xdr, starting_len);
-
out:
- if (file)
- fput(file);
+ if (read->rd_nf)
+ nfsd_file_put(read->rd_nf);
return nfserr;
}

diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index ee23de10663c..732fbb3b5ef1 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -579,7 +579,7 @@ struct nfsd_net;

extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate, stateid_t *stateid,
- int flags, struct file **filp, bool *tmp_file);
+ int flags, struct nfsd_file **filp);
__be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
stateid_t *stateid, unsigned char typemask,
struct nfs4_stid **s, struct nfsd_net *nn);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index ce7362c88b48..c167d7a5b0e6 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -268,15 +268,14 @@ struct nfsd4_open_downgrade {


struct nfsd4_read {
- stateid_t rd_stateid; /* request */
- u64 rd_offset; /* request */
- u32 rd_length; /* request */
- int rd_vlen;
- struct file *rd_filp;
- bool rd_tmp_file;
+ stateid_t rd_stateid; /* request */
+ u64 rd_offset; /* request */
+ u32 rd_length; /* request */
+ int rd_vlen;
+ struct nfsd_file *rd_nf;

- struct svc_rqst *rd_rqstp; /* response */
- struct svc_fh * rd_fhp; /* response */
+ struct svc_rqst *rd_rqstp; /* response */
+ struct svc_fh *rd_fhp; /* response */
};

struct nfsd4_readdir {
--
2.4.3


2015-10-20 17:34:17

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 19/19] nfsd: rip out the raparms cache

Nothing uses it anymore.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfssvc.c | 14 +-----
fs/nfsd/vfs.c | 147 -------------------------------------------------------
fs/nfsd/vfs.h | 6 ---
3 files changed, 1 insertion(+), 166 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d816bb3faa6e..d1034d119afb 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -216,18 +216,9 @@ static int nfsd_startup_generic(int nrservs)
if (nfsd_users++)
return 0;

- /*
- * Readahead param cache - will no-op if it already exists.
- * (Note therefore results will be suboptimal if number of
- * threads is modified after nfsd start.)
- */
- ret = nfsd_racache_init(2*nrservs);
- if (ret)
- goto dec_users;
-
ret = nfsd_file_cache_init();
if (ret)
- goto out_racache;
+ goto dec_users;

ret = nfs4_state_start();
if (ret)
@@ -236,8 +227,6 @@ static int nfsd_startup_generic(int nrservs)

out_file_cache:
nfsd_file_cache_shutdown();
-out_racache:
- nfsd_racache_shutdown();
dec_users:
nfsd_users--;
return ret;
@@ -250,7 +239,6 @@ static void nfsd_shutdown_generic(void)

nfs4_state_shutdown();
nfsd_file_cache_shutdown();
- nfsd_racache_shutdown();
}

static bool nfsd_needs_lockd(void)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 571f1000e670..aafb9b00b767 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -46,34 +46,6 @@

#define NFSDDBG_FACILITY NFSDDBG_FILEOP

-
-/*
- * This is a cache of readahead params that help us choose the proper
- * readahead strategy. Initially, we set all readahead parameters to 0
- * and let the VFS handle things.
- * If you increase the number of cached files very much, you'll need to
- * add a hash table here.
- */
-struct raparms {
- struct raparms *p_next;
- unsigned int p_count;
- ino_t p_ino;
- dev_t p_dev;
- int p_set;
- struct file_ra_state p_ra;
- unsigned int p_hindex;
-};
-
-struct raparm_hbucket {
- struct raparms *pb_head;
- spinlock_t pb_lock;
-} ____cacheline_aligned_in_smp;
-
-#define RAPARM_HASH_BITS 4
-#define RAPARM_HASH_SIZE (1<<RAPARM_HASH_BITS)
-#define RAPARM_HASH_MASK (RAPARM_HASH_SIZE-1)
-static struct raparm_hbucket raparm_hash[RAPARM_HASH_SIZE];
-
/*
* Called from nfsd_lookup and encode_dirent. Check if we have crossed
* a mount point.
@@ -749,65 +721,6 @@ nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
return err;
}

-struct raparms *
-nfsd_init_raparms(struct file *file)
-{
- struct inode *inode = file_inode(file);
- dev_t dev = inode->i_sb->s_dev;
- ino_t ino = inode->i_ino;
- struct raparms *ra, **rap, **frap = NULL;
- int depth = 0;
- unsigned int hash;
- struct raparm_hbucket *rab;
-
- hash = jhash_2words(dev, ino, 0xfeedbeef) & RAPARM_HASH_MASK;
- rab = &raparm_hash[hash];
-
- spin_lock(&rab->pb_lock);
- for (rap = &rab->pb_head; (ra = *rap); rap = &ra->p_next) {
- if (ra->p_ino == ino && ra->p_dev == dev)
- goto found;
- depth++;
- if (ra->p_count == 0)
- frap = rap;
- }
- depth = nfsdstats.ra_size;
- if (!frap) {
- spin_unlock(&rab->pb_lock);
- return NULL;
- }
- rap = frap;
- ra = *frap;
- ra->p_dev = dev;
- ra->p_ino = ino;
- ra->p_set = 0;
- ra->p_hindex = hash;
-found:
- if (rap != &rab->pb_head) {
- *rap = ra->p_next;
- ra->p_next = rab->pb_head;
- rab->pb_head = ra;
- }
- ra->p_count++;
- nfsdstats.ra_depth[depth*10/nfsdstats.ra_size]++;
- spin_unlock(&rab->pb_lock);
-
- if (ra->p_set)
- file->f_ra = ra->p_ra;
- return ra;
-}
-
-void nfsd_put_raparams(struct file *file, struct raparms *ra)
-{
- struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
-
- spin_lock(&rab->pb_lock);
- ra->p_ra = file->f_ra;
- ra->p_set = 1;
- ra->p_count--;
- spin_unlock(&rab->pb_lock);
-}
-
/*
* Grab and keep cached pages associated with a file in the svc_rqst
* so that they can be passed to the network sendmsg/sendpage routines
@@ -2017,63 +1930,3 @@ nfsd_permission(struct svc_rqst *rqstp, struct svc_export *exp,

return err? nfserrno(err) : 0;
}
-
-void
-nfsd_racache_shutdown(void)
-{
- struct raparms *raparm, *last_raparm;
- unsigned int i;
-
- dprintk("nfsd: freeing readahead buffers.\n");
-
- for (i = 0; i < RAPARM_HASH_SIZE; i++) {
- raparm = raparm_hash[i].pb_head;
- while(raparm) {
- last_raparm = raparm;
- raparm = raparm->p_next;
- kfree(last_raparm);
- }
- raparm_hash[i].pb_head = NULL;
- }
-}
-/*
- * Initialize readahead param cache
- */
-int
-nfsd_racache_init(int cache_size)
-{
- int i;
- int j = 0;
- int nperbucket;
- struct raparms **raparm = NULL;
-
-
- if (raparm_hash[0].pb_head)
- return 0;
- nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
- nperbucket = max(2, nperbucket);
- cache_size = nperbucket * RAPARM_HASH_SIZE;
-
- dprintk("nfsd: allocating %d readahead buffers.\n", cache_size);
-
- for (i = 0; i < RAPARM_HASH_SIZE; i++) {
- spin_lock_init(&raparm_hash[i].pb_lock);
-
- raparm = &raparm_hash[i].pb_head;
- for (j = 0; j < nperbucket; j++) {
- *raparm = kzalloc(sizeof(struct raparms), GFP_KERNEL);
- if (!*raparm)
- goto out_nomem;
- raparm = &(*raparm)->p_next;
- }
- *raparm = NULL;
- }
-
- nfsdstats.ra_size = cache_size;
- return 0;
-
-out_nomem:
- dprintk("nfsd: kmalloc failed, freeing readahead buffers\n");
- nfsd_racache_shutdown();
- return -ENOMEM;
-}
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 80692e06302d..1efccde4bf9a 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -39,8 +39,6 @@
typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned);

/* nfsd/vfs.c */
-int nfsd_racache_init(int);
-void nfsd_racache_shutdown(void);
int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
struct svc_export **expp);
__be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *,
@@ -74,7 +72,6 @@ __be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
int, struct file **);
__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t,
int, struct file **);
-struct raparms;
__be32 nfsd_splice_read(struct svc_rqst *,
struct file *, loff_t, unsigned long *);
__be32 nfsd_readv(struct file *, loff_t, struct kvec *, int,
@@ -107,9 +104,6 @@ __be32 nfsd_statfs(struct svc_rqst *, struct svc_fh *,
__be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
struct dentry *, int);

-struct raparms *nfsd_init_raparms(struct file *file);
-void nfsd_put_raparams(struct file *file, struct raparms *ra);
-
static inline int fh_want_write(struct svc_fh *fh)
{
int ret = mnt_want_write(fh->fh_export->ex_path.mnt);
--
2.4.3


2015-10-20 17:34:12

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 15/19] nfsd: convert nfs4_file->fi_fds array to use nfsd_files

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 23 ++++++++++++-----------
fs/nfsd/state.h | 2 +-
2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2b73e2885a82..b72fa6816860 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -49,6 +49,7 @@

#include "netns.h"
#include "pnfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_PROC

@@ -292,7 +293,7 @@ static struct file *
__nfs4_get_fd(struct nfs4_file *f, int oflag)
{
if (f->fi_fds[oflag])
- return get_file(f->fi_fds[oflag]);
+ return get_file(f->fi_fds[oflag]->nf_file);
return NULL;
}

@@ -449,17 +450,17 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag)
might_lock(&fp->fi_lock);

if (atomic_dec_and_lock(&fp->fi_access[oflag], &fp->fi_lock)) {
- struct file *f1 = NULL;
- struct file *f2 = NULL;
+ struct nfsd_file *f1 = NULL;
+ struct nfsd_file *f2 = NULL;

swap(f1, fp->fi_fds[oflag]);
if (atomic_read(&fp->fi_access[1 - oflag]) == 0)
swap(f2, fp->fi_fds[O_RDWR]);
spin_unlock(&fp->fi_lock);
if (f1)
- fput(f1);
+ nfsd_file_put(f1);
if (f2)
- fput(f2);
+ nfsd_file_put(f2);
}
}

@@ -3841,7 +3842,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,
struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp,
struct nfsd4_open *open)
{
- struct file *filp = NULL;
+ struct nfsd_file *nf = NULL;
__be32 status;
int oflag = nfs4_access_to_omode(open->op_share_access);
int access = nfs4_access_to_access(open->op_share_access);
@@ -3877,18 +3878,18 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp,

if (!fp->fi_fds[oflag]) {
spin_unlock(&fp->fi_lock);
- status = nfsd_open(rqstp, cur_fh, S_IFREG, access, &filp);
+ status = nfsd_file_acquire(rqstp, cur_fh, access, &nf);
if (status)
goto out_put_access;
spin_lock(&fp->fi_lock);
if (!fp->fi_fds[oflag]) {
- fp->fi_fds[oflag] = filp;
- filp = NULL;
+ fp->fi_fds[oflag] = nf;
+ nf = NULL;
}
}
spin_unlock(&fp->fi_lock);
- if (filp)
- fput(filp);
+ if (nf)
+ nfsd_file_put(nf);

status = nfsd4_truncate(rqstp, cur_fh, open);
if (status)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 77fdf4de91ba..473faa436e07 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -491,7 +491,7 @@ struct nfs4_file {
};
struct list_head fi_clnt_odstate;
/* One each for O_RDONLY, O_WRONLY, O_RDWR: */
- struct file * fi_fds[3];
+ struct nfsd_file *fi_fds[3];
/*
* Each open or lock stateid contributes 0-4 to the counts
* below depending on which bits are set in st_access_bitmap:
--
2.4.3


2015-10-20 17:34:11

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v6 14/19] nfsd: hook nfsd_commit up to the nfsd_file cache

Use cached filps if possible instead of opening a new one every time.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/vfs.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 050db266ef80..571f1000e670 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1055,9 +1055,9 @@ __be32
nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
loff_t offset, unsigned long count)
{
- struct file *file;
- loff_t end = LLONG_MAX;
- __be32 err = nfserr_inval;
+ struct nfsd_file *nf;
+ loff_t end = LLONG_MAX;
+ __be32 err = nfserr_inval;

if (offset < 0)
goto out;
@@ -1067,12 +1067,12 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
goto out;
}

- err = nfsd_open(rqstp, fhp, S_IFREG,
- NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &file);
+ err = nfsd_file_acquire(rqstp, fhp,
+ NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
if (err)
goto out;
if (EX_ISSYNC(fhp->fh_export)) {
- int err2 = vfs_fsync_range(file, offset, end, 0);
+ int err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);

if (err2 != -EINVAL)
err = nfserrno(err2);
@@ -1080,7 +1080,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
err = nfserr_notsupp;
}

- fput(file);
+ nfsd_file_put(nf);
out:
return err;
}
--
2.4.3


2015-10-21 15:51:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v6 01/19] nfsd: move include of state.h from trace.c to trace.h

On Tue, Oct 20, 2015 at 01:33:34PM -0400, Jeff Layton wrote:
> Any file which includes trace.h will need to include state.h, even if
> they aren't using any state tracepoints. Ensure that we include any
> headers that might be needed in trace.h instead of relying on the
> *.c files to have the right ones.

Applied, thanks.--b.

>
> Signed-off-by: Jeff Layton <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> ---
> fs/nfsd/trace.c | 2 --
> fs/nfsd/trace.h | 2 ++
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/trace.c b/fs/nfsd/trace.c
> index 82f89070594c..90967466a1e5 100644
> --- a/fs/nfsd/trace.c
> +++ b/fs/nfsd/trace.c
> @@ -1,5 +1,3 @@
>
> -#include "state.h"
> -
> #define CREATE_TRACE_POINTS
> #include "trace.h"
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index c668520c344b..0befe762762b 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -9,6 +9,8 @@
>
> #include <linux/tracepoint.h>
>
> +#include "state.h"
> +
> DECLARE_EVENT_CLASS(nfsd_stateid_class,
> TP_PROTO(stateid_t *stp),
> TP_ARGS(stp),
> --
> 2.4.3

2015-10-21 15:57:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v6 00/19] nfsd: open file caching

I haven't reviewed this in detail yet, sorry.

I seem to recall that a filehandle cache was something Christoph (cc'd)
lobbied for for a long time, but I forget the issue--did getting rid of
the raparms cache allow some sort of further vfs cleanup?

But I don't know that this is justifiable just as cleanup, so some
understanding of the performance impact would be useful too.

--b.

On Tue, Oct 20, 2015 at 01:33:33PM -0400, Jeff Layton wrote:
> v6:
> - rename delayed_fput infrastructure to global_fput
> - drop list_lru_rotate patch, rework LRU list code to use LRU_ROTATE
> and a new NFSD_FILE_REFERENCED flag
> - rework fsnotify_mark handling to be done with separate allocation
> - bugfixes
>
> v5:
> - switch to using flush_delayed_fput instead of __fput_sync
> - hash on inode->i_ino instead of inode pointer
> - add /proc/fs/nfsd/file_cache_stats file to track stats on the hash
> - eliminate extra fh_verify in nfsd_file_acquire
>
> v4:
> - squash some of the patches down into one patch to reduce churn
> - close cached open files after unlink instead of before
> - don't just close files after nfsd does an unlink, must do it
> after any vfs-layer unlink. Use fsnotify to handle that.
> - use a SRCU notifier chain for setlease
> - add patch to allow non-kthreads to do a fput_sync
>
> v3:
> - open files are now hashed on inode pointer instead of fh
> - eliminate the recurring workqueue job in favor of shrinker/LRU and
> notifier from lease setting code
> - have nfsv4 use the cache as well
> - removal of raparms cache
>
> v2:
> - changelog cleanups and clarifications
> - allow COMMIT to use cached open files
> - tracepoints for nfsd_file cache
> - proactively close open files prior to REMOVE, or a RENAME over a
> positive dentry
>
> This is the sixth posting of this patchset. For those just tuning in
> now, the basic idea here is to allow nfsd to keep a cache of open
> file descriptions, primarily to help NFSv3 workloads but also to clean
> up some nastiness in the NFSv4 file handling.
>
> Of course, we can't keep files open indefinitely, so much of the new
> infrastructure is geared toward allowing the files to be closed on
> demand, in the following cases:
>
> - nfsd_files are released whenever the inode's link count goes to zero
> - they are released when userspace wants to set a lease on the inode
> - when a shrinker callback occurs for the nfsd_file cache
> - when filesystems are unexported or nfsd is shut down, we clean
> out the cache
>
> This allows nfsd to keep the file open basically indefinitely. The
> tricky part is the setlease case. To handle that properly we have to
> allow userland processes to queue the fputs involved to a workqueue.
> This means allowing non-kthreads to queue fputs to the delayed_fput
> infrastructure.
>
> Al, can you look and weigh in? Do the delayed_fput/global_fput
> changes look reasonable?
>
> Jeff Layton (19):
> nfsd: move include of state.h from trace.c to trace.h
> fs: have flush_delayed_fput flush the workqueue job
> fs: add a kerneldoc header to fput
> fs: rename "delayed_fput" infrastructure to "fput_global"
> fs: add fput_global
> fsnotify: export several symbols
> locks: create a new notifier chain for lease attempts
> sunrpc: add a new cache_detail operation for when a cache is flushed
> nfsd: add a new struct file caching facility to nfsd
> nfsd: keep some rudimentary stats on nfsd_file cache
> nfsd: allow filecache open to skip fh_verify check
> nfsd: hook up nfsd_write to the new nfsd_file cache
> nfsd: hook up nfsd_read to the nfsd_file cache
> nfsd: hook nfsd_commit up to the nfsd_file cache
> nfsd: convert nfs4_file->fi_fds array to use nfsd_files
> nfsd: have nfsd_test_lock use the nfsd_file cache
> nfsd: convert fi_deleg_file and ls_file fields to nfsd_file
> nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
> nfsd: rip out the raparms cache
>
> fs/file_table.c | 94 ++++--
> fs/locks.c | 37 +++
> fs/nfsd/Kconfig | 2 +
> fs/nfsd/Makefile | 3 +-
> fs/nfsd/export.c | 14 +
> fs/nfsd/filecache.c | 712 +++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/filecache.h | 44 +++
> fs/nfsd/nfs3proc.c | 2 +-
> fs/nfsd/nfs4layouts.c | 12 +-
> fs/nfsd/nfs4proc.c | 32 +-
> fs/nfsd/nfs4state.c | 174 +++++------
> fs/nfsd/nfs4xdr.c | 16 +-
> fs/nfsd/nfsctl.c | 10 +
> fs/nfsd/nfsproc.c | 2 +-
> fs/nfsd/nfssvc.c | 16 +-
> fs/nfsd/state.h | 10 +-
> fs/nfsd/trace.c | 2 -
> fs/nfsd/trace.h | 137 +++++++++
> fs/nfsd/vfs.c | 269 ++++------------
> fs/nfsd/vfs.h | 11 +-
> fs/nfsd/xdr4.h | 15 +-
> fs/notify/group.c | 2 +
> fs/notify/inode_mark.c | 1 +
> fs/notify/mark.c | 4 +
> include/linux/file.h | 3 +-
> include/linux/fs.h | 1 +
> include/linux/sunrpc/cache.h | 1 +
> init/main.c | 2 +-
> net/sunrpc/cache.c | 3 +
> 29 files changed, 1249 insertions(+), 382 deletions(-)
> create mode 100644 fs/nfsd/filecache.c
> create mode 100644 fs/nfsd/filecache.h
>
> --
> 2.4.3

2015-10-22 21:19:30

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v6 00/19] nfsd: open file caching

Looks like there's a leak--is this something you've seen already?

This is on my current nfsd-next, which has some other stuff too.

--b.

[ 819.980697] kmem_cache_destroy nfsd_file_mark: Slab cache still has objects
[ 819.981326] CPU: 0 PID: 4360 Comm: nfsd Not tainted 4.3.0-rc3-00040-ga6bca98 #360
[ 819.981969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
[ 819.982805] ffff8800738d7d30 ffff8800738d7d20 ffffffff816053ac ffff880051ee5540
[ 819.983803] ffff8800738d7d58 ffffffff811813df ffff8800738d7d30 ffff8800738d7d30
[ 819.984782] ffff880074fd5e00 ffffffff822f9c80 ffff88007c64cf80 ffff8800738d7d68
[ 819.985751] Call Trace:
[ 819.985940] [<ffffffff816053ac>] dump_stack+0x4e/0x82
[ 819.986369] [<ffffffff811813df>] kmem_cache_destroy+0xef/0x100
[ 819.986899] [<ffffffffa00c2198>] nfsd_file_cache_shutdown+0x78/0xa0 [nfsd]
[ 819.987513] [<ffffffffa00b2c4d>] nfsd_shutdown_generic+0x1d/0x20 [nfsd]
[ 819.988100] [<ffffffffa00b2d2d>] nfsd_shutdown_net+0xdd/0x180 [nfsd]
[ 819.988656] [<ffffffffa00b2c55>] ? nfsd_shutdown_net+0x5/0x180 [nfsd]
[ 819.989218] [<ffffffffa00b2f34>] nfsd_last_thread+0x164/0x190 [nfsd]
[ 819.989770] [<ffffffffa00b2dd5>] ? nfsd_last_thread+0x5/0x190 [nfsd]
[ 819.990328] [<ffffffffa001463e>] svc_shutdown_net+0x2e/0x40 [sunrpc]
[ 819.990996] [<ffffffffa00b3936>] nfsd_destroy+0xd6/0x190 [nfsd]
[ 819.991719] [<ffffffffa00b3865>] ? nfsd_destroy+0x5/0x190 [nfsd]
[ 819.992373] [<ffffffffa00b3bb1>] nfsd+0x1c1/0x280 [nfsd]
[ 819.992960] [<ffffffffa00b39f5>] ? nfsd+0x5/0x280 [nfsd]
[ 819.993537] [<ffffffffa00b39f0>] ? nfsd_destroy+0x190/0x190 [nfsd]
[ 819.994195] [<ffffffff81098d6f>] kthread+0xef/0x110
[ 819.994734] [<ffffffff81a7677c>] ? _raw_spin_unlock_irq+0x2c/0x50
[ 819.995439] [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
[ 819.996129] [<ffffffff81a7744f>] ret_from_fork+0x3f/0x70
[ 819.996706] [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
[ 819.998854] nfsd: last server has exited, flushing export cache
[ 820.195957] NFSD: starting 20-second grace period (net ffffffff822f9c80)


2015-10-23 00:21:13

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v6 00/19] nfsd: open file caching

On Thu, 22 Oct 2015 17:19:28 -0400
"J. Bruce Fields" <[email protected]> wrote:

> Looks like there's a leak--is this something you've seen already?
>
> This is on my current nfsd-next, which has some other stuff too.
>
> --b.
>
> [ 819.980697] kmem_cache_destroy nfsd_file_mark: Slab cache still has objects
> [ 819.981326] CPU: 0 PID: 4360 Comm: nfsd Not tainted 4.3.0-rc3-00040-ga6bca98 #360
> [ 819.981969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> [ 819.982805] ffff8800738d7d30 ffff8800738d7d20 ffffffff816053ac ffff880051ee5540
> [ 819.983803] ffff8800738d7d58 ffffffff811813df ffff8800738d7d30 ffff8800738d7d30
> [ 819.984782] ffff880074fd5e00 ffffffff822f9c80 ffff88007c64cf80 ffff8800738d7d68
> [ 819.985751] Call Trace:
> [ 819.985940] [<ffffffff816053ac>] dump_stack+0x4e/0x82
> [ 819.986369] [<ffffffff811813df>] kmem_cache_destroy+0xef/0x100
> [ 819.986899] [<ffffffffa00c2198>] nfsd_file_cache_shutdown+0x78/0xa0 [nfsd]
> [ 819.987513] [<ffffffffa00b2c4d>] nfsd_shutdown_generic+0x1d/0x20 [nfsd]
> [ 819.988100] [<ffffffffa00b2d2d>] nfsd_shutdown_net+0xdd/0x180 [nfsd]
> [ 819.988656] [<ffffffffa00b2c55>] ? nfsd_shutdown_net+0x5/0x180 [nfsd]
> [ 819.989218] [<ffffffffa00b2f34>] nfsd_last_thread+0x164/0x190 [nfsd]
> [ 819.989770] [<ffffffffa00b2dd5>] ? nfsd_last_thread+0x5/0x190 [nfsd]
> [ 819.990328] [<ffffffffa001463e>] svc_shutdown_net+0x2e/0x40 [sunrpc]
> [ 819.990996] [<ffffffffa00b3936>] nfsd_destroy+0xd6/0x190 [nfsd]
> [ 819.991719] [<ffffffffa00b3865>] ? nfsd_destroy+0x5/0x190 [nfsd]
> [ 819.992373] [<ffffffffa00b3bb1>] nfsd+0x1c1/0x280 [nfsd]
> [ 819.992960] [<ffffffffa00b39f5>] ? nfsd+0x5/0x280 [nfsd]
> [ 819.993537] [<ffffffffa00b39f0>] ? nfsd_destroy+0x190/0x190 [nfsd]
> [ 819.994195] [<ffffffff81098d6f>] kthread+0xef/0x110
> [ 819.994734] [<ffffffff81a7677c>] ? _raw_spin_unlock_irq+0x2c/0x50
> [ 819.995439] [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
> [ 819.996129] [<ffffffff81a7744f>] ret_from_fork+0x3f/0x70
> [ 819.996706] [<ffffffff81098c80>] ? kthread_create_on_node+0x200/0x200
> [ 819.998854] nfsd: last server has exited, flushing export cache
> [ 820.195957] NFSD: starting 20-second grace period (net ffffffff822f9c80)
>

Thanks...interesting.

I'll go over the refcounting again to be sure but I suspect that this
might be a race between tearing down the cache and destruction of the
fsnotify marks.

fsnotify marks are destroyed by a dedicated thread that cleans them up
after the srcu grace period settles. That's a bit of a flimsy guarantee
unfortunately. We can use srcu_barrier(), but if the thread hasn't
picked up the list and started destroying yet then that may not help.

I'll look over that code -- maybe it's possible to use call_srcu
instead, which would allow us to use srcu_barrier to wait for them all
to complete.

Thanks!
--
Jeff Layton <[email protected]>