2015-09-07 14:01:13

by Kinglong Mee

[permalink] [raw]
Subject: [PATCH 0/6 v10] NFSD: Pin to vfsmount for nfsd exports cache

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.

# cat /etc/exports
/nfs/xfs *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs *(rw,insecure,no_subtree_check,no_root_squash)
# ll /nfs/
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root 6 Apr 20 22:01 xfs
# mount /dev/sde /nfs/test
# df
Filesystem 1K-blocks Used Available Use% Mounted on
......
/dev/sdd 1038336 32944 1005392 4% /nfs/pnfs
/dev/sdc 10475520 32928 10442592 1% /nfs/xfs
/dev/sde 999320 1284 929224 1% /nfs/test
# mount -t nfs 127.0.0.1:/nfs/ /mnt
# ll /mnt/*/
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
# umount /nfs/test/
umount: /nfs/test/: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

I don't think that's user expect, they want umount /nfs/test/.
Bruce think user can also umount /nfs/pnfs/ and /nfs/xfs.

This patch site lets nfsd exports pinning to vfsmount,
not using mntget, so user can umount any exports mountpoint now.

v3,
1. New helpers path_get_pin/path_put_unpin for path pin.
2. Use kzalloc for allocating memory.

v4, Thanks for Al Viro's commets for the logic of fs_pin.
1. add a completion for pin_kill waiting the reference is decreased to zero.
2. add a work_struct for pin_kill decreases the reference indirectly.
3. free svc_export/svc_expkey in pin_kill, not svc_export_put/svc_expkey_put.
4. svc_export_put/svc_expkey_put go though pin_kill logic.

v5,
let killing fs_pin under a reference of vfsmnt.

v6,
1. revert the change of v5
2. new helper legitimize_mntget() for nfsd exports/expkey cache
get vfsmount from fs_pin
3. cleanup some codes of sunrpc's cache
4. switch using list_head instead of single list for cache_head
in cache_detail
5. new functions validate/invalidate for processing of reference
increase/decrease change (nfsd exports/expkey using grab the
reference of mnt)
6. delete cache_head directly from cache_detail in pin_kill

v7,
implement self reference increase and decrease for nfsd exports/expkey

When reference of cahce_head increase(>1), grab a reference of mnt once.
and reference decrease to 1 (==1), drop the reference of mnt.

v8, Use hash_list for sunrpc cachen and a new method for nfsd's pin,

1. There are only one outlet from each cache, exp_find_key() for expkey,
exp_get_by_name() for export.
2. Any fsid to export or filehandle to export will call the function.
3. exp_get()/exp_put() increase/decrease the reference of export.

Call legitimize_mntget() in the only outlet function exp_find_key()/
exp_get_by_name(), if fail return STALE, otherwise, any valid
expkey/export from the cache is validated (Have get the reference of vfsmnt).

Add mntget() in exp_get() and mntput() in exp_put(), because the export
passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.

For expkey cache,
1. At first, a fsid is passed to exp_find_key, and lookup a cache
in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt
before return from exp_find_key.
3. Any calling exp_find_key with valid cache must put the vfsmnt.

for export cache,
1. At first, a path (returned from exp_find_key) with validate vfsmnt
is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt
before return from exp_get_by_name.
3. Any calling exp_get_by_name with valid cache must put the vfsmnt
by exp_put();
4. Any using the exp returned from exp_get_by_name must call exp_get(),
will increase the reference of vfsmnt.

So that,
a. After getting the reference in 2, any umount of filesystem will get -EBUSY.
b. After put all reference after 4, or before get the reference in 2,
any umount of filesystem will call pin_kill, and delete the cache directly,
also unpin the vfsmount.
c. Between 1 and 2, have get the reference of exp/key cache, with invalidate vfsmnt.
Umount of filesystem only wait exp_find_key/exp_get_by_name put the reference of
cache when legitimize_mntget fail.

v9, thanks for NeilBrown's comments and update
1. Drop the initialing of p->done in the old version,
2. Remove three patches of cleanup sunrpc cache that have applied by Bruce,
3. Update using seqlock in legitimize_mntget instead read_seqlock_excl
4. Update some bugs commented from Neil,
5. New logical of pin_kill from Neil,
"allow pin_remove() to be called other than from ->kill()"

v10,
Fix the bad using of read_seqbegin_or_lock,
and a bad using of rcu_read_lock.

Kinglong Mee (5):
fs_pin: Export functions for specific filesystem
path: New helpers path_get_pin/path_put_unpin for path pin
fs: New helper legitimize_mntget() for getting a legitimize mnt
sunrpc: New helper cache_delete_entry for deleting cache_head directly
nfsd: Allows user un-mounting filesystem where nfsd exports base on

NeilBrown (1):
fs-pin: allow pin_remove() to be called other than from->kill()

fs/fs_pin.c | 22 +++++++++-
fs/namei.c | 26 ++++++++++++
fs/namespace.c | 38 ++++++++++++++++++
fs/nfsd/export.c | 96 +++++++++++++++++++++++++++++++++++---------
fs/nfsd/export.h | 14 ++++++-
include/linux/fs_pin.h | 2 +-
include/linux/mount.h | 1 +
include/linux/path.h | 4 ++
include/linux/sunrpc/cache.h | 1 +
net/sunrpc/cache.c | 20 +++++++++
10 files changed, 202 insertions(+), 22 deletions(-)

--
2.4.3



2015-09-07 14:02:22

by Kinglong Mee

[permalink] [raw]
Subject: [PATCH 1/6 v10] fs-pin: allow pin_remove() to be called other than from->kill()

From: NeilBrown <[email protected]>

fs-pin currently assumes when either the vfsmount or the fs_pin wants
to unpin, pin_kill() will be called.
This requires that the ->kill() function can wait for any transient
references to the fs_pin to be released. If the structure containing
the fs_pin doesn't already have the ability to wait for references,
this can be a burden.

As the fs_pin already has infrastructure for waiting, that can be
leveraged to remove the burden.

In this alternate scenario, only the vfsmount calls pin_kill() when it
wants to unpin. The owner of the fs_pin() instead calls pin_remove().

The ->kill() function removes any long-term references, and then calls
pin_kill() (recursively).
When the last reference on (the structure containing) the fs_pin is
dropped, pin_remove() will be called and the (recursive) pin_kill()
call will complete.

For this to be safe, the final "put" must *not* free the structure if
pin_kill() has already been called, as that could leave ->kill()
accessing freed data.

So we provide a return value for pin_remove() which reports the old
->done value.

When final put calls pin_remove() it checks that value.
If it was 0, then pin_kill() has not called ->kill and will not,
so final put can free the data structure.
If it was -1, then pin_kill() has called ->kill, and ->kill will
free the data structure - final put must not touch it.

This makes the 'wait' infrastructure of fs_pin available to any
pinning client which wants to use it.

Signed-Off-By: NeilBrown <[email protected]>
---
fs/fs_pin.c | 18 +++++++++++++++++-
include/linux/fs_pin.h | 2 +-
2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index 611b540..b7954a9 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -6,16 +6,32 @@

static DEFINE_SPINLOCK(pin_lock);

-void pin_remove(struct fs_pin *pin)
+/**
+ * pin_remove - disconnect an fs_pin from the pinned structure.
+ * @pin: The struct fs_pin which is pinning something.
+ *
+ * Detach a 'pin' which was added by pin_insert(). A return value
+ * of -1 implies that pin_kill() has already been called and that the
+ * ->kill() function now owns the data structure containing @pin.
+ * The function which called pin_remove() must not touch the data structure
+ * again (unless it is the ->kill() function itself).
+ * A return value of 0 implies an uneventful disconnect: pin_kill() has not called,
+ * and will not call, the ->kill() function on this @pin.
+ * Any other return value is a usage error - e.g. repeated call to pin_remove().
+ */
+int pin_remove(struct fs_pin *pin)
{
+ int ret;
spin_lock(&pin_lock);
hlist_del_init(&pin->m_list);
hlist_del_init(&pin->s_list);
spin_unlock(&pin_lock);
spin_lock_irq(&pin->wait.lock);
+ ret = pin->done;
pin->done = 1;
wake_up_locked(&pin->wait);
spin_unlock_irq(&pin->wait.lock);
+ return ret;
}

void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h
index 3886b3b..2fe9d3b 100644
--- a/include/linux/fs_pin.h
+++ b/include/linux/fs_pin.h
@@ -18,7 +18,7 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *))
p->kill = kill;
}

-void pin_remove(struct fs_pin *);
+int pin_remove(struct fs_pin *);
void pin_insert_group(struct fs_pin *, struct vfsmount *, struct hlist_head *);
void pin_insert(struct fs_pin *, struct vfsmount *);
void pin_kill(struct fs_pin *);
--
2.4.3


2015-09-07 14:03:15

by Kinglong Mee

[permalink] [raw]
Subject: [PATCH 2/6 v10] fs_pin: Export functions for specific filesystem

Exports functions for others who want pin to vfsmount,
eg, nfsd's export cache.

These are needed for any module to participate in pinning.

mnt_pin_kill() and group_pin_kill() are just helper-functions for
unmount etc (and are in fs/internal.h) so don't need to be exported.

v10, same as v4
add exporting of pin_kill.

Signed-off-by: Kinglong Mee <[email protected]>
---
fs/fs_pin.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index b7954a9..b1a9654 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -33,6 +33,7 @@ int pin_remove(struct fs_pin *pin)
spin_unlock_irq(&pin->wait.lock);
return ret;
}
+EXPORT_SYMBOL(pin_remove);

void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p)
{
@@ -42,11 +43,13 @@ void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head
hlist_add_head(&pin->m_list, &real_mount(m)->mnt_pins);
spin_unlock(&pin_lock);
}
+EXPORT_SYMBOL(pin_insert_group);

void pin_insert(struct fs_pin *pin, struct vfsmount *m)
{
pin_insert_group(pin, m, &m->mnt_sb->s_pins);
}
+EXPORT_SYMBOL(pin_insert);

void pin_kill(struct fs_pin *p)
{
@@ -88,6 +91,7 @@ void pin_kill(struct fs_pin *p)
}
rcu_read_unlock();
}
+EXPORT_SYMBOL(pin_kill);

void mnt_pin_kill(struct mount *m)
{
--
2.4.3


2015-09-07 14:04:18

by Kinglong Mee

[permalink] [raw]
Subject: [PATCH 3/6 v10] path: New helpers path_get_pin/path_put_unpin for path,pin

Two helpers for filesystem pining to vfsmnt, not mntget.

v9, Update base on NeilBrown's new patch
v10, same as v9

Signed-off-by: Kinglong Mee <[email protected]>
---
fs/namei.c | 26 ++++++++++++++++++++++++++
include/linux/path.h | 4 ++++
2 files changed, 30 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 29b9279..3a5b0eb 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -492,6 +492,32 @@ void path_put(const struct path *path)
}
EXPORT_SYMBOL(path_put);

+/**
+ * path_get_pin - get a reference to a path's dentry
+ * and pin to path's vfsmnt
+ * @path: path to get the reference to
+ * @p: the fs_pin pin to vfsmnt
+ */
+void path_get_pin(struct path *path, struct fs_pin *p)
+{
+ dget(path->dentry);
+ pin_insert_group(p, path->mnt, NULL);
+}
+EXPORT_SYMBOL(path_get_pin);
+
+/**
+ * path_put_unpin - put a reference to a path's dentry
+ * and remove pin to path's vfsmnt
+ * @path: path to put the reference to
+ * @p: the fs_pin removed from vfsmnt
+ */
+int path_put_unpin(struct path *path, struct fs_pin *p)
+{
+ dput(path->dentry);
+ return pin_remove(p);
+}
+EXPORT_SYMBOL(path_put_unpin);
+
#define EMBEDDED_LEVELS 2
struct nameidata {
struct path path;
diff --git a/include/linux/path.h b/include/linux/path.h
index d137218..40d376a 100644
--- a/include/linux/path.h
+++ b/include/linux/path.h
@@ -3,6 +3,7 @@

struct dentry;
struct vfsmount;
+struct fs_pin;

struct path {
struct vfsmount *mnt;
@@ -12,6 +13,9 @@ struct path {
extern void path_get(const struct path *);
extern void path_put(const struct path *);

+extern void path_get_pin(struct path *, struct fs_pin *);
+extern int path_put_unpin(struct path *, struct fs_pin *);
+
static inline int path_equal(const struct path *path1, const struct path *path2)
{
return path1->mnt == path2->mnt && path1->dentry == path2->dentry;
--
2.4.3


2015-09-07 14:05:11

by Kinglong Mee

[permalink] [raw]
Subject: [PATCH 4/6 v10] fs: New helper legitimize_mntget() for getting a legitimize mnt

New helper legitimize_mntget for getting a mnt without setting
MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED, otherwise return NULL.

v9, Update using read_seqbegin_or_lock instead read_seqlock_excl
v10, fix bad logical of using seqlock

Signed-off-by: Kinglong Mee <[email protected]>
---
fs/namespace.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/mount.h | 1 +
2 files changed, 39 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0570729..6377672 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1153,6 +1153,44 @@ struct vfsmount *mntget(struct vfsmount *mnt)
}
EXPORT_SYMBOL(mntget);

+struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt)
+{
+ struct mount *mnt;
+ unsigned seq = 0;
+
+ if (vfsmnt == NULL)
+ return NULL;
+
+ rcu_read_lock();
+retry:
+ read_seqbegin_or_lock(&mount_lock, &seq);
+
+ if (vfsmnt->mnt_flags & (MNT_SYNC_UMOUNT | MNT_UMOUNT | MNT_DOOMED))
+ vfsmnt = NULL;
+ else {
+ mnt = real_mount(vfsmnt);
+ mnt_add_count(mnt, 1);
+ if (need_seqretry(&mount_lock, seq)) {
+ /* lost the race, need to try again */
+ if (vfsmnt->mnt_flags & MNT_SYNC_UMOUNT) {
+ /* no point trying... */
+ mnt_add_count(mnt, -1);
+ vfsmnt = NULL;
+ } else {
+ mntput(vfsmnt);
+ seq = 1;
+ goto retry;
+ }
+ }
+ }
+
+ done_seqretry(&mount_lock, seq);
+ rcu_read_unlock();
+
+ return vfsmnt;
+}
+EXPORT_SYMBOL(legitimize_mntget);
+
struct vfsmount *mnt_clone_internal(struct path *path)
{
struct mount *p;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index f822c3c..8ae9dc0 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -79,6 +79,7 @@ extern void mnt_drop_write(struct vfsmount *mnt);
extern void mnt_drop_write_file(struct file *file);
extern void mntput(struct vfsmount *mnt);
extern struct vfsmount *mntget(struct vfsmount *mnt);
+extern struct vfsmount *legitimize_mntget(struct vfsmount *vfsmnt);
extern struct vfsmount *mnt_clone_internal(struct path *path);
extern int __mnt_is_readonly(struct vfsmount *mnt);

--
2.4.3


2015-09-07 14:05:46

by Kinglong Mee

[permalink] [raw]
Subject: [PATCH 5/6 v10] sunrpc: New helper cache_delete_entry for deleting cache_head directly

A new helper cache_delete_entry() for delete cache_head from
cache_detail directly.

It will be used by pin_kill, so make sure the cache_detail is valid
before deleting is needed.

Because pin_kill is not many times, so the influence of performance
is accepted.

v9, delete duplicate checking of cache_detail
v10, same as v9

Signed-off-by: Kinglong Mee <[email protected]>
---
include/linux/sunrpc/cache.h | 1 +
net/sunrpc/cache.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 03d3b4c..2824db5 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -210,6 +210,7 @@ extern int cache_check(struct cache_detail *detail,
struct cache_head *h, struct cache_req *rqstp);
extern void cache_flush(void);
extern void cache_purge(struct cache_detail *detail);
+extern void cache_delete_entry(struct cache_detail *cd, struct cache_head *h);
#define NEVER (0x7FFFFFFF)
extern void __init cache_initialize(void);
extern int cache_register_net(struct cache_detail *cd, struct net *net);
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 4a2340a..430b5fa 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -454,6 +454,26 @@ static int cache_clean(void)
return rv;
}

+void cache_delete_entry(struct cache_detail *detail, struct cache_head *h)
+{
+ if (!detail || !h)
+ return;
+
+ write_lock(&detail->hash_lock);
+ if (hlist_unhashed(&h->cache_list)) {
+ write_unlock(&detail->hash_lock);
+ return ;
+ }
+
+ hlist_del_init(&h->cache_list);
+ detail->entries--;
+ set_bit(CACHE_CLEANED, &h->flags);
+ write_unlock(&detail->hash_lock);
+
+ cache_put(h, detail);
+}
+EXPORT_SYMBOL_GPL(cache_delete_entry);
+
/*
* We want to regularly clean the cache, so we need to schedule some work ...
*/
--
2.4.3


2015-09-07 14:07:05

by Kinglong Mee

[permalink] [raw]
Subject: [PATCH 6/6 v10] nfsd: Allows user un-mounting filesystem where nfsd exports base on

If there are some mount points(not exported for nfs) under pseudo root,
after client's operation of those entry under the root, anyone *can't*
unmount those mount points until export cache expired.

/nfs/xfs *(rw,insecure,no_subtree_check,no_root_squash)
/nfs/pnfs *(rw,insecure,no_subtree_check,no_root_squash)
total 0
drwxr-xr-x. 3 root root 84 Apr 21 22:27 pnfs
drwxr-xr-x. 3 root root 84 Apr 21 22:27 test
drwxr-xr-x. 2 root root 6 Apr 20 22:01 xfs
Filesystem 1K-blocks Used Available Use% Mounted on
......
/dev/sdd 1038336 32944 1005392 4% /nfs/pnfs
/dev/sdc 10475520 32928 10442592 1% /nfs/xfs
/dev/sde 999320 1284 929224 1% /nfs/test
/mnt/pnfs/:
total 0
-rw-r--r--. 1 root root 0 Apr 21 22:23 attr
drwxr-xr-x. 2 root root 6 Apr 21 22:19 tmp

/mnt/xfs/:
total 0
umount: /nfs/test/: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)

It's caused by exports cache of nfsd holds the reference of
the path (here is /nfs/test/), so, it can't be umounted.

I don't think that's user expect, they want umount /nfs/test/.
Bruce think user can also umount /nfs/pnfs/ and /nfs/xfs.

Also, using kzalloc for all memory allocating without kmalloc.
Thanks for Al Viro's commets for the logic of fs_pin.

v3,
1. using path_get_pin/path_put_unpin for path pin
2. using kzalloc for memory allocating

v5, v4,
1. add a completion for pin_kill waiting the reference is decreased to zero.
2. add a work_struct for pin_kill decreases the reference indirectly.
3. free svc_export/svc_expkey in pin_kill, not svc_export_put/svc_expkey_put.
4. svc_export_put/svc_expkey_put go though pin_kill logic.

v6,
1. Pin vfsmnt to mount point at first, when reference increace (==2),
grab a reference to vfsmnt by mntget. When decreace (==1),
drop the reference to vfsmnt, left pin.
2. Delete cache_head directly from cache_detail.

v7, implement self reference increase and decrease for nfsd exports/expkey

v8, new method as,

1. There are only one outlet from each cache, exp_find_key() for expkey,
exp_get_by_name() for export.
2. Any fsid to export or filehandle to export will call the function.
3. exp_get()/exp_put() increase/decrease the reference of export.

Call legitimize_mntget() in the only outlet function exp_find_key()/
exp_get_by_name(), if fail return STALE, otherwise, any valid
expkey/export from the cache is validated (Have get the reference of vfsmnt).

Add mntget() in exp_get() and mntput() in exp_put(), because the export
passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.

For expkey cache,
1. At first, a fsid is passed to exp_find_key, and lookup a cache
in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt
before return from exp_find_key.
3. Any calling exp_find_key with valid cache must put the vfsmnt.

for export cache,
1. At first, a path (returned from exp_find_key) with validate vfsmnt
is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
2. Then call legitimize_mntget getting a reference of vfsmnt
before return from exp_get_by_name.
3. Any calling exp_get_by_name with valid cache must put the vfsmnt
by exp_put();
4. Any using the exp returned from exp_get_by_name must call exp_get(),
will increase the reference of vfsmnt.

So that,
a. After getting the reference in 2, any umount of filesystem will get -EBUSY.
b. After put all reference after 4, or before get the reference in 2,
any umount of filesystem will call pin_kill, and delete the cache directly,
also unpin the vfsmount.
c. Between 1 and 2, have get the reference of exp/key cache, with invalidate vfsmnt.
Umount of filesystem only wait exp_find_key/exp_get_by_name put the reference
of cache when legitimize_mntget fail.

v9, thanks for NeilBrown's comments and update
1. Fix two string formats of path name.
2. Based on Neil's patch of
"allow pin_remove() to be called other than from ->kill()"

v10, Update two bad using of rcu_read_lock

Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: Kinglong Mee <[email protected]>
---
fs/nfsd/export.c | 96 +++++++++++++++++++++++++++++++++++++++++++++-----------
fs/nfsd/export.h | 14 +++++++--
2 files changed, 90 insertions(+), 20 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index b4d84b5..cd77dea 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -37,15 +37,23 @@
#define EXPKEY_HASHMAX (1 << EXPKEY_HASHBITS)
#define EXPKEY_HASHMASK (EXPKEY_HASHMAX -1)

+static void expkey_destroy(struct svc_expkey *key)
+{
+ auth_domain_put(key->ek_client);
+ kfree_rcu(key, rcu_head);
+}
+
static void expkey_put(struct kref *ref)
{
struct svc_expkey *key = container_of(ref, struct svc_expkey, h.ref);

if (test_bit(CACHE_VALID, &key->h.flags) &&
- !test_bit(CACHE_NEGATIVE, &key->h.flags))
- path_put(&key->ek_path);
- auth_domain_put(key->ek_client);
- kfree(key);
+ !test_bit(CACHE_NEGATIVE, &key->h.flags) &&
+ path_put_unpin(&key->ek_path, &key->ek_pin)) {
+ return ;
+ }
+
+ expkey_destroy(key);
}

static void expkey_request(struct cache_detail *cd,
@@ -119,6 +127,7 @@ static int expkey_parse(struct cache_detail *cd, char *mesg, int mlen)
if (key.h.expiry_time == 0)
goto out;

+ key.cd = cd;
key.ek_client = dom;
key.ek_fsidtype = fsidtype;
memcpy(key.ek_fsid, buf, len);
@@ -181,7 +190,11 @@ static int expkey_show(struct seq_file *m,
if (test_bit(CACHE_VALID, &h->flags) &&
!test_bit(CACHE_NEGATIVE, &h->flags)) {
seq_printf(m, " ");
- seq_path(m, &ek->ek_path, "\\ \t\n");
+ if (legitimize_mntget(ek->ek_path.mnt)) {
+ seq_path(m, &ek->ek_path, "\\ \t\n");
+ mntput(ek->ek_path.mnt);
+ } else
+ seq_printf(m, "Dir-unmounting");
}
seq_printf(m, "\n");
return 0;
@@ -210,6 +223,17 @@ static inline void expkey_init(struct cache_head *cnew,
new->ek_fsidtype = item->ek_fsidtype;

memcpy(new->ek_fsid, item->ek_fsid, sizeof(new->ek_fsid));
+ new->cd = item->cd;
+}
+
+static void expkey_pin_kill(struct fs_pin *pin)
+{
+ struct svc_expkey *key = container_of(pin, struct svc_expkey, ek_pin);
+ cache_delete_entry(key->cd, &key->h);
+ /* Must call pin_kill to wait the last reference be put */
+ rcu_read_lock();
+ pin_kill(&key->ek_pin);
+ expkey_destroy(key);
}

static inline void expkey_update(struct cache_head *cnew,
@@ -218,13 +242,14 @@ static inline void expkey_update(struct cache_head *cnew,
struct svc_expkey *new = container_of(cnew, struct svc_expkey, h);
struct svc_expkey *item = container_of(citem, struct svc_expkey, h);

+ init_fs_pin(&new->ek_pin, expkey_pin_kill);
new->ek_path = item->ek_path;
- path_get(&item->ek_path);
+ path_get_pin(&new->ek_path, &new->ek_pin);
}

static struct cache_head *expkey_alloc(void)
{
- struct svc_expkey *i = kmalloc(sizeof(*i), GFP_KERNEL);
+ struct svc_expkey *i = kzalloc(sizeof(*i), GFP_KERNEL);
if (i)
return &i->h;
else
@@ -306,14 +331,20 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc)
fsloc->locations = NULL;
}

-static void svc_export_put(struct kref *ref)
+static void svc_export_destroy(struct svc_export *exp)
{
- struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
- path_put(&exp->ex_path);
auth_domain_put(exp->ex_client);
nfsd4_fslocs_free(&exp->ex_fslocs);
kfree(exp->ex_uuid);
- kfree(exp);
+ kfree_rcu(exp, rcu_head);
+}
+
+static void svc_export_put(struct kref *ref)
+{
+ struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
+
+ if (!path_put_unpin(&exp->ex_path, &exp->ex_pin))
+ svc_export_destroy(exp);
}

static void svc_export_request(struct cache_detail *cd,
@@ -636,7 +667,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen)
if (expp == NULL)
err = -ENOMEM;
else
- exp_put(expp);
+ cache_put(&expp->h, expp->cd);
out4:
nfsd4_fslocs_free(&exp.ex_fslocs);
kfree(exp.ex_uuid);
@@ -664,7 +695,12 @@ static int svc_export_show(struct seq_file *m,
return 0;
}
exp = container_of(h, struct svc_export, h);
- seq_path(m, &exp->ex_path, " \t\n\\");
+ if (legitimize_mntget(exp->ex_path.mnt)) {
+ seq_path(m, &exp->ex_path, " \t\n\\");
+ mntput(exp->ex_path.mnt);
+ } else
+ seq_printf(m, "Dir-unmounting");
+
seq_putc(m, '\t');
seq_escape(m, exp->ex_client->name, " \t\n\\");
seq_putc(m, '(');
@@ -694,15 +730,26 @@ static int svc_export_match(struct cache_head *a, struct cache_head *b)
path_equal(&orig->ex_path, &new->ex_path);
}

+static void export_pin_kill(struct fs_pin *pin)
+{
+ struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
+ cache_delete_entry(exp->cd, &exp->h);
+ /* Must call pin_kill to wait the last reference be put */
+ rcu_read_lock();
+ pin_kill(&exp->ex_pin);
+ svc_export_destroy(exp);
+}
+
static void svc_export_init(struct cache_head *cnew, struct cache_head *citem)
{
struct svc_export *new = container_of(cnew, struct svc_export, h);
struct svc_export *item = container_of(citem, struct svc_export, h);

+ init_fs_pin(&new->ex_pin, export_pin_kill);
kref_get(&item->ex_client->ref);
new->ex_client = item->ex_client;
new->ex_path = item->ex_path;
- path_get(&item->ex_path);
+ path_get_pin(&new->ex_path, &new->ex_pin);
new->ex_fslocs.locations = NULL;
new->ex_fslocs.locations_count = 0;
new->ex_fslocs.migrated = 0;
@@ -740,7 +787,7 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem)

static struct cache_head *svc_export_alloc(void)
{
- struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL);
+ struct svc_export *i = kzalloc(sizeof(*i), GFP_KERNEL);
if (i)
return &i->h;
else
@@ -809,6 +856,7 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
if (!clp)
return ERR_PTR(-ENOENT);

+ key.cd = cd;
key.ek_client = clp;
key.ek_fsidtype = fsid_type;
memcpy(key.ek_fsid, fsidv, key_len(fsid_type));
@@ -819,6 +867,12 @@ exp_find_key(struct cache_detail *cd, struct auth_domain *clp, int fsid_type,
err = cache_check(cd, &ek->h, reqp);
if (err)
return ERR_PTR(err);
+
+ if (!legitimize_mntget(ek->ek_path.mnt)) {
+ cache_put(&ek->h, ek->cd);
+ return ERR_PTR(-ENOENT);
+ }
+
return ek;
}

@@ -842,6 +896,8 @@ exp_get_by_name(struct cache_detail *cd, struct auth_domain *clp,
err = cache_check(cd, &exp->h, reqp);
if (err)
return ERR_PTR(err);
+
+ mntget(exp->ex_path.mnt);
return exp;
}

@@ -858,6 +914,7 @@ exp_parent(struct cache_detail *cd, struct auth_domain *clp, struct path *path)
struct dentry *parent = dget_parent(path->dentry);
dput(path->dentry);
path->dentry = parent;
+ exp_put(exp);
exp = exp_get_by_name(cd, clp, path, NULL);
}
dput(path->dentry);
@@ -928,7 +985,10 @@ static struct svc_export *exp_find(struct cache_detail *cd,
return ERR_CAST(ek);

exp = exp_get_by_name(cd, clp, &ek->ek_path, reqp);
- cache_put(&ek->h, nn->svc_expkey_cache);
+
+ /* Put the mnt get in exp_find_key() */
+ mntput(ek->ek_path.mnt);
+ cache_put(&ek->h, ek->cd);

if (IS_ERR(exp))
return ERR_CAST(exp);
@@ -1195,10 +1255,10 @@ static int e_show(struct seq_file *m, void *p)
return 0;
}

- exp_get(exp);
+ cache_get(&exp->h);
if (cache_check(cd, &exp->h, NULL))
return 0;
- exp_put(exp);
+ cache_put(&exp->h, exp->cd);
return svc_export_show(m, cd, cp);
}

diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h
index 2e31507..bf6bfff 100644
--- a/fs/nfsd/export.h
+++ b/fs/nfsd/export.h
@@ -4,6 +4,7 @@
#ifndef NFSD_EXPORT_H
#define NFSD_EXPORT_H

+#include <linux/fs_pin.h>
#include <linux/sunrpc/cache.h>
#include <uapi/linux/nfsd/export.h>
#include <linux/nfs4.h>
@@ -47,9 +48,10 @@ struct exp_flavor_info {

struct svc_export {
struct cache_head h;
+ struct cache_detail *cd;
+
struct auth_domain * ex_client;
int ex_flags;
- struct path ex_path;
kuid_t ex_anon_uid;
kgid_t ex_anon_gid;
int ex_fsid;
@@ -59,7 +61,10 @@ struct svc_export {
struct exp_flavor_info ex_flavors[MAX_SECINFO_LIST];
enum pnfs_layouttype ex_layout_type;
struct nfsd4_deviceid_map *ex_devid_map;
- struct cache_detail *cd;
+
+ struct path ex_path;
+ struct fs_pin ex_pin;
+ struct rcu_head rcu_head;
};

/* an "export key" (expkey) maps a filehandlefragement to an
@@ -68,12 +73,15 @@ struct svc_export {
*/
struct svc_expkey {
struct cache_head h;
+ struct cache_detail *cd;

struct auth_domain * ek_client;
int ek_fsidtype;
u32 ek_fsid[6];

struct path ek_path;
+ struct fs_pin ek_pin;
+ struct rcu_head rcu_head;
};

#define EX_ISSYNC(exp) (!((exp)->ex_flags & NFSEXP_ASYNC))
@@ -101,12 +109,14 @@ __be32 nfserrno(int errno);

static inline void exp_put(struct svc_export *exp)
{
+ mntput(exp->ex_path.mnt);
cache_put(&exp->h, exp->cd);
}

static inline struct svc_export *exp_get(struct svc_export *exp)
{
cache_get(&exp->h);
+ mntget(exp->ex_path.mnt);
return exp;
}
struct svc_export * rqst_exp_find(struct svc_rqst *, int, u32 *);
--
2.4.3