2010-12-29 13:14:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
Only for client for now.

Changelog:

v2:
- one of rpc_create() calls was missed initially, fixed;
- change logic for get_rpc_pipefs(NULL);
- export get_rpc_pipefs() to be able to use from modules
(tnx J. Bruce Field);
- change "From:" and "Signed-off-by:" addresses.

v1:
- initial revision of the patchset.

Kirill A. Shutemov (12):
sunrpc: mount rpc_pipefs on initialization
sunrpc: introduce init_rpc_pipefs
sunrpc: push init_rpc_pipefs up to rpc_create() callers
sunrpc: tag svc_serv with rpc_pipefs mount point
sunrpc: get rpc_pipefs mount point for svc_serv from callers
lockd: get rpc_pipefs mount point from callers
sunrpc: get rpc_pipefs mount point for rpcb_create[_local] from callers
sunrpc: tag pipefs field of cache_detail with rpc_pipefs mount point
nfs: per-rpc_pipefs dns cache
sunrpc: introduce get_rpc_pipefs()
nfs: introduce mount option 'rpcmount'
sunrpc: make rpc_pipefs be mountable multiple times

fs/lockd/clntlock.c | 8 +-
fs/lockd/host.c | 12 +++-
fs/lockd/mon.c | 13 ++-
fs/lockd/svc.c | 4 +-
fs/nfs/cache_lib.c | 18 +---
fs/nfs/cache_lib.h | 3 +-
fs/nfs/callback.c | 6 +-
fs/nfs/callback.h | 3 +-
fs/nfs/client.c | 45 +++++++++--
fs/nfs/dns_resolve.c | 128 +++++++++++++++++++++++------
fs/nfs/dns_resolve.h | 8 +--
fs/nfs/inode.c | 8 +--
fs/nfs/internal.h | 10 ++-
fs/nfs/mount_clnt.c | 1 +
fs/nfs/namespace.c | 3 +-
fs/nfs/nfs4namespace.c | 20 +++--
fs/nfs/super.c | 20 +++++
fs/nfsd/nfs4callback.c | 2 +
fs/nfsd/nfssvc.c | 8 +-
include/linux/lockd/bind.h | 3 +-
include/linux/lockd/lockd.h | 4 +-
include/linux/nfs_fs_sb.h | 1 +
include/linux/sunrpc/cache.h | 9 +--
include/linux/sunrpc/clnt.h | 5 +-
include/linux/sunrpc/rpc_pipe_fs.h | 6 +-
include/linux/sunrpc/svc.h | 9 +-
net/sunrpc/cache.c | 16 +++--
net/sunrpc/clnt.c | 19 +++--
net/sunrpc/rpc_pipe.c | 156 ++++++++++++++++++++++++++++++-----
net/sunrpc/rpcb_clnt.c | 19 +++--
net/sunrpc/svc.c | 52 ++++++++-----
31 files changed, 448 insertions(+), 171 deletions(-)

--
1.7.3.4



2010-12-30 09:44:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Thu, Dec 30, 2010 at 03:10:20AM -0600, Rob Landley wrote:
> On 12/30/2010 02:51 AM, Kirill A. Shutemov wrote:
> > On Wed, Dec 29, 2010 at 08:13:50PM -0600, Rob Landley wrote:
> >> On Wed, Dec 29, 2010 at 7:14 AM, Kirill A. Shutemov<[email protected]> wrote:
> >>>
> >>> Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
> >>> Only for client for now.
> >>
> >> What would a test case for this look like? (Is there some way to tell
> >> an nfs mount to use a specific instance of rpc_pipefs or something?)
> >
> > You can create a new instance of rpc_pipefs using 'newinstance'
> > mountoption.
> >
> > Then you can specify which rpc_pipefs to use with 'rpcmount' mountoption
> > of nfs mount. If none specifed, '/var/lib/nfs/rpc_pipefs' uses by default.
>
> That path is as the process performing the mount sees it?

Yep.

> > If no rpcmount mountoption, no rpc_pipefs was found at
> > '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
> > init_rpc_pipefs.
>
> It's the "we are in init's mount namespace" that I was wondering about.
>
> So if I naievely chroot, nfs mount stops working the way it did before I
> chrooted unless I do an extra setup step?

No. It will work as before since you are still in init's mount namespace.
Creating new mount namespace changes rules.

> I'm actually poking at getting nfs mount working in LXC containers with
> different network routing (mostly study so far, it took me a couple
> weeks just to get lxc to work for me and now I'm trying to wrap my head
> around Linux's NFS implementation), so I'm very interested in this...

--
Kirill A. Shutemov

2010-12-31 16:54:22

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Thu, 2010-12-30 at 04:05 -0600, Rob Landley wrote:
> On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
> >>> If no rpcmount mountoption, no rpc_pipefs was found at
> >>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
> >>> init_rpc_pipefs.
> >>
> >> It's the "we are in init's mount namespace" that I was wondering about.
> >>
> >> So if I naievely chroot, nfs mount stops working the way it did before I
> >> chrooted unless I do an extra setup step?
> >
> > No. It will work as before since you are still in init's mount namespace.
> > Creating new mount namespace changes rules.
>
> Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs. Got it.
>
> I'm kind of surprised that the kernel cares about a specific path under
> /var/lib. (Seems like policy in the kernel somehow.) Can't it just
> check the current process's mount list to see if an instance of
> rpc_pipefs is mounted in the current namespace the way lxc looks for
> cgroups? Or are there potential performance/scalability issues with that?

The kernel doesn't give a damn about the /var/lib/nfs/rpc_pipefs bit.
That's all for the benefit of the userland utilities.

Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2010-12-29 13:14:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 03/12] sunrpc: push init_rpc_pipefs up to rpc_create() callers

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/lockd/host.c | 2 ++
fs/lockd/mon.c | 2 ++
fs/nfs/client.c | 2 ++
fs/nfs/mount_clnt.c | 2 ++
fs/nfsd/nfs4callback.c | 2 ++
include/linux/sunrpc/clnt.h | 1 +
net/sunrpc/clnt.c | 11 +++++++----
net/sunrpc/rpcb_clnt.c | 3 +++
8 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index ed0c59f..b033a2d 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -14,6 +14,7 @@
#include <linux/in6.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svc.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/lockd/lockd.h>
#include <linux/mutex.h>

@@ -360,6 +361,7 @@ nlm_bind_host(struct nlm_host *host)
.authflavor = RPC_AUTH_UNIX,
.flags = (RPC_CLNT_CREATE_NOPING |
RPC_CLNT_CREATE_AUTOBIND),
+ .rpcmount = init_rpc_pipefs,
};

/*
diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index e0c9189..37e5328 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -15,6 +15,7 @@
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/xprtsock.h>
#include <linux/sunrpc/svc.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/lockd/lockd.h>

#include <asm/unaligned.h>
@@ -78,6 +79,7 @@ static struct rpc_clnt *nsm_create(void)
.version = NSM_VERSION,
.authflavor = RPC_AUTH_NULL,
.flags = RPC_CLNT_CREATE_NOPING,
+ .rpcmount = init_rpc_pipefs,
};

return rpc_create(&args);
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 0870d0d..e041f39 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -25,6 +25,7 @@
#include <linux/sunrpc/metrics.h>
#include <linux/sunrpc/xprtsock.h>
#include <linux/sunrpc/xprtrdma.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_mount.h>
#include <linux/nfs4_mount.h>
@@ -614,6 +615,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp,
.program = &nfs_program,
.version = clp->rpc_ops->version,
.authflavor = flavor,
+ .rpcmount = init_rpc_pipefs,
};

if (discrtry)
diff --git a/fs/nfs/mount_clnt.c b/fs/nfs/mount_clnt.c
index 4f981f1..67b4b8d 100644
--- a/fs/nfs/mount_clnt.c
+++ b/fs/nfs/mount_clnt.c
@@ -13,6 +13,7 @@
#include <linux/in.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/sched.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/nfs_fs.h>
#include "internal.h"

@@ -161,6 +162,7 @@ int nfs_mount(struct nfs_mount_request *info)
.program = &mnt_program,
.version = info->version,
.authflavor = RPC_AUTH_UNIX,
+ .rpcmount = init_rpc_pipefs,
};
struct rpc_clnt *mnt_clnt;
int status;
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 143da2e..a95150d 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -33,6 +33,7 @@

#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svc_xprt.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/slab.h>
#include "nfsd.h"
#include "state.h"
@@ -488,6 +489,7 @@ int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *conn)
.version = 0,
.authflavor = clp->cl_flavor,
.flags = (RPC_CLNT_CREATE_NOPING | RPC_CLNT_CREATE_QUIET),
+ .rpcmount = init_rpc_pipefs,
};
struct rpc_clnt *client;

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index a5a55f2..f052712 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -116,6 +116,7 @@ struct rpc_create_args {
unsigned long flags;
char *client_name;
struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */
+ struct vfsmount *rpcmount;
};

/* Values for "flags" field */
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index da2507a..6d88fb7 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -96,7 +96,8 @@ static void rpc_unregister_client(struct rpc_clnt *clnt)
}

static int
-rpc_setup_pipedir(struct rpc_clnt *clnt, char *dir_name)
+rpc_setup_pipedir(struct rpc_clnt *clnt, struct vfsmount *rpcmount,
+ char *dir_name)
{
static uint32_t clntid;
struct nameidata nd;
@@ -112,7 +113,7 @@ rpc_setup_pipedir(struct rpc_clnt *clnt, char *dir_name)
if (dir_name == NULL)
return 0;

- path.mnt = mntget(init_rpc_pipefs);
+ path.mnt = mntget(rpcmount);
error = vfs_path_lookup(path.mnt->mnt_root, path.mnt, dir_name, 0, &nd);
if (error)
goto err;
@@ -226,7 +227,8 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru

atomic_set(&clnt->cl_count, 1);

- err = rpc_setup_pipedir(clnt, program->pipe_dir_name);
+ BUG_ON(!args->rpcmount);
+ err = rpc_setup_pipedir(clnt, args->rpcmount, program->pipe_dir_name);
if (err < 0)
goto out_no_path;

@@ -390,7 +392,8 @@ rpc_clone_client(struct rpc_clnt *clnt)
goto out_no_principal;
}
atomic_set(&new->cl_count, 1);
- err = rpc_setup_pipedir(new, clnt->cl_program->pipe_dir_name);
+ err = rpc_setup_pipedir(new, clnt->cl_path.mnt,
+ clnt->cl_program->pipe_dir_name);
if (err != 0)
goto out_no_path;
if (new->cl_auth)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index fa6d7ca..8d04380 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -27,6 +27,7 @@
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/sched.h>
#include <linux/sunrpc/xprtsock.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>

#ifdef RPC_DEBUG
# define RPCDBG_FACILITY RPCDBG_BIND
@@ -186,6 +187,7 @@ static int rpcb_create_local(void)
.version = RPCBVERS_2,
.authflavor = RPC_AUTH_UNIX,
.flags = RPC_CLNT_CREATE_NOPING,
+ .rpcmount = init_rpc_pipefs,
};
struct rpc_clnt *clnt, *clnt4;
int result = 0;
@@ -240,6 +242,7 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
.authflavor = RPC_AUTH_UNIX,
.flags = (RPC_CLNT_CREATE_NOPING |
RPC_CLNT_CREATE_NONPRIVPORT),
+ .rpcmount = init_rpc_pipefs,
};

switch (srvaddr->sa_family) {
--
1.7.3.4


2010-12-30 12:52:48

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On 12/30/2010 05:45 AM, Kirill A. Shutemov wrote:
> On Thu, Dec 30, 2010 at 05:05:22AM -0600, Rob Landley wrote:
>> On Thu, Dec 30, 2010 at 4:44 AM, Kirill A. Shutemov<[email protected]> wrote:
>>> On Thu, Dec 30, 2010 at 04:05:07AM -0600, Rob Landley wrote:
>>>> On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
>>>>>>> If no rpcmount mountoption, no rpc_pipefs was found at
>>>>>>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
>>>>>>> init_rpc_pipefs.
>>>>>>
>>>>>> It's the "we are in init's mount namespace" that I was wondering about.
>>>>>>
>>>>>> So if I naievely chroot, nfs mount stops working the way it did before I
>>>>>> chrooted unless I do an extra setup step?
>>>>>
>>>>> No. It will work as before since you are still in init's mount namespace.
>>>>> Creating new mount namespace changes rules.
>>>>
>>>> Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs. Got it.
>>>>
>>>> I'm kind of surprised that the kernel cares about a specific path under
>>>> /var/lib. (Seems like policy in the kernel somehow.)
>>>
>>> Yep. It's bad, but there is way to overwrite the default.
>>>
>>> Other way is to leave 'rpcmount' mountoption without default.
>>> get_rpc_pipefs(NULL) in init's mount namespace will always return
>>> init_rpc_pipefs, without filesystem lookup.
>>> get_rpc_pipefs(NULL) in non-init's mount namespace will always return
>>> error.
>>>
>>> So you will have to specify 'rpcmount' mountoption for every nfs mount in
>>> container. Hmm, I guess, it may confuse user.
>>>
>>> Or we can try to move the default to userspace. /sbin/mount.nfs?
>>
>> /proc/sys/kernel/hotplug exists to tell the kernel where to find the hotplug
>> binary. Once upon a time /sys/hotplug was the default value, and that was
>> there to overwrite it. (They changed the default to blank (disabled) not due
>> to policy reasons, but due to adding the netlink hotplug notification
>> mechanism and making that the default.)
>>
>> I bring that up to point out that the general consensus about policy in the
>> kernel seems to be "when you really really can't avoid having any, make a
>> sane default the user can override".
>>
>> (Of course adding another entry to the crawling horror of /proc may not
>> be an improvement. But individual overrides at the mount -o level seem
>> like a non-optimal granularity for this...)
>
> Do you propose to implement default as sysctl parameter?

I was pointing out it's been done before.

I'd prefer autodetecting it so new namespaces and the base namespace
don't have magic policy _or_ require different mount invocations. An
ability to change the default for a value is less appealing than not
needing the value in the first place.

And changing the default would probably have to be per-container anyway
to be useful. (Which isn't _quite_ the same as per-namespace since you
can chroot without CLONE_NEWNS.)

(I keep thinking back to web service providers offering cheap web
hosting "with root access" via openvz containers and such. They're
administering their own boxes, but aren't big iron guys. This is yet
another thing for them to understand that didn't apply to the linux box
they have at home, and I'm just wondering if there's a way they don't
have to.)

>>>> Can't it just
>>>> check the current process's mount list to see if an instance of
>>>> rpc_pipefs is mounted in the current namespace the way lxc looks for
>>>> cgroups? Or are there potential performance/scalability issues with that?
>>>
>>> What should we do if we have several rpc_pipefs mounts in the namespace?
>>
>> You mean more than one inside a given process's view of the filesystem, taking
>> into account chroot like /proc/mounts does?
>>
>> Before this patch series, there was one instance systemwide. The patch changed
>> that to look a fixed location in the filesystem relative to the
>> current chroot. Either
>> way, there was one instance available to a given process doing an nfs mount.
>>
>> What's the use case for having more than one visible to a given process?
>> (NUMA scalability? Some sort of multipath/VPN routing context?)
>
> It's no so obvious for me why we should restrict it. ;)

You can still provide a specific location with "-o rpcmount=/blah",
correct? So this isn't restricting it, this is autodetecting the
default value, using the visible mount point of the appropriate type.

> Currently, there is no association between rpc_pipefs and mount namespace,

There is in that the root context doesn't need to have this mounted, and
new namespaces do. So there's an existing association between a LACK of
a namespace and a different default behavior.

My understanding (correct me if I'm wrong) is that the historical
behavior is that there's only one, and it doesn't actually live anywhere
in the filesystem tree. You're adding a special location. I'm
wondering if there's any way for that location not to be special.

> so I don't see simple way to restrict number of rpc_pipefs per mount
> namespace. Associating mount namespace with rpc_pipefs is not a good idea,
> I think.

I'm talking about associating a default rpc_pipefs instance with a
namespace, which it seems to me you're already doing by emulating the
legacy behavior. Before you CLONE_NEWNS you get a magic default mount
that doesn't exist in the tree. After you CLONE_NEWNS you get something
like -EINVAL unless you supply your own default. (I'm actually not sure
why new namespaces don't fall back to the magic global one...)

I'm suggesting that if the user doesn't specify -o rpcmount then the
default could be the first rpc_pipefs mount visible to the current
process context, rather than a specific path. Logic to do that exists
in the proc/self/mounts code (which I'm reading through now...).

(Your 00/12 post doesn't actually explain what can be _different_ about
the various instances of rpc_pipefs, and hence why you'd want to mount
it multiple times. I'm still coming up to speed on the guts of NFS.
The use case I'm trying to fix involves containers with different
network routing than the host, and this looks like potentially part of
the solution to that, but I'm still putting together enough context to
work out how....)

Rob

2010-12-29 13:14:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 10/12] sunrpc: introduce get_rpc_pipefs()

Get rpc_pipefs mount point by path.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/sunrpc/rpc_pipe_fs.h | 2 +
net/sunrpc/rpc_pipe.c | 38 ++++++++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/include/linux/sunrpc/rpc_pipe_fs.h b/include/linux/sunrpc/rpc_pipe_fs.h
index b09bfa5..922057c 100644
--- a/include/linux/sunrpc/rpc_pipe_fs.h
+++ b/include/linux/sunrpc/rpc_pipe_fs.h
@@ -46,6 +46,8 @@ RPC_I(struct inode *inode)

extern struct vfsmount *init_rpc_pipefs;

+struct vfsmount *get_rpc_pipefs(const char *path);
+
extern int rpc_queue_upcall(struct inode *, struct rpc_pipe_msg *);

struct rpc_clnt;
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index b1e299b..fec6b2d 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -16,6 +16,7 @@
#include <linux/namei.h>
#include <linux/fsnotify.h>
#include <linux/kernel.h>
+#include <linux/nsproxy.h>

#include <asm/ioctls.h>
#include <linux/fs.h>
@@ -931,6 +932,43 @@ static const struct super_operations s_ops = {

#define RPCAUTH_GSSMAGIC 0x67596969

+struct vfsmount *get_rpc_pipefs(const char *p)
+{
+ int error;
+ struct vfsmount *rpcmount;
+ struct path path;
+
+ if (!p) {
+ /* Try to get with default rpcmount mount point */
+ rpcmount = get_rpc_pipefs("/var/lib/nfs/rpc_pipefs");
+
+ /*
+ * If nothing was found at default mount point and init's
+ * mount namespace is in use, use init_rpc_pipefs
+ */
+ if (IS_ERR(rpcmount) && (current->nsproxy->mnt_ns ==
+ init_task.nsproxy->mnt_ns))
+ return mntget(init_rpc_pipefs);
+
+ return rpcmount;
+ }
+
+ error = kern_path(p, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path);
+ if (error)
+ return ERR_PTR(error);
+
+ if (path.mnt->mnt_sb->s_magic != RPCAUTH_GSSMAGIC) {
+ path_put(&path);
+ return ERR_PTR(-EINVAL);
+ }
+
+ rpcmount = mntget(path.mnt);
+ path_put(&path);
+
+ return rpcmount;
+}
+EXPORT_SYMBOL_GPL(get_rpc_pipefs);
+
/*
* We have a single directory with 1 node in it.
*/
--
1.7.3.4


2010-12-29 13:14:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 07/12] sunrpc: get rpc_pipefs mount point for rpcb_create[_local] from callers

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/sunrpc/clnt.h | 4 ++--
net/sunrpc/rpcb_clnt.c | 22 ++++++++++++----------
net/sunrpc/svc.c | 34 +++++++++++++++++++++-------------
3 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index f052712..59eda38 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -135,10 +135,10 @@ void rpc_shutdown_client(struct rpc_clnt *);
void rpc_release_client(struct rpc_clnt *);
void rpc_task_release_client(struct rpc_task *);

-int rpcb_register(u32, u32, int, unsigned short);
+int rpcb_register(u32, u32, int, unsigned short, struct vfsmount *);
int rpcb_v4_register(const u32 program, const u32 version,
const struct sockaddr *address,
- const char *netid);
+ const char *netid, struct vfsmount *rpcmount);
void rpcb_getport_async(struct rpc_task *);

void rpc_call_start(struct rpc_task *);
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 8d04380..867d177 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -27,7 +27,6 @@
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/sched.h>
#include <linux/sunrpc/xprtsock.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>

#ifdef RPC_DEBUG
# define RPCDBG_FACILITY RPCDBG_BIND
@@ -175,7 +174,7 @@ static DEFINE_MUTEX(rpcb_create_local_mutex);
* Returns zero on success, otherwise a negative errno value
* is returned.
*/
-static int rpcb_create_local(void)
+static int rpcb_create_local(struct vfsmount *rpcmount)
{
struct rpc_create_args args = {
.net = &init_net,
@@ -187,7 +186,7 @@ static int rpcb_create_local(void)
.version = RPCBVERS_2,
.authflavor = RPC_AUTH_UNIX,
.flags = RPC_CLNT_CREATE_NOPING,
- .rpcmount = init_rpc_pipefs,
+ .rpcmount = rpcmount,
};
struct rpc_clnt *clnt, *clnt4;
int result = 0;
@@ -229,7 +228,8 @@ out:
}

static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
- size_t salen, int proto, u32 version)
+ size_t salen, int proto, u32 version,
+ struct vfsmount *rpcmount)
{
struct rpc_create_args args = {
.net = &init_net,
@@ -242,7 +242,7 @@ static struct rpc_clnt *rpcb_create(char *hostname, struct sockaddr *srvaddr,
.authflavor = RPC_AUTH_UNIX,
.flags = (RPC_CLNT_CREATE_NOPING |
RPC_CLNT_CREATE_NONPRIVPORT),
- .rpcmount = init_rpc_pipefs,
+ .rpcmount = rpcmount,
};

switch (srvaddr->sa_family) {
@@ -309,7 +309,8 @@ static int rpcb_register_call(struct rpc_clnt *clnt, struct rpc_message *msg)
* IN6ADDR_ANY (ie available for all AF_INET and AF_INET6
* addresses).
*/
-int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port)
+int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port,
+ struct vfsmount *rpcmount)
{
struct rpcbind_args map = {
.r_prog = prog,
@@ -322,7 +323,7 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned short port)
};
int error;

- error = rpcb_create_local();
+ error = rpcb_create_local(rpcmount);
if (error)
return error;

@@ -449,7 +450,8 @@ static int rpcb_unregister_all_protofamilies(struct rpc_message *msg)
* advertises the service on all IPv4 and IPv6 addresses.
*/
int rpcb_v4_register(const u32 program, const u32 version,
- const struct sockaddr *address, const char *netid)
+ const struct sockaddr *address, const char *netid,
+ struct vfsmount *rpcmount)
{
struct rpcbind_args map = {
.r_prog = program,
@@ -462,7 +464,7 @@ int rpcb_v4_register(const u32 program, const u32 version,
};
int error;

- error = rpcb_create_local();
+ error = rpcb_create_local(rpcmount);
if (error)
return error;
if (rpcb_local_clnt4 == NULL)
@@ -598,7 +600,7 @@ void rpcb_getport_async(struct rpc_task *task)
task->tk_pid, __func__, bind_version);

rpcb_clnt = rpcb_create(clnt->cl_server, sap, salen, xprt->prot,
- bind_version);
+ bind_version, clnt->cl_path.mnt);
if (IS_ERR(rpcb_clnt)) {
status = PTR_ERR(rpcb_clnt);
dprintk("RPC: %5u %s: rpcb_create failed, error %ld\n",
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 0bd6088..e0ae040 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -743,7 +743,8 @@ EXPORT_SYMBOL_GPL(svc_exit_thread);
*/
static int __svc_rpcb_register4(const u32 program, const u32 version,
const unsigned short protocol,
- const unsigned short port)
+ const unsigned short port,
+ struct vfsmount *rpcmount)
{
const struct sockaddr_in sin = {
.sin_family = AF_INET,
@@ -765,14 +766,16 @@ static int __svc_rpcb_register4(const u32 program, const u32 version,
}

error = rpcb_v4_register(program, version,
- (const struct sockaddr *)&sin, netid);
+ (const struct sockaddr *)&sin, netid,
+ rpcmount);

/*
* User space didn't support rpcbind v4, so retry this
* registration request with the legacy rpcbind v2 protocol.
*/
if (error == -EPROTONOSUPPORT)
- error = rpcb_register(program, version, protocol, port);
+ error = rpcb_register(program, version, protocol, port,
+ rpcmount);

return error;
}
@@ -790,7 +793,8 @@ static int __svc_rpcb_register4(const u32 program, const u32 version,
*/
static int __svc_rpcb_register6(const u32 program, const u32 version,
const unsigned short protocol,
- const unsigned short port)
+ const unsigned short port,
+ struct vfsmount *rpcmount)
{
const struct sockaddr_in6 sin6 = {
.sin6_family = AF_INET6,
@@ -812,7 +816,8 @@ static int __svc_rpcb_register6(const u32 program, const u32 version,
}

error = rpcb_v4_register(program, version,
- (const struct sockaddr *)&sin6, netid);
+ (const struct sockaddr *)&sin6, netid,
+ rpcmount);

/*
* User space didn't support rpcbind version 4, so we won't
@@ -835,19 +840,20 @@ static int __svc_register(const char *progname,
const u32 program, const u32 version,
const int family,
const unsigned short protocol,
- const unsigned short port)
+ const unsigned short port,
+ struct vfsmount *rpcmount)
{
int error = -EAFNOSUPPORT;

switch (family) {
case PF_INET:
error = __svc_rpcb_register4(program, version,
- protocol, port);
+ protocol, port, rpcmount);
break;
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
case PF_INET6:
error = __svc_rpcb_register6(program, version,
- protocol, port);
+ protocol, port, rpcmount);
#endif /* defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) */
}

@@ -893,7 +899,8 @@ int svc_register(const struct svc_serv *serv, const int family,
continue;

error = __svc_register(progp->pg_name, progp->pg_prog,
- i, family, proto, port);
+ i, family, proto, port,
+ serv->sv_rpcmount);
if (error < 0)
break;
}
@@ -910,18 +917,18 @@ int svc_register(const struct svc_serv *serv, const int family,
* in this case to clear all existing entries for [program, version].
*/
static void __svc_unregister(const u32 program, const u32 version,
- const char *progname)
+ const char *progname, struct vfsmount *rpcmount)
{
int error;

- error = rpcb_v4_register(program, version, NULL, "");
+ error = rpcb_v4_register(program, version, NULL, "", rpcmount);

/*
* User space didn't support rpcbind v4, so retry this
* request with the legacy rpcbind v2 protocol.
*/
if (error == -EPROTONOSUPPORT)
- error = rpcb_register(program, version, 0, 0);
+ error = rpcb_register(program, version, 0, 0, rpcmount);

dprintk("svc: %s(%sv%u), error %d\n",
__func__, progname, version, error);
@@ -950,7 +957,8 @@ static void svc_unregister(const struct svc_serv *serv)
if (progp->pg_vers[i]->vs_hidden)
continue;

- __svc_unregister(progp->pg_prog, i, progp->pg_name);
+ __svc_unregister(progp->pg_prog, i, progp->pg_name,
+ serv->sv_rpcmount);
}
}

--
1.7.3.4


2010-12-29 13:14:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 11/12] nfs: introduce mount option 'rpcmount'

It specifies rpc_pipefs to use. /var/lib/nfs/rpc_pipefs, by default.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/nfs/callback.c | 6 ++--
fs/nfs/callback.h | 3 +-
fs/nfs/client.c | 46 ++++++++++++++++++++++++++++++++++++--------
fs/nfs/internal.h | 10 +++++++-
fs/nfs/mount_clnt.c | 3 +-
fs/nfs/namespace.c | 3 +-
fs/nfs/nfs4namespace.c | 22 +++++++++++---------
fs/nfs/super.c | 20 +++++++++++++++++++
include/linux/nfs_fs_sb.h | 1 +
9 files changed, 86 insertions(+), 28 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index bef6abd..ef6d206 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -16,7 +16,6 @@
#include <linux/freezer.h>
#include <linux/kthread.h>
#include <linux/sunrpc/svcauth_gss.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>
#if defined(CONFIG_NFS_V4_1)
#include <linux/sunrpc/bc_xprt.h>
#endif
@@ -239,7 +238,8 @@ static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
/*
* Bring up the callback thread if it is not already up.
*/
-int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt)
+int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt,
+ struct vfsmount *rpcmount)
{
struct svc_serv *serv = NULL;
struct svc_rqst *rqstp;
@@ -254,7 +254,7 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt)
nfs_callback_bc_serv(minorversion, xprt, cb_info);
goto out;
}
- serv = svc_create(&nfs4_callback_program, init_rpc_pipefs,
+ serv = svc_create(&nfs4_callback_program, rpcmount,
NFS4_CALLBACK_BUFSIZE, NULL);
if (!serv) {
ret = -ENOMEM;
diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 85a7cfd..ae27385 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -133,7 +133,8 @@ extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args, struct cb_getat
extern __be32 nfs4_callback_recall(struct cb_recallargs *args, void *dummy);

#ifdef CONFIG_NFS_V4
-extern int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt);
+extern int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt,
+ struct vfsmount *rpcmount);
extern void nfs_callback_down(int minorversion);
extern int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation,
const nfs4_stateid *stateid);
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index fbc013d..ccc400a 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -107,6 +107,7 @@ struct nfs_client_initdata {
const struct nfs_rpc_ops *rpc_ops;
int proto;
u32 minorversion;
+ struct vfsmount *rpcmount;
};

/*
@@ -143,6 +144,7 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
clp->cl_rpcclient = ERR_PTR(-EINVAL);

clp->cl_proto = cl_init->proto;
+ clp->cl_rpcmount = mntget(cl_init->rpcmount);

#ifdef CONFIG_NFS_V4
INIT_LIST_HEAD(&clp->cl_delegations);
@@ -231,6 +233,7 @@ static void nfs_free_client(struct nfs_client *clp)
if (clp->cl_machine_cred != NULL)
put_rpccred(clp->cl_machine_cred);

+ mntput(clp->cl_rpcmount);
kfree(clp->cl_hostname);
kfree(clp);

@@ -457,6 +460,9 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
/* Match the full socket address */
if (!nfs_sockaddr_cmp(sap, clap))
continue;
+ /* Match rpc_pipefs mount point */
+ if (clp->cl_rpcmount->mnt_sb != data->rpcmount->mnt_sb)
+ continue;

atomic_inc(&clp->cl_count);
return clp;
@@ -615,7 +621,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp,
.program = &nfs_program,
.version = clp->rpc_ops->version,
.authflavor = flavor,
- .rpcmount = init_rpc_pipefs,
+ .rpcmount = clp->cl_rpcmount,
};

if (discrtry)
@@ -650,7 +656,7 @@ static void nfs_destroy_server(struct nfs_server *server)
/*
* Version 2 or 3 lockd setup
*/
-static int nfs_start_lockd(struct nfs_server *server)
+static int nfs_start_lockd(struct nfs_server *server, struct vfsmount *rpcmount)
{
struct nlm_host *host;
struct nfs_client *clp = server->nfs_client;
@@ -661,7 +667,7 @@ static int nfs_start_lockd(struct nfs_server *server)
.nfs_version = clp->rpc_ops->version,
.noresvport = server->flags & NFS_MOUNT_NORESVPORT ?
1 : 0,
- .rpcmount = init_rpc_pipefs,
+ .rpcmount = rpcmount,
};

if (nlm_init.nfs_version > 3)
@@ -809,8 +815,16 @@ static int nfs_init_server(struct nfs_server *server,
cl_init.rpc_ops = &nfs_v3_clientops;
#endif

+ cl_init.rpcmount = get_rpc_pipefs(data->rpcmount);
+ if (IS_ERR(cl_init.rpcmount)) {
+ dprintk("<-- nfs_init_server() = error %ld\n",
+ PTR_ERR(cl_init.rpcmount));
+ return PTR_ERR(cl_init.rpcmount);
+ }
+
/* Allocate or find a client reference we can use */
clp = nfs_get_client(&cl_init);
+ mntput(cl_init.rpcmount);
if (IS_ERR(clp)) {
dprintk("<-- nfs_init_server() = error %ld\n", PTR_ERR(clp));
return PTR_ERR(clp);
@@ -842,7 +856,7 @@ static int nfs_init_server(struct nfs_server *server,
server->acdirmax = data->acdirmax * HZ;

/* Start lockd here, before we might error out */
- error = nfs_start_lockd(server);
+ error = nfs_start_lockd(server, clp->cl_rpcmount);
if (error < 0)
goto error;

@@ -1144,7 +1158,8 @@ static int nfs4_init_callback(struct nfs_client *clp)
}

error = nfs_callback_up(clp->cl_mvops->minor_version,
- clp->cl_rpcclient->cl_xprt);
+ clp->cl_rpcclient->cl_xprt,
+ clp->cl_rpcmount);
if (error < 0) {
dprintk("%s: failed to start callback. Error = %d\n",
__func__, error);
@@ -1244,7 +1259,8 @@ static int nfs4_set_client(struct nfs_server *server,
const char *ip_addr,
rpc_authflavor_t authflavour,
int proto, const struct rpc_timeout *timeparms,
- u32 minorversion)
+ u32 minorversion,
+ struct vfsmount *rpcmount)
{
struct nfs_client_initdata cl_init = {
.hostname = hostname,
@@ -1253,6 +1269,7 @@ static int nfs4_set_client(struct nfs_server *server,
.rpc_ops = &nfs_v4_clientops,
.proto = proto,
.minorversion = minorversion,
+ .rpcmount = rpcmount,
};
struct nfs_client *clp;
int error;
@@ -1363,6 +1380,7 @@ static int nfs4_init_server(struct nfs_server *server,
const struct nfs_parsed_mount_data *data)
{
struct rpc_timeout timeparms;
+ struct vfsmount *rpcmount;
int error;

dprintk("--> nfs4_init_server()\n");
@@ -1377,6 +1395,11 @@ static int nfs4_init_server(struct nfs_server *server,
server->caps |= NFS_CAP_READDIRPLUS;
server->options = data->options;

+ rpcmount = get_rpc_pipefs(data->rpcmount);
+ if (IS_ERR(rpcmount)) {
+ error = PTR_ERR(rpcmount);
+ goto error;
+ }
/* Get a client record */
error = nfs4_set_client(server,
data->nfs_server.hostname,
@@ -1386,7 +1409,9 @@ static int nfs4_init_server(struct nfs_server *server,
data->auth_flavors[0],
data->nfs_server.protocol,
&timeparms,
- data->minorversion);
+ data->minorversion,
+ rpcmount);
+ mntput(rpcmount);
if (error < 0)
goto error;

@@ -1476,7 +1501,10 @@ struct nfs_server *nfs4_create_referral_server(struct nfs_clone_mount *data,
data->authflavor,
parent_server->client->cl_xprt->prot,
parent_server->client->cl_timeout,
- parent_client->cl_mvops->minor_version);
+ parent_client->cl_mvops->minor_version,
+ parent_client->cl_rpcmount);
+
+
if (error < 0)
goto error;

@@ -1550,7 +1578,7 @@ struct nfs_server *nfs_clone_server(struct nfs_server *source,
(unsigned long long) server->fsid.major,
(unsigned long long) server->fsid.minor);

- error = nfs_start_lockd(server);
+ error = nfs_start_lockd(server, server->nfs_client->cl_rpcmount);
if (error < 0)
goto out_free_server;

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index e6356b7..cb31fd9 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -86,6 +86,7 @@ struct nfs_parsed_mount_data {
unsigned int version;
unsigned int minorversion;
char *fscache_uniq;
+ char *rpcmount;

struct {
struct sockaddr_storage address;
@@ -120,6 +121,7 @@ struct nfs_mount_request {
int noresvport;
unsigned int *auth_flav_len;
rpc_authflavor_t *auth_flavs;
+ struct vfsmount *rpcmount;
};

extern int nfs_mount(struct nfs_mount_request *info);
@@ -160,10 +162,14 @@ static inline void nfs_fs_proc_exit(void)

/* nfs4namespace.c */
#ifdef CONFIG_NFS_V4
-extern struct vfsmount *nfs_do_refmount(const struct vfsmount *mnt_parent, struct dentry *dentry);
+extern struct vfsmount *nfs_do_refmount(const struct vfsmount *mnt_parent,
+ struct dentry *dentry,
+ struct vfsmount *rpcmount);
#else
static inline
-struct vfsmount *nfs_do_refmount(const struct vfsmount *mnt_parent, struct dentry *dentry)
+struct vfsmount *nfs_do_refmount(const struct vfsmount *mnt_parent,
+ struct dentry *dentry,
+ struct vfsmount *rpcmount)
{
return ERR_PTR(-ENOENT);
}
diff --git a/fs/nfs/mount_clnt.c b/fs/nfs/mount_clnt.c
index 67b4b8d..9fd4157 100644
--- a/fs/nfs/mount_clnt.c
+++ b/fs/nfs/mount_clnt.c
@@ -13,7 +13,6 @@
#include <linux/in.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/sched.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/nfs_fs.h>
#include "internal.h"

@@ -162,7 +161,7 @@ int nfs_mount(struct nfs_mount_request *info)
.program = &mnt_program,
.version = info->version,
.authflavor = RPC_AUTH_UNIX,
- .rpcmount = init_rpc_pipefs,
+ .rpcmount = info->rpcmount,
};
struct rpc_clnt *mnt_clnt;
int status;
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index db6aa36..d47f6f5 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -135,7 +135,8 @@ static void * nfs_follow_mountpoint(struct dentry *dentry, struct nameidata *nd)
goto out_err;

if (fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL)
- mnt = nfs_do_refmount(nd->path.mnt, nd->path.dentry);
+ mnt = nfs_do_refmount(nd->path.mnt, nd->path.dentry,
+ server->nfs_client->cl_rpcmount);
else
mnt = nfs_do_submount(nd->path.mnt, nd->path.dentry, fh,
fattr);
diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c
index 7a61fdb..92d5d63 100644
--- a/fs/nfs/nfs4namespace.c
+++ b/fs/nfs/nfs4namespace.c
@@ -14,7 +14,6 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/sunrpc/clnt.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/vfs.h>
#include <linux/inet.h>
#include "internal.h"
@@ -99,14 +98,13 @@ static int nfs4_validate_fspath(const struct vfsmount *mnt_parent,
}

static size_t nfs_parse_server_name(char *string, size_t len,
- struct sockaddr *sa, size_t salen)
+ struct sockaddr *sa, size_t salen, struct vfsmount *rpcmount)
{
ssize_t ret;

ret = rpc_pton(string, len, sa, salen);
if (ret == 0) {
- ret = nfs_dns_resolve_name(string, len, sa, salen,
- init_rpc_pipefs);
+ ret = nfs_dns_resolve_name(string, len, sa, salen, rpcmount);
if (ret < 0)
ret = 0;
}
@@ -115,7 +113,8 @@ static size_t nfs_parse_server_name(char *string, size_t len,

static struct vfsmount *try_location(struct nfs_clone_mount *mountdata,
char *page, char *page2,
- const struct nfs4_fs_location *location)
+ const struct nfs4_fs_location *location,
+ struct vfsmount *rpcmount)
{
const size_t addr_bufsize = sizeof(struct sockaddr_storage);
struct vfsmount *mnt = ERR_PTR(-ENOENT);
@@ -143,7 +142,7 @@ static struct vfsmount *try_location(struct nfs_clone_mount *mountdata,
continue;

mountdata->addrlen = nfs_parse_server_name(buf->data, buf->len,
- mountdata->addr, addr_bufsize);
+ mountdata->addr, addr_bufsize, rpcmount);
if (mountdata->addrlen == 0)
continue;

@@ -174,7 +173,8 @@ static struct vfsmount *try_location(struct nfs_clone_mount *mountdata,
*/
static struct vfsmount *nfs_follow_referral(const struct vfsmount *mnt_parent,
const struct dentry *dentry,
- const struct nfs4_fs_locations *locations)
+ const struct nfs4_fs_locations *locations,
+ struct vfsmount *rpcmount)
{
struct vfsmount *mnt = ERR_PTR(-ENOENT);
struct nfs_clone_mount mountdata = {
@@ -213,7 +213,7 @@ static struct vfsmount *nfs_follow_referral(const struct vfsmount *mnt_parent,
location->rootpath.ncomponents == 0)
continue;

- mnt = try_location(&mountdata, page, page2, location);
+ mnt = try_location(&mountdata, page, page2, location, rpcmount);
if (!IS_ERR(mnt))
break;
}
@@ -231,7 +231,9 @@ out:
* @dentry - dentry of referral
*
*/
-struct vfsmount *nfs_do_refmount(const struct vfsmount *mnt_parent, struct dentry *dentry)
+struct vfsmount *nfs_do_refmount(const struct vfsmount *mnt_parent,
+ struct dentry *dentry,
+ struct vfsmount *rpcmount)
{
struct vfsmount *mnt = ERR_PTR(-ENOMEM);
struct dentry *parent;
@@ -264,7 +266,7 @@ struct vfsmount *nfs_do_refmount(const struct vfsmount *mnt_parent, struct dentr
fs_locations->fs_path.ncomponents <= 0)
goto out_free;

- mnt = nfs_follow_referral(mnt_parent, dentry, fs_locations);
+ mnt = nfs_follow_referral(mnt_parent, dentry, fs_locations, rpcmount);
out_free:
__free_page(page);
kfree(fs_locations);
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 4100630..32b7e35 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -35,6 +35,7 @@
#include <linux/sunrpc/metrics.h>
#include <linux/sunrpc/xprtsock.h>
#include <linux/sunrpc/xprtrdma.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_mount.h>
#include <linux/nfs4_mount.h>
@@ -106,6 +107,7 @@ enum {
Opt_lookupcache,
Opt_fscache_uniq,
Opt_local_lock,
+ Opt_rpcmount,

/* Special mount options */
Opt_userspace, Opt_deprecated, Opt_sloppy,
@@ -178,6 +180,7 @@ static const match_table_t nfs_mount_option_tokens = {
{ Opt_lookupcache, "lookupcache=%s" },
{ Opt_fscache_uniq, "fsc=%s" },
{ Opt_local_lock, "local_lock=%s" },
+ { Opt_rpcmount, "rpcmount=%s" },

{ Opt_err, NULL }
};
@@ -1484,6 +1487,13 @@ static int nfs_parse_mount_options(char *raw,
return 0;
};
break;
+ case Opt_rpcmount:
+ string = match_strdup(args);
+ if (string == NULL)
+ goto out_nomem;
+ kfree(mnt->rpcmount);
+ mnt->rpcmount = string;
+ break;

/*
* Special options
@@ -1644,11 +1654,19 @@ static int nfs_try_mount(struct nfs_parsed_mount_data *args,
request.salen = args->mount_server.addrlen;
nfs_set_port(request.sap, &args->mount_server.port, 0);

+ request.rpcmount = get_rpc_pipefs(args->rpcmount);
+ if (IS_ERR(request.rpcmount)) {
+ dfprintk(MOUNT, "NFS: unable get rpc_pipefs mount point, "
+ "error %ld\n", PTR_ERR(request.rpcmount));
+ return PTR_ERR(request.rpcmount);
+ }
+
/*
* Now ask the mount server to map our export path
* to a file handle.
*/
status = nfs_mount(&request);
+ mntput(request.rpcmount);
if (status != 0) {
dfprintk(MOUNT, "NFS: unable to mount server %s, error %d\n",
request.hostname, status);
@@ -2352,6 +2370,7 @@ out:
kfree(data->nfs_server.hostname);
kfree(data->mount_server.hostname);
kfree(data->fscache_uniq);
+ kfree(data->rpcmount);
security_free_mnt_opts(&data->lsm_opts);
out_free_fh:
nfs_free_fhandle(mntfh);
@@ -2947,6 +2966,7 @@ out:
kfree(data->nfs_server.export_path);
kfree(data->nfs_server.hostname);
kfree(data->fscache_uniq);
+ kfree(data->rpcmount);
out_free_data:
kfree(data);
dprintk("<-- nfs4_get_sb() = %d%s\n", error,
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 452d964..ee417c9 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -36,6 +36,7 @@ struct nfs_client {
struct list_head cl_share_link; /* link in global client list */
struct list_head cl_superblocks; /* List of nfs_server structs */

+ struct vfsmount *cl_rpcmount; /* rpc_pipefs mount point */
struct rpc_clnt * cl_rpcclient;
const struct nfs_rpc_ops *rpc_ops; /* NFS protocol vector */
int cl_proto; /* Network transport protocol */
--
1.7.3.4


2010-12-29 13:14:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 12/12] sunrpc: make rpc_pipefs be mountable multiple times

To support containers, allow multiple independent instances of
rpc_pipefs. Use '-o newinstance' to create new of the filesystem.
The same semantics as with devpts.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
net/sunrpc/rpc_pipe.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 79 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index fec6b2d..7b693db 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -17,6 +17,7 @@
#include <linux/fsnotify.h>
#include <linux/kernel.h>
#include <linux/nsproxy.h>
+#include <linux/parser.h>

#include <asm/ioctls.h>
#include <linux/fs.h>
@@ -39,6 +40,49 @@ static struct kmem_cache *rpc_inode_cachep __read_mostly;

#define RPC_UPCALL_TIMEOUT (30*HZ)

+struct rpc_mount_opts {
+ int newinstance;
+};
+
+enum {
+ Opt_newinstance,
+
+ Opt_err
+};
+
+static const match_table_t tokens = {
+ {Opt_newinstance, "newinstance"},
+
+ {Opt_err, NULL}
+};
+
+static int
+parse_mount_options(char *data, struct rpc_mount_opts *opts)
+{
+ char *p;
+
+ opts->newinstance = 0;
+
+ while ((p = strsep(&data, ",")) != NULL) {
+ substring_t args[MAX_OPT_ARGS];
+ int token;
+
+ if (!*p)
+ continue;
+
+ token = match_token(p, tokens, args);
+ switch (token) {
+ case Opt_newinstance:
+ opts->newinstance = 1;
+ break;
+ default:
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
static void rpc_purge_list(struct rpc_inode *rpci, struct list_head *head,
void (*destroy_msg)(struct rpc_pipe_msg *), int err)
{
@@ -1039,11 +1083,45 @@ rpc_fill_super(struct super_block *sb, void *data, int silent)
return 0;
}

+static int
+compare_rpc_mnt_sb(struct super_block *s, void *p)
+{
+ if (init_rpc_pipefs)
+ return init_rpc_pipefs->mnt_sb == s;
+ return 0;
+}
+
static struct dentry *
rpc_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
- return mount_single(fs_type, flags, data, rpc_fill_super);
+ int error;
+ struct rpc_mount_opts opts;
+ struct super_block *s;
+
+ error = parse_mount_options(data, &opts);
+ if (error)
+ return ERR_PTR(error);
+
+ if (opts.newinstance)
+ s = sget(fs_type, NULL, set_anon_super, NULL);
+ else
+ s = sget(fs_type, compare_rpc_mnt_sb, set_anon_super, NULL);
+
+ if (IS_ERR(s))
+ return ERR_CAST(s);
+
+ if (!s->s_root) {
+ s->s_flags = flags;
+ error = rpc_fill_super(s, data, flags & MS_SILENT ? 1 : 0);
+ if (error) {
+ deactivate_locked_super(s);
+ return ERR_PTR(error);
+ }
+ s->s_flags |= MS_ACTIVE;
+ }
+
+ return dget(s->s_root);
}

static struct file_system_type rpc_pipe_fs_type = {
--
1.7.3.4


2010-12-29 13:14:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 02/12] sunrpc: introduce init_rpc_pipefs

Introduce global variable init_rpc_pipefs and use it instead of
rpc_get_mount()/rpc_put_mount().

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/nfs/cache_lib.c | 6 +++---
include/linux/sunrpc/rpc_pipe_fs.h | 4 ++--
net/sunrpc/clnt.c | 10 ++++------
net/sunrpc/rpc_pipe.c | 23 ++++++-----------------
4 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/fs/nfs/cache_lib.c b/fs/nfs/cache_lib.c
index 8469031..dd7ca5f 100644
--- a/fs/nfs/cache_lib.c
+++ b/fs/nfs/cache_lib.c
@@ -117,7 +117,7 @@ int nfs_cache_register(struct cache_detail *cd)
struct vfsmount *mnt;
int ret;

- mnt = rpc_get_mount();
+ mnt = mntget(init_rpc_pipefs);
if (IS_ERR(mnt))
return PTR_ERR(mnt);
ret = vfs_path_lookup(mnt->mnt_root, mnt, "/cache", 0, &nd);
@@ -129,13 +129,13 @@ int nfs_cache_register(struct cache_detail *cd)
if (!ret)
return ret;
err:
- rpc_put_mount();
+ mntput(mnt);
return ret;
}

void nfs_cache_unregister(struct cache_detail *cd)
{
sunrpc_cache_unregister_pipefs(cd);
- rpc_put_mount();
+ mntput(init_rpc_pipefs);
}

diff --git a/include/linux/sunrpc/rpc_pipe_fs.h b/include/linux/sunrpc/rpc_pipe_fs.h
index cf14db9..b09bfa5 100644
--- a/include/linux/sunrpc/rpc_pipe_fs.h
+++ b/include/linux/sunrpc/rpc_pipe_fs.h
@@ -44,6 +44,8 @@ RPC_I(struct inode *inode)
return container_of(inode, struct rpc_inode, vfs_inode);
}

+extern struct vfsmount *init_rpc_pipefs;
+
extern int rpc_queue_upcall(struct inode *, struct rpc_pipe_msg *);

struct rpc_clnt;
@@ -60,8 +62,6 @@ extern void rpc_remove_cache_dir(struct dentry *);
extern struct dentry *rpc_mkpipe(struct dentry *, const char *, void *,
const struct rpc_pipe_ops *, int flags);
extern int rpc_unlink(struct dentry *);
-extern struct vfsmount *rpc_get_mount(void);
-extern void rpc_put_mount(void);
extern int register_rpc_pipefs(void);
extern void unregister_rpc_pipefs(void);

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 92ce94f..da2507a 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -112,9 +112,7 @@ rpc_setup_pipedir(struct rpc_clnt *clnt, char *dir_name)
if (dir_name == NULL)
return 0;

- path.mnt = rpc_get_mount();
- if (IS_ERR(path.mnt))
- return PTR_ERR(path.mnt);
+ path.mnt = mntget(init_rpc_pipefs);
error = vfs_path_lookup(path.mnt->mnt_root, path.mnt, dir_name, 0, &nd);
if (error)
goto err;
@@ -140,7 +138,7 @@ rpc_setup_pipedir(struct rpc_clnt *clnt, char *dir_name)
err_path_put:
path_put(&nd.path);
err:
- rpc_put_mount();
+ mntput(path.mnt);
return error;
}

@@ -251,7 +249,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, stru
out_no_auth:
if (!IS_ERR(clnt->cl_path.dentry)) {
rpc_remove_client_dir(clnt->cl_path.dentry);
- rpc_put_mount();
+ mntput(clnt->cl_path.mnt);
}
out_no_path:
kfree(clnt->cl_principal);
@@ -472,7 +470,7 @@ rpc_free_client(struct rpc_clnt *clnt)
clnt->cl_protname, clnt->cl_server);
if (!IS_ERR(clnt->cl_path.dentry)) {
rpc_remove_client_dir(clnt->cl_path.dentry);
- rpc_put_mount();
+ mntput(clnt->cl_path.mnt);
}
if (clnt->cl_parent != clnt) {
rpc_release_client(clnt->cl_parent);
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 7f3fbdd..b1e299b 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -28,9 +28,10 @@
#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/sunrpc/cache.h>

-static struct vfsmount *rpc_mnt __read_mostly;
+struct vfsmount *init_rpc_pipefs __read_mostly;

static struct file_system_type rpc_pipe_fs_type;
+EXPORT_SYMBOL_GPL(init_rpc_pipefs);


static struct kmem_cache *rpc_inode_cachep __read_mostly;
@@ -412,18 +413,6 @@ struct rpc_filelist {
umode_t mode;
};

-struct vfsmount *rpc_get_mount(void)
-{
- return mntget(rpc_mnt);
-}
-EXPORT_SYMBOL_GPL(rpc_get_mount);
-
-void rpc_put_mount(void)
-{
- mntput(rpc_mnt);
-}
-EXPORT_SYMBOL_GPL(rpc_put_mount);
-
static int rpc_delete_dentry(struct dentry *dentry)
{
return 1;
@@ -1060,9 +1049,9 @@ int register_rpc_pipefs(void)
if (err)
goto destroy_cache;

- rpc_mnt = kern_mount(&rpc_pipe_fs_type);
- if (IS_ERR(rpc_mnt)) {
- err = PTR_ERR(rpc_mnt);
+ init_rpc_pipefs = kern_mount(&rpc_pipe_fs_type);
+ if (IS_ERR(init_rpc_pipefs)) {
+ err = PTR_ERR(init_rpc_pipefs);
goto unregister_fs;
}

@@ -1077,7 +1066,7 @@ destroy_cache:

void unregister_rpc_pipefs(void)
{
- mntput(rpc_mnt);
+ mntput(init_rpc_pipefs);
kmem_cache_destroy(rpc_inode_cachep);
unregister_filesystem(&rpc_pipe_fs_type);
}
--
1.7.3.4


2010-12-29 13:14:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 05/12] sunrpc: get rpc_pipefs mount point for svc_serv from callers

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/lockd/svc.c | 4 +++-
fs/nfs/callback.c | 4 +++-
fs/nfsd/nfssvc.c | 6 ++++--
include/linux/sunrpc/svc.h | 8 ++++----
net/sunrpc/svc.c | 18 +++++++++---------
5 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index abfff9d..32310b1 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -31,6 +31,7 @@
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svc.h>
#include <linux/sunrpc/svcsock.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <net/ip.h>
#include <linux/lockd/lockd.h>
#include <linux/nfs.h>
@@ -269,7 +270,8 @@ int lockd_up(void)
"lockd_up: no pid, %d users??\n", nlmsvc_users);

error = -ENOMEM;
- serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, NULL);
+ serv = svc_create(&nlmsvc_program, init_rpc_pipefs, LOCKD_BUFSIZE,
+ NULL);
if (!serv) {
printk(KERN_WARNING "lockd_up: create service failed\n");
goto out;
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 93a8b3b..bef6abd 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -16,6 +16,7 @@
#include <linux/freezer.h>
#include <linux/kthread.h>
#include <linux/sunrpc/svcauth_gss.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#if defined(CONFIG_NFS_V4_1)
#include <linux/sunrpc/bc_xprt.h>
#endif
@@ -253,7 +254,8 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt)
nfs_callback_bc_serv(minorversion, xprt, cb_info);
goto out;
}
- serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, NULL);
+ serv = svc_create(&nfs4_callback_program, init_rpc_pipefs,
+ NFS4_CALLBACK_BUFSIZE, NULL);
if (!serv) {
ret = -ENOMEM;
goto out_err;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 2bae1d8..d96c32b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -13,6 +13,7 @@

#include <linux/sunrpc/stats.h>
#include <linux/sunrpc/svcsock.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/lockd/bind.h>
#include <linux/nfsacl.h>
#include <linux/seq_file.h>
@@ -331,8 +332,9 @@ int nfsd_create_serv(void)
}
nfsd_reset_versions();

- nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize,
- nfsd_last_thread, nfsd, THIS_MODULE);
+ nfsd_serv = svc_create_pooled(&nfsd_program, init_rpc_pipefs,
+ nfsd_max_blksize, nfsd_last_thread, nfsd,
+ THIS_MODULE);
if (nfsd_serv == NULL)
return -ENOMEM;

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 3b6b26c..7f09411 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -400,13 +400,13 @@ struct svc_procedure {
/*
* Function prototypes.
*/
-struct svc_serv *svc_create(struct svc_program *, unsigned int,
- void (*shutdown)(struct svc_serv *));
+struct svc_serv *svc_create(struct svc_program *, struct vfsmount *,
+ unsigned int, void (*shutdown)(struct svc_serv *));
struct svc_rqst *svc_prepare_thread(struct svc_serv *serv,
struct svc_pool *pool);
void svc_exit_thread(struct svc_rqst *);
-struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int,
- void (*shutdown)(struct svc_serv *),
+struct svc_serv * svc_create_pooled(struct svc_program *, struct vfsmount *,
+ unsigned int, void (*shutdown)(struct svc_serv *),
svc_thread_fn, struct module *);
int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
int svc_pool_stats_open(struct svc_serv *serv, struct file *file);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index d2f7c03..0bd6088 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -28,7 +28,6 @@
#include <linux/sunrpc/svcsock.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/bc_xprt.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>

#define RPCDBG_FACILITY RPCDBG_SVCDSP

@@ -361,7 +360,8 @@ svc_pool_for_cpu(struct svc_serv *serv, int cpu)
* Create an RPC service
*/
static struct svc_serv *
-__svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
+__svc_create(struct svc_program *prog, struct vfsmount *rpcmount,
+ unsigned int bufsize, int npools,
void (*shutdown)(struct svc_serv *serv))
{
struct svc_serv *serv;
@@ -373,7 +373,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
return NULL;
serv->sv_name = prog->pg_name;
serv->sv_program = prog;
- serv->sv_rpcmount = mntget(init_rpc_pipefs);
+ serv->sv_rpcmount = mntget(rpcmount);
serv->sv_nrthreads = 1;
serv->sv_stats = prog->pg_stats;
if (bufsize > RPCSVC_MAXPAYLOAD)
@@ -429,22 +429,22 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
}

struct svc_serv *
-svc_create(struct svc_program *prog, unsigned int bufsize,
- void (*shutdown)(struct svc_serv *serv))
+svc_create(struct svc_program *prog, struct vfsmount *rpcmount,
+ unsigned int bufsize, void (*shutdown)(struct svc_serv *serv))
{
- return __svc_create(prog, bufsize, /*npools*/1, shutdown);
+ return __svc_create(prog, rpcmount, bufsize, /*npools*/1, shutdown);
}
EXPORT_SYMBOL_GPL(svc_create);

struct svc_serv *
-svc_create_pooled(struct svc_program *prog, unsigned int bufsize,
- void (*shutdown)(struct svc_serv *serv),
+svc_create_pooled(struct svc_program *prog, struct vfsmount *rpcmount,
+ unsigned int bufsize, void (*shutdown)(struct svc_serv *serv),
svc_thread_fn func, struct module *mod)
{
struct svc_serv *serv;
unsigned int npools = svc_pool_map_get();

- serv = __svc_create(prog, bufsize, npools, shutdown);
+ serv = __svc_create(prog, rpcmount, bufsize, npools, shutdown);

if (serv != NULL) {
serv->sv_function = func;
--
1.7.3.4


2010-12-30 11:05:24

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Thu, Dec 30, 2010 at 4:44 AM, Kirill A. Shutemov <[email protected]> wrote:
> On Thu, Dec 30, 2010 at 04:05:07AM -0600, Rob Landley wrote:
>> On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
>> >>> If no rpcmount mountoption, no rpc_pipefs was found at
>> >>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
>> >>> init_rpc_pipefs.
>> >>
>> >> It's the "we are in init's mount namespace" that I was wondering about.
>> >>
>> >> So if I naievely chroot, nfs mount stops working the way it did before I
>> >> chrooted unless I do an extra setup step?
>> >
>> > No. It will work as before since you are still in init's mount namespace.
>> > Creating new mount namespace changes rules.
>>
>> Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs. ?Got it.
>>
>> I'm kind of surprised that the kernel cares about a specific path under
>> /var/lib. ?(Seems like policy in the kernel somehow.)
>
> Yep. It's bad, but there is way to overwrite the default.
>
> Other way is to leave 'rpcmount' mountoption without default.
> get_rpc_pipefs(NULL) in init's mount namespace will always return
> init_rpc_pipefs, without filesystem lookup.
> get_rpc_pipefs(NULL) in non-init's mount namespace will always return
> error.
>
> So you will have to specify 'rpcmount' mountoption for every nfs mount in
> container. Hmm, I guess, it may confuse user.
>
> Or we can try to move the default to userspace. /sbin/mount.nfs?

/proc/sys/kernel/hotplug exists to tell the kernel where to find the hotplug
binary. Once upon a time /sys/hotplug was the default value, and that was
there to overwrite it. (They changed the default to blank (disabled) not due
to policy reasons, but due to adding the netlink hotplug notification
mechanism and making that the default.)

I bring that up to point out that the general consensus about policy in the
kernel seems to be "when you really really can't avoid having any, make a
sane default the user can override".

(Of course adding another entry to the crawling horror of /proc may not
be an improvement. But individual overrides at the mount -o level seem
like a non-optimal granularity for this...)

>> Can't it just
>> check the current process's mount list to see if an instance of
>> rpc_pipefs is mounted in the current namespace the way lxc looks for
>> cgroups? ?Or are there potential performance/scalability issues with that?
>
> What should we do if we have several rpc_pipefs mounts in the namespace?

You mean more than one inside a given process's view of the filesystem, taking
into account chroot like /proc/mounts does?

Before this patch series, there was one instance systemwide. The patch changed
that to look a fixed location in the filesystem relative to the
current chroot. Either
way, there was one instance available to a given process doing an nfs mount.

What's the use case for having more than one visible to a given process?
(NUMA scalability? Some sort of multipath/VPN routing context?)

Rob

2010-12-29 13:14:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 01/12] sunrpc: mount rpc_pipefs on initialization

Mount rpc_pipefs on register_rpc_pipefs() and replace
rpc_get_mount()/rpc_put_mount() implementation with mntget()/mntput().

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
net/sunrpc/rpc_pipe.c | 27 ++++++++++++++++-----------
1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 10a17a3..7f3fbdd 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -29,7 +29,6 @@
#include <linux/sunrpc/cache.h>

static struct vfsmount *rpc_mnt __read_mostly;
-static int rpc_mount_count;

static struct file_system_type rpc_pipe_fs_type;

@@ -415,18 +414,13 @@ struct rpc_filelist {

struct vfsmount *rpc_get_mount(void)
{
- int err;
-
- err = simple_pin_fs(&rpc_pipe_fs_type, &rpc_mnt, &rpc_mount_count);
- if (err != 0)
- return ERR_PTR(err);
- return rpc_mnt;
+ return mntget(rpc_mnt);
}
EXPORT_SYMBOL_GPL(rpc_get_mount);

void rpc_put_mount(void)
{
- simple_release_fs(&rpc_mnt, &rpc_mount_count);
+ mntput(rpc_mnt);
}
EXPORT_SYMBOL_GPL(rpc_put_mount);

@@ -1063,16 +1057,27 @@ int register_rpc_pipefs(void)
if (!rpc_inode_cachep)
return -ENOMEM;
err = register_filesystem(&rpc_pipe_fs_type);
- if (err) {
- kmem_cache_destroy(rpc_inode_cachep);
- return err;
+ if (err)
+ goto destroy_cache;
+
+ rpc_mnt = kern_mount(&rpc_pipe_fs_type);
+ if (IS_ERR(rpc_mnt)) {
+ err = PTR_ERR(rpc_mnt);
+ goto unregister_fs;
}

return 0;
+
+unregister_fs:
+ unregister_filesystem(&rpc_pipe_fs_type);
+destroy_cache:
+ kmem_cache_destroy(rpc_inode_cachep);
+ return err;
}

void unregister_rpc_pipefs(void)
{
+ mntput(rpc_mnt);
kmem_cache_destroy(rpc_inode_cachep);
unregister_filesystem(&rpc_pipe_fs_type);
}
--
1.7.3.4


2010-12-30 02:13:51

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Wed, Dec 29, 2010 at 7:14 AM, Kirill A. Shutemov <[email protected]> wrote:
>
> Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
> Only for client for now.

What would a test case for this look like? (Is there some way to tell
an nfs mount to use a specific instance of rpc_pipefs or something?)

Rob

2010-12-29 13:14:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 09/12] nfs: per-rpc_pipefs dns cache

Lazy initialization of dns cache: on first call nfs_dns_resolve_name().
Every rpc_pipefs has separate dns cache now.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/nfs/cache_lib.c | 17 ++-----
fs/nfs/cache_lib.h | 3 +-
fs/nfs/dns_resolve.c | 128 ++++++++++++++++++++++++++++++++++++++----------
fs/nfs/dns_resolve.h | 8 +---
fs/nfs/inode.c | 8 +---
fs/nfs/nfs4namespace.c | 4 +-
6 files changed, 113 insertions(+), 55 deletions(-)

diff --git a/fs/nfs/cache_lib.c b/fs/nfs/cache_lib.c
index 0944d4e..9b99d9e 100644
--- a/fs/nfs/cache_lib.c
+++ b/fs/nfs/cache_lib.c
@@ -12,7 +12,6 @@
#include <linux/namei.h>
#include <linux/slab.h>
#include <linux/sunrpc/cache.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>

#include "cache_lib.h"

@@ -111,25 +110,17 @@ int nfs_cache_wait_for_upcall(struct nfs_cache_defer_req *dreq)
return 0;
}

-int nfs_cache_register(struct cache_detail *cd)
+int nfs_cache_register(struct cache_detail *cd, struct vfsmount *rpcmount)
{
struct nameidata nd;
- struct vfsmount *mnt;
int ret;

- mnt = mntget(init_rpc_pipefs);
- if (IS_ERR(mnt))
- return PTR_ERR(mnt);
- ret = vfs_path_lookup(mnt->mnt_root, mnt, "/cache", 0, &nd);
+ ret = vfs_path_lookup(rpcmount->mnt_root, rpcmount, "/cache", 0, &nd);
if (ret)
- goto err;
- ret = sunrpc_cache_register_pipefs(mnt, nd.path.dentry,
+ return ret;
+ ret = sunrpc_cache_register_pipefs(rpcmount, nd.path.dentry,
cd->name, 0600, cd);
path_put(&nd.path);
- if (!ret)
- return ret;
-err:
- mntput(mnt);
return ret;
}

diff --git a/fs/nfs/cache_lib.h b/fs/nfs/cache_lib.h
index 76f856e..1d4a0a5 100644
--- a/fs/nfs/cache_lib.h
+++ b/fs/nfs/cache_lib.h
@@ -23,5 +23,6 @@ extern struct nfs_cache_defer_req *nfs_cache_defer_req_alloc(void);
extern void nfs_cache_defer_req_put(struct nfs_cache_defer_req *dreq);
extern int nfs_cache_wait_for_upcall(struct nfs_cache_defer_req *dreq);

-extern int nfs_cache_register(struct cache_detail *cd);
+extern int nfs_cache_register(struct cache_detail *cd,
+ struct vfsmount *rpcmount);
extern void nfs_cache_unregister(struct cache_detail *cd);
diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c
index a6e711a..4b35323 100644
--- a/fs/nfs/dns_resolve.c
+++ b/fs/nfs/dns_resolve.c
@@ -12,7 +12,7 @@
#include <linux/dns_resolver.h>

ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
- struct sockaddr *sa, size_t salen)
+ struct sockaddr *sa, size_t salen, struct vfsmount *rpcmount)
{
ssize_t ret;
char *ip_addr = NULL;
@@ -37,6 +37,7 @@ ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
#include <linux/socket.h>
#include <linux/seq_file.h>
#include <linux/inet.h>
+#include <linux/mount.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/cache.h>
#include <linux/sunrpc/svcauth.h>
@@ -47,7 +48,13 @@ ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
#define NFS_DNS_HASHBITS 4
#define NFS_DNS_HASHTBL_SIZE (1 << NFS_DNS_HASHBITS)

-static struct cache_head *nfs_dns_table[NFS_DNS_HASHTBL_SIZE];
+static DEFINE_SPINLOCK(nfs_dns_resolve_lock);
+static LIST_HEAD(nfs_dns_resolve_list);
+
+struct nfs_dns_resolve_list {
+ struct list_head list;
+ struct cache_detail *cd;
+};

struct nfs_dns_ent {
struct cache_head h;
@@ -259,21 +266,6 @@ out:
return ret;
}

-static struct cache_detail nfs_dns_resolve = {
- .owner = THIS_MODULE,
- .hash_size = NFS_DNS_HASHTBL_SIZE,
- .hash_table = nfs_dns_table,
- .name = "dns_resolve",
- .cache_put = nfs_dns_ent_put,
- .cache_upcall = nfs_dns_upcall,
- .cache_parse = nfs_dns_parse,
- .cache_show = nfs_dns_show,
- .match = nfs_dns_match,
- .init = nfs_dns_ent_init,
- .update = nfs_dns_ent_update,
- .alloc = nfs_dns_ent_alloc,
-};
-
static int do_cache_lookup(struct cache_detail *cd,
struct nfs_dns_ent *key,
struct nfs_dns_ent **item,
@@ -336,37 +328,121 @@ out:
return ret;
}

+static struct cache_detail *nfs_alloc_dns_resolve(void)
+{
+ struct cache_detail *dns_resolve;
+ struct cache_head **hash_table;
+
+ dns_resolve = kmalloc(sizeof(*dns_resolve), GFP_KERNEL);
+ if (!dns_resolve)
+ return NULL;
+
+ hash_table = kmalloc(sizeof(*hash_table) * NFS_DNS_HASHTBL_SIZE,
+ GFP_KERNEL);
+ if (!hash_table) {
+ kfree(dns_resolve);
+ return NULL;
+ }
+
+ dns_resolve->owner = THIS_MODULE;
+ dns_resolve->hash_size = NFS_DNS_HASHTBL_SIZE;
+ dns_resolve->hash_table = hash_table;
+ dns_resolve->name = "dns_resolve";
+ dns_resolve->cache_put = nfs_dns_ent_put;
+ dns_resolve->cache_upcall = nfs_dns_upcall;
+ dns_resolve->cache_parse = nfs_dns_parse;
+ dns_resolve->cache_show = nfs_dns_show;
+ dns_resolve->match = nfs_dns_match;
+ dns_resolve->init = nfs_dns_ent_init;
+ dns_resolve->update = nfs_dns_ent_update;
+ dns_resolve->alloc = nfs_dns_ent_alloc;
+
+ return dns_resolve;
+}
+
+static void nfs_free_dns_resolve(struct cache_detail *dns_resolve)
+{
+ kfree(dns_resolve->hash_table);
+ kfree(dns_resolve);
+}
+
+static struct cache_detail *nfs_get_dns_resolve(struct vfsmount *rpcmount)
+{
+ struct nfs_dns_resolve_list *dns_resolve;
+ int error = 0;
+
+ spin_lock(&nfs_dns_resolve_lock);
+ list_for_each_entry(dns_resolve, &nfs_dns_resolve_list, list) {
+ if (dns_resolve->cd->u.pipefs.mnt->mnt_sb != rpcmount->mnt_sb)
+ continue;
+
+ spin_unlock(&nfs_dns_resolve_lock);
+ return dns_resolve->cd;
+ }
+
+ dns_resolve = kmalloc(sizeof(*dns_resolve), GFP_KERNEL);
+ if (dns_resolve)
+ dns_resolve->cd = nfs_alloc_dns_resolve();
+ if (!dns_resolve || !dns_resolve->cd) {
+ error = -ENOMEM;
+ goto err;
+ }
+
+ error = nfs_cache_register(dns_resolve->cd, rpcmount);
+ if (error)
+ goto err;
+
+ INIT_LIST_HEAD(&dns_resolve->list);
+ list_add(&dns_resolve->list, &nfs_dns_resolve_list);
+ spin_unlock(&nfs_dns_resolve_lock);
+
+ return dns_resolve->cd;
+err:
+ spin_unlock(&nfs_dns_resolve_lock);
+ if (dns_resolve)
+ kfree(dns_resolve->cd);
+ kfree(dns_resolve);
+ return dns_resolve->cd;
+}
+
ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
- struct sockaddr *sa, size_t salen)
+ struct sockaddr *sa, size_t salen, struct vfsmount *rpcmount)
{
struct nfs_dns_ent key = {
.hostname = name,
.namelen = namelen,
};
+ struct cache_detail *dns_resolve;
struct nfs_dns_ent *item = NULL;
ssize_t ret;

- ret = do_cache_lookup_wait(&nfs_dns_resolve, &key, &item);
+ dns_resolve = nfs_get_dns_resolve(rpcmount);
+ ret = do_cache_lookup_wait(dns_resolve, &key, &item);
if (ret == 0) {
if (salen >= item->addrlen) {
memcpy(sa, &item->addr, item->addrlen);
ret = item->addrlen;
} else
ret = -EOVERFLOW;
- cache_put(&item->h, &nfs_dns_resolve);
+ cache_put(&item->h, dns_resolve);
} else if (ret == -ENOENT)
ret = -ESRCH;
return ret;
}

-int nfs_dns_resolver_init(void)
-{
- return nfs_cache_register(&nfs_dns_resolve);
-}
-
void nfs_dns_resolver_destroy(void)
{
- nfs_cache_unregister(&nfs_dns_resolve);
+ struct nfs_dns_resolve_list *dns_resolve, *tmp;
+
+ spin_lock(&nfs_dns_resolve_lock);
+ list_for_each_entry_safe(dns_resolve, tmp, &nfs_dns_resolve_list,
+ list) {
+ nfs_cache_unregister(dns_resolve->cd);
+ nfs_free_dns_resolve(dns_resolve->cd);
+ list_del(&dns_resolve->list);
+ kfree(dns_resolve);
+ }
+ spin_unlock(&nfs_dns_resolve_lock);
}

#endif
diff --git a/fs/nfs/dns_resolve.h b/fs/nfs/dns_resolve.h
index 199bb55..a9ae700 100644
--- a/fs/nfs/dns_resolve.h
+++ b/fs/nfs/dns_resolve.h
@@ -8,19 +8,13 @@


#ifdef CONFIG_NFS_USE_KERNEL_DNS
-static inline int nfs_dns_resolver_init(void)
-{
- return 0;
-}
-
static inline void nfs_dns_resolver_destroy(void)
{}
#else
-extern int nfs_dns_resolver_init(void);
extern void nfs_dns_resolver_destroy(void);
#endif

extern ssize_t nfs_dns_resolve_name(char *name, size_t namelen,
- struct sockaddr *sa, size_t salen);
+ struct sockaddr *sa, size_t salen, struct vfsmount *rpcmount);

#endif
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e67e31c..9fed17c 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1528,10 +1528,6 @@ static int __init init_nfs_fs(void)

err = nfs_idmap_init();
if (err < 0)
- goto out9;
-
- err = nfs_dns_resolver_init();
- if (err < 0)
goto out8;

err = nfs_fscache_register();
@@ -1592,10 +1588,8 @@ out5:
out6:
nfs_fscache_unregister();
out7:
- nfs_dns_resolver_destroy();
-out8:
nfs_idmap_quit();
-out9:
+out8:
return err;
}

diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c
index 3c2a172..7a61fdb 100644
--- a/fs/nfs/nfs4namespace.c
+++ b/fs/nfs/nfs4namespace.c
@@ -14,6 +14,7 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/sunrpc/clnt.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/vfs.h>
#include <linux/inet.h>
#include "internal.h"
@@ -104,7 +105,8 @@ static size_t nfs_parse_server_name(char *string, size_t len,

ret = rpc_pton(string, len, sa, salen);
if (ret == 0) {
- ret = nfs_dns_resolve_name(string, len, sa, salen);
+ ret = nfs_dns_resolve_name(string, len, sa, salen,
+ init_rpc_pipefs);
if (ret < 0)
ret = 0;
}
--
1.7.3.4


2010-12-30 11:45:15

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Thu, Dec 30, 2010 at 05:05:22AM -0600, Rob Landley wrote:
> On Thu, Dec 30, 2010 at 4:44 AM, Kirill A. Shutemov <[email protected]> wrote:
> > On Thu, Dec 30, 2010 at 04:05:07AM -0600, Rob Landley wrote:
> >> On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
> >> >>> If no rpcmount mountoption, no rpc_pipefs was found at
> >> >>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
> >> >>> init_rpc_pipefs.
> >> >>
> >> >> It's the "we are in init's mount namespace" that I was wondering about.
> >> >>
> >> >> So if I naievely chroot, nfs mount stops working the way it did before I
> >> >> chrooted unless I do an extra setup step?
> >> >
> >> > No. It will work as before since you are still in init's mount namespace.
> >> > Creating new mount namespace changes rules.
> >>
> >> Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs. ?Got it.
> >>
> >> I'm kind of surprised that the kernel cares about a specific path under
> >> /var/lib. ?(Seems like policy in the kernel somehow.)
> >
> > Yep. It's bad, but there is way to overwrite the default.
> >
> > Other way is to leave 'rpcmount' mountoption without default.
> > get_rpc_pipefs(NULL) in init's mount namespace will always return
> > init_rpc_pipefs, without filesystem lookup.
> > get_rpc_pipefs(NULL) in non-init's mount namespace will always return
> > error.
> >
> > So you will have to specify 'rpcmount' mountoption for every nfs mount in
> > container. Hmm, I guess, it may confuse user.
> >
> > Or we can try to move the default to userspace. /sbin/mount.nfs?
>
> /proc/sys/kernel/hotplug exists to tell the kernel where to find the hotplug
> binary. Once upon a time /sys/hotplug was the default value, and that was
> there to overwrite it. (They changed the default to blank (disabled) not due
> to policy reasons, but due to adding the netlink hotplug notification
> mechanism and making that the default.)
>
> I bring that up to point out that the general consensus about policy in the
> kernel seems to be "when you really really can't avoid having any, make a
> sane default the user can override".
>
> (Of course adding another entry to the crawling horror of /proc may not
> be an improvement. But individual overrides at the mount -o level seem
> like a non-optimal granularity for this...)

Do you propose to implement default as sysctl parameter?

> >> Can't it just
> >> check the current process's mount list to see if an instance of
> >> rpc_pipefs is mounted in the current namespace the way lxc looks for
> >> cgroups? ?Or are there potential performance/scalability issues with that?
> >
> > What should we do if we have several rpc_pipefs mounts in the namespace?
>
> You mean more than one inside a given process's view of the filesystem, taking
> into account chroot like /proc/mounts does?
>
> Before this patch series, there was one instance systemwide. The patch changed
> that to look a fixed location in the filesystem relative to the
> current chroot. Either
> way, there was one instance available to a given process doing an nfs mount.
>
> What's the use case for having more than one visible to a given process?
> (NUMA scalability? Some sort of multipath/VPN routing context?)

It's no so obvious for me why we should restrict it. ;)

Currently, there is no association between rpc_pipefs and mount namespace,
so I don't see simple way to restrict number of rpc_pipefs per mount
namespace. Associating mount namespace with rpc_pipefs is not a good idea,
I think.

--
Kirill A. Shutemov

2010-12-31 13:03:30

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Thu, Dec 30, 2010 at 06:52:43AM -0600, Rob Landley wrote:
> On 12/30/2010 05:45 AM, Kirill A. Shutemov wrote:
> > Currently, there is no association between rpc_pipefs and mount namespace,
>
> There is in that the root context doesn't need to have this mounted, and
> new namespaces do. So there's an existing association between a LACK of
> a namespace and a different default behavior.
>
> My understanding (correct me if I'm wrong) is that the historical
> behavior is that there's only one, and it doesn't actually live anywhere
> in the filesystem tree. You're adding a special location. I'm
> wondering if there's any way for that location not to be special.

/var/lib/net/rpc_pipefs is default path where userspace part of NFS stack
(gssd, idmapd) want to see rpc_pipefs

> > so I don't see simple way to restrict number of rpc_pipefs per mount
> > namespace. Associating mount namespace with rpc_pipefs is not a good idea,
> > I think.
>
> I'm talking about associating a default rpc_pipefs instance with a
> namespace, which it seems to me you're already doing by emulating the
> legacy behavior. Before you CLONE_NEWNS you get a magic default mount
> that doesn't exist in the tree. After you CLONE_NEWNS you get something
> like -EINVAL unless you supply your own default.

Root namespace is special. In case of nfsroot you need rpc_pipefs before
root available.

> (I'm actually not sure
> why new namespaces don't fall back to the magic global one...)

It breaks isolation. Container should not use host's rpc_pipefs without
host's permission.

> I'm suggesting that if the user doesn't specify -o rpcmount then the
> default could be the first rpc_pipefs mount visible to the current
> process context, rather than a specific path. Logic to do that exists
> in the proc/self/mounts code (which I'm reading through now...).

static int check_rpc_pipefs(struct vfsmount *mnt, void *arg)
{
struct vfsmount **rpcmount = arg;
struct path path = {
.mnt = mnt,
.dentry = mnt->mnt_root,
};

if (!mnt->mnt_sb)
return 0;
if (mnt->mnt_sb->s_magic != RPCAUTH_GSSMAGIC)
return 0;

if (!path_is_under(&path, &current->fs->root))
return 0;

*rpcmount = mntget(mnt);
return 1;
}

struct vfsmount *get_rpc_pipefs(const char *p)
{
int error;
struct vfsmount *rpcmount = ERR_PTR(-EINVAL);
struct path path;

if (!p) {
iterate_mounts(check_rpc_pipefs, &rpcmount,
current->nsproxy->mnt_ns->root);

if (IS_ERR(rpcmount) && (current->nsproxy->mnt_ns ==
init_task.nsproxy->mnt_ns))
return mntget(init_rpc_pipefs);

return rpcmount;
}

error = kern_path(p, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path);
if (error)
return ERR_PTR(error);

check_rpc_pipefs(path.mnt, &rpcmount);
path_put(&path);

return rpcmount;
}
EXPORT_SYMBOL_GPL(get_rpc_pipefs);

Something like this? Patch to replace patch #10 attached.

--
Kirill A. Shutemov


Attachments:
(No filename) (3.15 kB)
sunrpc-introduce-get_rpc_pipefs.patch (2.41 kB)
Download all attachments

2010-12-29 13:14:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 06/12] lockd: get rpc_pipefs mount point from callers

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/lockd/clntlock.c | 8 +++++---
fs/lockd/host.c | 14 +++++++++++---
fs/lockd/mon.c | 15 ++++++++-------
fs/lockd/svc.c | 6 ++----
fs/nfs/client.c | 1 +
fs/nfsd/nfssvc.c | 2 +-
include/linux/lockd/bind.h | 3 ++-
include/linux/lockd/lockd.h | 4 +++-
8 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
index 25509eb..1179c18 100644
--- a/fs/lockd/clntlock.c
+++ b/fs/lockd/clntlock.c
@@ -56,13 +56,14 @@ struct nlm_host *nlmclnt_init(const struct nlmclnt_initdata *nlm_init)
u32 nlm_version = (nlm_init->nfs_version == 2) ? 1 : 4;
int status;

- status = lockd_up();
+ status = lockd_up(nlm_init->rpcmount);
if (status < 0)
return ERR_PTR(status);

host = nlmclnt_lookup_host(nlm_init->address, nlm_init->addrlen,
nlm_init->protocol, nlm_version,
- nlm_init->hostname, nlm_init->noresvport);
+ nlm_init->hostname, nlm_init->noresvport,
+ nlm_init->rpcmount);
if (host == NULL) {
lockd_down();
return ERR_PTR(-ENOLCK);
@@ -223,7 +224,8 @@ reclaimer(void *ptr)
allow_signal(SIGKILL);

down_write(&host->h_rwsem);
- lockd_up(); /* note: this cannot fail as lockd is already running */
+ /* note: this cannot fail as lockd is already running */
+ lockd_up(host->h_rpcmount);

dprintk("lockd: reclaiming locks for host %s\n", host->h_name);

diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index b033a2d..757d1d3 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -14,9 +14,9 @@
#include <linux/in6.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svc.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/lockd/lockd.h>
#include <linux/mutex.h>
+#include <linux/mount.h>

#include <net/ipv6.h>

@@ -44,6 +44,7 @@ struct nlm_lookup_host_info {
const struct sockaddr *src_sap; /* our address (optional) */
const size_t src_len; /* it's length */
const int noresvport; /* use non-priv port */
+ struct vfsmount *rpcmount; /* rpc_pipefs mount point */
};

/*
@@ -128,6 +129,8 @@ static struct nlm_host *nlm_lookup_host(struct nlm_lookup_host_info *ni)
if (ni->server && ni->src_len != 0 &&
!rpc_cmp_addr(nlm_srcaddr(host), ni->src_sap))
continue;
+ if (host->h_rpcmount->mnt_sb != ni->rpcmount->mnt_sb)
+ continue;

/* Move to head of hash chain. */
hlist_del(&host->h_hash);
@@ -171,6 +174,7 @@ static struct nlm_host *nlm_lookup_host(struct nlm_lookup_host_info *ni)
host->h_srcaddrlen = ni->src_len;
host->h_version = ni->version;
host->h_proto = ni->protocol;
+ host->h_rpcmount = mntget(ni->rpcmount);
host->h_rpcclnt = NULL;
mutex_init(&host->h_mutex);
host->h_nextrebind = jiffies + NLM_HOST_REBIND;
@@ -212,6 +216,7 @@ nlm_destroy_host(struct nlm_host *host)

nsm_unmonitor(host);
nsm_release(host->h_nsmhandle);
+ mntput(host->h_rpcmount);

clnt = host->h_rpcclnt;
if (clnt != NULL)
@@ -238,7 +243,8 @@ struct nlm_host *nlmclnt_lookup_host(const struct sockaddr *sap,
const unsigned short protocol,
const u32 version,
const char *hostname,
- int noresvport)
+ int noresvport,
+ struct vfsmount *rpcmount)
{
struct nlm_lookup_host_info ni = {
.server = 0,
@@ -249,6 +255,7 @@ struct nlm_host *nlmclnt_lookup_host(const struct sockaddr *sap,
.hostname = hostname,
.hostname_len = strlen(hostname),
.noresvport = noresvport,
+ .rpcmount = rpcmount,
};

dprintk("lockd: %s(host='%s', vers=%u, proto=%s)\n", __func__,
@@ -295,6 +302,7 @@ struct nlm_host *nlmsvc_lookup_host(const struct svc_rqst *rqstp,
.hostname = hostname,
.hostname_len = hostname_len,
.src_len = rqstp->rq_addrlen,
+ .rpcmount = rqstp->rq_server->sv_rpcmount,
};

dprintk("lockd: %s(host='%*s', vers=%u, proto=%s)\n", __func__,
@@ -361,7 +369,7 @@ nlm_bind_host(struct nlm_host *host)
.authflavor = RPC_AUTH_UNIX,
.flags = (RPC_CLNT_CREATE_NOPING |
RPC_CLNT_CREATE_AUTOBIND),
- .rpcmount = init_rpc_pipefs,
+ .rpcmount = host->h_rpcmount,
};

/*
diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index 37e5328..526d486 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -15,7 +15,6 @@
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/xprtsock.h>
#include <linux/sunrpc/svc.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/lockd/lockd.h>

#include <asm/unaligned.h>
@@ -63,7 +62,7 @@ static inline struct sockaddr *nsm_addr(const struct nsm_handle *nsm)
return (struct sockaddr *)&nsm->sm_addr;
}

-static struct rpc_clnt *nsm_create(void)
+static struct rpc_clnt *nsm_create(struct vfsmount *rpcmount)
{
struct sockaddr_in sin = {
.sin_family = AF_INET,
@@ -79,13 +78,14 @@ static struct rpc_clnt *nsm_create(void)
.version = NSM_VERSION,
.authflavor = RPC_AUTH_NULL,
.flags = RPC_CLNT_CREATE_NOPING,
- .rpcmount = init_rpc_pipefs,
+ .rpcmount = rpcmount,
};

return rpc_create(&args);
}

-static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res)
+static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res,
+ struct vfsmount *rpcmount)
{
struct rpc_clnt *clnt;
int status;
@@ -101,7 +101,7 @@ static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res)
.rpc_resp = res,
};

- clnt = nsm_create();
+ clnt = nsm_create(rpcmount);
if (IS_ERR(clnt)) {
status = PTR_ERR(clnt);
dprintk("lockd: failed to create NSM upcall transport, "
@@ -151,7 +151,7 @@ int nsm_monitor(const struct nlm_host *host)
*/
nsm->sm_mon_name = nsm_use_hostnames ? nsm->sm_name : nsm->sm_addrbuf;

- status = nsm_mon_unmon(nsm, NSMPROC_MON, &res);
+ status = nsm_mon_unmon(nsm, NSMPROC_MON, &res, host->h_rpcmount);
if (unlikely(res.status != 0))
status = -EIO;
if (unlikely(status < 0)) {
@@ -185,7 +185,8 @@ void nsm_unmonitor(const struct nlm_host *host)
&& nsm->sm_monitored && !nsm->sm_sticky) {
dprintk("lockd: nsm_unmonitor(%s)\n", nsm->sm_name);

- status = nsm_mon_unmon(nsm, NSMPROC_UNMON, &res);
+ status = nsm_mon_unmon(nsm, NSMPROC_UNMON, &res,
+ host->h_rpcmount);
if (res.status != 0)
status = -EIO;
if (status < 0)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 32310b1..7387b04 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -31,7 +31,6 @@
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svc.h>
#include <linux/sunrpc/svcsock.h>
-#include <linux/sunrpc/rpc_pipe_fs.h>
#include <net/ip.h>
#include <linux/lockd/lockd.h>
#include <linux/nfs.h>
@@ -249,7 +248,7 @@ out_err:
/*
* Bring up the lockd process if it's not already up.
*/
-int lockd_up(void)
+int lockd_up(struct vfsmount *rpcmount)
{
struct svc_serv *serv;
int error = 0;
@@ -270,8 +269,7 @@ int lockd_up(void)
"lockd_up: no pid, %d users??\n", nlmsvc_users);

error = -ENOMEM;
- serv = svc_create(&nlmsvc_program, init_rpc_pipefs, LOCKD_BUFSIZE,
- NULL);
+ serv = svc_create(&nlmsvc_program, rpcmount, LOCKD_BUFSIZE, NULL);
if (!serv) {
printk(KERN_WARNING "lockd_up: create service failed\n");
goto out;
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index e041f39..fbc013d 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -661,6 +661,7 @@ static int nfs_start_lockd(struct nfs_server *server)
.nfs_version = clp->rpc_ops->version,
.noresvport = server->flags & NFS_MOUNT_NORESVPORT ?
1 : 0,
+ .rpcmount = init_rpc_pipefs,
};

if (nlm_init.nfs_version > 3)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d96c32b..17d78d3 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -220,7 +220,7 @@ static int nfsd_startup(unsigned short port, int nrservs)
ret = nfsd_init_socks(port);
if (ret)
goto out_racache;
- ret = lockd_up();
+ ret = lockd_up(init_rpc_pipefs);
if (ret)
goto out_racache;
ret = nfs4_state_start();
diff --git a/include/linux/lockd/bind.h b/include/linux/lockd/bind.h
index fbc48f8..97cd4bf 100644
--- a/include/linux/lockd/bind.h
+++ b/include/linux/lockd/bind.h
@@ -42,6 +42,7 @@ struct nlmclnt_initdata {
unsigned short protocol;
u32 nfs_version;
int noresvport;
+ struct vfsmount *rpcmount;
};

/*
@@ -53,7 +54,7 @@ extern void nlmclnt_done(struct nlm_host *host);

extern int nlmclnt_proc(struct nlm_host *host, int cmd,
struct file_lock *fl);
-extern int lockd_up(void);
+extern int lockd_up(struct vfsmount *rpcmount);
extern void lockd_down(void);

#endif /* LINUX_LOCKD_BIND_H */
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
index 2dee05e..e30b07d 100644
--- a/include/linux/lockd/lockd.h
+++ b/include/linux/lockd/lockd.h
@@ -44,6 +44,7 @@ struct nlm_host {
size_t h_addrlen;
struct sockaddr_storage h_srcaddr; /* our address (optional) */
size_t h_srcaddrlen;
+ struct vfsmount *h_rpcmount; /* rpc_pipefs mount point */
struct rpc_clnt *h_rpcclnt; /* RPC client to talk to peer */
char *h_name; /* remote hostname */
u32 h_version; /* interface version */
@@ -222,7 +223,8 @@ struct nlm_host *nlmclnt_lookup_host(const struct sockaddr *sap,
const unsigned short protocol,
const u32 version,
const char *hostname,
- int noresvport);
+ int noresvport,
+ struct vfsmount *rpcmount);
struct nlm_host *nlmsvc_lookup_host(const struct svc_rqst *rqstp,
const char *hostname,
const size_t hostname_len);
--
1.7.3.4


2010-12-30 09:24:22

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On 12/30/2010 02:51 AM, Kirill A. Shutemov wrote:
> On Wed, Dec 29, 2010 at 08:13:50PM -0600, Rob Landley wrote:
>> On Wed, Dec 29, 2010 at 7:14 AM, Kirill A. Shutemov<[email protected]> wrote:
>>>
>>> Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
>>> Only for client for now.
>>
>> What would a test case for this look like? (Is there some way to tell
>> an nfs mount to use a specific instance of rpc_pipefs or something?)
>
> You can create a new instance of rpc_pipefs using 'newinstance'
> mountoption.
>
> Then you can specify which rpc_pipefs to use with 'rpcmount' mountoption
> of nfs mount. If none specifed, '/var/lib/nfs/rpc_pipefs' uses by default.

That path is as the process performing the mount sees it?

> If no rpcmount mountoption, no rpc_pipefs was found at
> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
> init_rpc_pipefs.

It's the "we are in init's mount namespace" that I was wondering about.

So if I naievely chroot, nfs mount stops working the way it did before I
chrooted unless I do an extra setup step?

I'm actually poking at getting nfs mount working in LXC containers with
different network routing (mostly study so far, it took me a couple
weeks just to get lxc to work for me and now I'm trying to wrap my head
around Linux's NFS implementation), so I'm very interested in this...

Rob

2010-12-29 13:14:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 04/12] sunrpc: tag svc_serv with rpc_pipefs mount point

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
include/linux/sunrpc/svc.h | 1 +
net/sunrpc/svc.c | 4 ++++
2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 5a3085b..3b6b26c 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -64,6 +64,7 @@ struct svc_pool {
*/
struct svc_serv {
struct svc_program * sv_program; /* RPC program */
+ struct vfsmount * sv_rpcmount; /* rpc_pipefs mount point*/
struct svc_stat * sv_stats; /* RPC statistics */
spinlock_t sv_lock;
unsigned int sv_nrthreads; /* # of server threads */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 6359c42..d2f7c03 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -20,6 +20,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/slab.h>
+#include <linux/mount.h>

#include <linux/sunrpc/types.h>
#include <linux/sunrpc/xdr.h>
@@ -27,6 +28,7 @@
#include <linux/sunrpc/svcsock.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/bc_xprt.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>

#define RPCDBG_FACILITY RPCDBG_SVCDSP

@@ -371,6 +373,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
return NULL;
serv->sv_name = prog->pg_name;
serv->sv_program = prog;
+ serv->sv_rpcmount = mntget(init_rpc_pipefs);
serv->sv_nrthreads = 1;
serv->sv_stats = prog->pg_stats;
if (bufsize > RPCSVC_MAXPAYLOAD)
@@ -492,6 +495,7 @@ svc_destroy(struct svc_serv *serv)
svc_sock_destroy(serv->bc_xprt);
#endif /* CONFIG_NFS_V4_1 */

+ mntput(serv->sv_rpcmount);
svc_unregister(serv);
kfree(serv->sv_pools);
kfree(serv);
--
1.7.3.4


2010-12-29 13:14:34

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCH v2 08/12] sunrpc: tag pipefs field of cache_detail with rpc_pipefs mount point

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/nfs/cache_lib.c | 3 +--
include/linux/sunrpc/cache.h | 9 +++------
net/sunrpc/cache.c | 16 ++++++++++------
3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/cache_lib.c b/fs/nfs/cache_lib.c
index dd7ca5f..0944d4e 100644
--- a/fs/nfs/cache_lib.c
+++ b/fs/nfs/cache_lib.c
@@ -123,7 +123,7 @@ int nfs_cache_register(struct cache_detail *cd)
ret = vfs_path_lookup(mnt->mnt_root, mnt, "/cache", 0, &nd);
if (ret)
goto err;
- ret = sunrpc_cache_register_pipefs(nd.path.dentry,
+ ret = sunrpc_cache_register_pipefs(mnt, nd.path.dentry,
cd->name, 0600, cd);
path_put(&nd.path);
if (!ret)
@@ -136,6 +136,5 @@ err:
void nfs_cache_unregister(struct cache_detail *cd)
{
sunrpc_cache_unregister_pipefs(cd);
- mntput(init_rpc_pipefs);
}

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 6950c98..d34a621 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -64,10 +64,6 @@ struct cache_detail_procfs {
struct proc_dir_entry *flush_ent, *channel_ent, *content_ent;
};

-struct cache_detail_pipefs {
- struct dentry *dir;
-};
-
struct cache_detail {
struct module * owner;
int hash_size;
@@ -114,7 +110,7 @@ struct cache_detail {

union {
struct cache_detail_procfs procfs;
- struct cache_detail_pipefs pipefs;
+ struct path pipefs;
} u;
};

@@ -201,7 +197,8 @@ extern int cache_register_net(struct cache_detail *cd, struct net *net);
extern void cache_unregister(struct cache_detail *cd);
extern void cache_unregister_net(struct cache_detail *cd, struct net *net);

-extern int sunrpc_cache_register_pipefs(struct dentry *parent, const char *,
+extern int sunrpc_cache_register_pipefs(struct vfsmount *rpcmount,
+ struct dentry *parent, const char *,
mode_t, struct cache_detail *);
extern void sunrpc_cache_unregister_pipefs(struct cache_detail *);

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index e433e75..ed50d49 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -28,6 +28,7 @@
#include <linux/workqueue.h>
#include <linux/mutex.h>
#include <linux/pagemap.h>
+#include <linux/mount.h>
#include <asm/ioctls.h>
#include <linux/sunrpc/types.h>
#include <linux/sunrpc/cache.h>
@@ -1753,7 +1754,8 @@ const struct file_operations cache_flush_operations_pipefs = {
.llseek = no_llseek,
};

-int sunrpc_cache_register_pipefs(struct dentry *parent,
+int sunrpc_cache_register_pipefs(struct vfsmount *rpcmount,
+ struct dentry *parent,
const char *name, mode_t umode,
struct cache_detail *cd)
{
@@ -1766,9 +1768,10 @@ int sunrpc_cache_register_pipefs(struct dentry *parent,
q.len = strlen(name);
q.hash = full_name_hash(q.name, q.len);
dir = rpc_create_cache_dir(parent, &q, umode, cd);
- if (!IS_ERR(dir))
- cd->u.pipefs.dir = dir;
- else {
+ if (!IS_ERR(dir)) {
+ cd->u.pipefs.mnt = mntget(rpcmount);
+ cd->u.pipefs.dentry = dir;
+ } else {
sunrpc_destroy_cache_detail(cd);
ret = PTR_ERR(dir);
}
@@ -1778,8 +1781,9 @@ EXPORT_SYMBOL_GPL(sunrpc_cache_register_pipefs);

void sunrpc_cache_unregister_pipefs(struct cache_detail *cd)
{
- rpc_remove_cache_dir(cd->u.pipefs.dir);
- cd->u.pipefs.dir = NULL;
+ rpc_remove_cache_dir(cd->u.pipefs.dentry);
+ cd->u.pipefs.dentry = NULL;
+ mntput(cd->u.pipefs.mnt);
sunrpc_destroy_cache_detail(cd);
}
EXPORT_SYMBOL_GPL(sunrpc_cache_unregister_pipefs);
--
1.7.3.4


2010-12-30 10:05:11

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
>>> If no rpcmount mountoption, no rpc_pipefs was found at
>>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
>>> init_rpc_pipefs.
>>
>> It's the "we are in init's mount namespace" that I was wondering about.
>>
>> So if I naievely chroot, nfs mount stops working the way it did before I
>> chrooted unless I do an extra setup step?
>
> No. It will work as before since you are still in init's mount namespace.
> Creating new mount namespace changes rules.

Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs. Got it.

I'm kind of surprised that the kernel cares about a specific path under
/var/lib. (Seems like policy in the kernel somehow.) Can't it just
check the current process's mount list to see if an instance of
rpc_pipefs is mounted in the current namespace the way lxc looks for
cgroups? Or are there potential performance/scalability issues with that?

Rob

2010-12-30 10:44:17

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Thu, Dec 30, 2010 at 04:05:07AM -0600, Rob Landley wrote:
> On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
> >>> If no rpcmount mountoption, no rpc_pipefs was found at
> >>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
> >>> init_rpc_pipefs.
> >>
> >> It's the "we are in init's mount namespace" that I was wondering about.
> >>
> >> So if I naievely chroot, nfs mount stops working the way it did before I
> >> chrooted unless I do an extra setup step?
> >
> > No. It will work as before since you are still in init's mount namespace.
> > Creating new mount namespace changes rules.
>
> Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs. Got it.
>
> I'm kind of surprised that the kernel cares about a specific path under
> /var/lib. (Seems like policy in the kernel somehow.)

Yep. It's bad, but there is way to overwrite the default.

Other way is to leave 'rpcmount' mountoption without default.
get_rpc_pipefs(NULL) in init's mount namespace will always return
init_rpc_pipefs, without filesystem lookup.
get_rpc_pipefs(NULL) in non-init's mount namespace will always return
error.

So you will have to specify 'rpcmount' mountoption for every nfs mount in
container. Hmm, I guess, it may confuse user.

Or we can try to move the default to userspace. /sbin/mount.nfs?

> Can't it just
> check the current process's mount list to see if an instance of
> rpc_pipefs is mounted in the current namespace the way lxc looks for
> cgroups? Or are there potential performance/scalability issues with that?

What should we do if we have several rpc_pipefs mounts in the namespace?

--
Kirill A. Shutemov

2010-12-30 08:51:40

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Wed, Dec 29, 2010 at 08:13:50PM -0600, Rob Landley wrote:
> On Wed, Dec 29, 2010 at 7:14 AM, Kirill A. Shutemov <[email protected]> wrote:
> >
> > Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
> > Only for client for now.
>
> What would a test case for this look like? (Is there some way to tell
> an nfs mount to use a specific instance of rpc_pipefs or something?)

You can create a new instance of rpc_pipefs using 'newinstance'
mountoption.

Then you can specify which rpc_pipefs to use with 'rpcmount' mountoption
of nfs mount. If none specifed, '/var/lib/nfs/rpc_pipefs' uses by default.
If no rpcmount mountoption, no rpc_pipefs was found at
'/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
init_rpc_pipefs.

--
Kirill A. Shutemov

2011-01-05 11:42:03

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Wed, Dec 29, 2010 at 03:14:18PM +0200, Kirill A. Shutemov wrote:
> Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.

Won't that make sunrpc impossible to rmmod once you've got it in?
Note that having a reference to vfsmount pins it down, which pins
the superblock down, which pins the file_system_type down, which
pins the damn module down. So cleanup_sunrpc() won't be ever called,
AFAICS...

2011-01-03 20:48:58

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On 12/31/2010 10:54 AM, Trond Myklebust wrote:
>> I'm kind of surprised that the kernel cares about a specific path under
>> /var/lib. (Seems like policy in the kernel somehow.) Can't it just
>> check the current process's mount list to see if an instance of
>> rpc_pipefs is mounted in the current namespace the way lxc looks for
>> cgroups? Or are there potential performance/scalability issues with that?
>
> The kernel doesn't give a damn about the /var/lib/nfs/rpc_pipefs bit.
> That's all for the benefit of the userland utilities.

Are you saying that if you go into a container and that mount point
doesn't exist, the kernel will still be able to find and use rpc_pipefs?
Without userspace creating a specific magic path and mounting a
filesystem on it?

If so, I misread the patch...

Rob


2011-01-03 20:39:07

by Rob Landley

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On 12/31/2010 07:03 AM, Kirill A. Shutemov wrote:

> EXPORT_SYMBOL_GPL(get_rpc_pipefs);
>
> Something like this? Patch to replace patch #10 attached.

Looks good to me. Thanks.

acked-by: Rob Landley <[email protected]>

Rob



2011-01-05 13:40:29

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Wed, Jan 05, 2011 at 11:41:55AM +0000, Al Viro wrote:
> On Wed, Dec 29, 2010 at 03:14:18PM +0200, Kirill A. Shutemov wrote:
> > Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
>
> Won't that make sunrpc impossible to rmmod once you've got it in?

Oops.. Nice catch.

I'll fix it by relacing remaing references of init_rpc_pipefs with
get_rpc_pipefs(NULL) at the end of the patchset and move init_rpc_pipefs
mounting to get_rpc_pipefs(). So, we will have temporary regression in the
middle of the patchset, but I think it's not a problem.

--
Kirill A. Shutemov

2011-01-07 11:12:22

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Wed, Jan 05, 2011 at 11:41:55AM +0000, Al Viro wrote:
> On Wed, Dec 29, 2010 at 03:14:18PM +0200, Kirill A. Shutemov wrote:
> > Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
>
> Won't that make sunrpc impossible to rmmod once you've got it in?
> Note that having a reference to vfsmount pins it down, which pins
> the superblock down, which pins the file_system_type down, which
> pins the damn module down. So cleanup_sunrpc() won't be ever called,
> AFAICS...

Hm.. rpc_pipe_fs_type.owner = NULL seems fix the problem.
Is it valid solution in this case?

--
Kirill A. Shutemov

2011-01-07 11:19:27

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Fri, Jan 07, 2011 at 01:12:22PM +0200, Kirill A. Shutemov wrote:
> On Wed, Jan 05, 2011 at 11:41:55AM +0000, Al Viro wrote:
> > On Wed, Dec 29, 2010 at 03:14:18PM +0200, Kirill A. Shutemov wrote:
> > > Prepare nfs/sunrpc stack to use multiple instances of rpc_pipefs.
> >
> > Won't that make sunrpc impossible to rmmod once you've got it in?
> > Note that having a reference to vfsmount pins it down, which pins
> > the superblock down, which pins the file_system_type down, which
> > pins the damn module down. So cleanup_sunrpc() won't be ever called,
> > AFAICS...
>
> Hm.. rpc_pipe_fs_type.owner = NULL seems fix the problem.
> Is it valid solution in this case?

Please, ignore it. :)

--
Kirill A. Shutemov

2011-01-03 16:53:57

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time

On Fri, Dec 31, 2010 at 03:03:29PM +0200, Kirill A. Shutemov wrote:
> On Thu, Dec 30, 2010 at 06:52:43AM -0600, Rob Landley wrote:
> > On 12/30/2010 05:45 AM, Kirill A. Shutemov wrote:
> > > Currently, there is no association between rpc_pipefs and mount namespace,
> >
> > There is in that the root context doesn't need to have this mounted, and
> > new namespaces do. So there's an existing association between a LACK of
> > a namespace and a different default behavior.
> >
> > My understanding (correct me if I'm wrong) is that the historical
> > behavior is that there's only one, and it doesn't actually live anywhere
> > in the filesystem tree. You're adding a special location. I'm
> > wondering if there's any way for that location not to be special.
>
> /var/lib/net/rpc_pipefs is default path where userspace part of NFS stack
> (gssd, idmapd) want to see rpc_pipefs
>
> > > so I don't see simple way to restrict number of rpc_pipefs per mount
> > > namespace. Associating mount namespace with rpc_pipefs is not a good idea,
> > > I think.
> >
> > I'm talking about associating a default rpc_pipefs instance with a
> > namespace, which it seems to me you're already doing by emulating the
> > legacy behavior. Before you CLONE_NEWNS you get a magic default mount
> > that doesn't exist in the tree. After you CLONE_NEWNS you get something
> > like -EINVAL unless you supply your own default.
>
> Root namespace is special. In case of nfsroot you need rpc_pipefs before
> root available.
>
> > (I'm actually not sure
> > why new namespaces don't fall back to the magic global one...)
>
> It breaks isolation. Container should not use host's rpc_pipefs without
> host's permission.
>
> > I'm suggesting that if the user doesn't specify -o rpcmount then the
> > default could be the first rpc_pipefs mount visible to the current
> > process context, rather than a specific path. Logic to do that exists
> > in the proc/self/mounts code (which I'm reading through now...).
>
> static int check_rpc_pipefs(struct vfsmount *mnt, void *arg)
> {
> struct vfsmount **rpcmount = arg;
> struct path path = {
> .mnt = mnt,
> .dentry = mnt->mnt_root,
> };
>
> if (!mnt->mnt_sb)
> return 0;
> if (mnt->mnt_sb->s_magic != RPCAUTH_GSSMAGIC)
> return 0;
>
> if (!path_is_under(&path, &current->fs->root))
> return 0;
>
> *rpcmount = mntget(mnt);
> return 1;
> }
>
> struct vfsmount *get_rpc_pipefs(const char *p)
> {
> int error;
> struct vfsmount *rpcmount = ERR_PTR(-EINVAL);
> struct path path;
>
> if (!p) {
> iterate_mounts(check_rpc_pipefs, &rpcmount,
> current->nsproxy->mnt_ns->root);
>
> if (IS_ERR(rpcmount) && (current->nsproxy->mnt_ns ==
> init_task.nsproxy->mnt_ns))
> return mntget(init_rpc_pipefs);
>
> return rpcmount;
> }
>
> error = kern_path(p, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path);
> if (error)
> return ERR_PTR(error);
>
> check_rpc_pipefs(path.mnt, &rpcmount);
> path_put(&path);
>
> return rpcmount;
> }
> EXPORT_SYMBOL_GPL(get_rpc_pipefs);
>
> Something like this? Patch to replace patch #10 attached.

Any comments?

--
Kirill A. Shutemov