2013-02-01 12:55:15

by Stanislav Kinsbursky

[permalink] [raw]
Subject: [PATCH v2 0/4] nfsd: make is works in a container

This patch set finally enables NFSd in container.
I've tested it in container with it's own root, and also pid, net and mount
namespaces.

There are some limitations, which are listed below:
1) only nfsdclt client tracker supported for container. It's deprecated and
going to be removed soon. UMH tracker requires switching root. Legacy tracker
requires something like RB tree of opened inodes to make sure, that any
recovery directory will be opened only once.
2) Enabled versions are controlled globally (should be fixed).
3) Server should be stopped by writing "0" to
/proc/fs/nfsd/threads instead of sending signals to NFSd threads (they are
working in init_pid). Sending signals will either won't work if container wich
its own pid namespace, or will kill all nfsd threads for all containers in
init_pid namesapce.
4) Currently, if container was stopped without stopping NFS server (i.e. it's
init was killed), NFSd kthreads will remain running. One of possible solutions
is to not hold network by NFSd service sockets, but register oer-net callback
and kill all the threads on network namespace exit.
5) NFSd filesystem superblock holds network namespace. I.e. if some process
will hold container's NFSd supeblock, then sthe whole container's network
naemspace will stay alive even is container is destroyed already.

There may be more limitations, which are not clear to me yet.

v2:
1) removed root swap - deprecated
2) rebased on current tree

The following series implements...

---

Stanislav Kinsbursky (4):
nfsd: containerize NFSd filesystem
nfsd: use proper net while reading "exports" file
nfsd: disable usermode helper client tracker in container
nfsd: enable NFSv4 state in containers


fs/nfsd/nfs4recover.c | 6 ++++
fs/nfsd/nfs4state.c | 10 ------
fs/nfsd/nfsctl.c | 77 +++++++++++++++++++++++++++++++++++++------------
fs/nfsd/nfssvc.c | 5 +--
4 files changed, 66 insertions(+), 32 deletions(-)



2013-02-01 12:55:18

by Stanislav Kinsbursky

[permalink] [raw]
Subject: [PATCH 1/4] nfsd: containerize NFSd filesystem

This patch makes NFSD file system superblock to be created per net.
This makes possible to get proper network namespace from superblock instead of
using hard-coded "init_net".

Note: NFSd fs super-block holds network namespace. This garantees, that
network namespace won't disappear from underneath of it.
This, obviously, means, that in case of kill of a container's "init" (which is not a mount
namespace, but network namespace creator) netowrk namespace won't be
destroyed.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
---
fs/nfsd/nfsctl.c | 46 +++++++++++++++++++++++++++++++++-------------
fs/nfsd/nfssvc.c | 5 ++---
2 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 65889ec..2eb03c1 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -220,6 +220,7 @@ static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size)
struct sockaddr *sap = (struct sockaddr *)&address;
size_t salen = sizeof(address);
char *fo_path;
+ struct net *net = file->f_dentry->d_sb->s_fs_info;

/* sanity check */
if (size == 0)
@@ -232,7 +233,7 @@ static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size)
if (qword_get(&buf, fo_path, size) < 0)
return -EINVAL;

- if (rpc_pton(&init_net, fo_path, size, sap, salen) == 0)
+ if (rpc_pton(net, fo_path, size, sap, salen) == 0)
return -EINVAL;

return nlmsvc_unlock_all_by_ip(sap);
@@ -317,6 +318,7 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
int len;
struct auth_domain *dom;
struct knfsd_fh fh;
+ struct net *net = file->f_dentry->d_sb->s_fs_info;

if (size == 0)
return -EINVAL;
@@ -352,7 +354,7 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
if (!dom)
return -ENOMEM;

- len = exp_rootfh(&init_net, dom, path, &fh, maxsize);
+ len = exp_rootfh(net, dom, path, &fh, maxsize);
auth_domain_put(dom);
if (len)
return len;
@@ -396,7 +398,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
{
char *mesg = buf;
int rv;
- struct net *net = &init_net;
+ struct net *net = file->f_dentry->d_sb->s_fs_info;

if (size > 0) {
int newthreads;
@@ -447,7 +449,7 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
int len;
int npools;
int *nthreads;
- struct net *net = &init_net;
+ struct net *net = file->f_dentry->d_sb->s_fs_info;

mutex_lock(&nfsd_mutex);
npools = nfsd_nrpools(net);
@@ -510,7 +512,7 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size)
unsigned minor;
ssize_t tlen = 0;
char *sep;
- struct net *net = &init_net;
+ struct net *net = file->f_dentry->d_sb->s_fs_info;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);

if (size>0) {
@@ -792,7 +794,7 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size,
static ssize_t write_ports(struct file *file, char *buf, size_t size)
{
ssize_t rv;
- struct net *net = &init_net;
+ struct net *net = file->f_dentry->d_sb->s_fs_info;

mutex_lock(&nfsd_mutex);
rv = __write_ports(file, buf, size, net);
@@ -827,7 +829,7 @@ int nfsd_max_blksize;
static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
{
char *mesg = buf;
- struct net *net = &init_net;
+ struct net *net = file->f_dentry->d_sb->s_fs_info;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);

if (size > 0) {
@@ -923,7 +925,8 @@ static ssize_t nfsd4_write_time(struct file *file, char *buf, size_t size,
*/
static ssize_t write_leasetime(struct file *file, char *buf, size_t size)
{
- struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id);
+ struct net *net = file->f_dentry->d_sb->s_fs_info;
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
return nfsd4_write_time(file, buf, size, &nn->nfsd4_lease, nn);
}

@@ -939,7 +942,8 @@ static ssize_t write_leasetime(struct file *file, char *buf, size_t size)
*/
static ssize_t write_gracetime(struct file *file, char *buf, size_t size)
{
- struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id);
+ struct net *net = file->f_dentry->d_sb->s_fs_info;
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
return nfsd4_write_time(file, buf, size, &nn->nfsd4_grace, nn);
}

@@ -995,7 +999,8 @@ static ssize_t __write_recoverydir(struct file *file, char *buf, size_t size,
static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
{
ssize_t rv;
- struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id);
+ struct net *net = file->f_dentry->d_sb->s_fs_info;
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);

mutex_lock(&nfsd_mutex);
rv = __write_recoverydir(file, buf, size, nn);
@@ -1037,20 +1042,35 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
#endif
/* last one */ {""}
};
- return simple_fill_super(sb, 0x6e667364, nfsd_files);
+ struct net *net = data;
+ int ret;
+
+ ret = simple_fill_super(sb, 0x6e667364, nfsd_files);
+ if (ret)
+ return ret;
+ sb->s_fs_info = get_net(net);
+ return 0;
}

static struct dentry *nfsd_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
- return mount_single(fs_type, flags, data, nfsd_fill_super);
+ return mount_ns(fs_type, flags, current->nsproxy->net_ns, nfsd_fill_super);
+}
+
+static void nfsd_umount(struct super_block *sb)
+{
+ struct net *net = sb->s_fs_info;
+
+ kill_litter_super(sb);
+ put_net(net);
}

static struct file_system_type nfsd_fs_type = {
.owner = THIS_MODULE,
.name = "nfsd",
.mount = nfsd_mount,
- .kill_sb = kill_litter_super,
+ .kill_sb = nfsd_umount,
};

#ifdef CONFIG_PROC_FS
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index cee62ab..9b539c3 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -703,8 +703,7 @@ nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp)
int nfsd_pool_stats_open(struct inode *inode, struct file *file)
{
int ret;
- struct net *net = &init_net;
- struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct nfsd_net *nn = net_generic(inode->i_sb->s_fs_info, nfsd_net_id);

mutex_lock(&nfsd_mutex);
if (nn->nfsd_serv == NULL) {
@@ -721,7 +720,7 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file)
int nfsd_pool_stats_release(struct inode *inode, struct file *file)
{
int ret = seq_release(inode, file);
- struct net *net = &init_net;
+ struct net *net = inode->i_sb->s_fs_info;

mutex_lock(&nfsd_mutex);
/* this function really, really should have been called svc_put() */


2013-02-18 05:10:58

by Stanislav Kinsbursky

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfsd: make is works in a container

15.02.2013 20:20, J. Bruce Fields пишет:
> On Fri, Feb 01, 2013 at 03:56:05PM +0300, Stanislav Kinsbursky wrote:
>> This patch set finally enables NFSd in container.
>> I've tested it in container with it's own root, and also pid, net and mount
>> namespaces.
>
> Thanks, these look fine to me; applying. They should show up in my
> for-3.9 branch sometime today.
>

Thanks!


--
Best regards,
Stanislav Kinsbursky

2013-02-01 12:55:33

by Stanislav Kinsbursky

[permalink] [raw]
Subject: [PATCH 4/4] nfsd: enable NFSv4 state in containers

Currently, NFSd is ready to operate in network namespace based containers.
So let's drop check for "init_net" and make it able to fly.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
---
fs/nfsd/nfs4state.c | 10 ----------
1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4db46aa..b7e2bc4 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4914,16 +4914,6 @@ nfs4_state_start_net(struct net *net)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
int ret;

- /*
- * FIXME: For now, we hang most of the pernet global stuff off of
- * init_net until nfsd is fully containerized. Eventually, we'll
- * need to pass a net pointer into this function, take a reference
- * to that instead and then do most of the rest of this on a per-net
- * basis.
- */
- if (net != &init_net)
- return -EINVAL;
-
ret = nfs4_state_create_net(net);
if (ret)
return ret;


2013-02-01 12:55:25

by Stanislav Kinsbursky

[permalink] [raw]
Subject: [PATCH 2/4] nfsd: use proper net while reading "exports" file

Functuon "exports_open" is used for both "/proc/fs/nfs/exports" and
"/proc/fs/nfsd/exports" files.
Now NFSd filesystem is containerised, so proper net can be taken from
superblock for "/proc/fs/nfsd/exports" reader.
But for "/proc/fs/nfsd/exports" only current->nsproxy->net_ns can be used.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
---
fs/nfsd/nfsctl.c | 31 +++++++++++++++++++++++++------
1 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 2eb03c1..1825f72 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -125,11 +125,11 @@ static const struct file_operations transaction_ops = {
.llseek = default_llseek,
};

-static int exports_open(struct inode *inode, struct file *file)
+static int exports_net_open(struct net *net, struct file *file)
{
int err;
struct seq_file *seq;
- struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id);
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);

err = seq_open(file, &nfs_exports_op);
if (err)
@@ -140,8 +140,26 @@ static int exports_open(struct inode *inode, struct file *file)
return 0;
}

-static const struct file_operations exports_operations = {
- .open = exports_open,
+static int exports_proc_open(struct inode *inode, struct file *file)
+{
+ return exports_net_open(current->nsproxy->net_ns, file);
+}
+
+static const struct file_operations exports_proc_operations = {
+ .open = exports_proc_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+ .owner = THIS_MODULE,
+};
+
+static int exports_nfsd_open(struct inode *inode, struct file *file)
+{
+ return exports_net_open(inode->i_sb->s_fs_info, file);
+}
+
+static const struct file_operations exports_nfsd_operations = {
+ .open = exports_nfsd_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,
@@ -1018,7 +1036,7 @@ static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
{
static struct tree_descr nfsd_files[] = {
- [NFSD_List] = {"exports", &exports_operations, S_IRUGO},
+ [NFSD_List] = {"exports", &exports_nfsd_operations, S_IRUGO},
[NFSD_Export_features] = {"export_features",
&export_features_operations, S_IRUGO},
[NFSD_FO_UnlockIP] = {"unlock_ip",
@@ -1081,7 +1099,8 @@ static int create_proc_exports_entry(void)
entry = proc_mkdir("fs/nfs", NULL);
if (!entry)
return -ENOMEM;
- entry = proc_create("exports", 0, entry, &exports_operations);
+ entry = proc_create("exports", 0, entry,
+ &exports_proc_operations);
if (!entry)
return -ENOMEM;
return 0;


2013-02-01 12:55:27

by Stanislav Kinsbursky

[permalink] [raw]
Subject: [PATCH 3/4] nfsd: disable usermode helper client tracker in container

This tracker uses khelper kthread to execute binaries.
Execution itself is done from kthread context - i.e. global root is used.
This is not suitable for containers with own root.
So, disable this tracker for a while.

Note: one of possible solutions can be pass "init" callback to khelper, which
will swap root to desired one.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
---
fs/nfsd/nfs4recover.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index ba6fdd4..e0ae1cf 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -1185,6 +1185,12 @@ bin_to_hex_dup(const unsigned char *src, int srclen)
static int
nfsd4_umh_cltrack_init(struct net __attribute__((unused)) *net)
{
+ /* XXX: The usermode helper s not working in container yet. */
+ if (net != &init_net) {
+ WARN(1, KERN_ERR "NFSD: attempt to initialize umh client "
+ "tracking in a container!\n");
+ return -EINVAL;
+ }
return nfsd4_umh_cltrack_upcall("init", NULL, NULL);
}



2013-02-15 16:20:23

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] nfsd: make is works in a container

On Fri, Feb 01, 2013 at 03:56:05PM +0300, Stanislav Kinsbursky wrote:
> This patch set finally enables NFSd in container.
> I've tested it in container with it's own root, and also pid, net and mount
> namespaces.

Thanks, these look fine to me; applying. They should show up in my
for-3.9 branch sometime today.

--b.

>
> There are some limitations, which are listed below:
> 1) only nfsdclt client tracker supported for container. It's deprecated and
> going to be removed soon. UMH tracker requires switching root. Legacy tracker
> requires something like RB tree of opened inodes to make sure, that any
> recovery directory will be opened only once.
> 2) Enabled versions are controlled globally (should be fixed).
> 3) Server should be stopped by writing "0" to
> /proc/fs/nfsd/threads instead of sending signals to NFSd threads (they are
> working in init_pid). Sending signals will either won't work if container wich
> its own pid namespace, or will kill all nfsd threads for all containers in
> init_pid namesapce.
> 4) Currently, if container was stopped without stopping NFS server (i.e. it's
> init was killed), NFSd kthreads will remain running. One of possible solutions
> is to not hold network by NFSd service sockets, but register oer-net callback
> and kill all the threads on network namespace exit.
> 5) NFSd filesystem superblock holds network namespace. I.e. if some process
> will hold container's NFSd supeblock, then sthe whole container's network
> naemspace will stay alive even is container is destroyed already.
>
> There may be more limitations, which are not clear to me yet.
>
> v2:
> 1) removed root swap - deprecated
> 2) rebased on current tree
>
> The following series implements...
>
> ---
>
> Stanislav Kinsbursky (4):
> nfsd: containerize NFSd filesystem
> nfsd: use proper net while reading "exports" file
> nfsd: disable usermode helper client tracker in container
> nfsd: enable NFSv4 state in containers
>
>
> fs/nfsd/nfs4recover.c | 6 ++++
> fs/nfsd/nfs4state.c | 10 ------
> fs/nfsd/nfsctl.c | 77 +++++++++++++++++++++++++++++++++++++------------
> fs/nfsd/nfssvc.c | 5 +--
> 4 files changed, 66 insertions(+), 32 deletions(-)
>