2010-07-29 19:56:45

by Serge Hallyn

[permalink] [raw]
Subject: [PATCH 1/3] cgroup : add clone_children control file

This patch is sent as an answer to a previous thread around the ns_cgroup.

https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html

It adds a control file 'clone_children' for a cgroup.
This control file is a boolean specifying if the child cgroup should
be a clone of the parent cgroup or not. The default value is 'false'.

This flag makes the child cgroup to call the post_clone callback of all
the subsystem, if it is available.

At present, the cpuset is the only one which had implemented the post_clone
callback.

The option can be set at mount time by specifying the 'clone_children' mount
option.

Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Paul Menage <[email protected]>
---
Documentation/cgroups/cgroups.txt | 14 +++++++++++-
include/linux/cgroup.h | 4 +++
kernel/cgroup.c | 39 +++++++++++++++++++++++++++++++++++++
3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index b34823f..190018b 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -18,7 +18,8 @@ CONTENTS:
1.2 Why are cgroups needed ?
1.3 How are cgroups implemented ?
1.4 What does notify_on_release do ?
- 1.5 How do I use cgroups ?
+ 1.5 What does clone_children do ?
+ 1.6 How do I use cgroups ?
2. Usage Examples and Syntax
2.1 Basic Usage
2.2 Attaching processes
@@ -293,7 +294,16 @@ notify_on_release in the root cgroup at system boot is disabled
value of their parents notify_on_release setting. The default value of
a cgroup hierarchy's release_agent path is empty.

-1.5 How do I use cgroups ?
+1.5 What does clone_children do ?
+---------------------------------
+
+If the clone_children flag is enabled (1) in a cgroup, then all
+cgroups created beneath will call the post_clone callbacks for each
+subsystem of the newly created cgroup. Usually when this callback is
+implemented for a subsystem, it copies the values of the parent
+subsystem, this is the case for the cpuset.
+
+1.6 How do I use cgroups ?
--------------------------

To start a new job that is to be contained within a cgroup, using
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e3d00fd..f3cbd73 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -154,6 +154,10 @@ enum {
* A thread in rmdir() is wating for this cgroup.
*/
CGRP_WAIT_ON_RMDIR,
+ /*
+ * Clone cgroup values when creating a new child cgroup
+ */
+ CGRP_CLONE_CHILDREN,
};

/* which pidlist file are we talking about? */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 3ac6f5b..dfbff78 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -229,6 +229,7 @@ inline int cgroup_is_removed(const struct cgroup *cgrp)
/* bits in struct cgroupfs_root flags field */
enum {
ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
+ ROOT_CLONE_CHILDREN, /* mounted subsystems will inherit from parent */
};

static int cgroup_is_releasable(const struct cgroup *cgrp)
@@ -244,6 +245,11 @@ static int notify_on_release(const struct cgroup *cgrp)
return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
}

+static int clone_children(const struct cgroup *cgrp)
+{
+ return test_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
+}
+
/*
* for_each_subsys() allows you to iterate on each subsystem attached to
* an active hierarchy
@@ -1038,6 +1044,8 @@ static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
seq_printf(seq, ",%s", ss->name);
if (test_bit(ROOT_NOPREFIX, &root->flags))
seq_puts(seq, ",noprefix");
+ if (test_bit(ROOT_CLONE_CHILDREN, &root->flags))
+ seq_puts(seq, ",clone_children");
if (strlen(root->release_agent_path))
seq_printf(seq, ",release_agent=%s", root->release_agent_path);
if (strlen(root->name))
@@ -1097,6 +1105,8 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
opts->none = true;
} else if (!strcmp(token, "noprefix")) {
set_bit(ROOT_NOPREFIX, &opts->flags);
+ } else if (!strcmp(token, "clone_children")) {
+ set_bit(ROOT_CLONE_CHILDREN, &opts->flags);
} else if (!strncmp(token, "release_agent=", 14)) {
/* Specifying two release agents is forbidden */
if (opts->release_agent)
@@ -1357,6 +1367,8 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts)
strcpy(root->release_agent_path, opts->release_agent);
if (opts->name)
strcpy(root->name, opts->name);
+ if (test_bit(ROOT_CLONE_CHILDREN, &opts->flags))
+ set_bit(CGRP_CLONE_CHILDREN, &root->top_cgroup.flags);
return root;
}

@@ -3150,6 +3162,23 @@ fail:
return ret;
}

+static u64 cgroup_clone_children_read(struct cgroup *cgrp,
+ struct cftype *cft)
+{
+ return clone_children(cgrp);
+}
+
+static int cgroup_clone_children_write(struct cgroup *cgrp,
+ struct cftype *cft,
+ u64 val)
+{
+ if (val)
+ set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
+ else
+ clear_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
+ return 0;
+}
+
/*
* for the common functions, 'private' gives the type of file
*/
@@ -3180,6 +3209,11 @@ static struct cftype files[] = {
.write_string = cgroup_write_event_control,
.mode = S_IWUGO,
},
+ {
+ .name = "cgroup.clone_children",
+ .read_u64 = cgroup_clone_children_read,
+ .write_u64 = cgroup_clone_children_write,
+ },
};

static struct cftype cft_release_agent = {
@@ -3309,6 +3343,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
if (notify_on_release(parent))
set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);

+ if (clone_children(parent))
+ set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);
+
for_each_subsys(root, ss) {
struct cgroup_subsys_state *css = ss->create(ss, cgrp);

@@ -3323,6 +3360,8 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
goto err_destroy;
}
/* At error, ->destroy() callback has to free assigned ID. */
+ if (clone_children(parent) && ss->post_clone)
+ ss->post_clone(ss, cgrp);
}

cgroup_lock_hierarchy(root);
--
1.7.0.4


2010-07-29 19:57:45

by Serge Hallyn

[permalink] [raw]
Subject: [PATCH 2/3] cgroup : make the mount options parsing more accurate

The actual code does not detect 'all' with one subsystem name, which
is IMHO mutually exclusive and when an option is specified even if it
is not a subsystem name, we have to specify the 'all' option with the
other option.
eg:
not detected : mount -t cgroup -o all,freezer cgroup /cgroup
not flexible : mount -t cgroup -o noprefix,all cgroup /cgroup

This patch fix this and makes the code a bit more clear by replacing
'else if' indentation by 'continue' blocks in the loop.

Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Paul Menage <[email protected]>
---
kernel/cgroup.c | 91 +++++++++++++++++++++++++++++++++++++------------------
1 files changed, 61 insertions(+), 30 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index dfbff78..09fb6f9 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1074,7 +1074,8 @@ struct cgroup_sb_opts {
*/
static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
{
- char *token, *o = data ?: "all";
+ char *token, *o = data;
+ bool all_ss = false, one_ss = false;
unsigned long mask = (unsigned long)-1;
int i;
bool module_pin_failed = false;
@@ -1088,26 +1089,30 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
memset(opts, 0, sizeof(*opts));

while ((token = strsep(&o, ",")) != NULL) {
+
if (!*token)
return -EINVAL;
- if (!strcmp(token, "all")) {
- /* Add all non-disabled subsystems */
- opts->subsys_bits = 0;
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
- struct cgroup_subsys *ss = subsys[i];
- if (ss == NULL)
- continue;
- if (!ss->disabled)
- opts->subsys_bits |= 1ul << i;
- }
- } else if (!strcmp(token, "none")) {
+ if (!strcmp(token, "none")) {
/* Explicitly have no subsystems */
opts->none = true;
- } else if (!strcmp(token, "noprefix")) {
+ continue;
+ }
+ if (!strcmp(token, "all")) {
+ /* Mutually exclusive option 'all' + subsystem name */
+ if (one_ss)
+ return -EINVAL;
+ all_ss = true;
+ continue;
+ }
+ if (!strcmp(token, "noprefix")) {
set_bit(ROOT_NOPREFIX, &opts->flags);
- } else if (!strcmp(token, "clone_children")) {
+ continue;
+ }
+ if (!strcmp(token, "clone_children")) {
set_bit(ROOT_CLONE_CHILDREN, &opts->flags);
- } else if (!strncmp(token, "release_agent=", 14)) {
+ continue;
+ }
+ if (!strncmp(token, "release_agent=", 14)) {
/* Specifying two release agents is forbidden */
if (opts->release_agent)
return -EINVAL;
@@ -1115,7 +1120,9 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
kstrndup(token + 14, PATH_MAX, GFP_KERNEL);
if (!opts->release_agent)
return -ENOMEM;
- } else if (!strncmp(token, "name=", 5)) {
+ continue;
+ }
+ if (!strncmp(token, "name=", 5)) {
const char *name = token + 5;
/* Can't specify an empty name */
if (!strlen(name))
@@ -1137,20 +1144,44 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
GFP_KERNEL);
if (!opts->name)
return -ENOMEM;
- } else {
- struct cgroup_subsys *ss;
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
- ss = subsys[i];
- if (ss == NULL)
- continue;
- if (!strcmp(token, ss->name)) {
- if (!ss->disabled)
- set_bit(i, &opts->subsys_bits);
- break;
- }
- }
- if (i == CGROUP_SUBSYS_COUNT)
- return -ENOENT;
+
+ continue;
+ }
+
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
+ if (strcmp(token, ss->name))
+ continue;
+ if (ss->disabled)
+ continue;
+
+ /* Mutually exclusive option 'all' + subsystem name */
+ if (all_ss)
+ return -EINVAL;
+ set_bit(i, &opts->subsys_bits);
+ one_ss = true;
+
+ break;
+ }
+ if (i == CGROUP_SUBSYS_COUNT)
+ return -ENOENT;
+ }
+
+ /*
+ * If the 'all' option was specified select all the subsystems,
+ * otherwise 'all, 'none' and a subsystem name options were not
+ * specified, let's default to 'all'
+ */
+ if (all_ss || (!all_ss && !one_ss && !opts->none)) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
+ if (ss->disabled)
+ continue;
+ set_bit(i, &opts->subsys_bits);
}
}

--
1.7.0.4

2010-07-29 19:58:27

by Serge Hallyn

[permalink] [raw]
Subject: [PATCH 3/3] cgroup : remove the ns_cgroup

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier.

For example, a single process can not handle a big amount of namespaces
without interacting with this cgroup and falling in an exponential creation
time due to the nested cgroup directory depth (eg. /cgroup/<pid>/.../<pid>/...).

That was spotted when creating a single process using multiple network namespaces,
the objective was 4096 network namespaces, but at 820 netns, the creation time
was dramatically slow and the creation time for a namespace increased from 10msec
to 10sec. After five hours, the expected numbers of netns was not reached.
Without the ns_cgroup interaction, 4K netns are created after 2 minutes.

In order to solve that, we have to mount the cgroup with all the subsystems
except the ns_cgroup, it's a little weird and hard to manage from an administration
pov because we have to know what are the cgroup available on the system and we
can't do a simple 'mount -t cgroup cgroup /cgroup'.

With the previous patch which adds a 'clone_children' parameter to a cgroup,
we should be able to remove the ns_cgroup and manage manually the creation +
adding a task to the cgroup consistenly with the rest of the subsystems.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

Changelog: Jul 29 (seh): remove references to ns_cgroup_clone(), fix up
some documentation, and remove CONFIG_CGROUP_NS references.

Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Paul Menage <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
---
Documentation/cgroups/cgroups.txt | 2 +-
arch/ia64/configs/generic_defconfig | 1 -
arch/mips/configs/bcm47xx_defconfig | 1 -
arch/mips/configs/ip27_defconfig | 1 -
arch/mips/configs/sb1250-swarm_defconfig | 1 -
arch/powerpc/configs/cell_defconfig | 1 -
arch/powerpc/configs/ppc64_defconfig | 1 -
arch/powerpc/configs/ppc64e_defconfig | 1 -
arch/powerpc/configs/ppc6xx_defconfig | 1 -
arch/powerpc/configs/pseries_defconfig | 1 -
arch/s390/defconfig | 1 -
arch/sh/configs/sdk7786_defconfig | 1 -
arch/sh/configs/se7206_defconfig | 1 -
arch/sh/configs/sh7724_generic_defconfig | 1 -
arch/sh/configs/sh7770_generic_defconfig | 1 -
arch/sh/configs/shx3_defconfig | 1 -
arch/sh/configs/urquell_defconfig | 1 -
arch/x86/configs/i386_defconfig | 1 -
arch/x86/configs/x86_64_defconfig | 1 -
include/linux/cgroup.h | 3 -
include/linux/cgroup_subsys.h | 6 --
include/linux/nsproxy.h | 9 ---
init/Kconfig | 9 ---
kernel/Makefile | 1 -
kernel/cgroup.c | 116 ------------------------------
kernel/cpuset.c | 7 +-
kernel/fork.c | 6 --
kernel/ns_cgroup.c | 110 ----------------------------
kernel/nsproxy.c | 4 -
29 files changed, 4 insertions(+), 287 deletions(-)
delete mode 100644 kernel/ns_cgroup.c

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 190018b..6a5ba63 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -618,7 +618,7 @@ always handled well.
void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
(cgroup_mutex held by caller)

-Called at the end of cgroup_clone() to do any parameter
+Called during cgroup_create() to do any parameter
initialization which might be required before a task could attach. For
example in cpusets, no task may attach before 'cpus' and 'mems' are set
up.
diff --git a/arch/ia64/configs/generic_defconfig b/arch/ia64/configs/generic_defconfig
index 6a4cc50..d257546 100644
--- a/arch/ia64/configs/generic_defconfig
+++ b/arch/ia64/configs/generic_defconfig
@@ -25,7 +25,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
diff --git a/arch/mips/configs/bcm47xx_defconfig b/arch/mips/configs/bcm47xx_defconfig
index bbd826b..cb75ade 100644
--- a/arch/mips/configs/bcm47xx_defconfig
+++ b/arch/mips/configs/bcm47xx_defconfig
@@ -199,7 +199,6 @@ CONFIG_TINY_RCU=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
diff --git a/arch/mips/configs/ip27_defconfig b/arch/mips/configs/ip27_defconfig
index 84b6503..036ad3e 100644
--- a/arch/mips/configs/ip27_defconfig
+++ b/arch/mips/configs/ip27_defconfig
@@ -209,7 +209,6 @@ CONFIG_LOG_BUF_SHIFT=15
# CONFIG_GROUP_SCHED is not set
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
diff --git a/arch/mips/configs/sb1250-swarm_defconfig b/arch/mips/configs/sb1250-swarm_defconfig
index 7f07bf0..0f059f2 100644
--- a/arch/mips/configs/sb1250-swarm_defconfig
+++ b/arch/mips/configs/sb1250-swarm_defconfig
@@ -197,7 +197,6 @@ CONFIG_SYSVIPC_SYSCTL=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-# CONFIG_CGROUP_NS is not set
CONFIG_CPUSETS=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
diff --git a/arch/powerpc/configs/cell_defconfig b/arch/powerpc/configs/cell_defconfig
index 9433719..cfa9177 100644
--- a/arch/powerpc/configs/cell_defconfig
+++ b/arch/powerpc/configs/cell_defconfig
@@ -76,7 +76,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-# CONFIG_CGROUP_NS is not set
CONFIG_CPUSETS=y
# CONFIG_GROUP_SCHED is not set
# CONFIG_USER_SCHED is not set
diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
index 369f4e0..2bc8f44 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -86,7 +86,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
diff --git a/arch/powerpc/configs/ppc64e_defconfig b/arch/powerpc/configs/ppc64e_defconfig
index 403e82e..ac0f13d 100644
--- a/arch/powerpc/configs/ppc64e_defconfig
+++ b/arch/powerpc/configs/ppc64e_defconfig
@@ -100,7 +100,6 @@ CONFIG_LOG_BUF_SHIFT=17
# CONFIG_GROUP_SCHED is not set
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
diff --git a/arch/powerpc/configs/ppc6xx_defconfig b/arch/powerpc/configs/ppc6xx_defconfig
index 12dc7c4..8175eba 100644
--- a/arch/powerpc/configs/ppc6xx_defconfig
+++ b/arch/powerpc/configs/ppc6xx_defconfig
@@ -88,7 +88,6 @@ CONFIG_AUDIT_TREE=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
# CONFIG_CGROUP_FREEZER is not set
CONFIG_CGROUP_DEVICE=y
CONFIG_GROUP_SCHED=y
diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
index 16ae717..6d96b61 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -85,7 +85,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
diff --git a/arch/s390/defconfig b/arch/s390/defconfig
index 253f158..08e9f82 100644
--- a/arch/s390/defconfig
+++ b/arch/s390/defconfig
@@ -72,7 +72,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
diff --git a/arch/sh/configs/sdk7786_defconfig b/arch/sh/configs/sdk7786_defconfig
index 2698245..0f6894e 100644
--- a/arch/sh/configs/sdk7786_defconfig
+++ b/arch/sh/configs/sdk7786_defconfig
@@ -84,7 +84,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
index 910eaec..d2c0fc8 100644
--- a/arch/sh/configs/se7206_defconfig
+++ b/arch/sh/configs/se7206_defconfig
@@ -77,7 +77,6 @@ CONFIG_TREE_RCU_TRACE=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
-CONFIG_CGROUP_NS=y
# CONFIG_CGROUP_FREEZER is not set
CONFIG_CGROUP_DEVICE=y
# CONFIG_CPUSETS is not set
diff --git a/arch/sh/configs/sh7724_generic_defconfig b/arch/sh/configs/sh7724_generic_defconfig
index a6a9e68..c3a2f67 100644
--- a/arch/sh/configs/sh7724_generic_defconfig
+++ b/arch/sh/configs/sh7724_generic_defconfig
@@ -70,7 +70,6 @@ CONFIG_RCU_FANOUT=32
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
diff --git a/arch/sh/configs/sh7770_generic_defconfig b/arch/sh/configs/sh7770_generic_defconfig
index 4327f89..ce2357f 100644
--- a/arch/sh/configs/sh7770_generic_defconfig
+++ b/arch/sh/configs/sh7770_generic_defconfig
@@ -69,7 +69,6 @@ CONFIG_RCU_FANOUT=32
CONFIG_LOG_BUF_SHIFT=17
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-# CONFIG_CGROUP_NS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
index 42f6bd3..05ced6a 100644
--- a/arch/sh/configs/shx3_defconfig
+++ b/arch/sh/configs/shx3_defconfig
@@ -84,7 +84,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
# CONFIG_CPUSETS is not set
diff --git a/arch/sh/configs/urquell_defconfig b/arch/sh/configs/urquell_defconfig
index 28bb19d..e698dff 100644
--- a/arch/sh/configs/urquell_defconfig
+++ b/arch/sh/configs/urquell_defconfig
@@ -82,7 +82,6 @@ CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
index d28fad1..4428d5c 100644
--- a/arch/x86/configs/i386_defconfig
+++ b/arch/x86/configs/i386_defconfig
@@ -105,7 +105,6 @@ CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig
index 6c86acd..2ff947f 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -105,7 +105,6 @@ CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
-CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index f3cbd73..ddbdb77 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -552,9 +552,6 @@ static inline struct cgroup* task_cgroup(struct task_struct *task,
return task_subsys_state(task, subsys_id)->cgroup;
}

-int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *ss,
- char *nodename);
-
/* A cgroup_iter should be treated as an opaque object */
struct cgroup_iter {
struct list_head *cg_link;
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index ccefff0..4ba5259 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -19,12 +19,6 @@ SUBSYS(debug)

/* */

-#ifdef CONFIG_CGROUP_NS
-SUBSYS(ns)
-#endif
-
-/* */
-
#ifdef CONFIG_CGROUP_SCHED
SUBSYS(cpu_cgroup)
#endif
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 7b370c7..50d20ab 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -81,13 +81,4 @@ static inline void get_nsproxy(struct nsproxy *ns)
atomic_inc(&ns->count);
}

-#ifdef CONFIG_CGROUP_NS
-int ns_cgroup_clone(struct task_struct *tsk, struct pid *pid);
-#else
-static inline int ns_cgroup_clone(struct task_struct *tsk, struct pid *pid)
-{
- return 0;
-}
-#endif
-
#endif
diff --git a/init/Kconfig b/init/Kconfig
index 5cff9a9..1124656 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -496,15 +496,6 @@ config CGROUP_DEBUG

Say N if unsure.

-config CGROUP_NS
- bool "Namespace cgroup subsystem"
- depends on CGROUPS
- help
- Provides a simple namespace cgroup subsystem to
- provide hierarchical naming of sets of namespaces,
- for instance virtual servers and checkpoint/restart
- jobs.
-
config CGROUP_FREEZER
bool "Freezer cgroup subsystem"
depends on CGROUPS
diff --git a/kernel/Makefile b/kernel/Makefile
index 057472f..a7ee5f4 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -62,7 +62,6 @@ obj-$(CONFIG_COMPAT) += compat.o
obj-$(CONFIG_CGROUPS) += cgroup.o
obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
obj-$(CONFIG_CPUSETS) += cpuset.o
-obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
obj-$(CONFIG_UTS_NS) += utsname.o
obj-$(CONFIG_USER_NS) += user_namespace.o
obj-$(CONFIG_PID_NS) += pid_namespace.o
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 09fb6f9..7ec5bad 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4212,122 +4212,6 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
}

/**
- * cgroup_clone - clone the cgroup the given subsystem is attached to
- * @tsk: the task to be moved
- * @subsys: the given subsystem
- * @nodename: the name for the new cgroup
- *
- * Duplicate the current cgroup in the hierarchy that the given
- * subsystem is attached to, and move this task into the new
- * child.
- */
-int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *subsys,
- char *nodename)
-{
- struct dentry *dentry;
- int ret = 0;
- struct cgroup *parent, *child;
- struct inode *inode;
- struct css_set *cg;
- struct cgroupfs_root *root;
- struct cgroup_subsys *ss;
-
- /* We shouldn't be called by an unregistered subsystem */
- BUG_ON(!subsys->active);
-
- /* First figure out what hierarchy and cgroup we're dealing
- * with, and pin them so we can drop cgroup_mutex */
- mutex_lock(&cgroup_mutex);
- again:
- root = subsys->root;
- if (root == &rootnode) {
- mutex_unlock(&cgroup_mutex);
- return 0;
- }
-
- /* Pin the hierarchy */
- if (!atomic_inc_not_zero(&root->sb->s_active)) {
- /* We race with the final deactivate_super() */
- mutex_unlock(&cgroup_mutex);
- return 0;
- }
-
- /* Keep the cgroup alive */
- task_lock(tsk);
- parent = task_cgroup(tsk, subsys->subsys_id);
- cg = tsk->cgroups;
- get_css_set(cg);
- task_unlock(tsk);
-
- mutex_unlock(&cgroup_mutex);
-
- /* Now do the VFS work to create a cgroup */
- inode = parent->dentry->d_inode;
-
- /* Hold the parent directory mutex across this operation to
- * stop anyone else deleting the new cgroup */
- mutex_lock(&inode->i_mutex);
- dentry = lookup_one_len(nodename, parent->dentry, strlen(nodename));
- if (IS_ERR(dentry)) {
- printk(KERN_INFO
- "cgroup: Couldn't allocate dentry for %s: %ld\n", nodename,
- PTR_ERR(dentry));
- ret = PTR_ERR(dentry);
- goto out_release;
- }
-
- /* Create the cgroup directory, which also creates the cgroup */
- ret = vfs_mkdir(inode, dentry, 0755);
- child = __d_cgrp(dentry);
- dput(dentry);
- if (ret) {
- printk(KERN_INFO
- "Failed to create cgroup %s: %d\n", nodename,
- ret);
- goto out_release;
- }
-
- /* The cgroup now exists. Retake cgroup_mutex and check
- * that we're still in the same state that we thought we
- * were. */
- mutex_lock(&cgroup_mutex);
- if ((root != subsys->root) ||
- (parent != task_cgroup(tsk, subsys->subsys_id))) {
- /* Aargh, we raced ... */
- mutex_unlock(&inode->i_mutex);
- put_css_set(cg);
-
- deactivate_super(root->sb);
- /* The cgroup is still accessible in the VFS, but
- * we're not going to try to rmdir() it at this
- * point. */
- printk(KERN_INFO
- "Race in cgroup_clone() - leaking cgroup %s\n",
- nodename);
- goto again;
- }
-
- /* do any required auto-setup */
- for_each_subsys(root, ss) {
- if (ss->post_clone)
- ss->post_clone(ss, child);
- }
-
- /* All seems fine. Finish by moving the task into the new cgroup */
- ret = cgroup_attach_task(child, tsk);
- mutex_unlock(&cgroup_mutex);
-
- out_release:
- mutex_unlock(&inode->i_mutex);
-
- mutex_lock(&cgroup_mutex);
- put_css_set(cg);
- mutex_unlock(&cgroup_mutex);
- deactivate_super(root->sb);
- return ret;
-}
-
-/**
* cgroup_is_descendant - see if @cgrp is a descendant of @task's cgrp
* @cgrp: the cgroup in question
* @task: the task in question
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 02b9611..4613840 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1829,10 +1829,9 @@ static int cpuset_populate(struct cgroup_subsys *ss, struct cgroup *cont)
}

/*
- * post_clone() is called at the end of cgroup_clone().
- * 'cgroup' was just created automatically as a result of
- * a cgroup_clone(), and the current task is about to
- * be moved into 'cgroup'.
+ * post_clone() is called during cgroup_create() when the
+ * clone_children mount argument was specified. The cgroup
+ * can not yet have any tasks.
*
* Currently we refuse to set up the cgroup - thereby
* refusing the task to be entered, and as a result refusing
diff --git a/kernel/fork.c b/kernel/fork.c
index b6cce14..c391b1d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1167,12 +1167,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
if (clone_flags & CLONE_THREAD)
p->tgid = current->tgid;

- if (current->nsproxy != p->nsproxy) {
- retval = ns_cgroup_clone(p, pid);
- if (retval)
- goto bad_fork_free_pid;
- }
-
p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
/*
* Clear TID on mm_release()?
diff --git a/kernel/ns_cgroup.c b/kernel/ns_cgroup.c
deleted file mode 100644
index 2a5dfec..0000000
--- a/kernel/ns_cgroup.c
+++ /dev/null
@@ -1,110 +0,0 @@
-/*
- * ns_cgroup.c - namespace cgroup subsystem
- *
- * Copyright 2006, 2007 IBM Corp
- */
-
-#include <linux/module.h>
-#include <linux/cgroup.h>
-#include <linux/fs.h>
-#include <linux/proc_fs.h>
-#include <linux/slab.h>
-#include <linux/nsproxy.h>
-
-struct ns_cgroup {
- struct cgroup_subsys_state css;
-};
-
-struct cgroup_subsys ns_subsys;
-
-static inline struct ns_cgroup *cgroup_to_ns(
- struct cgroup *cgroup)
-{
- return container_of(cgroup_subsys_state(cgroup, ns_subsys_id),
- struct ns_cgroup, css);
-}
-
-int ns_cgroup_clone(struct task_struct *task, struct pid *pid)
-{
- char name[PROC_NUMBUF];
-
- snprintf(name, PROC_NUMBUF, "%d", pid_vnr(pid));
- return cgroup_clone(task, &ns_subsys, name);
-}
-
-/*
- * Rules:
- * 1. you can only enter a cgroup which is a descendant of your current
- * cgroup
- * 2. you can only place another process into a cgroup if
- * a. you have CAP_SYS_ADMIN
- * b. your cgroup is an ancestor of task's destination cgroup
- * (hence either you are in the same cgroup as task, or in an
- * ancestor cgroup thereof)
- */
-static int ns_can_attach(struct cgroup_subsys *ss, struct cgroup *new_cgroup,
- struct task_struct *task, bool threadgroup)
-{
- if (current != task) {
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
-
- if (!cgroup_is_descendant(new_cgroup, current))
- return -EPERM;
- }
-
- if (!cgroup_is_descendant(new_cgroup, task))
- return -EPERM;
-
- if (threadgroup) {
- struct task_struct *c;
- rcu_read_lock();
- list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
- if (!cgroup_is_descendant(new_cgroup, c)) {
- rcu_read_unlock();
- return -EPERM;
- }
- }
- rcu_read_unlock();
- }
-
- return 0;
-}
-
-/*
- * Rules: you can only create a cgroup if
- * 1. you are capable(CAP_SYS_ADMIN)
- * 2. the target cgroup is a descendant of your own cgroup
- */
-static struct cgroup_subsys_state *ns_create(struct cgroup_subsys *ss,
- struct cgroup *cgroup)
-{
- struct ns_cgroup *ns_cgroup;
-
- if (!capable(CAP_SYS_ADMIN))
- return ERR_PTR(-EPERM);
- if (!cgroup_is_descendant(cgroup, current))
- return ERR_PTR(-EPERM);
-
- ns_cgroup = kzalloc(sizeof(*ns_cgroup), GFP_KERNEL);
- if (!ns_cgroup)
- return ERR_PTR(-ENOMEM);
- return &ns_cgroup->css;
-}
-
-static void ns_destroy(struct cgroup_subsys *ss,
- struct cgroup *cgroup)
-{
- struct ns_cgroup *ns_cgroup;
-
- ns_cgroup = cgroup_to_ns(cgroup);
- kfree(ns_cgroup);
-}
-
-struct cgroup_subsys ns_subsys = {
- .name = "ns",
- .can_attach = ns_can_attach,
- .create = ns_create,
- .destroy = ns_destroy,
- .subsys_id = ns_subsys_id,
-};
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f74e6c0..014a90d 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -198,10 +198,6 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
goto out;
}

- err = ns_cgroup_clone(current, task_pid(current));
- if (err)
- put_nsproxy(*new_nsp);
-
out:
return err;
}
--
1.7.0.4

2010-07-29 21:40:21

by Matt Helsley

[permalink] [raw]
Subject: Re: [PATCH 3/3] cgroup : remove the ns_cgroup

On Thu, Jul 29, 2010 at 02:58:12PM -0500, Serge E. Hallyn wrote:
> The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier.
>
> For example, a single process can not handle a big amount of namespaces
> without interacting with this cgroup and falling in an exponential creation
> time due to the nested cgroup directory depth (eg. /cgroup/<pid>/.../<pid>/...).
>
> That was spotted when creating a single process using multiple network namespaces,
> the objective was 4096 network namespaces, but at 820 netns, the creation time
> was dramatically slow and the creation time for a namespace increased from 10msec
> to 10sec. After five hours, the expected numbers of netns was not reached.
> Without the ns_cgroup interaction, 4K netns are created after 2 minutes.

Is this problem related to Andrew's post here re:

[Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED

>
> In order to solve that, we have to mount the cgroup with all the subsystems
> except the ns_cgroup, it's a little weird and hard to manage from an administration
> pov because we have to know what are the cgroup available on the system and we
> can't do a simple 'mount -t cgroup cgroup /cgroup'.
>
> With the previous patch which adds a 'clone_children' parameter to a cgroup,
> we should be able to remove the ns_cgroup and manage manually the creation +
> adding a task to the cgroup consistenly with the rest of the subsystems.
>
> This patch removes the ns_cgroup as suggested in the following thread:
>
> https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
>
> The 'cgroup_clone' function is removed because it is no longer used.
>
> Changelog: Jul 29 (seh): remove references to ns_cgroup_clone(), fix up
> some documentation, and remove CONFIG_CGROUP_NS references.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> Signed-off-by: Serge E. Hallyn <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> Cc: Paul Menage <[email protected]>
> Cc: Jamal Hadi Salim <[email protected]>

Good riddance to cgroup_clone(). I seem to recall it required some
fairly nasty code contortions and only the ns cgroup needed/used it.

Acked-by: Matt Helsley <[email protected]>

> ---
> Documentation/cgroups/cgroups.txt | 2 +-
> arch/ia64/configs/generic_defconfig | 1 -
> arch/mips/configs/bcm47xx_defconfig | 1 -
> arch/mips/configs/ip27_defconfig | 1 -
> arch/mips/configs/sb1250-swarm_defconfig | 1 -
> arch/powerpc/configs/cell_defconfig | 1 -
> arch/powerpc/configs/ppc64_defconfig | 1 -
> arch/powerpc/configs/ppc64e_defconfig | 1 -
> arch/powerpc/configs/ppc6xx_defconfig | 1 -
> arch/powerpc/configs/pseries_defconfig | 1 -
> arch/s390/defconfig | 1 -
> arch/sh/configs/sdk7786_defconfig | 1 -
> arch/sh/configs/se7206_defconfig | 1 -
> arch/sh/configs/sh7724_generic_defconfig | 1 -
> arch/sh/configs/sh7770_generic_defconfig | 1 -
> arch/sh/configs/shx3_defconfig | 1 -
> arch/sh/configs/urquell_defconfig | 1 -
> arch/x86/configs/i386_defconfig | 1 -
> arch/x86/configs/x86_64_defconfig | 1 -
> include/linux/cgroup.h | 3 -
> include/linux/cgroup_subsys.h | 6 --
> include/linux/nsproxy.h | 9 ---
> init/Kconfig | 9 ---
> kernel/Makefile | 1 -
> kernel/cgroup.c | 116 ------------------------------
> kernel/cpuset.c | 7 +-
> kernel/fork.c | 6 --
> kernel/ns_cgroup.c | 110 ----------------------------
> kernel/nsproxy.c | 4 -
> 29 files changed, 4 insertions(+), 287 deletions(-)
> delete mode 100644 kernel/ns_cgroup.c
>
> diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
> index 190018b..6a5ba63 100644
> --- a/Documentation/cgroups/cgroups.txt
> +++ b/Documentation/cgroups/cgroups.txt
> @@ -618,7 +618,7 @@ always handled well.
> void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
> (cgroup_mutex held by caller)
>
> -Called at the end of cgroup_clone() to do any parameter
> +Called during cgroup_create() to do any parameter
> initialization which might be required before a task could attach. For
> example in cpusets, no task may attach before 'cpus' and 'mems' are set
> up.
> diff --git a/arch/ia64/configs/generic_defconfig b/arch/ia64/configs/generic_defconfig
> index 6a4cc50..d257546 100644
> --- a/arch/ia64/configs/generic_defconfig
> +++ b/arch/ia64/configs/generic_defconfig
> @@ -25,7 +25,6 @@ CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=20
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> # CONFIG_CGROUP_FREEZER is not set
> # CONFIG_CGROUP_DEVICE is not set
> CONFIG_CPUSETS=y
> diff --git a/arch/mips/configs/bcm47xx_defconfig b/arch/mips/configs/bcm47xx_defconfig
> index bbd826b..cb75ade 100644
> --- a/arch/mips/configs/bcm47xx_defconfig
> +++ b/arch/mips/configs/bcm47xx_defconfig
> @@ -199,7 +199,6 @@ CONFIG_TINY_RCU=y
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> # CONFIG_CGROUP_FREEZER is not set
> # CONFIG_CGROUP_DEVICE is not set
> # CONFIG_CPUSETS is not set
> diff --git a/arch/mips/configs/ip27_defconfig b/arch/mips/configs/ip27_defconfig
> index 84b6503..036ad3e 100644
> --- a/arch/mips/configs/ip27_defconfig
> +++ b/arch/mips/configs/ip27_defconfig
> @@ -209,7 +209,6 @@ CONFIG_LOG_BUF_SHIFT=15
> # CONFIG_GROUP_SCHED is not set
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> # CONFIG_CGROUP_FREEZER is not set
> # CONFIG_CGROUP_DEVICE is not set
> CONFIG_CPUSETS=y
> diff --git a/arch/mips/configs/sb1250-swarm_defconfig b/arch/mips/configs/sb1250-swarm_defconfig
> index 7f07bf0..0f059f2 100644
> --- a/arch/mips/configs/sb1250-swarm_defconfig
> +++ b/arch/mips/configs/sb1250-swarm_defconfig
> @@ -197,7 +197,6 @@ CONFIG_SYSVIPC_SYSCTL=y
> CONFIG_LOG_BUF_SHIFT=15
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> CONFIG_CPUSETS=y
> CONFIG_GROUP_SCHED=y
> CONFIG_FAIR_GROUP_SCHED=y
> diff --git a/arch/powerpc/configs/cell_defconfig b/arch/powerpc/configs/cell_defconfig
> index 9433719..cfa9177 100644
> --- a/arch/powerpc/configs/cell_defconfig
> +++ b/arch/powerpc/configs/cell_defconfig
> @@ -76,7 +76,6 @@ CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=15
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> CONFIG_CPUSETS=y
> # CONFIG_GROUP_SCHED is not set
> # CONFIG_USER_SCHED is not set
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 369f4e0..2bc8f44 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -86,7 +86,6 @@ CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> # CONFIG_CGROUP_FREEZER is not set
> # CONFIG_CGROUP_DEVICE is not set
> CONFIG_CPUSETS=y
> diff --git a/arch/powerpc/configs/ppc64e_defconfig b/arch/powerpc/configs/ppc64e_defconfig
> index 403e82e..ac0f13d 100644
> --- a/arch/powerpc/configs/ppc64e_defconfig
> +++ b/arch/powerpc/configs/ppc64e_defconfig
> @@ -100,7 +100,6 @@ CONFIG_LOG_BUF_SHIFT=17
> # CONFIG_GROUP_SCHED is not set
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> # CONFIG_CGROUP_FREEZER is not set
> # CONFIG_CGROUP_DEVICE is not set
> CONFIG_CPUSETS=y
> diff --git a/arch/powerpc/configs/ppc6xx_defconfig b/arch/powerpc/configs/ppc6xx_defconfig
> index 12dc7c4..8175eba 100644
> --- a/arch/powerpc/configs/ppc6xx_defconfig
> +++ b/arch/powerpc/configs/ppc6xx_defconfig
> @@ -88,7 +88,6 @@ CONFIG_AUDIT_TREE=y
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> # CONFIG_CGROUP_FREEZER is not set
> CONFIG_CGROUP_DEVICE=y
> CONFIG_GROUP_SCHED=y
> diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
> index 16ae717..6d96b61 100644
> --- a/arch/powerpc/configs/pseries_defconfig
> +++ b/arch/powerpc/configs/pseries_defconfig
> @@ -85,7 +85,6 @@ CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> CONFIG_CGROUP_FREEZER=y
> CONFIG_CGROUP_DEVICE=y
> CONFIG_CPUSETS=y
> diff --git a/arch/s390/defconfig b/arch/s390/defconfig
> index 253f158..08e9f82 100644
> --- a/arch/s390/defconfig
> +++ b/arch/s390/defconfig
> @@ -72,7 +72,6 @@ CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> # CONFIG_CGROUP_FREEZER is not set
> # CONFIG_CGROUP_DEVICE is not set
> # CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/sdk7786_defconfig b/arch/sh/configs/sdk7786_defconfig
> index 2698245..0f6894e 100644
> --- a/arch/sh/configs/sdk7786_defconfig
> +++ b/arch/sh/configs/sdk7786_defconfig
> @@ -84,7 +84,6 @@ CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> CONFIG_CGROUP_DEBUG=y
> -CONFIG_CGROUP_NS=y
> CONFIG_CGROUP_FREEZER=y
> CONFIG_CGROUP_DEVICE=y
> CONFIG_CPUSETS=y
> diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
> index 910eaec..d2c0fc8 100644
> --- a/arch/sh/configs/se7206_defconfig
> +++ b/arch/sh/configs/se7206_defconfig
> @@ -77,7 +77,6 @@ CONFIG_TREE_RCU_TRACE=y
> CONFIG_LOG_BUF_SHIFT=14
> CONFIG_CGROUPS=y
> CONFIG_CGROUP_DEBUG=y
> -CONFIG_CGROUP_NS=y
> # CONFIG_CGROUP_FREEZER is not set
> CONFIG_CGROUP_DEVICE=y
> # CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/sh7724_generic_defconfig b/arch/sh/configs/sh7724_generic_defconfig
> index a6a9e68..c3a2f67 100644
> --- a/arch/sh/configs/sh7724_generic_defconfig
> +++ b/arch/sh/configs/sh7724_generic_defconfig
> @@ -70,7 +70,6 @@ CONFIG_RCU_FANOUT=32
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> # CONFIG_CGROUP_FREEZER is not set
> # CONFIG_CGROUP_DEVICE is not set
> # CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/sh7770_generic_defconfig b/arch/sh/configs/sh7770_generic_defconfig
> index 4327f89..ce2357f 100644
> --- a/arch/sh/configs/sh7770_generic_defconfig
> +++ b/arch/sh/configs/sh7770_generic_defconfig
> @@ -69,7 +69,6 @@ CONFIG_RCU_FANOUT=32
> CONFIG_LOG_BUF_SHIFT=17
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> # CONFIG_CGROUP_FREEZER is not set
> # CONFIG_CGROUP_DEVICE is not set
> # CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
> index 42f6bd3..05ced6a 100644
> --- a/arch/sh/configs/shx3_defconfig
> +++ b/arch/sh/configs/shx3_defconfig
> @@ -84,7 +84,6 @@ CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=14
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> CONFIG_CGROUP_FREEZER=y
> CONFIG_CGROUP_DEVICE=y
> # CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/urquell_defconfig b/arch/sh/configs/urquell_defconfig
> index 28bb19d..e698dff 100644
> --- a/arch/sh/configs/urquell_defconfig
> +++ b/arch/sh/configs/urquell_defconfig
> @@ -82,7 +82,6 @@ CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=14
> CONFIG_CGROUPS=y
> CONFIG_CGROUP_DEBUG=y
> -CONFIG_CGROUP_NS=y
> CONFIG_CGROUP_FREEZER=y
> CONFIG_CGROUP_DEVICE=y
> CONFIG_CPUSETS=y
> diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
> index d28fad1..4428d5c 100644
> --- a/arch/x86/configs/i386_defconfig
> +++ b/arch/x86/configs/i386_defconfig
> @@ -105,7 +105,6 @@ CONFIG_FAIR_GROUP_SCHED=y
> CONFIG_CGROUP_SCHED=y
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> CONFIG_CGROUP_FREEZER=y
> # CONFIG_CGROUP_DEVICE is not set
> CONFIG_CPUSETS=y
> diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig
> index 6c86acd..2ff947f 100644
> --- a/arch/x86/configs/x86_64_defconfig
> +++ b/arch/x86/configs/x86_64_defconfig
> @@ -105,7 +105,6 @@ CONFIG_FAIR_GROUP_SCHED=y
> CONFIG_CGROUP_SCHED=y
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> CONFIG_CGROUP_FREEZER=y
> # CONFIG_CGROUP_DEVICE is not set
> CONFIG_CPUSETS=y
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index f3cbd73..ddbdb77 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -552,9 +552,6 @@ static inline struct cgroup* task_cgroup(struct task_struct *task,
> return task_subsys_state(task, subsys_id)->cgroup;
> }
>
> -int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *ss,
> - char *nodename);
> -
> /* A cgroup_iter should be treated as an opaque object */
> struct cgroup_iter {
> struct list_head *cg_link;
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index ccefff0..4ba5259 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -19,12 +19,6 @@ SUBSYS(debug)
>
> /* */
>
> -#ifdef CONFIG_CGROUP_NS
> -SUBSYS(ns)
> -#endif
> -
> -/* */
> -
> #ifdef CONFIG_CGROUP_SCHED
> SUBSYS(cpu_cgroup)
> #endif
> diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
> index 7b370c7..50d20ab 100644
> --- a/include/linux/nsproxy.h
> +++ b/include/linux/nsproxy.h
> @@ -81,13 +81,4 @@ static inline void get_nsproxy(struct nsproxy *ns)
> atomic_inc(&ns->count);
> }
>
> -#ifdef CONFIG_CGROUP_NS
> -int ns_cgroup_clone(struct task_struct *tsk, struct pid *pid);
> -#else
> -static inline int ns_cgroup_clone(struct task_struct *tsk, struct pid *pid)
> -{
> - return 0;
> -}
> -#endif
> -
> #endif
> diff --git a/init/Kconfig b/init/Kconfig
> index 5cff9a9..1124656 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -496,15 +496,6 @@ config CGROUP_DEBUG
>
> Say N if unsure.
>
> -config CGROUP_NS
> - bool "Namespace cgroup subsystem"
> - depends on CGROUPS
> - help
> - Provides a simple namespace cgroup subsystem to
> - provide hierarchical naming of sets of namespaces,
> - for instance virtual servers and checkpoint/restart
> - jobs.
> -
> config CGROUP_FREEZER
> bool "Freezer cgroup subsystem"
> depends on CGROUPS
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 057472f..a7ee5f4 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -62,7 +62,6 @@ obj-$(CONFIG_COMPAT) += compat.o
> obj-$(CONFIG_CGROUPS) += cgroup.o
> obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
> obj-$(CONFIG_CPUSETS) += cpuset.o
> -obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
> obj-$(CONFIG_UTS_NS) += utsname.o
> obj-$(CONFIG_USER_NS) += user_namespace.o
> obj-$(CONFIG_PID_NS) += pid_namespace.o
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 09fb6f9..7ec5bad 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4212,122 +4212,6 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
> }
>
> /**
> - * cgroup_clone - clone the cgroup the given subsystem is attached to
> - * @tsk: the task to be moved
> - * @subsys: the given subsystem
> - * @nodename: the name for the new cgroup
> - *
> - * Duplicate the current cgroup in the hierarchy that the given
> - * subsystem is attached to, and move this task into the new
> - * child.
> - */
> -int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *subsys,
> - char *nodename)
> -{
> - struct dentry *dentry;
> - int ret = 0;
> - struct cgroup *parent, *child;
> - struct inode *inode;
> - struct css_set *cg;
> - struct cgroupfs_root *root;
> - struct cgroup_subsys *ss;
> -
> - /* We shouldn't be called by an unregistered subsystem */
> - BUG_ON(!subsys->active);
> -
> - /* First figure out what hierarchy and cgroup we're dealing
> - * with, and pin them so we can drop cgroup_mutex */
> - mutex_lock(&cgroup_mutex);
> - again:
> - root = subsys->root;
> - if (root == &rootnode) {
> - mutex_unlock(&cgroup_mutex);
> - return 0;
> - }
> -
> - /* Pin the hierarchy */
> - if (!atomic_inc_not_zero(&root->sb->s_active)) {
> - /* We race with the final deactivate_super() */
> - mutex_unlock(&cgroup_mutex);
> - return 0;
> - }
> -
> - /* Keep the cgroup alive */
> - task_lock(tsk);
> - parent = task_cgroup(tsk, subsys->subsys_id);
> - cg = tsk->cgroups;
> - get_css_set(cg);
> - task_unlock(tsk);
> -
> - mutex_unlock(&cgroup_mutex);
> -
> - /* Now do the VFS work to create a cgroup */
> - inode = parent->dentry->d_inode;
> -
> - /* Hold the parent directory mutex across this operation to
> - * stop anyone else deleting the new cgroup */
> - mutex_lock(&inode->i_mutex);
> - dentry = lookup_one_len(nodename, parent->dentry, strlen(nodename));
> - if (IS_ERR(dentry)) {
> - printk(KERN_INFO
> - "cgroup: Couldn't allocate dentry for %s: %ld\n", nodename,
> - PTR_ERR(dentry));
> - ret = PTR_ERR(dentry);
> - goto out_release;
> - }
> -
> - /* Create the cgroup directory, which also creates the cgroup */
> - ret = vfs_mkdir(inode, dentry, 0755);
> - child = __d_cgrp(dentry);
> - dput(dentry);
> - if (ret) {
> - printk(KERN_INFO
> - "Failed to create cgroup %s: %d\n", nodename,
> - ret);
> - goto out_release;
> - }
> -
> - /* The cgroup now exists. Retake cgroup_mutex and check
> - * that we're still in the same state that we thought we
> - * were. */
> - mutex_lock(&cgroup_mutex);
> - if ((root != subsys->root) ||
> - (parent != task_cgroup(tsk, subsys->subsys_id))) {
> - /* Aargh, we raced ... */
> - mutex_unlock(&inode->i_mutex);
> - put_css_set(cg);
> -
> - deactivate_super(root->sb);
> - /* The cgroup is still accessible in the VFS, but
> - * we're not going to try to rmdir() it at this
> - * point. */
> - printk(KERN_INFO
> - "Race in cgroup_clone() - leaking cgroup %s\n",
> - nodename);
> - goto again;
> - }
> -
> - /* do any required auto-setup */
> - for_each_subsys(root, ss) {
> - if (ss->post_clone)
> - ss->post_clone(ss, child);
> - }
> -
> - /* All seems fine. Finish by moving the task into the new cgroup */
> - ret = cgroup_attach_task(child, tsk);
> - mutex_unlock(&cgroup_mutex);
> -
> - out_release:
> - mutex_unlock(&inode->i_mutex);
> -
> - mutex_lock(&cgroup_mutex);
> - put_css_set(cg);
> - mutex_unlock(&cgroup_mutex);
> - deactivate_super(root->sb);
> - return ret;
> -}
> -
> -/**
> * cgroup_is_descendant - see if @cgrp is a descendant of @task's cgrp
> * @cgrp: the cgroup in question
> * @task: the task in question
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 02b9611..4613840 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1829,10 +1829,9 @@ static int cpuset_populate(struct cgroup_subsys *ss, struct cgroup *cont)
> }
>
> /*
> - * post_clone() is called at the end of cgroup_clone().
> - * 'cgroup' was just created automatically as a result of
> - * a cgroup_clone(), and the current task is about to
> - * be moved into 'cgroup'.
> + * post_clone() is called during cgroup_create() when the
> + * clone_children mount argument was specified. The cgroup
> + * can not yet have any tasks.
> *
> * Currently we refuse to set up the cgroup - thereby
> * refusing the task to be entered, and as a result refusing
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b6cce14..c391b1d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1167,12 +1167,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> if (clone_flags & CLONE_THREAD)
> p->tgid = current->tgid;
>
> - if (current->nsproxy != p->nsproxy) {
> - retval = ns_cgroup_clone(p, pid);
> - if (retval)
> - goto bad_fork_free_pid;
> - }
> -
> p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
> /*
> * Clear TID on mm_release()?
> diff --git a/kernel/ns_cgroup.c b/kernel/ns_cgroup.c
> deleted file mode 100644
> index 2a5dfec..0000000
> --- a/kernel/ns_cgroup.c
> +++ /dev/null
> @@ -1,110 +0,0 @@
> -/*
> - * ns_cgroup.c - namespace cgroup subsystem
> - *
> - * Copyright 2006, 2007 IBM Corp
> - */
> -
> -#include <linux/module.h>
> -#include <linux/cgroup.h>
> -#include <linux/fs.h>
> -#include <linux/proc_fs.h>
> -#include <linux/slab.h>
> -#include <linux/nsproxy.h>
> -
> -struct ns_cgroup {
> - struct cgroup_subsys_state css;
> -};
> -
> -struct cgroup_subsys ns_subsys;
> -
> -static inline struct ns_cgroup *cgroup_to_ns(
> - struct cgroup *cgroup)
> -{
> - return container_of(cgroup_subsys_state(cgroup, ns_subsys_id),
> - struct ns_cgroup, css);
> -}
> -
> -int ns_cgroup_clone(struct task_struct *task, struct pid *pid)
> -{
> - char name[PROC_NUMBUF];
> -
> - snprintf(name, PROC_NUMBUF, "%d", pid_vnr(pid));
> - return cgroup_clone(task, &ns_subsys, name);
> -}
> -
> -/*
> - * Rules:
> - * 1. you can only enter a cgroup which is a descendant of your current
> - * cgroup
> - * 2. you can only place another process into a cgroup if
> - * a. you have CAP_SYS_ADMIN
> - * b. your cgroup is an ancestor of task's destination cgroup
> - * (hence either you are in the same cgroup as task, or in an
> - * ancestor cgroup thereof)
> - */
> -static int ns_can_attach(struct cgroup_subsys *ss, struct cgroup *new_cgroup,
> - struct task_struct *task, bool threadgroup)
> -{
> - if (current != task) {
> - if (!capable(CAP_SYS_ADMIN))
> - return -EPERM;
> -
> - if (!cgroup_is_descendant(new_cgroup, current))
> - return -EPERM;
> - }
> -
> - if (!cgroup_is_descendant(new_cgroup, task))
> - return -EPERM;
> -
> - if (threadgroup) {
> - struct task_struct *c;
> - rcu_read_lock();
> - list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
> - if (!cgroup_is_descendant(new_cgroup, c)) {
> - rcu_read_unlock();
> - return -EPERM;
> - }
> - }
> - rcu_read_unlock();
> - }
> -
> - return 0;
> -}
> -
> -/*
> - * Rules: you can only create a cgroup if
> - * 1. you are capable(CAP_SYS_ADMIN)
> - * 2. the target cgroup is a descendant of your own cgroup
> - */
> -static struct cgroup_subsys_state *ns_create(struct cgroup_subsys *ss,
> - struct cgroup *cgroup)
> -{
> - struct ns_cgroup *ns_cgroup;
> -
> - if (!capable(CAP_SYS_ADMIN))
> - return ERR_PTR(-EPERM);
> - if (!cgroup_is_descendant(cgroup, current))
> - return ERR_PTR(-EPERM);
> -
> - ns_cgroup = kzalloc(sizeof(*ns_cgroup), GFP_KERNEL);
> - if (!ns_cgroup)
> - return ERR_PTR(-ENOMEM);
> - return &ns_cgroup->css;
> -}
> -
> -static void ns_destroy(struct cgroup_subsys *ss,
> - struct cgroup *cgroup)
> -{
> - struct ns_cgroup *ns_cgroup;
> -
> - ns_cgroup = cgroup_to_ns(cgroup);
> - kfree(ns_cgroup);
> -}
> -
> -struct cgroup_subsys ns_subsys = {
> - .name = "ns",
> - .can_attach = ns_can_attach,
> - .create = ns_create,
> - .destroy = ns_destroy,
> - .subsys_id = ns_subsys_id,
> -};
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index f74e6c0..014a90d 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -198,10 +198,6 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
> goto out;
> }
>
> - err = ns_cgroup_clone(current, task_pid(current));
> - if (err)
> - put_nsproxy(*new_nsp);
> -
> out:
> return err;
> }
> --
> 1.7.0.4
>
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers

2010-07-29 21:47:03

by Paul Menage

[permalink] [raw]
Subject: Re: [PATCH 3/3] cgroup : remove the ns_cgroup

On Thu, Jul 29, 2010 at 12:58 PM, Serge E. Hallyn
<[email protected]> wrote:
> The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier.
>
> For example, a single process can not handle a big amount of namespaces
> without interacting with this cgroup and falling in an exponential creation
> time due to the nested cgroup directory depth (eg. /cgroup/<pid>/.../<pid>/...).
>
> That was spotted when creating a single process using multiple network namespaces,
> the objective was 4096 network namespaces, but at 820 netns, the creation time
> was dramatically slow and the creation time for a namespace increased from 10msec
> to 10sec. After five hours, the expected numbers of netns was not reached.
> Without the ns_cgroup interaction, 4K netns are created after 2 minutes.
>
> In order to solve that, we have to mount the cgroup with all the subsystems
> except the ns_cgroup, it's a little weird and hard to manage from an administration
> pov because we have to know what are the cgroup available on the system and we
> can't do a simple 'mount -t cgroup cgroup /cgroup'.
>
> With the previous patch which adds a 'clone_children' parameter to a cgroup,
> we should be able to remove the ns_cgroup and manage manually the creation +
> adding a task to the cgroup consistenly with the rest of the subsystems.
>
> This patch removes the ns_cgroup as suggested in the following thread:
>
> https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
>
> The 'cgroup_clone' function is removed because it is no longer used.
>
> Changelog: Jul 29 (seh): remove references to ns_cgroup_clone(), fix up
> ? ? ? ? ? some documentation, and remove CONFIG_CGROUP_NS references.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> Signed-off-by: Serge E. Hallyn <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> Cc: Paul Menage <[email protected]>
> Cc: Jamal Hadi Salim <[email protected]>

Acked-by: Paul Menage <[email protected]>

Thanks

> ---
> ?Documentation/cgroups/cgroups.txt ? ? ? ?| ? ?2 +-
> ?arch/ia64/configs/generic_defconfig ? ? ?| ? ?1 -
> ?arch/mips/configs/bcm47xx_defconfig ? ? ?| ? ?1 -
> ?arch/mips/configs/ip27_defconfig ? ? ? ? | ? ?1 -
> ?arch/mips/configs/sb1250-swarm_defconfig | ? ?1 -
> ?arch/powerpc/configs/cell_defconfig ? ? ?| ? ?1 -
> ?arch/powerpc/configs/ppc64_defconfig ? ? | ? ?1 -
> ?arch/powerpc/configs/ppc64e_defconfig ? ?| ? ?1 -
> ?arch/powerpc/configs/ppc6xx_defconfig ? ?| ? ?1 -
> ?arch/powerpc/configs/pseries_defconfig ? | ? ?1 -
> ?arch/s390/defconfig ? ? ? ? ? ? ? ? ? ? ?| ? ?1 -
> ?arch/sh/configs/sdk7786_defconfig ? ? ? ?| ? ?1 -
> ?arch/sh/configs/se7206_defconfig ? ? ? ? | ? ?1 -
> ?arch/sh/configs/sh7724_generic_defconfig | ? ?1 -
> ?arch/sh/configs/sh7770_generic_defconfig | ? ?1 -
> ?arch/sh/configs/shx3_defconfig ? ? ? ? ? | ? ?1 -
> ?arch/sh/configs/urquell_defconfig ? ? ? ?| ? ?1 -
> ?arch/x86/configs/i386_defconfig ? ? ? ? ?| ? ?1 -
> ?arch/x86/configs/x86_64_defconfig ? ? ? ?| ? ?1 -
> ?include/linux/cgroup.h ? ? ? ? ? ? ? ? ? | ? ?3 -
> ?include/linux/cgroup_subsys.h ? ? ? ? ? ?| ? ?6 --
> ?include/linux/nsproxy.h ? ? ? ? ? ? ? ? ?| ? ?9 ---
> ?init/Kconfig ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ? ?9 ---
> ?kernel/Makefile ? ? ? ? ? ? ? ? ? ? ? ? ?| ? ?1 -
> ?kernel/cgroup.c ? ? ? ? ? ? ? ? ? ? ? ? ?| ?116 ------------------------------
> ?kernel/cpuset.c ? ? ? ? ? ? ? ? ? ? ? ? ?| ? ?7 +-
> ?kernel/fork.c ? ? ? ? ? ? ? ? ? ? ? ? ? ?| ? ?6 --
> ?kernel/ns_cgroup.c ? ? ? ? ? ? ? ? ? ? ? | ?110 ----------------------------
> ?kernel/nsproxy.c ? ? ? ? ? ? ? ? ? ? ? ? | ? ?4 -
> ?29 files changed, 4 insertions(+), 287 deletions(-)
> ?delete mode 100644 kernel/ns_cgroup.c
>
> diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
> index 190018b..6a5ba63 100644
> --- a/Documentation/cgroups/cgroups.txt
> +++ b/Documentation/cgroups/cgroups.txt
> @@ -618,7 +618,7 @@ always handled well.
> ?void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
> ?(cgroup_mutex held by caller)
>
> -Called at the end of cgroup_clone() to do any parameter
> +Called during cgroup_create() to do any parameter
> ?initialization which might be required before a task could attach. ?For
> ?example in cpusets, no task may attach before 'cpus' and 'mems' are set
> ?up.
> diff --git a/arch/ia64/configs/generic_defconfig b/arch/ia64/configs/generic_defconfig
> index 6a4cc50..d257546 100644
> --- a/arch/ia64/configs/generic_defconfig
> +++ b/arch/ia64/configs/generic_defconfig
> @@ -25,7 +25,6 @@ CONFIG_IKCONFIG_PROC=y
> ?CONFIG_LOG_BUF_SHIFT=20
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> ?# CONFIG_CGROUP_FREEZER is not set
> ?# CONFIG_CGROUP_DEVICE is not set
> ?CONFIG_CPUSETS=y
> diff --git a/arch/mips/configs/bcm47xx_defconfig b/arch/mips/configs/bcm47xx_defconfig
> index bbd826b..cb75ade 100644
> --- a/arch/mips/configs/bcm47xx_defconfig
> +++ b/arch/mips/configs/bcm47xx_defconfig
> @@ -199,7 +199,6 @@ CONFIG_TINY_RCU=y
> ?CONFIG_LOG_BUF_SHIFT=17
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> ?# CONFIG_CGROUP_FREEZER is not set
> ?# CONFIG_CGROUP_DEVICE is not set
> ?# CONFIG_CPUSETS is not set
> diff --git a/arch/mips/configs/ip27_defconfig b/arch/mips/configs/ip27_defconfig
> index 84b6503..036ad3e 100644
> --- a/arch/mips/configs/ip27_defconfig
> +++ b/arch/mips/configs/ip27_defconfig
> @@ -209,7 +209,6 @@ CONFIG_LOG_BUF_SHIFT=15
> ?# CONFIG_GROUP_SCHED is not set
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> ?# CONFIG_CGROUP_FREEZER is not set
> ?# CONFIG_CGROUP_DEVICE is not set
> ?CONFIG_CPUSETS=y
> diff --git a/arch/mips/configs/sb1250-swarm_defconfig b/arch/mips/configs/sb1250-swarm_defconfig
> index 7f07bf0..0f059f2 100644
> --- a/arch/mips/configs/sb1250-swarm_defconfig
> +++ b/arch/mips/configs/sb1250-swarm_defconfig
> @@ -197,7 +197,6 @@ CONFIG_SYSVIPC_SYSCTL=y
> ?CONFIG_LOG_BUF_SHIFT=15
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> ?CONFIG_CPUSETS=y
> ?CONFIG_GROUP_SCHED=y
> ?CONFIG_FAIR_GROUP_SCHED=y
> diff --git a/arch/powerpc/configs/cell_defconfig b/arch/powerpc/configs/cell_defconfig
> index 9433719..cfa9177 100644
> --- a/arch/powerpc/configs/cell_defconfig
> +++ b/arch/powerpc/configs/cell_defconfig
> @@ -76,7 +76,6 @@ CONFIG_IKCONFIG_PROC=y
> ?CONFIG_LOG_BUF_SHIFT=15
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> ?CONFIG_CPUSETS=y
> ?# CONFIG_GROUP_SCHED is not set
> ?# CONFIG_USER_SCHED is not set
> diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
> index 369f4e0..2bc8f44 100644
> --- a/arch/powerpc/configs/ppc64_defconfig
> +++ b/arch/powerpc/configs/ppc64_defconfig
> @@ -86,7 +86,6 @@ CONFIG_IKCONFIG_PROC=y
> ?CONFIG_LOG_BUF_SHIFT=17
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> ?# CONFIG_CGROUP_FREEZER is not set
> ?# CONFIG_CGROUP_DEVICE is not set
> ?CONFIG_CPUSETS=y
> diff --git a/arch/powerpc/configs/ppc64e_defconfig b/arch/powerpc/configs/ppc64e_defconfig
> index 403e82e..ac0f13d 100644
> --- a/arch/powerpc/configs/ppc64e_defconfig
> +++ b/arch/powerpc/configs/ppc64e_defconfig
> @@ -100,7 +100,6 @@ CONFIG_LOG_BUF_SHIFT=17
> ?# CONFIG_GROUP_SCHED is not set
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> ?# CONFIG_CGROUP_FREEZER is not set
> ?# CONFIG_CGROUP_DEVICE is not set
> ?CONFIG_CPUSETS=y
> diff --git a/arch/powerpc/configs/ppc6xx_defconfig b/arch/powerpc/configs/ppc6xx_defconfig
> index 12dc7c4..8175eba 100644
> --- a/arch/powerpc/configs/ppc6xx_defconfig
> +++ b/arch/powerpc/configs/ppc6xx_defconfig
> @@ -88,7 +88,6 @@ CONFIG_AUDIT_TREE=y
> ?CONFIG_LOG_BUF_SHIFT=17
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> ?# CONFIG_CGROUP_FREEZER is not set
> ?CONFIG_CGROUP_DEVICE=y
> ?CONFIG_GROUP_SCHED=y
> diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
> index 16ae717..6d96b61 100644
> --- a/arch/powerpc/configs/pseries_defconfig
> +++ b/arch/powerpc/configs/pseries_defconfig
> @@ -85,7 +85,6 @@ CONFIG_IKCONFIG_PROC=y
> ?CONFIG_LOG_BUF_SHIFT=17
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> ?CONFIG_CGROUP_FREEZER=y
> ?CONFIG_CGROUP_DEVICE=y
> ?CONFIG_CPUSETS=y
> diff --git a/arch/s390/defconfig b/arch/s390/defconfig
> index 253f158..08e9f82 100644
> --- a/arch/s390/defconfig
> +++ b/arch/s390/defconfig
> @@ -72,7 +72,6 @@ CONFIG_IKCONFIG_PROC=y
> ?CONFIG_LOG_BUF_SHIFT=17
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> ?# CONFIG_CGROUP_FREEZER is not set
> ?# CONFIG_CGROUP_DEVICE is not set
> ?# CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/sdk7786_defconfig b/arch/sh/configs/sdk7786_defconfig
> index 2698245..0f6894e 100644
> --- a/arch/sh/configs/sdk7786_defconfig
> +++ b/arch/sh/configs/sdk7786_defconfig
> @@ -84,7 +84,6 @@ CONFIG_IKCONFIG_PROC=y
> ?CONFIG_LOG_BUF_SHIFT=17
> ?CONFIG_CGROUPS=y
> ?CONFIG_CGROUP_DEBUG=y
> -CONFIG_CGROUP_NS=y
> ?CONFIG_CGROUP_FREEZER=y
> ?CONFIG_CGROUP_DEVICE=y
> ?CONFIG_CPUSETS=y
> diff --git a/arch/sh/configs/se7206_defconfig b/arch/sh/configs/se7206_defconfig
> index 910eaec..d2c0fc8 100644
> --- a/arch/sh/configs/se7206_defconfig
> +++ b/arch/sh/configs/se7206_defconfig
> @@ -77,7 +77,6 @@ CONFIG_TREE_RCU_TRACE=y
> ?CONFIG_LOG_BUF_SHIFT=14
> ?CONFIG_CGROUPS=y
> ?CONFIG_CGROUP_DEBUG=y
> -CONFIG_CGROUP_NS=y
> ?# CONFIG_CGROUP_FREEZER is not set
> ?CONFIG_CGROUP_DEVICE=y
> ?# CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/sh7724_generic_defconfig b/arch/sh/configs/sh7724_generic_defconfig
> index a6a9e68..c3a2f67 100644
> --- a/arch/sh/configs/sh7724_generic_defconfig
> +++ b/arch/sh/configs/sh7724_generic_defconfig
> @@ -70,7 +70,6 @@ CONFIG_RCU_FANOUT=32
> ?CONFIG_LOG_BUF_SHIFT=17
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> ?# CONFIG_CGROUP_FREEZER is not set
> ?# CONFIG_CGROUP_DEVICE is not set
> ?# CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/sh7770_generic_defconfig b/arch/sh/configs/sh7770_generic_defconfig
> index 4327f89..ce2357f 100644
> --- a/arch/sh/configs/sh7770_generic_defconfig
> +++ b/arch/sh/configs/sh7770_generic_defconfig
> @@ -69,7 +69,6 @@ CONFIG_RCU_FANOUT=32
> ?CONFIG_LOG_BUF_SHIFT=17
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -# CONFIG_CGROUP_NS is not set
> ?# CONFIG_CGROUP_FREEZER is not set
> ?# CONFIG_CGROUP_DEVICE is not set
> ?# CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/shx3_defconfig b/arch/sh/configs/shx3_defconfig
> index 42f6bd3..05ced6a 100644
> --- a/arch/sh/configs/shx3_defconfig
> +++ b/arch/sh/configs/shx3_defconfig
> @@ -84,7 +84,6 @@ CONFIG_IKCONFIG_PROC=y
> ?CONFIG_LOG_BUF_SHIFT=14
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> ?CONFIG_CGROUP_FREEZER=y
> ?CONFIG_CGROUP_DEVICE=y
> ?# CONFIG_CPUSETS is not set
> diff --git a/arch/sh/configs/urquell_defconfig b/arch/sh/configs/urquell_defconfig
> index 28bb19d..e698dff 100644
> --- a/arch/sh/configs/urquell_defconfig
> +++ b/arch/sh/configs/urquell_defconfig
> @@ -82,7 +82,6 @@ CONFIG_IKCONFIG_PROC=y
> ?CONFIG_LOG_BUF_SHIFT=14
> ?CONFIG_CGROUPS=y
> ?CONFIG_CGROUP_DEBUG=y
> -CONFIG_CGROUP_NS=y
> ?CONFIG_CGROUP_FREEZER=y
> ?CONFIG_CGROUP_DEVICE=y
> ?CONFIG_CPUSETS=y
> diff --git a/arch/x86/configs/i386_defconfig b/arch/x86/configs/i386_defconfig
> index d28fad1..4428d5c 100644
> --- a/arch/x86/configs/i386_defconfig
> +++ b/arch/x86/configs/i386_defconfig
> @@ -105,7 +105,6 @@ CONFIG_FAIR_GROUP_SCHED=y
> ?CONFIG_CGROUP_SCHED=y
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> ?CONFIG_CGROUP_FREEZER=y
> ?# CONFIG_CGROUP_DEVICE is not set
> ?CONFIG_CPUSETS=y
> diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig
> index 6c86acd..2ff947f 100644
> --- a/arch/x86/configs/x86_64_defconfig
> +++ b/arch/x86/configs/x86_64_defconfig
> @@ -105,7 +105,6 @@ CONFIG_FAIR_GROUP_SCHED=y
> ?CONFIG_CGROUP_SCHED=y
> ?CONFIG_CGROUPS=y
> ?# CONFIG_CGROUP_DEBUG is not set
> -CONFIG_CGROUP_NS=y
> ?CONFIG_CGROUP_FREEZER=y
> ?# CONFIG_CGROUP_DEVICE is not set
> ?CONFIG_CPUSETS=y
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index f3cbd73..ddbdb77 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -552,9 +552,6 @@ static inline struct cgroup* task_cgroup(struct task_struct *task,
> ? ? ? ?return task_subsys_state(task, subsys_id)->cgroup;
> ?}
>
> -int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *ss,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? char *nodename);
> -
> ?/* A cgroup_iter should be treated as an opaque object */
> ?struct cgroup_iter {
> ? ? ? ?struct list_head *cg_link;
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index ccefff0..4ba5259 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -19,12 +19,6 @@ SUBSYS(debug)
>
> ?/* */
>
> -#ifdef CONFIG_CGROUP_NS
> -SUBSYS(ns)
> -#endif
> -
> -/* */
> -
> ?#ifdef CONFIG_CGROUP_SCHED
> ?SUBSYS(cpu_cgroup)
> ?#endif
> diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
> index 7b370c7..50d20ab 100644
> --- a/include/linux/nsproxy.h
> +++ b/include/linux/nsproxy.h
> @@ -81,13 +81,4 @@ static inline void get_nsproxy(struct nsproxy *ns)
> ? ? ? ?atomic_inc(&ns->count);
> ?}
>
> -#ifdef CONFIG_CGROUP_NS
> -int ns_cgroup_clone(struct task_struct *tsk, struct pid *pid);
> -#else
> -static inline int ns_cgroup_clone(struct task_struct *tsk, struct pid *pid)
> -{
> - ? ? ? return 0;
> -}
> -#endif
> -
> ?#endif
> diff --git a/init/Kconfig b/init/Kconfig
> index 5cff9a9..1124656 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -496,15 +496,6 @@ config CGROUP_DEBUG
>
> ? ? ? ? ?Say N if unsure.
>
> -config CGROUP_NS
> - ? ? ? bool "Namespace cgroup subsystem"
> - ? ? ? depends on CGROUPS
> - ? ? ? help
> - ? ? ? ? Provides a simple namespace cgroup subsystem to
> - ? ? ? ? provide hierarchical naming of sets of namespaces,
> - ? ? ? ? for instance virtual servers and checkpoint/restart
> - ? ? ? ? jobs.
> -
> ?config CGROUP_FREEZER
> ? ? ? ?bool "Freezer cgroup subsystem"
> ? ? ? ?depends on CGROUPS
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 057472f..a7ee5f4 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -62,7 +62,6 @@ obj-$(CONFIG_COMPAT) += compat.o
> ?obj-$(CONFIG_CGROUPS) += cgroup.o
> ?obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
> ?obj-$(CONFIG_CPUSETS) += cpuset.o
> -obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
> ?obj-$(CONFIG_UTS_NS) += utsname.o
> ?obj-$(CONFIG_USER_NS) += user_namespace.o
> ?obj-$(CONFIG_PID_NS) += pid_namespace.o
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 09fb6f9..7ec5bad 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4212,122 +4212,6 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
> ?}
>
> ?/**
> - * cgroup_clone - clone the cgroup the given subsystem is attached to
> - * @tsk: the task to be moved
> - * @subsys: the given subsystem
> - * @nodename: the name for the new cgroup
> - *
> - * Duplicate the current cgroup in the hierarchy that the given
> - * subsystem is attached to, and move this task into the new
> - * child.
> - */
> -int cgroup_clone(struct task_struct *tsk, struct cgroup_subsys *subsys,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? char *nodename)
> -{
> - ? ? ? struct dentry *dentry;
> - ? ? ? int ret = 0;
> - ? ? ? struct cgroup *parent, *child;
> - ? ? ? struct inode *inode;
> - ? ? ? struct css_set *cg;
> - ? ? ? struct cgroupfs_root *root;
> - ? ? ? struct cgroup_subsys *ss;
> -
> - ? ? ? /* We shouldn't be called by an unregistered subsystem */
> - ? ? ? BUG_ON(!subsys->active);
> -
> - ? ? ? /* First figure out what hierarchy and cgroup we're dealing
> - ? ? ? ?* with, and pin them so we can drop cgroup_mutex */
> - ? ? ? mutex_lock(&cgroup_mutex);
> - again:
> - ? ? ? root = subsys->root;
> - ? ? ? if (root == &rootnode) {
> - ? ? ? ? ? ? ? mutex_unlock(&cgroup_mutex);
> - ? ? ? ? ? ? ? return 0;
> - ? ? ? }
> -
> - ? ? ? /* Pin the hierarchy */
> - ? ? ? if (!atomic_inc_not_zero(&root->sb->s_active)) {
> - ? ? ? ? ? ? ? /* We race with the final deactivate_super() */
> - ? ? ? ? ? ? ? mutex_unlock(&cgroup_mutex);
> - ? ? ? ? ? ? ? return 0;
> - ? ? ? }
> -
> - ? ? ? /* Keep the cgroup alive */
> - ? ? ? task_lock(tsk);
> - ? ? ? parent = task_cgroup(tsk, subsys->subsys_id);
> - ? ? ? cg = tsk->cgroups;
> - ? ? ? get_css_set(cg);
> - ? ? ? task_unlock(tsk);
> -
> - ? ? ? mutex_unlock(&cgroup_mutex);
> -
> - ? ? ? /* Now do the VFS work to create a cgroup */
> - ? ? ? inode = parent->dentry->d_inode;
> -
> - ? ? ? /* Hold the parent directory mutex across this operation to
> - ? ? ? ?* stop anyone else deleting the new cgroup */
> - ? ? ? mutex_lock(&inode->i_mutex);
> - ? ? ? dentry = lookup_one_len(nodename, parent->dentry, strlen(nodename));
> - ? ? ? if (IS_ERR(dentry)) {
> - ? ? ? ? ? ? ? printk(KERN_INFO
> - ? ? ? ? ? ? ? ? ? ? ?"cgroup: Couldn't allocate dentry for %s: %ld\n", nodename,
> - ? ? ? ? ? ? ? ? ? ? ?PTR_ERR(dentry));
> - ? ? ? ? ? ? ? ret = PTR_ERR(dentry);
> - ? ? ? ? ? ? ? goto out_release;
> - ? ? ? }
> -
> - ? ? ? /* Create the cgroup directory, which also creates the cgroup */
> - ? ? ? ret = vfs_mkdir(inode, dentry, 0755);
> - ? ? ? child = __d_cgrp(dentry);
> - ? ? ? dput(dentry);
> - ? ? ? if (ret) {
> - ? ? ? ? ? ? ? printk(KERN_INFO
> - ? ? ? ? ? ? ? ? ? ? ?"Failed to create cgroup %s: %d\n", nodename,
> - ? ? ? ? ? ? ? ? ? ? ?ret);
> - ? ? ? ? ? ? ? goto out_release;
> - ? ? ? }
> -
> - ? ? ? /* The cgroup now exists. Retake cgroup_mutex and check
> - ? ? ? ?* that we're still in the same state that we thought we
> - ? ? ? ?* were. */
> - ? ? ? mutex_lock(&cgroup_mutex);
> - ? ? ? if ((root != subsys->root) ||
> - ? ? ? ? ? (parent != task_cgroup(tsk, subsys->subsys_id))) {
> - ? ? ? ? ? ? ? /* Aargh, we raced ... */
> - ? ? ? ? ? ? ? mutex_unlock(&inode->i_mutex);
> - ? ? ? ? ? ? ? put_css_set(cg);
> -
> - ? ? ? ? ? ? ? deactivate_super(root->sb);
> - ? ? ? ? ? ? ? /* The cgroup is still accessible in the VFS, but
> - ? ? ? ? ? ? ? ?* we're not going to try to rmdir() it at this
> - ? ? ? ? ? ? ? ?* point. */
> - ? ? ? ? ? ? ? printk(KERN_INFO
> - ? ? ? ? ? ? ? ? ? ? ?"Race in cgroup_clone() - leaking cgroup %s\n",
> - ? ? ? ? ? ? ? ? ? ? ?nodename);
> - ? ? ? ? ? ? ? goto again;
> - ? ? ? }
> -
> - ? ? ? /* do any required auto-setup */
> - ? ? ? for_each_subsys(root, ss) {
> - ? ? ? ? ? ? ? if (ss->post_clone)
> - ? ? ? ? ? ? ? ? ? ? ? ss->post_clone(ss, child);
> - ? ? ? }
> -
> - ? ? ? /* All seems fine. Finish by moving the task into the new cgroup */
> - ? ? ? ret = cgroup_attach_task(child, tsk);
> - ? ? ? mutex_unlock(&cgroup_mutex);
> -
> - out_release:
> - ? ? ? mutex_unlock(&inode->i_mutex);
> -
> - ? ? ? mutex_lock(&cgroup_mutex);
> - ? ? ? put_css_set(cg);
> - ? ? ? mutex_unlock(&cgroup_mutex);
> - ? ? ? deactivate_super(root->sb);
> - ? ? ? return ret;
> -}
> -
> -/**
> ?* cgroup_is_descendant - see if @cgrp is a descendant of @task's cgrp
> ?* @cgrp: the cgroup in question
> ?* @task: the task in question
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 02b9611..4613840 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1829,10 +1829,9 @@ static int cpuset_populate(struct cgroup_subsys *ss, struct cgroup *cont)
> ?}
>
> ?/*
> - * post_clone() is called at the end of cgroup_clone().
> - * 'cgroup' was just created automatically as a result of
> - * a cgroup_clone(), and the current task is about to
> - * be moved into 'cgroup'.
> + * post_clone() is called during cgroup_create() when the
> + * clone_children mount argument was specified. ?The cgroup
> + * can not yet have any tasks.
> ?*
> ?* Currently we refuse to set up the cgroup - thereby
> ?* refusing the task to be entered, and as a result refusing
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b6cce14..c391b1d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1167,12 +1167,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> ? ? ? ?if (clone_flags & CLONE_THREAD)
> ? ? ? ? ? ? ? ?p->tgid = current->tgid;
>
> - ? ? ? if (current->nsproxy != p->nsproxy) {
> - ? ? ? ? ? ? ? retval = ns_cgroup_clone(p, pid);
> - ? ? ? ? ? ? ? if (retval)
> - ? ? ? ? ? ? ? ? ? ? ? goto bad_fork_free_pid;
> - ? ? ? }
> -
> ? ? ? ?p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
> ? ? ? ?/*
> ? ? ? ? * Clear TID on mm_release()?
> diff --git a/kernel/ns_cgroup.c b/kernel/ns_cgroup.c
> deleted file mode 100644
> index 2a5dfec..0000000
> --- a/kernel/ns_cgroup.c
> +++ /dev/null
> @@ -1,110 +0,0 @@
> -/*
> - * ns_cgroup.c - namespace cgroup subsystem
> - *
> - * Copyright 2006, 2007 IBM Corp
> - */
> -
> -#include <linux/module.h>
> -#include <linux/cgroup.h>
> -#include <linux/fs.h>
> -#include <linux/proc_fs.h>
> -#include <linux/slab.h>
> -#include <linux/nsproxy.h>
> -
> -struct ns_cgroup {
> - ? ? ? struct cgroup_subsys_state css;
> -};
> -
> -struct cgroup_subsys ns_subsys;
> -
> -static inline struct ns_cgroup *cgroup_to_ns(
> - ? ? ? ? ? ? ? struct cgroup *cgroup)
> -{
> - ? ? ? return container_of(cgroup_subsys_state(cgroup, ns_subsys_id),
> - ? ? ? ? ? ? ? ? ? ? ? ? ? struct ns_cgroup, css);
> -}
> -
> -int ns_cgroup_clone(struct task_struct *task, struct pid *pid)
> -{
> - ? ? ? char name[PROC_NUMBUF];
> -
> - ? ? ? snprintf(name, PROC_NUMBUF, "%d", pid_vnr(pid));
> - ? ? ? return cgroup_clone(task, &ns_subsys, name);
> -}
> -
> -/*
> - * Rules:
> - * ? 1. you can only enter a cgroup which is a descendant of your current
> - * ? ? cgroup
> - * ? 2. you can only place another process into a cgroup if
> - * ? ? a. you have CAP_SYS_ADMIN
> - * ? ? b. your cgroup is an ancestor of task's destination cgroup
> - * ? ? ? (hence either you are in the same cgroup as task, or in an
> - * ? ? ? ?ancestor cgroup thereof)
> - */
> -static int ns_can_attach(struct cgroup_subsys *ss, struct cgroup *new_cgroup,
> - ? ? ? ? ? ? ? ? ? ? ? ?struct task_struct *task, bool threadgroup)
> -{
> - ? ? ? if (current != task) {
> - ? ? ? ? ? ? ? if (!capable(CAP_SYS_ADMIN))
> - ? ? ? ? ? ? ? ? ? ? ? return -EPERM;
> -
> - ? ? ? ? ? ? ? if (!cgroup_is_descendant(new_cgroup, current))
> - ? ? ? ? ? ? ? ? ? ? ? return -EPERM;
> - ? ? ? }
> -
> - ? ? ? if (!cgroup_is_descendant(new_cgroup, task))
> - ? ? ? ? ? ? ? return -EPERM;
> -
> - ? ? ? if (threadgroup) {
> - ? ? ? ? ? ? ? struct task_struct *c;
> - ? ? ? ? ? ? ? rcu_read_lock();
> - ? ? ? ? ? ? ? list_for_each_entry_rcu(c, &task->thread_group, thread_group) {
> - ? ? ? ? ? ? ? ? ? ? ? if (!cgroup_is_descendant(new_cgroup, c)) {
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? rcu_read_unlock();
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return -EPERM;
> - ? ? ? ? ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? rcu_read_unlock();
> - ? ? ? }
> -
> - ? ? ? return 0;
> -}
> -
> -/*
> - * Rules: you can only create a cgroup if
> - * ? ? 1. you are capable(CAP_SYS_ADMIN)
> - * ? ? 2. the target cgroup is a descendant of your own cgroup
> - */
> -static struct cgroup_subsys_state *ns_create(struct cgroup_subsys *ss,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct cgroup *cgroup)
> -{
> - ? ? ? struct ns_cgroup *ns_cgroup;
> -
> - ? ? ? if (!capable(CAP_SYS_ADMIN))
> - ? ? ? ? ? ? ? return ERR_PTR(-EPERM);
> - ? ? ? if (!cgroup_is_descendant(cgroup, current))
> - ? ? ? ? ? ? ? return ERR_PTR(-EPERM);
> -
> - ? ? ? ns_cgroup = kzalloc(sizeof(*ns_cgroup), GFP_KERNEL);
> - ? ? ? if (!ns_cgroup)
> - ? ? ? ? ? ? ? return ERR_PTR(-ENOMEM);
> - ? ? ? return &ns_cgroup->css;
> -}
> -
> -static void ns_destroy(struct cgroup_subsys *ss,
> - ? ? ? ? ? ? ? ? ? ? ? struct cgroup *cgroup)
> -{
> - ? ? ? struct ns_cgroup *ns_cgroup;
> -
> - ? ? ? ns_cgroup = cgroup_to_ns(cgroup);
> - ? ? ? kfree(ns_cgroup);
> -}
> -
> -struct cgroup_subsys ns_subsys = {
> - ? ? ? .name = "ns",
> - ? ? ? .can_attach = ns_can_attach,
> - ? ? ? .create = ns_create,
> - ? ? ? .destroy ?= ns_destroy,
> - ? ? ? .subsys_id = ns_subsys_id,
> -};
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index f74e6c0..014a90d 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -198,10 +198,6 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
> ? ? ? ? ? ? ? ?goto out;
> ? ? ? ?}
>
> - ? ? ? err = ns_cgroup_clone(current, task_pid(current));
> - ? ? ? if (err)
> - ? ? ? ? ? ? ? put_nsproxy(*new_nsp);
> -
> ?out:
> ? ? ? ?return err;
> ?}
> --
> 1.7.0.4
>
>

2010-07-29 22:40:15

by Serge Hallyn

[permalink] [raw]
Subject: Re: [PATCH 3/3] cgroup : remove the ns_cgroup

Quoting Matt Helsley ([email protected]):
> On Thu, Jul 29, 2010 at 02:58:12PM -0500, Serge E. Hallyn wrote:
> > The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier.
> >
> > For example, a single process can not handle a big amount of namespaces
> > without interacting with this cgroup and falling in an exponential creation
> > time due to the nested cgroup directory depth (eg. /cgroup/<pid>/.../<pid>/...).
> >
> > That was spotted when creating a single process using multiple network namespaces,
> > the objective was 4096 network namespaces, but at 820 netns, the creation time
> > was dramatically slow and the creation time for a namespace increased from 10msec
> > to 10sec. After five hours, the expected numbers of netns was not reached.
> > Without the ns_cgroup interaction, 4K netns are created after 2 minutes.
>
> Is this problem related to Andrew's post here re:
>
> [Bugme-new] [Bug 16417] New: Slow context switches with SMP and CONFIG_FAIR_GROUP_SCHED

Hm, I don't think so (though it should be trivial to test :). The
situation Daniel (the real patch and intro author) cites is I believe
mainly due to the time spent traversing very deep paths. Whereas
Pierre doesn't seem to be even unsharing at all. Rather he's just
creating cgroups with mkdir.

Still I could be wrong.

BTW in the past the only reason I saw for keeping ns cgroup was
to lock tasks into a devices cgroup. Until that lazy guy who was
going to do it gets off his butt and implements user namespaces,
you'll just have to use LSMs, which is the right way.

-serge

2010-07-29 23:01:12

by Matt Helsley

[permalink] [raw]
Subject: Remaining work for userns (WAS Re: [PATCH 3/3] cgroup : remove the ns_cgroup)

On Thu, Jul 29, 2010 at 05:39:57PM -0500, Serge E. Hallyn wrote:
> Quoting Matt Helsley ([email protected]):
> > On Thu, Jul 29, 2010 at 02:58:12PM -0500, Serge E. Hallyn wrote:

<snip>

>
> BTW in the past the only reason I saw for keeping ns cgroup was
> to lock tasks into a devices cgroup. Until that lazy guy who was
> going to do it gets off his butt and implements user namespaces,
> you'll just have to use LSMs, which is the right way.

And the only missing piece of userns is replacing the cred checks
right? If so, it might be possible to come up with a coccinelle semantic
patch which would do all/most of the hard work -- depends on whether the
all the checks fit a small number of semantic patterns.

Cheers,
-Matt Helsley

2010-07-29 23:21:35

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: Remaining work for userns (WAS Re: [PATCH 3/3] cgroup : remove the ns_cgroup)

Quoting Matt Helsley ([email protected]):
> On Thu, Jul 29, 2010 at 05:39:57PM -0500, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > On Thu, Jul 29, 2010 at 02:58:12PM -0500, Serge E. Hallyn wrote:
>
> <snip>
>
> >
> > BTW in the past the only reason I saw for keeping ns cgroup was
> > to lock tasks into a devices cgroup. Until that lazy guy who was
> > going to do it gets off his butt and implements user namespaces,
> > you'll just have to use LSMs, which is the right way.
>
> And the only missing piece of userns is replacing the cred checks
> right? If so, it might be possible to come up with a coccinelle semantic
> patch which would do all/most of the hard work -- depends on whether the
> all the checks fit a small number of semantic patterns.

I think the thing that always puts the brakes on when I get started
is siginfo_t. We need some way to reference user namespaces in there,
without enforcing lifetime rules on siginfo.

What you mention is definately a chunk as well, so if you are interested
in pursuing that that'd be great.

Also, reviewing the patches at the top of
http://git.kernel.org/?p=linux/kernel/git/sergeh/linux-cr.git;a=shortlog;h=refs/heads/userns.feb16.1
to give us some fresh feedback on the general approach is
valuable.

And from there, the whole discussion (which we've had several times
in the past) about how to have the VFS map userids should probably be
had again. (I believe august 2008 was the last time we really got
into that)

thanks,
-serge

2010-07-31 00:23:37

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Remaining work for userns

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Matt Helsley ([email protected]):
>> On Thu, Jul 29, 2010 at 05:39:57PM -0500, Serge E. Hallyn wrote:
>> > Quoting Matt Helsley ([email protected]):
>> > > On Thu, Jul 29, 2010 at 02:58:12PM -0500, Serge E. Hallyn wrote:
>>
>> <snip>
>>
>> >
>> > BTW in the past the only reason I saw for keeping ns cgroup was
>> > to lock tasks into a devices cgroup. Until that lazy guy who was
>> > going to do it gets off his butt and implements user namespaces,
>> > you'll just have to use LSMs, which is the right way.
>>
>> And the only missing piece of userns is replacing the cred checks
>> right? If so, it might be possible to come up with a coccinelle semantic
>> patch which would do all/most of the hard work -- depends on whether the
>> all the checks fit a small number of semantic patterns.
>
> I think the thing that always puts the brakes on when I get started
> is siginfo_t. We need some way to reference user namespaces in there,
> without enforcing lifetime rules on siginfo.

As I recall signal delivery in the kernel lands the signal in the
queue of the destination process before the syscall returns. If that
is true we should be able to handle signal delivery by just doing
whatever conversions are needed during delivery.

aka the userns should just be task->nsproxy->user_ns for
task->signal->queue. We cannot unshare the user namespace so there
are no nasty races.

I am reminded that I may want to play with the user namespace and
unshare when I get setns refresh and reviewed for inclusion. Still
none of that should affect the fact that a task should never be
able to change user namespaces.

> What you mention is definately a chunk as well, so if you are interested
> in pursuing that that'd be great.
>
> Also, reviewing the patches at the top of
> http://git.kernel.org/?p=linux/kernel/git/sergeh/linux-cr.git;a=shortlog;h=refs/heads/userns.feb16.1
> to give us some fresh feedback on the general approach is
> valuable.
>
> And from there, the whole discussion (which we've had several times
> in the past) about how to have the VFS map userids should probably be
> had again. (I believe august 2008 was the last time we really got
> into that)

We now have user_ns_map_uid and user_ns_map_gid in next-next.git
Serge I'm not certain how that interacts with your other work, but
it is definitely something we want to build on.

Eric

2010-08-03 08:26:08

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH 1/3] cgroup : add clone_children control file

Cc: Andrew Morton (to pick up those patches)

Serge E. Hallyn wrote:
> This patch is sent as an answer to a previous thread around the ns_cgroup.
>
> https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html
>
> It adds a control file 'clone_children' for a cgroup.
> This control file is a boolean specifying if the child cgroup should
> be a clone of the parent cgroup or not. The default value is 'false'.
>
> This flag makes the child cgroup to call the post_clone callback of all
> the subsystem, if it is available.
>
> At present, the cpuset is the only one which had implemented the post_clone
> callback.
>
> The option can be set at mount time by specifying the 'clone_children' mount
> option.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> Signed-off-by: Serge E. Hallyn <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> Cc: Paul Menage <[email protected]>

Reviewed-by: Li Zefan <[email protected]>

2010-08-03 08:26:18

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH 3/3] cgroup : remove the ns_cgroup

Cc: Andrew

Serge E. Hallyn wrote:
> The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier.
>
> For example, a single process can not handle a big amount of namespaces
> without interacting with this cgroup and falling in an exponential creation
> time due to the nested cgroup directory depth (eg. /cgroup/<pid>/.../<pid>/...).
>
> That was spotted when creating a single process using multiple network namespaces,
> the objective was 4096 network namespaces, but at 820 netns, the creation time
> was dramatically slow and the creation time for a namespace increased from 10msec
> to 10sec. After five hours, the expected numbers of netns was not reached.
> Without the ns_cgroup interaction, 4K netns are created after 2 minutes.
>
> In order to solve that, we have to mount the cgroup with all the subsystems
> except the ns_cgroup, it's a little weird and hard to manage from an administration
> pov because we have to know what are the cgroup available on the system and we
> can't do a simple 'mount -t cgroup cgroup /cgroup'.
>
> With the previous patch which adds a 'clone_children' parameter to a cgroup,
> we should be able to remove the ns_cgroup and manage manually the creation +
> adding a task to the cgroup consistenly with the rest of the subsystems.
>
> This patch removes the ns_cgroup as suggested in the following thread:
>
> https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
>
> The 'cgroup_clone' function is removed because it is no longer used.
>
> Changelog: Jul 29 (seh): remove references to ns_cgroup_clone(), fix up
> some documentation, and remove CONFIG_CGROUP_NS references.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> Signed-off-by: Serge E. Hallyn <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> Cc: Paul Menage <[email protected]>
> Cc: Jamal Hadi Salim <[email protected]>

Reviewed-by: Li Zefan <[email protected]>

2010-08-03 08:26:43

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH 2/3] cgroup : make the mount options parsing more accurate

Cc: Andrew

Serge E. Hallyn wrote:
> The actual code does not detect 'all' with one subsystem name, which
> is IMHO mutually exclusive and when an option is specified even if it
> is not a subsystem name, we have to specify the 'all' option with the
> other option.
> eg:
> not detected : mount -t cgroup -o all,freezer cgroup /cgroup
> not flexible : mount -t cgroup -o noprefix,all cgroup /cgroup
>
> This patch fix this and makes the code a bit more clear by replacing
> 'else if' indentation by 'continue' blocks in the loop.
>
> Signed-off-by: Daniel Lezcano <[email protected]>
> Signed-off-by: Serge E. Hallyn <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> Cc: Paul Menage <[email protected]>

Reviewed-by: Li Zefan <[email protected]>