2010-12-15 09:34:44

by Li Zefan

[permalink] [raw]
Subject: [PATCH v2 0/6] cgroups: Bindable cgroup subsystems

Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.

But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.

This patchset alleviates the pain, and then a subsytem can be
bound/unbound to/from a hierarchy which has sub-cgroups in it.

Some subsystems still can't take advantage of this patchset, memcgroup
and cpuset for example.

For cpuset, if a hierarchy has a sub-cgroup and the cgroup has tasks,
we can't decide sub-cgroup's cpuset.mems and cpuset.cpus automatically
if we try to bind cpuset to this hierarchy.

For memcgroup, memcgroup uses css_get/put(), and due to some complexity,
for now bindable subsystems should not use css_get/put().

Usage:

# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks

(add cpuacct to the hierarchy)
# mount -o remount,cpuset,cpuacct xxx /mnt

(remove it from the hierarchy)
# mount -o remount,cpuset xxx /mnt

There's another limitation, cpuacct should not be bound to any mounted
hierarchy before the above operation. But that's not a problem, as you
can remove it from a hierarchy and bind it to another one.

Changelog v2:

- Fix some bugs.
- Spit can_bind flag to bindable and unbindable flags
- Provide a __css_tryget() so a bindable subsystem can pin a cgroup
via it.
- ...

---
Documentation/cgroups/cgroups.txt | 37 +++-
include/linux/cgroup.h | 39 +++-
kernel/cgroup.c | 391 +++++++++++++++++++++++++++++++------
kernel/cgroup_freezer.c | 1 +
kernel/sched.c | 2 +
security/device_cgroup.c | 2 +
6 files changed, 398 insertions(+), 74 deletions(-)


2010-12-15 09:34:58

by Li Zefan

[permalink] [raw]
Subject: [PATCH v2 1/6] cgroups: Shrink struct cgroup_subsys

On x86_32, sizeof(struct cgroup_subsys) shrinks from 276 bytes
to 264.

Acked-by: Paul Menage <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
---
include/linux/cgroup.h | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed4ba11..63d953d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -481,14 +481,16 @@ struct cgroup_subsys {
void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);

int subsys_id;
- int active;
- int disabled;
- int early_init;
+
+ bool active:1;
+ bool disabled:1;
+ bool early_init:1;
/*
* True if this subsys uses ID. ID is not available before cgroup_init()
* (not available in early_init time.)
*/
- bool use_id;
+ bool use_id:1;
+
#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;

--
1.6.3

2010-12-15 09:35:17

by Li Zefan

[permalink] [raw]
Subject: [PATCH v2 2/6] cgroups: Allow to bind a subsystem to a cgroup hierarchy

Stephane posted a patchset to add perf_cgroup subsystem, so perf can
be used to monitor all threads belonging to a cgroup.

But if you already mounted a cgroup hierarchy but without perf_cgroup
and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
and thus you're not able to use per-cgroup perf feature.

This patch alleviates the pain, and then a subsytem can be bind to
a hierarchy which has sub-cgroups in it.

Matt also commented that users will appreciate this feature.

For a cgroup subsystem to become bindable, the bindable flag of
struct cgroup_subsys should be set.

But for some constraints, not all subsystems can take advantage of
this patch. For example, we can't decide a cgroup's cpuset.mems and
cpuset.cpus automatically, so cpuset is not bindable.

Usage:

# mount -t cgroup -o cpuset xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks

(assume cpuacct is bindable, and we add cpuacct to the hierarchy)

# mount -o remount,cpuset,cpuacct xxx /mnt

Changelog v2:

- Add more code comments.
- Use rcu_assign_pointer in hierarchy_update_css_sets().
- Fix to nullify css pointers in hierarchy_attach_css_failed().
- Fix to call post_clone() for newly-created css.

Signed-off-by: Li Zefan <[email protected]>
---
include/linux/cgroup.h | 5 +
kernel/cgroup.c | 273 ++++++++++++++++++++++++++++++++++++++----------
2 files changed, 221 insertions(+), 57 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 63d953d..d8c4e22 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -490,6 +490,11 @@ struct cgroup_subsys {
* (not available in early_init time.)
*/
bool use_id:1;
+ /*
+ * Indicate if this subsystem can be bound to a cgroup hierarchy
+ * which has child cgroups.
+ */
+ bool bindable:1;

#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 66a416b..caac80f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -57,6 +57,7 @@
#include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
#include <linux/eventfd.h>
#include <linux/poll.h>
+#include <linux/bitops.h>

#include <asm/atomic.h>

@@ -871,18 +872,13 @@ static void remove_dir(struct dentry *d)

static void cgroup_clear_directory(struct dentry *dentry)
{
- struct list_head *node;
+ struct dentry *d, *tmp;

BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex));
spin_lock(&dcache_lock);
- node = dentry->d_subdirs.next;
- while (node != &dentry->d_subdirs) {
- struct dentry *d = list_entry(node, struct dentry, d_u.d_child);
- list_del_init(node);
- if (d->d_inode) {
- /* This should never be called on a cgroup
- * directory with child cgroups */
- BUG_ON(d->d_inode->i_mode & S_IFDIR);
+ list_for_each_entry_safe(d, tmp, &dentry->d_subdirs, d_u.d_child) {
+ if (d->d_inode && !(d->d_inode->i_mode & S_IFDIR)) {
+ list_del_init(&d->d_u.d_child);
d = dget_locked(d);
spin_unlock(&dcache_lock);
d_delete(d);
@@ -890,7 +886,6 @@ static void cgroup_clear_directory(struct dentry *dentry)
dput(d);
spin_lock(&dcache_lock);
}
- node = dentry->d_subdirs.next;
}
spin_unlock(&dcache_lock);
}
@@ -935,6 +930,171 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
css_put(css);
}

+static void init_cgroup_css(struct cgroup_subsys_state *css,
+ struct cgroup_subsys *ss,
+ struct cgroup *cgrp)
+{
+ css->cgroup = cgrp;
+ atomic_set(&css->refcnt, 1);
+ css->flags = 0;
+ css->id = NULL;
+ if (cgrp == dummytop)
+ set_bit(CSS_ROOT, &css->flags);
+ BUG_ON(cgrp->subsys[ss->subsys_id]);
+ cgrp->subsys[ss->subsys_id] = css;
+}
+
+static int cgroup_attach_css(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+ struct cgroup_subsys_state *css;
+ int ret;
+
+ css = ss->create(ss, cgrp);
+ if (IS_ERR(css))
+ return PTR_ERR(css);
+ init_cgroup_css(css, ss, cgrp);
+
+ if (ss->use_id) {
+ ret = alloc_css_id(ss, cgrp->parent, cgrp);
+ if (ret)
+ return ret;
+ }
+ /* At error, ->destroy() callback has to free assigned ID. */
+
+ if (clone_children(cgrp->parent) && ss->post_clone)
+ ss->post_clone(ss, cgrp);
+
+ return 0;
+}
+
+/*
+ * cgroup_walk_hierarchy - iterate through a cgroup hierarchy
+ * @process_cgroup: callback called on each cgroup in the hierarchy
+ * @data: will be passed to @process_cgroup
+ * @top_cgrp: the root cgroup of the hierarchy
+ *
+ * It's a pre-order traversal, so a parent cgroup will be processed before
+ * its children.
+ */
+static int cgroup_walk_hierarchy(int (*process_cgroup)(struct cgroup *, void *),
+ void *data, struct cgroup *top_cgrp)
+{
+ struct cgroup *parent = top_cgrp;
+ struct cgroup *child;
+ struct list_head *node;
+ int ret;
+
+ node = parent->children.next;
+repeat:
+ while (node != &parent->children) {
+ child = list_entry(node, struct cgroup, sibling);
+
+ /* Process this cgroup */
+ ret = process_cgroup(child, data);
+ if (ret)
+ return ret;
+
+ /* Process its children */
+ if (!list_empty(&child->children)) {
+ parent = child;
+ node = parent->children.next;
+ goto repeat;
+ } else
+ node = node->next;
+ }
+
+ /* Process its siblings */
+ if (parent != top_cgrp) {
+ child = parent;
+ parent = child->parent;
+ node = child->sibling.next;
+ goto repeat;
+ }
+
+ return 0;
+}
+
+/*
+ * If hierarchy_attach_css() failed, do some cleanup.
+ */
+static int hierarchy_attach_css_failed(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ if (cgrp->subsys[i]) {
+ subsys[i]->destroy(subsys[i], cgrp);
+ cgrp->subsys[i] = NULL;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Allocate css objects of added subsystems, and attach them to the
+ * existing cgroup.
+ */
+static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+ int ret = 0;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ ret = cgroup_attach_css(subsys[i], cgrp);
+ if (ret)
+ break;
+ }
+
+ if (ret)
+ cgroup_walk_hierarchy(hierarchy_attach_css_failed, data,
+ cgrp->top_cgroup);
+ return ret;
+}
+
+/*
+ * After attaching new css objects to the cgroup, we need to entangle
+ * them into the existing css_sets.
+ */
+static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+ int i;
+ struct cg_cgroup_link *link;
+
+ write_lock(&css_set_lock);
+ list_for_each_entry(link, &cgrp->css_sets, cgrp_link_list) {
+ struct css_set *cg = link->cg;
+ struct hlist_head *hhead;
+
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ rcu_assign_pointer(cg->subsys[i], cgrp->subsys[i]);
+
+ /* rehash */
+ hlist_del(&cg->hlist);
+ hhead = css_set_hash(cg->subsys);
+ hlist_add_head(&cg->hlist, hhead);
+ }
+ write_unlock(&css_set_lock);
+
+ return 0;
+}
+
+/*
+ * Re-populate each cgroup directory.
+ *
+ * Note root cgroup's inode mutex is held.
+ */
+static int hierarchy_populate_dir(struct cgroup *cgrp, void *data)
+{
+ mutex_lock_nested(&cgrp->dentry->d_inode->i_mutex, I_MUTEX_CHILD);
+ cgroup_populate_dir(cgrp);
+ mutex_unlock(&cgrp->dentry->d_inode->i_mutex);
+ return 0;
+}
+
/*
* Call with cgroup_mutex held. Drops reference counts on modules, including
* any duplicate ones that parse_cgroupfs_options took. If this function
@@ -946,36 +1106,59 @@ static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long added_bits, removed_bits;
struct cgroup *cgrp = &root->top_cgroup;
int i;
+ int err;

BUG_ON(!mutex_is_locked(&cgroup_mutex));

removed_bits = root->actual_subsys_bits & ~final_bits;
added_bits = final_bits & ~root->actual_subsys_bits;
+
/* Check that any added subsystems are currently free */
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
- unsigned long bit = 1UL << i;
- struct cgroup_subsys *ss = subsys[i];
- if (!(bit & added_bits))
- continue;
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
/*
* Nobody should tell us to do a subsys that doesn't exist:
* parse_cgroupfs_options should catch that case and refcounts
* ensure that subsystems won't disappear once selected.
*/
- BUG_ON(ss == NULL);
- if (ss->root != &rootnode) {
+ BUG_ON(subsys[i] == NULL);
+ if (subsys[i]->root != &rootnode) {
/* Subsystem isn't free */
return -EBUSY;
}
}

- /* Currently we don't handle adding/removing subsystems when
- * any child cgroups exist. This is theoretically supportable
- * but involves complex error handling, so it's being left until
- * later */
- if (root->number_of_cgroups > 1)
+ /* Removing will be supported later */
+ if (root->number_of_cgroups > 1 && removed_bits)
return -EBUSY;

+ /*
+ * For non-trivial hierarchy, check that added subsystems
+ * are all bindable
+ */
+ if (root->number_of_cgroups > 1) {
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ if (!subsys[i]->bindable)
+ return -EBUSY;
+ }
+
+ /* Attach css objects to the top cgroup */
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
+ BUG_ON(cgrp->subsys[i]);
+ BUG_ON(!dummytop->subsys[i]);
+ BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
+
+ cgrp->subsys[i] = dummytop->subsys[i];
+ cgrp->subsys[i]->cgroup = cgrp;
+ }
+
+ err = cgroup_walk_hierarchy(hierarchy_attach_css,
+ (void *)added_bits, cgrp);
+ if (err)
+ goto failed;
+
+ cgroup_walk_hierarchy(hierarchy_update_css_sets,
+ (void *)added_bits, cgrp);
+
/* Process each subsystem */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
@@ -983,12 +1166,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
if (bit & added_bits) {
/* We're binding this subsystem to this hierarchy */
BUG_ON(ss == NULL);
- BUG_ON(cgrp->subsys[i]);
- BUG_ON(!dummytop->subsys[i]);
- BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
mutex_lock(&ss->hierarchy_mutex);
- cgrp->subsys[i] = dummytop->subsys[i];
- cgrp->subsys[i]->cgroup = cgrp;
list_move(&ss->sibling, &root->subsys_list);
ss->root = root;
if (ss->bind)
@@ -1001,10 +1179,10 @@ static int rebind_subsystems(struct cgroupfs_root *root,
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
mutex_lock(&ss->hierarchy_mutex);
- if (ss->bind)
- ss->bind(ss, dummytop);
dummytop->subsys[i]->cgroup = dummytop;
cgrp->subsys[i] = NULL;
+ if (ss->bind)
+ ss->bind(ss, dummytop);
subsys[i]->root = &rootnode;
list_move(&ss->sibling, &rootnode.subsys_list);
mutex_unlock(&ss->hierarchy_mutex);
@@ -1031,6 +1209,12 @@ static int rebind_subsystems(struct cgroupfs_root *root,
synchronize_rcu();

return 0;
+
+failed:
+ for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
+ cgrp->subsys[i] = NULL;
+
+ return err;
}

static int cgroup_show_options(struct seq_file *seq, struct vfsmount *vfs)
@@ -1286,6 +1470,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)

/* (re)populate subsystem files */
cgroup_populate_dir(cgrp);
+ cgroup_walk_hierarchy(hierarchy_populate_dir, NULL, cgrp);

if (opts.release_agent)
strcpy(root->release_agent_path, opts.release_agent);
@@ -3313,20 +3498,6 @@ static int cgroup_populate_dir(struct cgroup *cgrp)
return 0;
}

-static void init_cgroup_css(struct cgroup_subsys_state *css,
- struct cgroup_subsys *ss,
- struct cgroup *cgrp)
-{
- css->cgroup = cgrp;
- atomic_set(&css->refcnt, 1);
- css->flags = 0;
- css->id = NULL;
- if (cgrp == dummytop)
- set_bit(CSS_ROOT, &css->flags);
- BUG_ON(cgrp->subsys[ss->subsys_id]);
- cgrp->subsys[ss->subsys_id] = css;
-}
-
static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
{
/* We need to take each hierarchy_mutex in a consistent order */
@@ -3401,21 +3572,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
set_bit(CGRP_CLONE_CHILDREN, &cgrp->flags);

for_each_subsys(root, ss) {
- struct cgroup_subsys_state *css = ss->create(ss, cgrp);
-
- if (IS_ERR(css)) {
- err = PTR_ERR(css);
+ err = cgroup_attach_css(ss, cgrp);
+ if (err)
goto err_destroy;
- }
- init_cgroup_css(css, ss, cgrp);
- if (ss->use_id) {
- err = alloc_css_id(ss, parent, cgrp);
- if (err)
- goto err_destroy;
- }
- /* At error, ->destroy() callback has to free assigned ID. */
- if (clone_children(parent) && ss->post_clone)
- ss->post_clone(ss, cgrp);
}

cgroup_lock_hierarchy(root);
--
1.6.3

2010-12-15 09:35:36

by Li Zefan

[permalink] [raw]
Subject: [PATCH v2 3/6] cgroups: Allow to unbind subsystem from a cgroup hierarchy

This allows us to unbind a cgroup subsystem from a hierarchy
which has sub-cgroups in it.

If a subsystem is to support unbinding, when pinning a cgroup
via css refcnt, it should use __css_tryget() instead of css_get().

Usage:

# mount -t cgroup -o cpuset,cpuacct xxx /mnt
# mkdir /mnt/tmp
# echo $$ > /mnt/tmp/tasks

(remove it from the hierarchy)
# mount -o remount,cpuset xxx /mnt

Changelog v2:

- Allow a cgroup subsystem to use css refcnt.
- Add more code comments.
- Use rcu_assign_pointer() in hierarchy_update_css_sets().
- Split can_bind flag to bindable and unbindable flags.

Signed-off-by: Li Zefan <[email protected]>
---
include/linux/cgroup.h | 17 ++++++
kernel/cgroup.c | 139 +++++++++++++++++++++++++++++++++++++++++------
2 files changed, 138 insertions(+), 18 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d8c4e22..17579b2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -110,6 +110,18 @@ static inline bool css_is_removed(struct cgroup_subsys_state *css)
}

/*
+ * For a subsystem which supports unbinding, call this to get css
+ * refcnt. Called with rcu_read_lock or cgroup_mutex held.
+ */
+
+static inline bool __css_tryget(struct cgroup_subsys_state *css)
+{
+ if (test_bit(CSS_ROOT, &css->flags))
+ return true;
+ return atomic_inc_not_zero(&css->refcnt);
+}
+
+/*
* Call css_tryget() to take a reference on a css if your existing
* (known-valid) reference isn't already ref-counted. Returns false if
* the css has been destroyed.
@@ -495,6 +507,11 @@ struct cgroup_subsys {
* which has child cgroups.
*/
bool bindable:1;
+ /*
+ * Indicate if this subsystem can be removed from a cgroup hierarchy
+ * which has child cgroups.
+ */
+ bool unbindable:1;

#define MAX_CGROUP_TYPE_NAMELEN 32
const char *name;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index caac80f..463575d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1055,12 +1055,61 @@ static int hierarchy_attach_css(struct cgroup *cgrp, void *data)
}

/*
- * After attaching new css objects to the cgroup, we need to entangle
- * them into the existing css_sets.
+ * Reset those css objects whose refcnts are cleared.
*/
-static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
+static int hierarchy_reset_css_refs(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ if (atomic_read(&css->refcnt) == 0)
+ atomic_set(&css->refcnt, 1);
+ }
+ return 0;
+}
+
+/*
+ * Clear all the css objects' refcnt to 0. If there's a refcnt > 1,
+ * return failure.
+ */
+static int hierarchy_clear_css_refs(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ struct cgroup_subsys_state *css = cgrp->subsys[i];
+
+ if (atomic_cmpxchg(&css->refcnt, 1, 0) != 1)
+ goto failed;
+ }
+ return 0;
+failed:
+ hierarchy_reset_css_refs(struct cgroup *cgrp, void *data);
+ return -EBUSY;
+}
+
+/*
+ * We're removing some subsystems from cgroup hierarchy, and here we
+ * remove and destroy the css objects from each cgroup.
+ */
+static int hierarchy_remove_css(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+ int i;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ subsys[i]->destroy(subsys[i], cgrp);
+ cgrp->subsys[i] = NULL;
+ }
+
+ return 0;
+}
+
+static int hierarchy_update_css_sets(struct cgroup *cgrp,
+ unsigned long bits, bool add)
{
- unsigned long added_bits = (unsigned long)data;
int i;
struct cg_cgroup_link *link;

@@ -1069,8 +1118,14 @@ static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
struct css_set *cg = link->cg;
struct hlist_head *hhead;

- for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
- rcu_assign_pointer(cg->subsys[i], cgrp->subsys[i]);
+ for_each_set_bit(i, &bits, CGROUP_SUBSYS_COUNT) {
+ if (add)
+ rcu_assign_pointer(cg->subsys[i],
+ cgrp->subsys[i]);
+ else
+ rcu_assign_pointer(cg->subsys[i],
+ dummytop->subsys[i]);
+ }

/* rehash */
hlist_del(&cg->hlist);
@@ -1083,6 +1138,30 @@ static int hierarchy_update_css_sets(struct cgroup *cgrp, void *data)
}

/*
+ * After attaching new css objects to the cgroup, we need to entangle
+ * them into the existing css_sets.
+ */
+static int hierarchy_add_to_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long added_bits = (unsigned long)data;
+
+ hierarchy_update_css_sets(cgrp, added_bits, true);
+ return 0;
+}
+
+/*
+ * Before dettaching and destroying css objects from the cgroup, we
+ * should detangle them from the existing css_sets.
+ */
+static int hierarchy_remove_from_css_sets(struct cgroup *cgrp, void *data)
+{
+ unsigned long removed_bits = (unsigned long)data;
+
+ hierarchy_update_css_sets(cgrp, removed_bits, false);
+ return 0;
+}
+
+/*
* Re-populate each cgroup directory.
*
* Note root cgroup's inode mutex is held.
@@ -1127,18 +1206,17 @@ static int rebind_subsystems(struct cgroupfs_root *root,
}
}

- /* Removing will be supported later */
- if (root->number_of_cgroups > 1 && removed_bits)
- return -EBUSY;
-
/*
* For non-trivial hierarchy, check that added subsystems
- * are all bindable
+ * are all bindable and removed subsystems are all unbindable
*/
if (root->number_of_cgroups > 1) {
for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
if (!subsys[i]->bindable)
return -EBUSY;
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT)
+ if (!subsys[i]->unbindable)
+ return -EBUSY;
}

/* Attach css objects to the top cgroup */
@@ -1154,9 +1232,14 @@ static int rebind_subsystems(struct cgroupfs_root *root,
err = cgroup_walk_hierarchy(hierarchy_attach_css,
(void *)added_bits, cgrp);
if (err)
- goto failed;
+ goto out;
+
+ err = cgroup_walk_hierarchy(hierarchy_clear_css_refs,
+ (void *)removed_bits, cgrp);
+ if (err)
+ goto out_remove_css;

- cgroup_walk_hierarchy(hierarchy_update_css_sets,
+ cgroup_walk_hierarchy(hierarchy_add_to_css_sets,
(void *)added_bits, cgrp);

/* Process each subsystem */
@@ -1176,11 +1259,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
} else if (bit & removed_bits) {
/* We're removing this subsystem */
BUG_ON(ss == NULL);
- BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
- BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
mutex_lock(&ss->hierarchy_mutex);
- dummytop->subsys[i]->cgroup = dummytop;
- cgrp->subsys[i] = NULL;
if (ss->bind)
ss->bind(ss, dummytop);
subsys[i]->root = &rootnode;
@@ -1206,11 +1285,35 @@ static int rebind_subsystems(struct cgroupfs_root *root,
}
}
root->subsys_bits = root->actual_subsys_bits = final_bits;
+
+ for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
+ BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
+
+ dummytop->subsys[i]->cgroup = dummytop;
+ cgrp->subsys[i] = NULL;
+ }
+
+ cgroup_walk_hierarchy(hierarchy_remove_from_css_sets,
+ (void *)removed_bits, cgrp);
+
+ /*
+ * There might be some pointers to the cgrouip_subsys_state
+ * that we are going to destroy.
+ */
+ synchronize_rcu();
+
+ cgroup_walk_hierarchy(hierarchy_remove_css,
+ (void *)removed_bits, cgrp);
+
synchronize_rcu();

return 0;

-failed:
+out_remove_css:
+ cgroup_walk_hierarchy(hierarchy_remove_css,
+ (void *)added_bits, cgrp);
+out:
for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
cgrp->subsys[i] = NULL;

--
1.6.3

2010-12-15 09:36:07

by Li Zefan

[permalink] [raw]
Subject: [PATCH v2 4/6] cgroups: Mark some subsystems bindable/unbindable

For those subsystems (debug, cpuacct, net_cls and devices),
setting the bindable/unbindable flag is sufficient.

Set freezer subsystem as bindable but not unbindable, because
sub-cgroups' can be in FROZEN state.

Signed-off-by: Li Zefan <[email protected]>
---
kernel/cgroup.c | 6 +++++-
kernel/cgroup_freezer.c | 1 +
kernel/sched.c | 2 ++
security/device_cgroup.c | 2 ++
4 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 463575d..fa2c5de 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1063,6 +1063,8 @@ static int hierarchy_reset_css_refs(struct cgroup *cgrp, void *data)
int i;

for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) {
+ struct cgroup_subsys_state *css = cgrp->subsys[i];
+
if (atomic_read(&css->refcnt) == 0)
atomic_set(&css->refcnt, 1);
}
@@ -1086,7 +1088,7 @@ static int hierarchy_clear_css_refs(struct cgroup *cgrp, void *data)
}
return 0;
failed:
- hierarchy_reset_css_refs(struct cgroup *cgrp, void *data);
+ hierarchy_reset_css_refs(cgrp, data);
return -EBUSY;
}

@@ -5201,5 +5203,7 @@ struct cgroup_subsys debug_subsys = {
.destroy = debug_destroy,
.populate = debug_populate,
.subsys_id = debug_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};
#endif /* CONFIG_CGROUP_DEBUG */
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index e7bebb7..213ecd9 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -393,4 +393,5 @@ struct cgroup_subsys freezer_subsys = {
.attach = NULL,
.fork = freezer_fork,
.exit = NULL,
+ .bindable = true,
};
diff --git a/kernel/sched.c b/kernel/sched.c
index dc91a4d..930ee2e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9346,6 +9346,8 @@ struct cgroup_subsys cpuacct_subsys = {
.destroy = cpuacct_destroy,
.populate = cpuacct_populate,
.subsys_id = cpuacct_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};
#endif /* CONFIG_CGROUP_CPUACCT */

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 8d9c48f..51321e9 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -473,6 +473,8 @@ struct cgroup_subsys devices_subsys = {
.destroy = devcgroup_destroy,
.populate = devcgroup_populate,
.subsys_id = devices_subsys_id,
+ .bindable = true,
+ .unbindable = true,
};

int devcgroup_inode_permission(struct inode *inode, int mask)
--
1.6.3

2010-12-15 09:36:18

by Li Zefan

[permalink] [raw]
Subject: [PATCH v2 5/6] cgroups: Triger BUG if a bindable subsystem calls css_get()

For now unbindable subsystems should not use css_get/put(), so check
this misuse.

Signed-off-by: Li Zefan <[email protected]>
---
include/linux/cgroup.h | 7 +++++--
kernel/cgroup.c | 5 +++++
2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 17579b2..e8ad9f1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -80,13 +80,15 @@ struct cgroup_subsys_state {

/* bits in struct cgroup_subsys_state flags field */
enum {
- CSS_ROOT, /* This CSS is the root of the subsystem */
- CSS_REMOVED, /* This CSS is dead */
+ CSS_ROOT, /* This CSS is the root of the subsystem */
+ CSS_REMOVED, /* This CSS is dead */
+ CSS_NO_GET, /* Forbid calling css_get/put() */
};

/* Caller must verify that the css is not for root cgroup */
static inline void __css_get(struct cgroup_subsys_state *css, int count)
{
+ BUG_ON(test_bit(CSS_NO_GET, &css->flags));
atomic_add(count, &css->refcnt);
}

@@ -131,6 +133,7 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
{
if (test_bit(CSS_ROOT, &css->flags))
return true;
+ BUG_ON(test_bit(CSS_NO_GET, &css->flags));
while (!atomic_inc_not_zero(&css->refcnt)) {
if (test_bit(CSS_REMOVED, &css->flags))
return false;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index fa2c5de..d49a459 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -938,6 +938,11 @@ static void init_cgroup_css(struct cgroup_subsys_state *css,
atomic_set(&css->refcnt, 1);
css->flags = 0;
css->id = NULL;
+
+ /* For now, unbindable subsystems should not call css_get/put(). */
+ if (ss->unbindable)
+ set_bit(CSS_NO_GET, &css->flags);
+
if (cgrp == dummytop)
set_bit(CSS_ROOT, &css->flags);
BUG_ON(cgrp->subsys[ss->subsys_id]);
--
1.6.3

2010-12-15 09:36:38

by Li Zefan

[permalink] [raw]
Subject: [PATCH v2 6/6] cgroups: Update documentation for bindable subsystems

Provide a usage example, update the bind() callback API, etc.

Signed-off-by: Li Zefan <[email protected]>
---
Documentation/cgroups/cgroups.txt | 37 +++++++++++++++++++++++++++++--------
1 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 190018b..4e772cc 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -363,17 +363,23 @@ Note this will add ns to the hierarchy but won't remove memory or
cpuset, because the new options are appended to the old ones:
# mount -o remount,ns /dev/cgroup

+For some subsystems you can bind them to a mounted hierarchy or
+remove them from it, even if there're sub-cgroups in it:
+# mount -t cgroup -o freezer hier1 /dev/cgroup
+# echo $$ > /dev/cgroup/my_cgroup
+# mount -o freezer,cpuset hier1 /dev/cgroup
+(failed)
+# mount -o freezer,cpuacct hier1 /dev/cgroup
+# mount -o cpuacct hier1 /dev/cgroup
+
+Note cpuacct should be sit in the default hierarchy before remount.
+
To Specify a hierarchy's release_agent:
# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
xxx /dev/cgroup

Note that specifying 'release_agent' more than once will return failure.

-Note that changing the set of subsystems is currently only supported
-when the hierarchy consists of a single (root) cgroup. Supporting
-the ability to arbitrarily bind/unbind subsystems from an existing
-cgroup hierarchy is intended to be implemented in the future.
-
Then under /dev/cgroup you can find a tree that corresponds to the
tree of the cgroups in the system. For instance, /dev/cgroup
is the cgroup that holds the whole system.
@@ -523,6 +529,15 @@ module initcall a call to cgroup_load_subsys(), and in its exitcall a
call to cgroup_unload_subsys(). It should also set its_subsys.module =
THIS_MODULE in its .c file.

+If a subsystem has bindable flag set, normally it has to be able to
+support side-effect free movement of a task into any just-created
+cgroups. i.e. it's probably not suitable for any subsystem where
+can_attach() might return false for the newly-created cgroup, or
+attach() might have side-effects for those same cases.
+
+If a subsystem has unbindable flag set, normally it has to be able to
+support side-effect free movement of a task into the roog cgroup.
+
Each subsystem may export the following methods. The only mandatory
methods are create/destroy. Any others that are null are presumed to
be successful no-ops.
@@ -627,9 +642,15 @@ void bind(struct cgroup_subsys *ss, struct cgroup *root)
(cgroup_mutex and ss->hierarchy_mutex held by caller)

Called when a cgroup subsystem is rebound to a different hierarchy
-and root cgroup. Currently this will only involve movement between
-the default hierarchy (which never has sub-cgroups) and a hierarchy
-that is being created/destroyed (and hence has no sub-cgroups).
+and root cgroup.
+
+For non-bindable subsystems, this will only involve movement
+between the default hierarchy (which never has sub-cgroups) and a
+hierarchy that is being created/destroyed (and hence has no sub-cgroups).
+
+For binadable subsystems, this may also involve movement between the
+default hierarchy and a mounted hierarchy that's populated with
+sub-cgroups.

4. Questions
============
--
1.6.3