2009-12-31 05:11:19

by Ben Blum

[permalink] [raw]
Subject: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

[This is a revision of http://lkml.org/lkml/2009/12/21/211 ]

This patch series implements support for building, loading, and
unloading subsystems as modules, both within and outside the kernel
source tree. It provides an interface cgroup_load_subsys() and
cgroup_unload_subsys() which modular subsystems can use to register and
depart during runtime. The net_cls classifier subsystem serves as the
example for a subsystem which can be converted into a module using these
changes.

Patch #1 sets up the subsys[] array so its contents can be dynamic as
modules appear and (eventually) disappear. Iterations over the array are
modified to handle when subsystems are absent, and the dynamic section
of the array is protected by cgroup_mutex.

Patch #2 implements an interface for modules to load subsystems, called
cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
pointer in struct cgroup_subsys.

Patch #3 adds a mechanism for unloading modular subsystems, which
includes a more advanced rework of the rudimentary reference counting
introduced in patch 2.

Patch #4 modifies the net_cls subsystem, which already had some module
declarations, to be configurable as a module, which also serves as a
simple proof-of-concept.

Part of implementing patches 2 and 4 involved updating css pointers in
each css_set when the module appears or leaves. In doing this, it was
discovered that css_sets always remain linked to the dummy cgroup,
regardless of whether or not any subsystems are actually bound to it
(i.e., not mounted on an actual hierarchy). The subsystem loading and
unloading code therefore should keep in mind the special cases where the
added subsystem is the only one in the dummy cgroup (and therefore all
css_sets need to be linked back into it) and where the removed subsys
was the only one in the dummy cgroup (and therefore all css_sets should
be unlinked from it) - however, as all css_sets always stay attached to
the dummy cgroup anyway, these cases are ignored. Any fix that addresses
this issue should also make sure these cases are addressed in the
subsystem loading and unloading code.

-- bblum

---
Documentation/cgroups/cgroups.txt | 9
include/linux/cgroup.h | 18 +
kernel/cgroup.c | 388 +++++++++++++++++++++++++++++++++-----
net/sched/Kconfig | 5
net/sched/cls_cgroup.c | 36 ++-
5 files changed, 400 insertions(+), 56 deletions(-)


2009-12-31 05:12:55

by Ben Blum

[permalink] [raw]
Subject: [PATCH v4 1/4] cgroups: revamp subsys array

Make subsys[] able to be dynamically populated to support modular subsystems

From: Ben Blum <[email protected]>

This patch reworks the way the subsys[] array is used so that subsystems can
register themselves after boot time, and enables the internals of cgroups to
be able to handle when subsystems are not present or may appear/disappear.

Signed-off-by: Ben Blum <[email protected]>
Acked-by: Li Zefan <[email protected]>
---

include/linux/cgroup.h | 10 ++++-
kernel/cgroup.c | 96 ++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 88 insertions(+), 18 deletions(-)


diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 0008dee..83da43d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -39,13 +39,19 @@ extern int cgroupstats_build(struct cgroupstats *stats,

extern const struct file_operations proc_cgroup_operations;

-/* Define the enumeration of all cgroup subsystems */
+/* Define the enumeration of all builtin cgroup subsystems */
#define SUBSYS(_x) _x ## _subsys_id,
enum cgroup_subsys_id {
#include <linux/cgroup_subsys.h>
- CGROUP_SUBSYS_COUNT
+ CGROUP_BUILTIN_SUBSYS_COUNT
};
#undef SUBSYS
+/*
+ * This define indicates the maximum number of subsystems that can be loaded
+ * at once. We limit to this many since cgroupfs_root has subsys_bits to keep
+ * track of all of them.
+ */
+#define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))

/* Per-subsystem/per-cgroup state maintained by the system. */
struct cgroup_subsys_state {
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0249f4b..402e828 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -56,10 +56,14 @@

static DEFINE_MUTEX(cgroup_mutex);

-/* Generate an array of cgroup subsystem pointers */
+/*
+ * Generate an array of cgroup subsystem pointers. At boot time, this is
+ * populated up to CGROUP_BUILTIN_SUBSYS_COUNT, and modular subsystems are
+ * registered after that. The mutable section of this array is protected by
+ * cgroup_mutex.
+ */
#define SUBSYS(_x) &_x ## _subsys,
-
-static struct cgroup_subsys *subsys[] = {
+static struct cgroup_subsys *subsys[CGROUP_SUBSYS_COUNT] = {
#include <linux/cgroup_subsys.h>
};

@@ -433,8 +437,11 @@ static struct css_set *find_existing_css_set(
struct hlist_node *node;
struct css_set *cg;

- /* Built the set of subsystem state objects that we want to
- * see in the new css_set */
+ /*
+ * Build the set of subsystem state objects that we want to see in the
+ * new css_set. while subsystems can change globally, the entries here
+ * won't change, so no need for locking.
+ */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
if (root->subsys_bits & (1UL << i)) {
/* Subsystem is in this hierarchy. So we want
@@ -869,7 +876,9 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
css_put(css);
}

-
+/*
+ * Call with cgroup_mutex held.
+ */
static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long final_bits)
{
@@ -877,6 +886,8 @@ static int rebind_subsystems(struct cgroupfs_root *root,
struct cgroup *cgrp = &root->top_cgroup;
int i;

+ BUG_ON(!mutex_is_locked(&cgroup_mutex));
+
removed_bits = root->actual_subsys_bits & ~final_bits;
added_bits = final_bits & ~root->actual_subsys_bits;
/* Check that any added subsystems are currently free */
@@ -885,6 +896,12 @@ static int rebind_subsystems(struct cgroupfs_root *root,
struct cgroup_subsys *ss = subsys[i];
if (!(bit & added_bits))
continue;
+ /*
+ * Nobody should tell us to do a subsys that doesn't exist:
+ * parse_cgroupfs_options should catch that case and refcounts
+ * ensure that subsystems won't disappear once selected.
+ */
+ BUG_ON(ss == NULL);
if (ss->root != &rootnode) {
/* Subsystem isn't free */
return -EBUSY;
@@ -904,6 +921,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long bit = 1UL << i;
if (bit & added_bits) {
/* We're binding this subsystem to this hierarchy */
+ BUG_ON(ss == NULL);
BUG_ON(cgrp->subsys[i]);
BUG_ON(!dummytop->subsys[i]);
BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
@@ -917,6 +935,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
mutex_unlock(&ss->hierarchy_mutex);
} else if (bit & removed_bits) {
/* We're removing this subsystem */
+ BUG_ON(ss == NULL);
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
mutex_lock(&ss->hierarchy_mutex);
@@ -929,6 +948,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
mutex_unlock(&ss->hierarchy_mutex);
} else if (bit & final_bits) {
/* Subsystem state should already exist */
+ BUG_ON(ss == NULL);
BUG_ON(!cgrp->subsys[i]);
} else {
/* Subsystem state shouldn't exist */
@@ -971,14 +991,18 @@ struct cgroup_sb_opts {

};

-/* Convert a hierarchy specifier into a bitmask of subsystems and
- * flags. */
+/*
+ * Convert a hierarchy specifier into a bitmask of subsystems and flags. Call
+ * with cgroup_mutex held to protect the subsys[] array.
+ */
static int parse_cgroupfs_options(char *data,
struct cgroup_sb_opts *opts)
{
char *token, *o = data ?: "all";
unsigned long mask = (unsigned long)-1;

+ BUG_ON(!mutex_is_locked(&cgroup_mutex));
+
#ifdef CONFIG_CPUSETS
mask = ~(1UL << cpuset_subsys_id);
#endif
@@ -994,6 +1018,8 @@ static int parse_cgroupfs_options(char *data,
opts->subsys_bits = 0;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
if (!ss->disabled)
opts->subsys_bits |= 1ul << i;
}
@@ -1038,6 +1064,8 @@ static int parse_cgroupfs_options(char *data,
int i;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
ss = subsys[i];
+ if (ss == NULL)
+ continue;
if (!strcmp(token, ss->name)) {
if (!ss->disabled)
set_bit(i, &opts->subsys_bits);
@@ -1291,7 +1319,9 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
struct cgroupfs_root *new_root;

/* First find the desired set of subsystems */
+ mutex_lock(&cgroup_mutex);
ret = parse_cgroupfs_options(data, &opts);
+ mutex_unlock(&cgroup_mutex);
if (ret)
goto out_err;

@@ -2878,8 +2908,14 @@ static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
/* We need to take each hierarchy_mutex in a consistent order */
int i;

+ /*
+ * No worry about a race with rebind_subsystems that might mess up the
+ * locking order, since both parties are under cgroup_mutex.
+ */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
if (ss->root == root)
mutex_lock(&ss->hierarchy_mutex);
}
@@ -2891,6 +2927,8 @@ static void cgroup_unlock_hierarchy(struct cgroupfs_root *root)

for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
if (ss->root == root)
mutex_unlock(&ss->hierarchy_mutex);
}
@@ -3011,11 +3049,16 @@ static int cgroup_has_css_refs(struct cgroup *cgrp)
* synchronization other than RCU, and the subsystem linked
* list isn't RCU-safe */
int i;
+ /*
+ * We won't need to lock the subsys array, because the subsystems
+ * we're concerned about aren't going anywhere since our cgroup root
+ * has a reference on them.
+ */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
struct cgroup_subsys_state *css;
- /* Skip subsystems not in this hierarchy */
- if (ss->root != cgrp->root)
+ /* Skip subsystems not present or not in this hierarchy */
+ if (ss == NULL || ss->root != cgrp->root)
continue;
css = cgrp->subsys[ss->subsys_id];
/* When called from check_for_release() it's possible
@@ -3236,7 +3279,8 @@ int __init cgroup_init_early(void)
for (i = 0; i < CSS_SET_TABLE_SIZE; i++)
INIT_HLIST_HEAD(&css_set_table[i]);

- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /* at bootup time, we don't worry about modular subsystems */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];

BUG_ON(!ss->name);
@@ -3271,7 +3315,8 @@ int __init cgroup_init(void)
if (err)
return err;

- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /* at bootup time, we don't worry about modular subsystems */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (!ss->early_init)
cgroup_init_subsys(ss);
@@ -3380,9 +3425,16 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)
int i;

seq_puts(m, "#subsys_name\thierarchy\tnum_cgroups\tenabled\n");
+ /*
+ * ideally we don't want subsystems moving around while we do this.
+ * cgroup_mutex is also necessary to guarantee an atomic snapshot of
+ * subsys/hierarchy state.
+ */
mutex_lock(&cgroup_mutex);
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ if (ss == NULL)
+ continue;
seq_printf(m, "%s\t%d\t%d\t%d\n",
ss->name, ss->root->hierarchy_id,
ss->root->number_of_cgroups, !ss->disabled);
@@ -3440,7 +3492,12 @@ void cgroup_fork_callbacks(struct task_struct *child)
{
if (need_forkexit_callback) {
int i;
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /*
+ * forkexit callbacks are only supported for builtin
+ * subsystems, and the builtin section of the subsys array is
+ * immutable, so we don't need to lock the subsys array here.
+ */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->fork)
ss->fork(ss, child);
@@ -3509,7 +3566,11 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
struct css_set *cg;

if (run_callbacks && need_forkexit_callback) {
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /*
+ * modular subsystems can't use callbacks, so no need to lock
+ * the subsys array
+ */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->exit)
ss->exit(ss, tsk);
@@ -3800,8 +3861,11 @@ static int __init cgroup_disable(char *str)
while ((token = strsep(&str, ",")) != NULL) {
if (!*token)
continue;
-
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+ /*
+ * cgroup_disable, being at boot time, can't know about module
+ * subsystems, so we don't worry about them.
+ */
+ for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];

if (!strcmp(token, ss->name)) {


Attachments:
cgroups-revamp-subsys-array.patch (10.54 kB)

2009-12-31 05:13:46

by Ben Blum

[permalink] [raw]
Subject: [PATCH v4 2/4] cgroups: subsystem module loading interface


On Thu, Dec 31, 2009 at 12:10:50AM -0500, Ben Blum wrote:
> [This is a revision of http://lkml.org/lkml/2009/12/21/211 ]
>
> This patch series implements support for building, loading, and
> unloading subsystems as modules, both within and outside the kernel
> source tree. It provides an interface cgroup_load_subsys() and
> cgroup_unload_subsys() which modular subsystems can use to register and
> depart during runtime. The net_cls classifier subsystem serves as the
> example for a subsystem which can be converted into a module using these
> changes.
>
> Patch #1 sets up the subsys[] array so its contents can be dynamic as
> modules appear and (eventually) disappear. Iterations over the array are
> modified to handle when subsystems are absent, and the dynamic section
> of the array is protected by cgroup_mutex.
>
> Patch #2 implements an interface for modules to load subsystems, called
> cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
> pointer in struct cgroup_subsys.
>
> Patch #3 adds a mechanism for unloading modular subsystems, which
> includes a more advanced rework of the rudimentary reference counting
> introduced in patch 2.
>
> Patch #4 modifies the net_cls subsystem, which already had some module
> declarations, to be configurable as a module, which also serves as a
> simple proof-of-concept.
>
> Part of implementing patches 2 and 4 involved updating css pointers in
> each css_set when the module appears or leaves. In doing this, it was
> discovered that css_sets always remain linked to the dummy cgroup,
> regardless of whether or not any subsystems are actually bound to it
> (i.e., not mounted on an actual hierarchy). The subsystem loading and
> unloading code therefore should keep in mind the special cases where the
> added subsystem is the only one in the dummy cgroup (and therefore all
> css_sets need to be linked back into it) and where the removed subsys
> was the only one in the dummy cgroup (and therefore all css_sets should
> be unlinked from it) - however, as all css_sets always stay attached to
> the dummy cgroup anyway, these cases are ignored. Any fix that addresses
> this issue should also make sure these cases are addressed in the
> subsystem loading and unloading code.
>
> -- bblum
>
> ---
> Documentation/cgroups/cgroups.txt | 9
> include/linux/cgroup.h | 18 +
> kernel/cgroup.c | 388 +++++++++++++++++++++++++++++++++-----
> net/sched/Kconfig | 5
> net/sched/cls_cgroup.c | 36 ++-
> 5 files changed, 400 insertions(+), 56 deletions(-)
>
>


Attachments:
(No filename) (2.55 kB)
cgroups-subsys-module-interface.patch (7.14 kB)
Download all attachments

2009-12-31 05:14:49

by Ben Blum

[permalink] [raw]
Subject: [PATCH v4 3/4] cgroups: subsystem module unloading

Provides support for unloading modular subsystems.

From: Ben Blum <[email protected]>

This patch adds a new function cgroup_unload_subsys which is to be used for
removing a loaded subsystem during module deletion. Reference counting of the
subsystems' modules is moved from once (at load time) to once per attached
hierarchy (in parse_cgroupfs_options and rebind_subsystems) (i.e., 0 or 1).

Signed-off-by: Ben Blum <[email protected]>
Acked-by: Li Zefan <[email protected]>
---

Documentation/cgroups/cgroups.txt | 5 +
include/linux/cgroup.h | 4 +
kernel/cgroup.c | 164 +++++++++++++++++++++++++++++++------
3 files changed, 145 insertions(+), 28 deletions(-)


diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 6ffcf81..7527bac 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -489,8 +489,9 @@ Each subsystem should:
- define a cgroup_subsys object called <name>_subsys

If a subsystem can be compiled as a module, it should also have in its
-module initcall a call to cgroup_load_subsys(&its_subsys_struct). It
-should also set its_subsys.module = THIS_MODULE in its .c file.
+module initcall a call to cgroup_load_subsys(), and in its exitcall a
+call to cgroup_unload_subsys(). It should also set its_subsys.module =
+THIS_MODULE in its .c file.

Each subsystem may export the following methods. The only mandatory
methods are create/destroy. Any others that are null are presumed to
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9461aed..9be4c22 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -37,6 +37,7 @@ extern void cgroup_exit(struct task_struct *p, int run_callbacks);
extern int cgroupstats_build(struct cgroupstats *stats,
struct dentry *dentry);
extern int cgroup_load_subsys(struct cgroup_subsys *ss);
+extern void cgroup_unload_subsys(struct cgroup_subsys *ss);

extern const struct file_operations proc_cgroup_operations;

@@ -264,7 +265,8 @@ struct css_set {
/*
* Set of subsystem states, one for each subsystem. This array
* is immutable after creation apart from the init_css_set
- * during subsystem registration (at boot time).
+ * during subsystem registration (at boot time) and modular subsystem
+ * loading/unloading.
*/
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index d7ca4cf..cc2e1f6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -878,7 +878,9 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
}

/*
- * Call with cgroup_mutex held.
+ * Call with cgroup_mutex held. Drops reference counts on modules, including
+ * any duplicate ones that parse_cgroupfs_options took. If this function
+ * returns an error, no reference counts are touched.
*/
static int rebind_subsystems(struct cgroupfs_root *root,
unsigned long final_bits)
@@ -934,6 +936,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
if (ss->bind)
ss->bind(ss, cgrp);
mutex_unlock(&ss->hierarchy_mutex);
+ /* refcount was already taken, and we're keeping it */
} else if (bit & removed_bits) {
/* We're removing this subsystem */
BUG_ON(ss == NULL);
@@ -947,10 +950,18 @@ static int rebind_subsystems(struct cgroupfs_root *root,
subsys[i]->root = &rootnode;
list_move(&ss->sibling, &rootnode.subsys_list);
mutex_unlock(&ss->hierarchy_mutex);
+ /* subsystem is now free - drop reference on module */
+ module_put(ss->module);
} else if (bit & final_bits) {
/* Subsystem state should already exist */
BUG_ON(ss == NULL);
BUG_ON(!cgrp->subsys[i]);
+ /*
+ * a refcount was taken, but we already had one, so
+ * drop the extra reference.
+ */
+ module_put(ss->module);
+ BUG_ON(ss->module && !module_refcount(ss->module));
} else {
/* Subsystem state shouldn't exist */
BUG_ON(cgrp->subsys[i]);
@@ -994,13 +1005,16 @@ struct cgroup_sb_opts {

/*
* Convert a hierarchy specifier into a bitmask of subsystems and flags. Call
- * with cgroup_mutex held to protect the subsys[] array.
+ * with cgroup_mutex held to protect the subsys[] array. This function takes
+ * refcounts on subsystems to be used, unless it returns error, in which case
+ * no refcounts are taken.
*/
-static int parse_cgroupfs_options(char *data,
- struct cgroup_sb_opts *opts)
+static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
{
char *token, *o = data ?: "all";
unsigned long mask = (unsigned long)-1;
+ int i;
+ bool module_pin_failed = false;

BUG_ON(!mutex_is_locked(&cgroup_mutex));

@@ -1015,7 +1029,6 @@ static int parse_cgroupfs_options(char *data,
return -EINVAL;
if (!strcmp(token, "all")) {
/* Add all non-disabled subsystems */
- int i;
opts->subsys_bits = 0;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
@@ -1038,7 +1051,6 @@ static int parse_cgroupfs_options(char *data,
if (!opts->release_agent)
return -ENOMEM;
} else if (!strncmp(token, "name=", 5)) {
- int i;
const char *name = token + 5;
/* Can't specify an empty name */
if (!strlen(name))
@@ -1062,7 +1074,6 @@ static int parse_cgroupfs_options(char *data,
return -ENOMEM;
} else {
struct cgroup_subsys *ss;
- int i;
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
ss = subsys[i];
if (ss == NULL)
@@ -1101,9 +1112,54 @@ static int parse_cgroupfs_options(char *data,
if (!opts->subsys_bits && !opts->name)
return -EINVAL;

+ /*
+ * Grab references on all the modules we'll need, so the subsystems
+ * don't dance around before rebind_subsystems attaches them. This may
+ * take duplicate reference counts on a subsystem that's already used,
+ * but rebind_subsystems handles this case.
+ */
+ for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+ unsigned long bit = 1UL << i;
+
+ if (!(bit & opts->subsys_bits))
+ continue;
+ if (!try_module_get(subsys[i]->module)) {
+ module_pin_failed = true;
+ break;
+ }
+ }
+ if (module_pin_failed) {
+ /*
+ * oops, one of the modules was going away. this means that we
+ * raced with a module_delete call, and to the user this is
+ * essentially a "subsystem doesn't exist" case.
+ */
+ for (i--; i >= CGROUP_BUILTIN_SUBSYS_COUNT; i--) {
+ /* drop refcounts only on the ones we took */
+ unsigned long bit = 1UL << i;
+
+ if (!(bit & opts->subsys_bits))
+ continue;
+ module_put(subsys[i]->module);
+ }
+ return -ENOENT;
+ }
+
return 0;
}

+static void drop_parsed_module_refcounts(unsigned long subsys_bits)
+{
+ int i;
+ for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+ unsigned long bit = 1UL << i;
+
+ if (!(bit & subsys_bits))
+ continue;
+ module_put(subsys[i]->module);
+ }
+}
+
static int cgroup_remount(struct super_block *sb, int *flags, char *data)
{
int ret = 0;
@@ -1120,21 +1176,19 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
if (ret)
goto out_unlock;

- /* Don't allow flags to change at remount */
- if (opts.flags != root->flags) {
- ret = -EINVAL;
- goto out_unlock;
- }
-
- /* Don't allow name to change at remount */
- if (opts.name && strcmp(opts.name, root->name)) {
+ /* Don't allow flags or name to change at remount */
+ if (opts.flags != root->flags ||
+ (opts.name && strcmp(opts.name, root->name))) {
ret = -EINVAL;
+ drop_parsed_module_refcounts(opts.subsys_bits);
goto out_unlock;
}

ret = rebind_subsystems(root, opts.subsys_bits);
- if (ret)
+ if (ret) {
+ drop_parsed_module_refcounts(opts.subsys_bits);
goto out_unlock;
+ }

/* (re)populate subsystem files */
cgroup_populate_dir(cgrp);
@@ -1333,7 +1387,7 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
new_root = cgroup_root_from_opts(&opts);
if (IS_ERR(new_root)) {
ret = PTR_ERR(new_root);
- goto out_err;
+ goto drop_modules;
}
opts.new_root = new_root;

@@ -1342,7 +1396,7 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
if (IS_ERR(sb)) {
ret = PTR_ERR(sb);
cgroup_drop_root(opts.new_root);
- goto out_err;
+ goto drop_modules;
}

root = sb->s_fs_info;
@@ -1398,6 +1452,11 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
free_cg_links(&tmp_cg_links);
goto drop_new_super;
}
+ /*
+ * There must be no failure case after here, since rebinding
+ * takes care of subsystems' refcounts, which are explicitly
+ * dropped in the failure exit path.
+ */

/* EBUSY should be the only error here */
BUG_ON(ret);
@@ -1436,6 +1495,8 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
* any) is not needed
*/
cgroup_drop_root(opts.new_root);
+ /* no subsys rebinding, so refcounts don't change */
+ drop_parsed_module_refcounts(opts.subsys_bits);
}

simple_set_mnt(mnt, sb);
@@ -1445,6 +1506,8 @@ static int cgroup_get_sb(struct file_system_type *fs_type,

drop_new_super:
deactivate_locked_super(sb);
+ drop_modules:
+ drop_parsed_module_refcounts(opts.subsys_bits);
out_err:
kfree(opts.release_agent);
kfree(opts.name);
@@ -3366,13 +3429,6 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
lockdep_set_class(&ss->hierarchy_mutex, &ss->subsys_key);
ss->active = 1;

- /*
- * pin the subsystem's module so it doesn't go away. this shouldn't
- * fail, since the module's initcall calls us.
- * TODO: with module unloading, move this elsewhere
- */
- BUG_ON(!try_module_get(ss->module));
-
/* success! */
mutex_unlock(&cgroup_mutex);
return 0;
@@ -3380,6 +3436,64 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
EXPORT_SYMBOL_GPL(cgroup_load_subsys);

/**
+ * cgroup_unload_subsys: unload a modular subsystem
+ * @ss: the subsystem to unload
+ *
+ * This function should be called in a modular subsystem's exitcall. When this
+ * function is invoked, the refcount on the subsystem's module will be 0, so
+ * the subsystem will not be attached to any hierarchy.
+ */
+void cgroup_unload_subsys(struct cgroup_subsys *ss)
+{
+ struct cg_cgroup_link *link;
+ struct hlist_head *hhead;
+
+ BUG_ON(ss->module == NULL);
+
+ /*
+ * we shouldn't be called if the subsystem is in use, and the use of
+ * try_module_get in parse_cgroupfs_options should ensure that it
+ * doesn't start being used while we're killing it off.
+ */
+ BUG_ON(ss->root != &rootnode);
+
+ mutex_lock(&cgroup_mutex);
+ /* deassign the subsys_id */
+ BUG_ON(ss->subsys_id < CGROUP_BUILTIN_SUBSYS_COUNT);
+ subsys[ss->subsys_id] = NULL;
+
+ /* remove subsystem from rootnode's list of subsystems */
+ list_del(&ss->sibling);
+
+ /*
+ * disentangle the css from all css_sets attached to the dummytop. as
+ * in loading, we need to pay our respects to the hashtable gods.
+ */
+ write_lock(&css_set_lock);
+ list_for_each_entry(link, &dummytop->css_sets, cgrp_link_list) {
+ struct css_set *cg = link->cg;
+
+ hlist_del(&cg->hlist);
+ BUG_ON(!cg->subsys[ss->subsys_id]);
+ cg->subsys[ss->subsys_id] = NULL;
+ hhead = css_set_hash(cg->subsys);
+ hlist_add_head(&cg->hlist, hhead);
+ }
+ write_unlock(&css_set_lock);
+
+ /*
+ * remove subsystem's css from the dummytop and free it - need to free
+ * before marking as null because ss->destroy needs the cgrp->subsys
+ * pointer to find their state.
+ */
+ ss->destroy(ss, dummytop);
+ dummytop->subsys[ss->subsys_id] = NULL;
+
+ mutex_unlock(&cgroup_mutex);
+}
+EXPORT_SYMBOL_GPL(cgroup_unload_subsys);
+
+/**
* cgroup_init_early - cgroup initialization at system boot
*
* Initialize cgroups at system boot, and initialize any


Attachments:
cgroups-subsys-unloading.patch (11.69 kB)

2009-12-31 05:15:27

by Ben Blum

[permalink] [raw]
Subject: [PATCH v4 4/4] cgroups: net_cls as module

Allows the net_cls cgroup subsystem to be compiled as a module

From: Ben Blum <[email protected]>

This patch modifies net/sched/cls_cgroup.c to allow the net_cls subsystem to
be optionally compiled as a module instead of builtin. The cgroup_subsys
struct is moved around a bit to allow the subsys_id to be either declared as a
compile-time constant by the cgroup_subsys.h include in cgroup.h, or, if it's
a module, initialized within the struct by cgroup_load_subsys.

Signed-off-by: Ben Blum <[email protected]>
Acked-by: Li Zefan <[email protected]>
---

net/sched/Kconfig | 5 ++++-
net/sched/cls_cgroup.c | 36 +++++++++++++++++++++++++++---------
2 files changed, 31 insertions(+), 10 deletions(-)


diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 929218a..08ff9bc 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -328,13 +328,16 @@ config NET_CLS_FLOW
module will be called cls_flow.

config NET_CLS_CGROUP
- bool "Control Group Classifier"
+ tristate "Control Group Classifier"
select NET_CLS
depends on CGROUPS
---help---
Say Y here if you want to classify packets based on the control
cgroup of their process.

+ To compile this code as a module, choose M here: the
+ module will be called cls_cgroup.
+
config NET_EMATCH
bool "Extended Matches"
select NET_CLS
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index e4877ca..7f27d2c 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -24,6 +24,25 @@ struct cgroup_cls_state
u32 classid;
};

+static struct cgroup_subsys_state *cgrp_create(struct cgroup_subsys *ss,
+ struct cgroup *cgrp);
+static void cgrp_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
+static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp);
+
+struct cgroup_subsys net_cls_subsys = {
+ .name = "net_cls",
+ .create = cgrp_create,
+ .destroy = cgrp_destroy,
+ .populate = cgrp_populate,
+#ifdef CONFIG_NET_CLS_CGROUP
+ .subsys_id = net_cls_subsys_id,
+#else
+#define net_cls_subsys_id net_cls_subsys.subsys_id
+#endif
+ .module = THIS_MODULE,
+};
+
+
static inline struct cgroup_cls_state *cgrp_cls_state(struct cgroup *cgrp)
{
return container_of(cgroup_subsys_state(cgrp, net_cls_subsys_id),
@@ -79,14 +98,6 @@ static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
return cgroup_add_files(cgrp, ss, ss_files, ARRAY_SIZE(ss_files));
}

-struct cgroup_subsys net_cls_subsys = {
- .name = "net_cls",
- .create = cgrp_create,
- .destroy = cgrp_destroy,
- .populate = cgrp_populate,
- .subsys_id = net_cls_subsys_id,
-};
-
struct cls_cgroup_head
{
u32 handle;
@@ -277,12 +288,19 @@ static struct tcf_proto_ops cls_cgroup_ops __read_mostly = {

static int __init init_cgroup_cls(void)
{
- return register_tcf_proto_ops(&cls_cgroup_ops);
+ int ret = register_tcf_proto_ops(&cls_cgroup_ops);
+ if (ret)
+ return ret;
+ ret = cgroup_load_subsys(&net_cls_subsys);
+ if (ret)
+ unregister_tcf_proto_ops(&cls_cgroup_ops);
+ return ret;
}

static void __exit exit_cgroup_cls(void)
{
unregister_tcf_proto_ops(&cls_cgroup_ops);
+ cgroup_unload_subsys(&net_cls_subsys);
}

module_init(init_cgroup_cls);


Attachments:
cgroups-net_cls-as-module.patch (3.37 kB)

2010-01-07 00:04:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Thu, 31 Dec 2009 00:10:50 -0500
Ben Blum <[email protected]> wrote:

> This patch series implements support for building, loading, and
> unloading subsystems as modules, both within and outside the kernel
> source tree. It provides an interface cgroup_load_subsys() and
> cgroup_unload_subsys() which modular subsystems can use to register and
> depart during runtime. The net_cls classifier subsystem serves as the
> example for a subsystem which can be converted into a module using these
> changes.

What is the value in this? What are the usage scenarios? Why does the
benefit of this change exceed the cost/risk/etc of merging it?

2010-01-07 01:26:32

by Ben Blum

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> On Thu, 31 Dec 2009 00:10:50 -0500
> Ben Blum <[email protected]> wrote:
>
> > This patch series implements support for building, loading, and
> > unloading subsystems as modules, both within and outside the kernel
> > source tree. It provides an interface cgroup_load_subsys() and
> > cgroup_unload_subsys() which modular subsystems can use to register and
> > depart during runtime. The net_cls classifier subsystem serves as the
> > example for a subsystem which can be converted into a module using these
> > changes.
>
> What is the value in this? What are the usage scenarios? Why does the
> benefit of this change exceed the cost/risk/etc of merging it?

As discussed in the first posting of these patches, this provides the
ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
would have already been a module except for a lack of support from
cgroups, and the change also allows other module-loadable classifiers
to add subsystems of their own.

2010-01-07 03:10:49

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Wed, 6 Jan 2010 20:26:06 -0500
Ben Blum <[email protected]> wrote:

> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > On Thu, 31 Dec 2009 00:10:50 -0500
> > Ben Blum <[email protected]> wrote:
> >
> > > This patch series implements support for building, loading, and
> > > unloading subsystems as modules, both within and outside the kernel
> > > source tree. It provides an interface cgroup_load_subsys() and
> > > cgroup_unload_subsys() which modular subsystems can use to register and
> > > depart during runtime. The net_cls classifier subsystem serves as the
> > > example for a subsystem which can be converted into a module using these
> > > changes.
> >
> > What is the value in this? What are the usage scenarios? Why does the
> > benefit of this change exceed the cost/risk/etc of merging it?
>
> As discussed in the first posting of these patches, this provides the
> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> would have already been a module except for a lack of support from
> cgroups, and the change also allows other module-loadable classifiers
> to add subsystems of their own.

Hmm, do you have your own module in plan ?

Thanks,
-Kame

2010-01-07 06:42:26

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

KAMEZAWA Hiroyuki wrote:
> On Wed, 6 Jan 2010 20:26:06 -0500
> Ben Blum <[email protected]> wrote:
>
>> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
>>> On Thu, 31 Dec 2009 00:10:50 -0500
>>> Ben Blum <[email protected]> wrote:
>>>
>>>> This patch series implements support for building, loading, and
>>>> unloading subsystems as modules, both within and outside the kernel
>>>> source tree. It provides an interface cgroup_load_subsys() and
>>>> cgroup_unload_subsys() which modular subsystems can use to register and
>>>> depart during runtime. The net_cls classifier subsystem serves as the
>>>> example for a subsystem which can be converted into a module using these
>>>> changes.
>>> What is the value in this? What are the usage scenarios? Why does the
>>> benefit of this change exceed the cost/risk/etc of merging it?
>> As discussed in the first posting of these patches, this provides the
>> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
>> would have already been a module except for a lack of support from
>> cgroups, and the change also allows other module-loadable classifiers
>> to add subsystems of their own.
>
> Hmm, do you have your own module in plan ?
>

Maybe the new blkio_cgroup can also be made module-able.

2010-01-07 07:19:43

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Thu, 07 Jan 2010 14:42:19 +0800
Li Zefan <[email protected]> wrote:

> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <[email protected]> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <[email protected]> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.
>

Hmm, I read the patch slightly. I'm not enough expert to review this patch..

I requst following as TODO.
(No objection to the direction/patch.)

1. Add documentation about load/unlod module.
It seems module unloading will not succuess while subsystem is mounted.
Right ?

2. Making this to be reasonable value.
#define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
I can't find why.

3. show whehter a subsys is a loadable module or not via /proc/cgroups

4. how to test ? load/unload NET_CLS is enough ?

Last one is question.

5. Is following path is safe ?

find_css_set() {
....
read_lock(&css_set_lock);
get template including pointer
read_unlock(&css_set_lock);

use template to build new css_set.


Thanks,
-Kame

2010-01-07 07:48:43

by Ben Blum

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Thu, Jan 07, 2010 at 04:16:27PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 07 Jan 2010 14:42:19 +0800
> Li Zefan <[email protected]> wrote:
>
> > KAMEZAWA Hiroyuki wrote:
> > > On Wed, 6 Jan 2010 20:26:06 -0500
> > > Ben Blum <[email protected]> wrote:
> > >
> > >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > >>> On Thu, 31 Dec 2009 00:10:50 -0500
> > >>> Ben Blum <[email protected]> wrote:
> > >>>
> > >>>> This patch series implements support for building, loading, and
> > >>>> unloading subsystems as modules, both within and outside the kernel
> > >>>> source tree. It provides an interface cgroup_load_subsys() and
> > >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> > >>>> depart during runtime. The net_cls classifier subsystem serves as the
> > >>>> example for a subsystem which can be converted into a module using these
> > >>>> changes.
> > >>> What is the value in this? What are the usage scenarios? Why does the
> > >>> benefit of this change exceed the cost/risk/etc of merging it?
> > >> As discussed in the first posting of these patches, this provides the
> > >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> > >> would have already been a module except for a lack of support from
> > >> cgroups, and the change also allows other module-loadable classifiers
> > >> to add subsystems of their own.
> > >
> > > Hmm, do you have your own module in plan ?
> > >
> >
> > Maybe the new blkio_cgroup can also be made module-able.
> >
>
> Hmm, I read the patch slightly. I'm not enough expert to review this patch..
>
> I requst following as TODO.
> (No objection to the direction/patch.)
>
> 1. Add documentation about load/unlod module.

could perhaps use more, i suppose.

> It seems module unloading will not succuess while subsystem is mounted.
> Right ?

yes, when you mount it, it takes a reference on the module, so you get
"module is in use".

> 2. Making this to be reasonable value.
> #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> I can't find why.

"We limit to this many since cgroupfs_root has subsys_bits to keep track
of all of them." should it be less, perhaps? the memory footprint is not
great, it is true, but implementing dynamically sized subsystem tracking
data structures requires much cleverer code in many places.

> 3. show whehter a subsys is a loadable module or not via /proc/cgroups

with just "y" or "n"? possible, and probably easy. do note that since
they are sorted by subsys_id, all the ones above a certain line will be
"n" and all below will be "y".

> 4. how to test ? load/unload NET_CLS is enough ?

load, mount, remount, unmount, mount with different combinations, unload
was the general approach I took, plus using gdb to examine state.

>
> Last one is question.
>
> 5. Is following path is safe ?
>
> find_css_set() {
> ....
> read_lock(&css_set_lock);
> get template including pointer
> read_unlock(&css_set_lock);
>
> use template to build new css_set.

should be, because that code is dealing with a cgrp's/css's specific
->subsys array, which state is protected by refcounts held by the
already mounted hierarchy, and the other entries in the array are not
cared about by the particular css in question.

>
>
> Thanks,
> -Kame
>
>
>

2010-01-07 07:54:33

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Thu, 7 Jan 2010 02:48:12 -0500
Ben Blum <[email protected]> wrote:

> > 2. Making this to be reasonable value.
> > #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> > I can't find why.
>
> "We limit to this many since cgroupfs_root has subsys_bits to keep track
> of all of them." should it be less, perhaps?

It's ok if it's clear that
"this decistion is done by implementation choice, not by cgroup's nature"

> the memory footprint is not
> great, it is true, but implementing dynamically sized subsystem tracking
> data structures requires much cleverer code in many places.
>
yes. I don't request that.

> > 3. show whehter a subsys is a loadable module or not via /proc/cgroups
>
> with just "y" or "n"? possible, and probably easy. do note that since
> they are sorted by subsys_id, all the ones above a certain line will be
> "n" and all below will be "y".
>
yes.

#subsys_name hierarchy num_cgroups enabled module
cpuset 0 1 1 0

and 0/1 to show y/n ? (but this cause interface incompatibility...)


Thanks,
-Kame

2010-01-07 08:06:05

by Ben Blum

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Thu, Jan 07, 2010 at 04:51:17PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 7 Jan 2010 02:48:12 -0500
> Ben Blum <[email protected]> wrote:
>
> > > 2. Making this to be reasonable value.
> > > #define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
> > > I can't find why.
> >
> > "We limit to this many since cgroupfs_root has subsys_bits to keep track
> > of all of them." should it be less, perhaps?
>
> It's ok if it's clear that
> "this decistion is done by implementation choice, not by cgroup's nature"

mhm, well, it is the upper limit due to nature, but why it and not a
smaller number is by choice.

>
> > the memory footprint is not
> > great, it is true, but implementing dynamically sized subsystem tracking
> > data structures requires much cleverer code in many places.
> >
> yes. I don't request that.

it might be possible to have a config option as CGROUP_EXTRA_SUBSYSTEMS
(with max being 64 or 32) which would add slots for subsystems outside
of the kernel tree, to avoid using up a lot of blank slots in typical
use cases. not entirely sure how to implement it in the scope of the
configuration world, just speculation.

> > > 3. show whehter a subsys is a loadable module or not via /proc/cgroups
> >
> > with just "y" or "n"? possible, and probably easy. do note that since
> > they are sorted by subsys_id, all the ones above a certain line will be
> > "n" and all below will be "y".
> >
> yes.
>
> #subsys_name hierarchy num_cgroups enabled module
> cpuset 0 1 1 0
>
> and 0/1 to show y/n ? (but this cause interface incompatibility...)

well, format should be agreed upon. 1/0 would be consistent with the
rest of the output.

>
>
> Thanks,
> -Kame
>
>
>

2010-01-07 08:14:58

by Ben Blum

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <[email protected]> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <[email protected]> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.

certainly looks promising. one thing I note while looking over it is
that it wants .use_id = 1, and the load_subsys interface neglects to
make a call to init_idr. adding calls to cgroup_init_idr has a few
possible complications... what use does blkio have for use_id?

2010-01-07 08:23:21

by Ben Blum

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Thu, Jan 07, 2010 at 03:14:32AM -0500, Ben Blum wrote:
> On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> > KAMEZAWA Hiroyuki wrote:
> > > On Wed, 6 Jan 2010 20:26:06 -0500
> > > Ben Blum <[email protected]> wrote:
> > >
> > >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > >>> On Thu, 31 Dec 2009 00:10:50 -0500
> > >>> Ben Blum <[email protected]> wrote:
> > >>>
> > >>>> This patch series implements support for building, loading, and
> > >>>> unloading subsystems as modules, both within and outside the kernel
> > >>>> source tree. It provides an interface cgroup_load_subsys() and
> > >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> > >>>> depart during runtime. The net_cls classifier subsystem serves as the
> > >>>> example for a subsystem which can be converted into a module using these
> > >>>> changes.
> > >>> What is the value in this? What are the usage scenarios? Why does the
> > >>> benefit of this change exceed the cost/risk/etc of merging it?
> > >> As discussed in the first posting of these patches, this provides the
> > >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> > >> would have already been a module except for a lack of support from
> > >> cgroups, and the change also allows other module-loadable classifiers
> > >> to add subsystems of their own.
> > >
> > > Hmm, do you have your own module in plan ?
> > >
> >
> > Maybe the new blkio_cgroup can also be made module-able.
>
> certainly looks promising. one thing I note while looking over it is
> that it wants .use_id = 1, and the load_subsys interface neglects to
> make a call to init_idr. adding calls to cgroup_init_idr has a few
> possible complications... what use does blkio have for use_id?

oh, oops, while tracing the calls that init_idr does i ended up looking
at the wrong functions and made it out to be harder than it was (in
several places, even). still it would need to be included in load_subsys
and the single error case handled appropriately.

2010-01-08 05:28:35

by Ben Blum

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] cgroups: support for module-loadable subsystems

On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> KAMEZAWA Hiroyuki wrote:
> > On Wed, 6 Jan 2010 20:26:06 -0500
> > Ben Blum <[email protected]> wrote:
> >
> >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> >>> On Thu, 31 Dec 2009 00:10:50 -0500
> >>> Ben Blum <[email protected]> wrote:
> >>>
> >>>> This patch series implements support for building, loading, and
> >>>> unloading subsystems as modules, both within and outside the kernel
> >>>> source tree. It provides an interface cgroup_load_subsys() and
> >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> >>>> depart during runtime. The net_cls classifier subsystem serves as the
> >>>> example for a subsystem which can be converted into a module using these
> >>>> changes.
> >>> What is the value in this? What are the usage scenarios? Why does the
> >>> benefit of this change exceed the cost/risk/etc of merging it?
> >> As discussed in the first posting of these patches, this provides the
> >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> >> would have already been a module except for a lack of support from
> >> cgroups, and the change also allows other module-loadable classifiers
> >> to add subsystems of their own.
> >
> > Hmm, do you have your own module in plan ?
> >
>
> Maybe the new blkio_cgroup can also be made module-able.

Ok, the following two patches make this happen (or at least pretend to
well enough to fool me). The first one adds use_id initialization in
cgroup_load_subsys, and the second rearranges config options and some
code as appropriate in block/ and adds EXPORT_SYMBOLs in cgroup.c.

-- bblum

---
block/Kconfig | 2 -
block/Kconfig.iosched | 2 -
block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++-----------
block/blk-cgroup.h | 10 ++++++--
include/linux/iocontext.h | 2 -
kernel/cgroup.c | 31 +++++++++++++++++++++-----
6 files changed, 77 insertions(+), 23 deletions(-)

2010-01-08 05:29:37

by Ben Blum

[permalink] [raw]
Subject: [RFC] [PATCH 1/2] cgroups: modular subsystems support for use_id


On Fri, Jan 08, 2010 at 12:27:34AM -0500, Ben Blum wrote:
> On Thu, Jan 07, 2010 at 02:42:19PM +0800, Li Zefan wrote:
> > KAMEZAWA Hiroyuki wrote:
> > > On Wed, 6 Jan 2010 20:26:06 -0500
> > > Ben Blum <[email protected]> wrote:
> > >
> > >> On Wed, Jan 06, 2010 at 04:04:14PM -0800, Andrew Morton wrote:
> > >>> On Thu, 31 Dec 2009 00:10:50 -0500
> > >>> Ben Blum <[email protected]> wrote:
> > >>>
> > >>>> This patch series implements support for building, loading, and
> > >>>> unloading subsystems as modules, both within and outside the kernel
> > >>>> source tree. It provides an interface cgroup_load_subsys() and
> > >>>> cgroup_unload_subsys() which modular subsystems can use to register and
> > >>>> depart during runtime. The net_cls classifier subsystem serves as the
> > >>>> example for a subsystem which can be converted into a module using these
> > >>>> changes.
> > >>> What is the value in this? What are the usage scenarios? Why does the
> > >>> benefit of this change exceed the cost/risk/etc of merging it?
> > >> As discussed in the first posting of these patches, this provides the
> > >> ability for arbitrary subsystems to be used with cgroups.. cls_cgroup
> > >> would have already been a module except for a lack of support from
> > >> cgroups, and the change also allows other module-loadable classifiers
> > >> to add subsystems of their own.
> > >
> > > Hmm, do you have your own module in plan ?
> > >
> >
> > Maybe the new blkio_cgroup can also be made module-able.
>
> Ok, the following two patches make this happen (or at least pretend to
> well enough to fool me). The first one adds use_id initialization in
> cgroup_load_subsys, and the second rearranges config options and some
> code as appropriate in block/ and adds EXPORT_SYMBOLs in cgroup.c.
>
> -- bblum
>
> ---
> block/Kconfig | 2 -
> block/Kconfig.iosched | 2 -
> block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++-----------
> block/blk-cgroup.h | 10 ++++++--
> include/linux/iocontext.h | 2 -
> kernel/cgroup.c | 31 +++++++++++++++++++++-----
> 6 files changed, 77 insertions(+), 23 deletions(-)
>
>


Attachments:
(No filename) (2.14 kB)
cgroups-module-use_id-support.patch (2.38 kB)
Download all attachments

2010-01-08 05:31:12

by Ben Blum

[permalink] [raw]
Subject: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module

Convert blk-cgroup to be buildable as a module

From: Ben Blum <[email protected]>

This patch modifies the Block I/O cgroup subsystem to be able to be built as a
module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
enhanced to support the new module dependency.

Signed-off-by: Ben Blum <[email protected]>
---
block/Kconfig | 2 +-
block/Kconfig.iosched | 2 +-
block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
block/blk-cgroup.h | 10 +++++++-
include/linux/iocontext.h | 2 +-
kernel/cgroup.c | 8 +++++++
6 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index e20fbde..62a5921 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
Protection. If in doubt, say N.

config BLK_CGROUP
- bool
+ tristate
depends on CGROUPS
default n
---help---
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index b71abfb..fc71cf0 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -23,6 +23,7 @@ config IOSCHED_DEADLINE

config IOSCHED_CFQ
tristate "CFQ I/O scheduler"
+ select BLK_CGROUP if CFQ_GROUP_IOSCHED
default y
---help---
The CFQ I/O scheduler tries to distribute bandwidth equally
@@ -35,7 +36,6 @@ config IOSCHED_CFQ
config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && CGROUPS
- select BLK_CGROUP
default n
---help---
Enable group IO scheduling in CFQ.
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1fa2654..6c73380 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
EXPORT_SYMBOL_GPL(blkio_root_cgroup);

+static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
+ struct cgroup *);
+static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
+ struct task_struct *, bool);
+static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
+ struct cgroup *, struct task_struct *, bool);
+static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
+static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
+
+struct cgroup_subsys blkio_subsys = {
+ .name = "blkio",
+ .create = blkiocg_create,
+ .can_attach = blkiocg_can_attach,
+ .attach = blkiocg_attach,
+ .destroy = blkiocg_destroy,
+ .populate = blkiocg_populate,
+#ifdef CONFIG_BLK_CGROUP
+ /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
+ .subsys_id = blkio_subsys_id,
+#endif
+ .use_id = 1,
+ .module = THIS_MODULE,
+};
+EXPORT_SYMBOL_GPL(blkio_subsys);
+
bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
{
if (!css_tryget(&blkcg->css))
@@ -267,7 +292,8 @@ remove_entry:
done:
free_css_id(&blkio_subsys, &blkcg->css);
rcu_read_unlock();
- kfree(blkcg);
+ if (blkcg != &blkio_root_cgroup)
+ kfree(blkcg);
}

static struct cgroup_subsys_state *
@@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
task_unlock(tsk);
}

-struct cgroup_subsys blkio_subsys = {
- .name = "blkio",
- .create = blkiocg_create,
- .can_attach = blkiocg_can_attach,
- .attach = blkiocg_attach,
- .destroy = blkiocg_destroy,
- .populate = blkiocg_populate,
- .subsys_id = blkio_subsys_id,
- .use_id = 1,
-};
-
void blkio_policy_register(struct blkio_policy_type *blkiop)
{
spin_lock(&blkio_list_lock);
@@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
spin_unlock(&blkio_list_lock);
}
EXPORT_SYMBOL_GPL(blkio_policy_unregister);
+
+static int __init init_cgroup_blkio(void)
+{
+ return cgroup_load_subsys(&blkio_subsys);
+}
+
+static void __exit exit_cgroup_blkio(void)
+{
+ cgroup_unload_subsys(&blkio_subsys);
+}
+
+module_init(init_cgroup_blkio);
+module_exit(exit_cgroup_blkio);
+MODULE_LICENSE("GPL");
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 4d316df..57648c6 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -15,7 +15,13 @@

#include <linux/cgroup.h>

-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
+
+#ifndef CONFIG_BLK_CGROUP
+/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
+extern struct cgroup_subsys blkio_subsys;
+#define blkio_subsys_id blkio_subsys.subsys_id
+#endif

struct blkio_cgroup {
struct cgroup_subsys_state css;
@@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
struct blkio_group *blkg, unsigned long dequeue) {}
#endif

-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
extern struct blkio_cgroup blkio_root_cgroup;
extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index a632359..b9f109d 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -68,7 +68,7 @@ struct io_context {
unsigned short ioprio;
unsigned short ioprio_changed;

-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
unsigned short cgroup_changed;
#endif

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5af59eb..391ff41 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -690,6 +690,7 @@ void cgroup_lock(void)
{
mutex_lock(&cgroup_mutex);
}
+EXPORT_SYMBOL_GPL(cgroup_lock);

/**
* cgroup_unlock - release lock on cgroup changes
@@ -700,6 +701,7 @@ void cgroup_unlock(void)
{
mutex_unlock(&cgroup_mutex);
}
+EXPORT_SYMBOL_GPL(cgroup_unlock);

/*
* A couple of forward declarations required, due to cyclic reference loop:
@@ -1762,6 +1764,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
}
return true;
}
+EXPORT_SYMBOL_GPL(cgroup_lock_live_group);

static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
const char *buffer)
@@ -4034,6 +4037,7 @@ void __css_put(struct cgroup_subsys_state *css)
rcu_read_unlock();
WARN_ON_ONCE(val < 1);
}
+EXPORT_SYMBOL_GPL(__css_put);

/*
* Notify userspace when a cgroup is released, by running the
@@ -4149,6 +4153,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
return cssid->id;
return 0;
}
+EXPORT_SYMBOL_GPL(css_id);

unsigned short css_depth(struct cgroup_subsys_state *css)
{
@@ -4158,6 +4163,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
return cssid->depth;
return 0;
}
+EXPORT_SYMBOL_GPL(css_depth);

bool css_is_ancestor(struct cgroup_subsys_state *child,
const struct cgroup_subsys_state *root)
@@ -4194,6 +4200,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
spin_unlock(&ss->id_lock);
call_rcu(&id->rcu_head, __free_css_id_cb);
}
+EXPORT_SYMBOL_GPL(free_css_id);

/*
* This is called by init or create(). Then, calls to this function are
@@ -4310,6 +4317,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)

return rcu_dereference(cssid->css);
}
+EXPORT_SYMBOL_GPL(css_lookup);

/**
* css_get_next - lookup next cgroup under specified hierarchy.


Attachments:
cgroups-blkio-as-module.patch (7.44 kB)

2010-01-08 15:11:46

by Vivek Goyal

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module

On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <[email protected]>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.
>

Hi Ben,

I will give this patch a try.

So from blk-cgroup perspective, the advantage of allowing it as module
will be that we can save some memory if we are not using the controller?


> Signed-off-by: Ben Blum <[email protected]>
> ---
> block/Kconfig | 2 +-
> block/Kconfig.iosched | 2 +-
> block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
> block/blk-cgroup.h | 10 +++++++-
> include/linux/iocontext.h | 2 +-
> kernel/cgroup.c | 8 +++++++
> 6 files changed, 60 insertions(+), 17 deletions(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index e20fbde..62a5921 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
> Protection. If in doubt, say N.
>
> config BLK_CGROUP
> - bool
> + tristate
> depends on CGROUPS
> default n
> ---help---
> diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
> index b71abfb..fc71cf0 100644
> --- a/block/Kconfig.iosched
> +++ b/block/Kconfig.iosched
> @@ -23,6 +23,7 @@ config IOSCHED_DEADLINE
>
> config IOSCHED_CFQ
> tristate "CFQ I/O scheduler"
> + select BLK_CGROUP if CFQ_GROUP_IOSCHED
> default y
> ---help---
> The CFQ I/O scheduler tries to distribute bandwidth equally
> @@ -35,7 +36,6 @@ config IOSCHED_CFQ
> config CFQ_GROUP_IOSCHED
> bool "CFQ Group Scheduling support"
> depends on IOSCHED_CFQ && CGROUPS
> - select BLK_CGROUP
> default n
> ---help---
> Enable group IO scheduling in CFQ.
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 1fa2654..6c73380 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
> struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
> EXPORT_SYMBOL_GPL(blkio_root_cgroup);
>
> +static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
> + struct cgroup *);
> +static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
> + struct task_struct *, bool);
> +static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
> + struct cgroup *, struct task_struct *, bool);
> +static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
> +static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
> +
> +struct cgroup_subsys blkio_subsys = {
> + .name = "blkio",
> + .create = blkiocg_create,
> + .can_attach = blkiocg_can_attach,
> + .attach = blkiocg_attach,
> + .destroy = blkiocg_destroy,
> + .populate = blkiocg_populate,
> +#ifdef CONFIG_BLK_CGROUP
> + /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
> + .subsys_id = blkio_subsys_id,
> +#endif
> + .use_id = 1,
> + .module = THIS_MODULE,
> +};
> +EXPORT_SYMBOL_GPL(blkio_subsys);

Why are you moving blkio_subsys declaration up in the file. That way now
you have to declare all the functions before you can instanciate
blkio_subsys?

> +
> bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
> {
> if (!css_tryget(&blkcg->css))
> @@ -267,7 +292,8 @@ remove_entry:
> done:
> free_css_id(&blkio_subsys, &blkcg->css);
> rcu_read_unlock();
> - kfree(blkcg);
> + if (blkcg != &blkio_root_cgroup)
> + kfree(blkcg);
> }
>
> static struct cgroup_subsys_state *
> @@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
> task_unlock(tsk);
> }
>
> -struct cgroup_subsys blkio_subsys = {
> - .name = "blkio",
> - .create = blkiocg_create,
> - .can_attach = blkiocg_can_attach,
> - .attach = blkiocg_attach,
> - .destroy = blkiocg_destroy,
> - .populate = blkiocg_populate,
> - .subsys_id = blkio_subsys_id,
> - .use_id = 1,
> -};
> -
> void blkio_policy_register(struct blkio_policy_type *blkiop)
> {
> spin_lock(&blkio_list_lock);
> @@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
> spin_unlock(&blkio_list_lock);
> }
> EXPORT_SYMBOL_GPL(blkio_policy_unregister);
> +
> +static int __init init_cgroup_blkio(void)
> +{
> + return cgroup_load_subsys(&blkio_subsys);
> +}
> +
> +static void __exit exit_cgroup_blkio(void)
> +{
> + cgroup_unload_subsys(&blkio_subsys);
> +}
> +
> +module_init(init_cgroup_blkio);
> +module_exit(exit_cgroup_blkio);
> +MODULE_LICENSE("GPL");
> diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
> index 4d316df..57648c6 100644
> --- a/block/blk-cgroup.h
> +++ b/block/blk-cgroup.h
> @@ -15,7 +15,13 @@
>
> #include <linux/cgroup.h>
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> +
> +#ifndef CONFIG_BLK_CGROUP
> +/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
> +extern struct cgroup_subsys blkio_subsys;
> +#define blkio_subsys_id blkio_subsys.subsys_id
> +#endif
>

Could above if conditions be simplified to just check for BLK_CGROUP_MODULE.

#ifdef CONFIG_BLK_CGROUP_MODULE
extern struct cgroup_subsys blkio_subsys;
#define blkio_subsys_id blkio_subsys.subsys_id
#endif

Thanks
Vivek

> struct blkio_cgroup {
> struct cgroup_subsys_state css;
> @@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
> struct blkio_group *blkg, unsigned long dequeue) {}
> #endif
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> extern struct blkio_cgroup blkio_root_cgroup;
> extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
> extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
> diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
> index a632359..b9f109d 100644
> --- a/include/linux/iocontext.h
> +++ b/include/linux/iocontext.h
> @@ -68,7 +68,7 @@ struct io_context {
> unsigned short ioprio;
> unsigned short ioprio_changed;
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> unsigned short cgroup_changed;
> #endif
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 5af59eb..391ff41 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -690,6 +690,7 @@ void cgroup_lock(void)
> {
> mutex_lock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock);
>
> /**
> * cgroup_unlock - release lock on cgroup changes
> @@ -700,6 +701,7 @@ void cgroup_unlock(void)
> {
> mutex_unlock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_unlock);
>
> /*
> * A couple of forward declarations required, due to cyclic reference loop:
> @@ -1762,6 +1764,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
> }
> return true;
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock_live_group);
>
> static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
> const char *buffer)
> @@ -4034,6 +4037,7 @@ void __css_put(struct cgroup_subsys_state *css)
> rcu_read_unlock();
> WARN_ON_ONCE(val < 1);
> }
> +EXPORT_SYMBOL_GPL(__css_put);
>
> /*
> * Notify userspace when a cgroup is released, by running the
> @@ -4149,6 +4153,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
> return cssid->id;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_id);
>
> unsigned short css_depth(struct cgroup_subsys_state *css)
> {
> @@ -4158,6 +4163,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
> return cssid->depth;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_depth);
>
> bool css_is_ancestor(struct cgroup_subsys_state *child,
> const struct cgroup_subsys_state *root)
> @@ -4194,6 +4200,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
> spin_unlock(&ss->id_lock);
> call_rcu(&id->rcu_head, __free_css_id_cb);
> }
> +EXPORT_SYMBOL_GPL(free_css_id);
>
> /*
> * This is called by init or create(). Then, calls to this function are
> @@ -4310,6 +4317,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
>
> return rcu_dereference(cssid->css);
> }
> +EXPORT_SYMBOL_GPL(css_lookup);
>
> /**
> * css_get_next - lookup next cgroup under specified hierarchy.

2010-01-08 16:34:19

by Vivek Goyal

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module

On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <[email protected]>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.
>
> Signed-off-by: Ben Blum <[email protected]>

Two quick observations with testing.

You need to EXPORT cgroup_path.

Second, after loading the module, I mounted the blkio controller. But creating
a cgroup directory crashed.

Vivek

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
PGD 1a7fe0067 PUD 1a0f19067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 3
Pid: 3984, comm: mkdir Not tainted 2.6.33-rc3-cgroup-module #2 /ProLiant DL380 G5
RIP: 0010:[<ffffffff8106bcb4>] [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
RSP: 0018:ffff8801a0f17e38 EFLAGS: 00010203
RAX: ffff8801a8cbde00 RBX: ffffffffa02cc8e0 RCX: ffff8801a8cbde00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff8801a770c000 R08: 0000000000000040 R09: ffff8801a0f17d48
R10: ffff8801a0f17ec8 R11: ffffffff8113efb4 R12: 0000000000000001
R13: ffff8801a0f22000 R14: ffff8801948aa980 R15: ffff8801a0f22030
FS: 00007ff9ecf5a710(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 00000001a257e000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mkdir (pid: 3984, threadinfo ffff8801a0f16000, task ffff8801a8e2c040)
Stack:
0000000000000003 000001ed948a8db0 ffff8801a0e1e800 0000000000000000
<0> ffff8801a7745940 ffff8801948a8db0 ffff8801948aa980 0000000000000000
<0> 00000000000001ed 00007fff661cb6eb 0000000000000000 ffffffff810d837e
Call Trace:
[<ffffffff810d837e>] ? vfs_mkdir+0xd5/0x164
[<ffffffff810da743>] ? sys_mkdirat+0x85/0xd1
[<ffffffff810773be>] ? audit_syscall_entry+0x1b9/0x1e4
[<ffffffff8100286b>] ? system_call_fastpath+0x16/0x1b
Code: 4c 24 18 e8 82 f3 ff ff 31 f6 48 3d 00 f0 ff ff 48 89 c1 76 20 85 c0 74 35 45 31 e4 e9 5a 01 00 00 48 8b 7c 24 18 48 63 d6 ff c6 <66> 8b 44 57 20 66 89 44 51 20 44 39 e6 7c e7 8b 51 08 49 63 c4
RIP [<ffffffff8106bcb4>] cgroup_mkdir+0x17e/0x350
RSP <ffff8801a0f17e38>
CR2: 0000000000000020
---[ end trace 09377bc5e5c2b563 ]---

> ---
> block/Kconfig | 2 +-
> block/Kconfig.iosched | 2 +-
> block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
> block/blk-cgroup.h | 10 +++++++-
> include/linux/iocontext.h | 2 +-
> kernel/cgroup.c | 8 +++++++
> 6 files changed, 60 insertions(+), 17 deletions(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index e20fbde..62a5921 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
> Protection. If in doubt, say N.
>
> config BLK_CGROUP
> - bool
> + tristate
> depends on CGROUPS
> default n
> ---help---
> diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
> index b71abfb..fc71cf0 100644
> --- a/block/Kconfig.iosched
> +++ b/block/Kconfig.iosched
> @@ -23,6 +23,7 @@ config IOSCHED_DEADLINE
>
> config IOSCHED_CFQ
> tristate "CFQ I/O scheduler"
> + select BLK_CGROUP if CFQ_GROUP_IOSCHED
> default y
> ---help---
> The CFQ I/O scheduler tries to distribute bandwidth equally
> @@ -35,7 +36,6 @@ config IOSCHED_CFQ
> config CFQ_GROUP_IOSCHED
> bool "CFQ Group Scheduling support"
> depends on IOSCHED_CFQ && CGROUPS
> - select BLK_CGROUP
> default n
> ---help---
> Enable group IO scheduling in CFQ.
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 1fa2654..6c73380 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
> struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
> EXPORT_SYMBOL_GPL(blkio_root_cgroup);
>
> +static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
> + struct cgroup *);
> +static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
> + struct task_struct *, bool);
> +static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
> + struct cgroup *, struct task_struct *, bool);
> +static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
> +static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
> +
> +struct cgroup_subsys blkio_subsys = {
> + .name = "blkio",
> + .create = blkiocg_create,
> + .can_attach = blkiocg_can_attach,
> + .attach = blkiocg_attach,
> + .destroy = blkiocg_destroy,
> + .populate = blkiocg_populate,
> +#ifdef CONFIG_BLK_CGROUP
> + /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
> + .subsys_id = blkio_subsys_id,
> +#endif
> + .use_id = 1,
> + .module = THIS_MODULE,
> +};
> +EXPORT_SYMBOL_GPL(blkio_subsys);
> +
> bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
> {
> if (!css_tryget(&blkcg->css))
> @@ -267,7 +292,8 @@ remove_entry:
> done:
> free_css_id(&blkio_subsys, &blkcg->css);
> rcu_read_unlock();
> - kfree(blkcg);
> + if (blkcg != &blkio_root_cgroup)
> + kfree(blkcg);
> }
>
> static struct cgroup_subsys_state *
> @@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
> task_unlock(tsk);
> }
>
> -struct cgroup_subsys blkio_subsys = {
> - .name = "blkio",
> - .create = blkiocg_create,
> - .can_attach = blkiocg_can_attach,
> - .attach = blkiocg_attach,
> - .destroy = blkiocg_destroy,
> - .populate = blkiocg_populate,
> - .subsys_id = blkio_subsys_id,
> - .use_id = 1,
> -};
> -
> void blkio_policy_register(struct blkio_policy_type *blkiop)
> {
> spin_lock(&blkio_list_lock);
> @@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
> spin_unlock(&blkio_list_lock);
> }
> EXPORT_SYMBOL_GPL(blkio_policy_unregister);
> +
> +static int __init init_cgroup_blkio(void)
> +{
> + return cgroup_load_subsys(&blkio_subsys);
> +}
> +
> +static void __exit exit_cgroup_blkio(void)
> +{
> + cgroup_unload_subsys(&blkio_subsys);
> +}
> +
> +module_init(init_cgroup_blkio);
> +module_exit(exit_cgroup_blkio);
> +MODULE_LICENSE("GPL");
> diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
> index 4d316df..57648c6 100644
> --- a/block/blk-cgroup.h
> +++ b/block/blk-cgroup.h
> @@ -15,7 +15,13 @@
>
> #include <linux/cgroup.h>
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> +
> +#ifndef CONFIG_BLK_CGROUP
> +/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
> +extern struct cgroup_subsys blkio_subsys;
> +#define blkio_subsys_id blkio_subsys.subsys_id
> +#endif
>
> struct blkio_cgroup {
> struct cgroup_subsys_state css;
> @@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
> struct blkio_group *blkg, unsigned long dequeue) {}
> #endif
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> extern struct blkio_cgroup blkio_root_cgroup;
> extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
> extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
> diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
> index a632359..b9f109d 100644
> --- a/include/linux/iocontext.h
> +++ b/include/linux/iocontext.h
> @@ -68,7 +68,7 @@ struct io_context {
> unsigned short ioprio;
> unsigned short ioprio_changed;
>
> -#ifdef CONFIG_BLK_CGROUP
> +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
> unsigned short cgroup_changed;
> #endif
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 5af59eb..391ff41 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -690,6 +690,7 @@ void cgroup_lock(void)
> {
> mutex_lock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock);
>
> /**
> * cgroup_unlock - release lock on cgroup changes
> @@ -700,6 +701,7 @@ void cgroup_unlock(void)
> {
> mutex_unlock(&cgroup_mutex);
> }
> +EXPORT_SYMBOL_GPL(cgroup_unlock);
>
> /*
> * A couple of forward declarations required, due to cyclic reference loop:
> @@ -1762,6 +1764,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
> }
> return true;
> }
> +EXPORT_SYMBOL_GPL(cgroup_lock_live_group);
>
> static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
> const char *buffer)
> @@ -4034,6 +4037,7 @@ void __css_put(struct cgroup_subsys_state *css)
> rcu_read_unlock();
> WARN_ON_ONCE(val < 1);
> }
> +EXPORT_SYMBOL_GPL(__css_put);
>
> /*
> * Notify userspace when a cgroup is released, by running the
> @@ -4149,6 +4153,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
> return cssid->id;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_id);
>
> unsigned short css_depth(struct cgroup_subsys_state *css)
> {
> @@ -4158,6 +4163,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
> return cssid->depth;
> return 0;
> }
> +EXPORT_SYMBOL_GPL(css_depth);
>
> bool css_is_ancestor(struct cgroup_subsys_state *child,
> const struct cgroup_subsys_state *root)
> @@ -4194,6 +4200,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
> spin_unlock(&ss->id_lock);
> call_rcu(&id->rcu_head, __free_css_id_cb);
> }
> +EXPORT_SYMBOL_GPL(free_css_id);
>
> /*
> * This is called by init or create(). Then, calls to this function are
> @@ -4310,6 +4317,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)
>
> return rcu_dereference(cssid->css);
> }
> +EXPORT_SYMBOL_GPL(css_lookup);
>
> /**
> * css_get_next - lookup next cgroup under specified hierarchy.

2010-01-12 00:24:23

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module

On Fri, 8 Jan 2010 10:10:38 -0500
Vivek Goyal <[email protected]> wrote:

> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> > Convert blk-cgroup to be buildable as a module
> >
> > From: Ben Blum <[email protected]>
> >
> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> > enhanced to support the new module dependency.
> >
>
> Hi Ben,
>
> I will give this patch a try.
>
> So from blk-cgroup perspective, the advantage of allowing it as module
> will be that we can save some memory if we are not using the controller?
>
Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
introduced ?

Thanks,
-Kame

2010-01-13 00:35:01

by Ben Blum

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module

On Fri, Jan 08, 2010 at 11:33:52AM -0500, Vivek Goyal wrote:
> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> > Convert blk-cgroup to be buildable as a module
> >
> > From: Ben Blum <[email protected]>
> >
> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> > enhanced to support the new module dependency.
> >
> > Signed-off-by: Ben Blum <[email protected]>
>
> Two quick observations with testing.
>
> You need to EXPORT cgroup_path.
>
> Second, after loading the module, I mounted the blkio controller. But creating
> a cgroup directory crashed.
>
> Vivek

argh, good catches on both of them. didn't test with DEBUG_CFQ_IOSCHED
(for cgroup_path) or with making a sub-cgroup (for the crash); shame on
me. turns out it crashed because I had init_idr before init_css_set, and
init_css_set sets css->id = NULL explicitly. fixed patches forthcoming.

-- bblum

---
block/Kconfig | 2 -
block/Kconfig.iosched | 2 -
block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++-----------
block/blk-cgroup.h | 10 ++++++--
include/linux/iocontext.h | 2 -
kernel/cgroup.c | 34 ++++++++++++++++++++++++-----
6 files changed, 80 insertions(+), 23 deletions(-)

2010-01-13 00:35:10

by Ben Blum

[permalink] [raw]
Subject: [PATCH v2 2/2] cgroups: blkio subsystem as module

Convert blk-cgroup to be buildable as a module

From: Ben Blum <[email protected]>

This patch modifies the Block I/O cgroup subsystem to be able to be built as a
module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
enhanced to support the new module dependency.

Signed-off-by: Ben Blum <[email protected]>
---
block/Kconfig | 2 +-
block/Kconfig.iosched | 2 +-
block/blk-cgroup.c | 53 +++++++++++++++++++++++++++++++++++----------
block/blk-cgroup.h | 10 +++++++-
include/linux/iocontext.h | 2 +-
kernel/cgroup.c | 9 ++++++++
6 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index e20fbde..62a5921 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -78,7 +78,7 @@ config BLK_DEV_INTEGRITY
Protection. If in doubt, say N.

config BLK_CGROUP
- bool
+ tristate
depends on CGROUPS
default n
---help---
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index b71abfb..fc71cf0 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -23,6 +23,7 @@ config IOSCHED_DEADLINE

config IOSCHED_CFQ
tristate "CFQ I/O scheduler"
+ select BLK_CGROUP if CFQ_GROUP_IOSCHED
default y
---help---
The CFQ I/O scheduler tries to distribute bandwidth equally
@@ -35,7 +36,6 @@ config IOSCHED_CFQ
config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && CGROUPS
- select BLK_CGROUP
default n
---help---
Enable group IO scheduling in CFQ.
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1fa2654..6c73380 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -23,6 +23,31 @@ static LIST_HEAD(blkio_list);
struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT };
EXPORT_SYMBOL_GPL(blkio_root_cgroup);

+static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
+ struct cgroup *);
+static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
+ struct task_struct *, bool);
+static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
+ struct cgroup *, struct task_struct *, bool);
+static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
+static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
+
+struct cgroup_subsys blkio_subsys = {
+ .name = "blkio",
+ .create = blkiocg_create,
+ .can_attach = blkiocg_can_attach,
+ .attach = blkiocg_attach,
+ .destroy = blkiocg_destroy,
+ .populate = blkiocg_populate,
+#ifdef CONFIG_BLK_CGROUP
+ /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */
+ .subsys_id = blkio_subsys_id,
+#endif
+ .use_id = 1,
+ .module = THIS_MODULE,
+};
+EXPORT_SYMBOL_GPL(blkio_subsys);
+
bool blkiocg_css_tryget(struct blkio_cgroup *blkcg)
{
if (!css_tryget(&blkcg->css))
@@ -267,7 +292,8 @@ remove_entry:
done:
free_css_id(&blkio_subsys, &blkcg->css);
rcu_read_unlock();
- kfree(blkcg);
+ if (blkcg != &blkio_root_cgroup)
+ kfree(blkcg);
}

static struct cgroup_subsys_state *
@@ -333,17 +359,6 @@ static void blkiocg_attach(struct cgroup_subsys *subsys, struct cgroup *cgroup,
task_unlock(tsk);
}

-struct cgroup_subsys blkio_subsys = {
- .name = "blkio",
- .create = blkiocg_create,
- .can_attach = blkiocg_can_attach,
- .attach = blkiocg_attach,
- .destroy = blkiocg_destroy,
- .populate = blkiocg_populate,
- .subsys_id = blkio_subsys_id,
- .use_id = 1,
-};
-
void blkio_policy_register(struct blkio_policy_type *blkiop)
{
spin_lock(&blkio_list_lock);
@@ -359,3 +374,17 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop)
spin_unlock(&blkio_list_lock);
}
EXPORT_SYMBOL_GPL(blkio_policy_unregister);
+
+static int __init init_cgroup_blkio(void)
+{
+ return cgroup_load_subsys(&blkio_subsys);
+}
+
+static void __exit exit_cgroup_blkio(void)
+{
+ cgroup_unload_subsys(&blkio_subsys);
+}
+
+module_init(init_cgroup_blkio);
+module_exit(exit_cgroup_blkio);
+MODULE_LICENSE("GPL");
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 4d316df..57648c6 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -15,7 +15,13 @@

#include <linux/cgroup.h>

-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
+
+#ifndef CONFIG_BLK_CGROUP
+/* When blk-cgroup is a module, its subsys_id isn't a compile-time constant */
+extern struct cgroup_subsys blkio_subsys;
+#define blkio_subsys_id blkio_subsys.subsys_id
+#endif

struct blkio_cgroup {
struct cgroup_subsys_state css;
@@ -94,7 +100,7 @@ static inline void blkiocg_update_blkio_group_dequeue_stats(
struct blkio_group *blkg, unsigned long dequeue) {}
#endif

-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
extern struct blkio_cgroup blkio_root_cgroup;
extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index a632359..b9f109d 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -68,7 +68,7 @@ struct io_context {
unsigned short ioprio;
unsigned short ioprio_changed;

-#ifdef CONFIG_BLK_CGROUP
+#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
unsigned short cgroup_changed;
#endif

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b4ae6ef..845a2e7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -690,6 +690,7 @@ void cgroup_lock(void)
{
mutex_lock(&cgroup_mutex);
}
+EXPORT_SYMBOL_GPL(cgroup_lock);

/**
* cgroup_unlock - release lock on cgroup changes
@@ -700,6 +701,7 @@ void cgroup_unlock(void)
{
mutex_unlock(&cgroup_mutex);
}
+EXPORT_SYMBOL_GPL(cgroup_unlock);

/*
* A couple of forward declarations required, due to cyclic reference loop:
@@ -1622,6 +1624,7 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
memmove(buf, start, buf + buflen - start);
return 0;
}
+EXPORT_SYMBOL_GPL(cgroup_path);

/**
* cgroup_attach_task - attach task 'tsk' to cgroup 'cgrp'
@@ -1762,6 +1765,7 @@ bool cgroup_lock_live_group(struct cgroup *cgrp)
}
return true;
}
+EXPORT_SYMBOL_GPL(cgroup_lock_live_group);

static int cgroup_release_agent_write(struct cgroup *cgrp, struct cftype *cft,
const char *buffer)
@@ -4036,6 +4040,7 @@ void __css_put(struct cgroup_subsys_state *css)
rcu_read_unlock();
WARN_ON_ONCE(val < 1);
}
+EXPORT_SYMBOL_GPL(__css_put);

/*
* Notify userspace when a cgroup is released, by running the
@@ -4151,6 +4156,7 @@ unsigned short css_id(struct cgroup_subsys_state *css)
return cssid->id;
return 0;
}
+EXPORT_SYMBOL_GPL(css_id);

unsigned short css_depth(struct cgroup_subsys_state *css)
{
@@ -4160,6 +4166,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css)
return cssid->depth;
return 0;
}
+EXPORT_SYMBOL_GPL(css_depth);

bool css_is_ancestor(struct cgroup_subsys_state *child,
const struct cgroup_subsys_state *root)
@@ -4196,6 +4203,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
spin_unlock(&ss->id_lock);
call_rcu(&id->rcu_head, __free_css_id_cb);
}
+EXPORT_SYMBOL_GPL(free_css_id);

/*
* This is called by init or create(). Then, calls to this function are
@@ -4312,6 +4320,7 @@ struct cgroup_subsys_state *css_lookup(struct cgroup_subsys *ss, int id)

return rcu_dereference(cssid->css);
}
+EXPORT_SYMBOL_GPL(css_lookup);

/**
* css_get_next - lookup next cgroup under specified hierarchy.


Attachments:
cgroups-blkio-as-module.patch (7.68 kB)

2010-01-13 00:35:41

by Ben Blum

[permalink] [raw]
Subject: [PATCH v2 1/2] cgroups: modular subsystems support for use_id

This patch adds support for subsys.use_id in module-loadable subsystems.

From: Ben Blum <[email protected]>

Signed-off-by: Ben Blum <[email protected]>
---
kernel/cgroup.c | 25 +++++++++++++++++++------
1 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index cc2e1f6..b4ae6ef 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -240,7 +240,8 @@ struct cg_cgroup_link {
static struct css_set init_css_set;
static struct cg_cgroup_link init_css_set_link;

-static int cgroup_subsys_init_idr(struct cgroup_subsys *ss);
+static int cgroup_init_idr(struct cgroup_subsys *ss,
+ struct cgroup_subsys_state *css);

/* css_set_lock protects the list of css_set objects, and the
* chain of tasks off each css_set. Nests outside task->alloc_lock
@@ -3396,6 +3397,18 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)

/* our new subsystem will be attached to the dummy hierarchy. */
init_cgroup_css(css, ss, dummytop);
+ /* init_idr must be after init_cgroup_css because it sets css->id. */
+ if (ss->use_id) {
+ int ret = cgroup_init_idr(ss, css);
+ if (ret) {
+ dummytop->subsys[ss->subsys_id] = NULL;
+ ss->destroy(ss, dummytop);
+ subsys[i] = NULL;
+ mutex_unlock(&cgroup_mutex);
+ return ret;
+ }
+ }
+
/*
* Now we need to entangle the css into the existing css_sets. unlike
* in cgroup_init_subsys, there are now multiple css_sets, so each one
@@ -3484,7 +3497,8 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
/*
* remove subsystem's css from the dummytop and free it - need to free
* before marking as null because ss->destroy needs the cgrp->subsys
- * pointer to find their state.
+ * pointer to find their state. note that this also takes care of
+ * freeing the css_id.
*/
ss->destroy(ss, dummytop);
dummytop->subsys[ss->subsys_id] = NULL;
@@ -3563,7 +3577,7 @@ int __init cgroup_init(void)
if (!ss->early_init)
cgroup_init_subsys(ss);
if (ss->use_id)
- cgroup_subsys_init_idr(ss);
+ cgroup_init_idr(ss, init_css_set.subsys[ss->subsys_id]);
}

/* Add init_css_set to the hash table */
@@ -4231,15 +4245,14 @@ err_out:

}

-static int __init cgroup_subsys_init_idr(struct cgroup_subsys *ss)
+static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
+ struct cgroup_subsys_state *rootcss)
{
struct css_id *newid;
- struct cgroup_subsys_state *rootcss;

spin_lock_init(&ss->id_lock);
idr_init(&ss->idr);

- rootcss = init_css_set.subsys[ss->subsys_id];
newid = get_new_cssid(ss, 0);
if (IS_ERR(newid))
return PTR_ERR(newid);


Attachments:
cgroups-module-use_id-support.patch (2.80 kB)

2010-01-14 09:16:00

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] cgroups: blkio subsystem as module

On Tue, Jan 12 2010, Ben Blum wrote:
> Convert blk-cgroup to be buildable as a module
>
> From: Ben Blum <[email protected]>
>
> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> enhanced to support the new module dependency.

I'd like to add this for 2.6.34, but I'd like some sign-off(s) on the
cgroup side of things (for patch 1/2).

--
Jens Axboe

2010-01-14 09:29:35

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] cgroups: blkio subsystem as module

Jens Axboe wrote:
> On Tue, Jan 12 2010, Ben Blum wrote:
>> Convert blk-cgroup to be buildable as a module
>>
>> From: Ben Blum <[email protected]>
>>
>> This patch modifies the Block I/O cgroup subsystem to be able to be built as a
>> module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
>> options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
>> enhanced to support the new module dependency.
>
> I'd like to add this for 2.6.34, but I'd like some sign-off(s) on the
> cgroup side of things (for patch 1/2).
>

Yes, it's for .34. But these 2 patches can't be applied to block-tree,
because the patches that implement cgroup modular feature are in -mm,
and they will go into .34.

2010-01-14 09:32:14

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module

On Tue, Jan 12, 2010 at 5:51 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Fri, 8 Jan 2010 10:10:38 -0500
> Vivek Goyal <[email protected]> wrote:
>
>> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
>> > Convert blk-cgroup to be buildable as a module
>> >
>> > From: Ben Blum <[email protected]>
>> >
>> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
>> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
>> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
>> > enhanced to support the new module dependency.
>> >
>>
>> Hi Ben,
>>
>> I will give this patch a try.
>>
>> So from blk-cgroup perspective, the advantage of allowing it as module
>> will be that we can save some memory if we are not using the controller?
>>
> Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
> introduced ?
>

My guess is it won't be, unless we start exposing page_cgroup API and
then make the module depend on memcg.

Balbir

2010-01-14 11:43:35

by Vivek Goyal

[permalink] [raw]
Subject: Re: [RFC] [PATCH 2/2] cgroups: blkio subsystem as module

On Thu, Jan 14, 2010 at 03:02:09PM +0530, Balbir Singh wrote:
> On Tue, Jan 12, 2010 at 5:51 AM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Fri, 8 Jan 2010 10:10:38 -0500
> > Vivek Goyal <[email protected]> wrote:
> >
> >> On Fri, Jan 08, 2010 at 12:30:21AM -0500, Ben Blum wrote:
> >> > Convert blk-cgroup to be buildable as a module
> >> >
> >> > From: Ben Blum <[email protected]>
> >> >
> >> > This patch modifies the Block I/O cgroup subsystem to be able to be built as a
> >> > module. As the CFQ disk scheduler optionally depends on blk-cgroup, config
> >> > options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
> >> > enhanced to support the new module dependency.
> >> >
> >>
> >> Hi Ben,
> >>
> >> I will give this patch a try.
> >>
> >> So from blk-cgroup perspective, the advantage of allowing it as module
> >> will be that we can save some memory if we are not using the controller?
> >>
> > Is "moduled" blkio cgroup safe after page-tracking by page_cgroup is
> > introduced ?
> >
>
> My guess is it won't be, unless we start exposing page_cgroup API and
> then make the module depend on memcg.

I think I agree. When we introduce page_cgroup based page tracking, either
we need to export page_cgroup API or we can force blkio controller to
compile as in-kernel if user selects the CONFIG_PAGE_TRACKING option.

So as of now, I can't think why we should not we allow compiling blkio as
module as long as core cgroup functionality supports it safely.

Thanks
Vivek