LinuxLists.cc - [PATCHSET v2 cgroup/for-3.15] cgroup: cleanups after kernfs conversion

2014-02-08 16:38:34

[permalink] [raw]

Subject: [PATCHSET v2 cgroup/for-3.15] cgroup: cleanups after kernfs conversion

Hello,

This is v2 of cleanups-after-kernfs-conversion patchset. Nothing
really changed since the last take[L]. It just got rebased on top of
the updated patches.

This patchset does a number of cleanups which are possible now that
cgroup is converted to kernfs. This patchset contains the following
eight patches.

0001-cgroup-warn-if-xattr-is-specified-with-sane_behavior.patch
0002-cgroup-relocate-cgroup_rm_cftypes.patch
0003-cgroup-remove-cftype_set.patch
0004-cgroup-simplify-dynamic-cftype-addition-and-removal.patch
0005-cgroup-make-cgroup-hold-onto-its-kernfs_node.patch
0006-cgroup-remove-cgroup-name.patch
0007-cgroup-rename-cgroupfs_root-number_of_cgroups-to-nr_.patch
0008-cgroup-remove-cgroupfs_root-refcnt.patch

This patchset is on top of

cgroup/for-3.15 f7cef064aa01 ("Merge branch 'driver-core-next' into cgroup/for-3.15")
+ [1] [PATCHSET v2 cgroup/for-3.15] cgroup: convert to kernfs

and also available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-post-kernfs-conversion

fs/kernfs/dir.c | 1
include/linux/cgroup.h | 91 ++++------
kernel/cgroup.c | 417 +++++++++++++++++--------------------------------
kernel/cpuset.c | 27 +--
kernel/sched/debug.c | 3
mm/memcontrol.c | 68 ++-----
7 files changed, 229 insertions(+), 390 deletions(-)

Thanks.

--
tejun

[L] http://lkml.kernel.org/g/[email protected]
[1] http://lkml.kernel.org/g/[email protected]

2014-02-08 16:38:37

[permalink] [raw]

Subject: [PATCH 1/8] cgroup: warn if "xattr" is specified with "sane_behavior"

Mount option "xattr" is no longer necessary as it's enabled by default
on kernfs. Warn if "xattr" is specified with "sane_behavior" so that
the option can be removed in the future.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 3 +++
2 files changed, 5 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b6c2652..6fe238e 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -262,6 +262,8 @@ enum {
* - "release_agent" and "notify_on_release" are removed.
* Replacement notification mechanism will be implemented.
*
+ * - "xattr" mount option is deprecated. kernfs always enables it.
+ *
* - cpuset: tasks will be kept in empty cpusets when hotplug happens
* and take masks of ancestors with non-empty cpus/mems, instead of
* being moved to an ancestor.
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 203438e..6b96516 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1265,6 +1265,9 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
pr_err("cgroup: sane_behavior: clone_children is not allowed\n");
return -EINVAL;
}
+
+ if (opts->flags & CGRP_ROOT_XATTR)
+ pr_warning("cgroup: sane_behavior: xattr is always available, flag unnecessary\n");
}

/*
--
1.8.5.3

2014-02-08 16:38:40

[permalink] [raw]

Subject: [PATCH 3/8] cgroup: remove cftype_set

cftype_set was added primarily to allow registering the same cftype
array more than once for different subsystems. Nobody uses or needs
such thing and it's already broken because each cftype has ->ss
pointer which is initialized during registration.

Let's add list_head ->node to cftype and use the first cftype entry in
the array to link them instead of allocating separate cftype_set.
While at it, trigger WARN if cft seems previously initialized during
registration.

This simplifies cftype handling a bit.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/cgroup.h | 26 +++++++++-----------------
kernel/cgroup.c | 41 +++++++++++++----------------------------
2 files changed, 22 insertions(+), 45 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 6fe238e..3c0c7e4 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -410,12 +410,11 @@ struct cftype {
unsigned int flags;

/*
- * The subsys this file belongs to. Initialized automatically
- * during registration. NULL for cgroup core files.
+ * Fields used for internal bookkeeping. Initialized automatically
+ * during registration.
*/
- struct cgroup_subsys *ss;
-
- /* kernfs_ops to use, initialized automatically during registration */
+ struct cgroup_subsys *ss; /* NULL for cgroup core files */
+ struct list_head node; /* anchored at ss->cfts */
struct kernfs_ops *kf_ops;

/*
@@ -470,16 +469,6 @@ struct cftype {
};

/*
- * cftype_sets describe cftypes belonging to a subsystem and are chained at
- * cgroup_subsys->cftsets. Each cftset points to an array of cftypes
- * terminated by zero length name.
- */
-struct cftype_set {
- struct list_head node; /* chained at subsys->cftsets */
- struct cftype *cfts;
-};
-
-/*
* See the comment above CGRP_ROOT_SANE_BEHAVIOR for details. This
* function can be called as long as @cgrp is accessible.
*/
@@ -595,8 +584,11 @@ struct cgroup_subsys {
/* link to parent, protected by cgroup_lock() */
struct cgroupfs_root *root;

- /* list of cftype_sets */
- struct list_head cftsets;
+ /*
+ * List of cftypes. Each entry is the first entry of an array
+ * terminated by zero length name.
+ */
+ struct list_head cfts;

/* base cftypes, automatically registered with subsys itself */
struct cftype *base_cftypes;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f718cbb..a3ade20 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1014,12 +1014,12 @@ static void cgroup_clear_dir(struct cgroup *cgrp, unsigned long subsys_mask)
int i;

for_each_subsys(ss, i) {
- struct cftype_set *set;
+ struct cftype *cfts;

if (!test_bit(i, &subsys_mask))
continue;
- list_for_each_entry(set, &ss->cftsets, node)
- cgroup_addrm_files(cgrp, set->cfts, false);
+ list_for_each_entry(cfts, &ss->cfts, node)
+ cgroup_addrm_files(cgrp, cfts, false);
}
}

@@ -2390,6 +2390,8 @@ static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
for (cft = cfts; cft->name[0] != '\0'; cft++) {
struct kernfs_ops *kf_ops;

+ WARN_ON(cft->ss || cft->kf_ops);
+
if (cft->seq_start)
kf_ops = &cgroup_kf_ops;
else
@@ -2428,26 +2430,15 @@ static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
*/
int cgroup_rm_cftypes(struct cftype *cfts)
{
- struct cftype *found = NULL;
- struct cftype_set *set;
-
if (!cfts || !cfts[0].ss)
return -ENOENT;

cgroup_cfts_prepare();
+ list_del(&cfts->node);
+ cgroup_cfts_commit(cfts, false);

- list_for_each_entry(set, &cfts[0].ss->cftsets, node) {
- if (set->cfts == cfts) {
- list_del(&set->node);
- kfree(set);
- found = cfts;
- break;
- }
- }
-
- cgroup_cfts_commit(found, false);
cgroup_exit_cftypes(cfts);
- return found ? 0 : -ENOENT;
+ return 0;
}

/**
@@ -2466,20 +2457,14 @@ int cgroup_rm_cftypes(struct cftype *cfts)
*/
int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
{
- struct cftype_set *set;
int ret;

- set = kzalloc(sizeof(*set), GFP_KERNEL);
- if (!set)
- return -ENOMEM;
-
ret = cgroup_init_cftypes(ss, cfts);
if (ret)
return ret;

cgroup_cfts_prepare();
- set->cfts = cfts;
- list_add_tail(&set->node, &ss->cftsets);
+ list_add_tail(&cfts->node, &ss->cfts);
ret = cgroup_cfts_commit(cfts, true);
if (ret)
cgroup_rm_cftypes(cfts);
@@ -3572,13 +3557,13 @@ static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask)

/* process cftsets of each subsystem */
for_each_subsys(ss, i) {
- struct cftype_set *set;
+ struct cftype *cfts;

if (!test_bit(i, &subsys_mask))
continue;

- list_for_each_entry(set, &ss->cftsets, node) {
- ret = cgroup_addrm_files(cgrp, set->cfts, true);
+ list_for_each_entry(cfts, &ss->cfts, node) {
+ ret = cgroup_addrm_files(cgrp, cfts, true);
if (ret < 0)
goto err;
}
@@ -4167,7 +4152,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss)
mutex_lock(&cgroup_tree_mutex);
mutex_lock(&cgroup_mutex);

- INIT_LIST_HEAD(&ss->cftsets);
+ INIT_LIST_HEAD(&ss->cfts);

/* Create the top cgroup state for this subsystem */
ss->root = &cgroup_dummy_root;
--
1.8.5.3

2014-02-08 16:38:47

[permalink] [raw]

Subject: [PATCH 6/8] cgroup: remove cgroup->name

cgroup->name handling became quite complicated over time involving
dedicated struct cgroup_name for RCU protection. Now that cgroup is
on kernfs, we can drop all of it and simply use kernfs_name/path() and
friends. Replace cgroup->name and all related code with kernfs
name/path constructs.

* Reimplement cgroup_name() and cgroup_path() as thin wrappers on top
of kernfs counterparts, which involves semantic changes.
pr_cont_cgroup_name() and pr_cont_cgroup_path() added.

* cgroup->name handling dropped from cgroup_rename().

* All users of cgroup_name/path() updated to the new semantics. Users
which were formatting the string just to printk them are converted
to use pr_cont_cgroup_name/path() instead, which simplifies things
quite a bit. As cgroup_name() no longer requires RCU read lock
around it, RCU lockings which were protecting only cgroup_name() are
removed.

v2: Comment above oom_info_lock updated as suggested by Michal.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
---
block/blk-cgroup.h | 12 ++--
fs/kernfs/dir.c | 1 +
include/linux/cgroup.h | 55 +++++++++---------
kernel/cgroup.c | 148 ++++++++++++-------------------------------------
kernel/cpuset.c | 27 +++++----
kernel/sched/debug.c | 3 +-
mm/memcontrol.c | 68 ++++++-----------------
7 files changed, 103 insertions(+), 211 deletions(-)

diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 453b528..15a8d64 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -241,12 +241,16 @@ static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd)
*/
static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen)
{
- int ret;
+ char *p;

- ret = cgroup_path(blkg->blkcg->css.cgroup, buf, buflen);
- if (ret)
+ p = cgroup_path(blkg->blkcg->css.cgroup, buf, buflen);
+ if (!p) {
strncpy(buf, "<unavailable>", buflen);
- return ret;
+ return -ENAMETOOLONG;
+ }
+
+ memmove(buf, p, buf + buflen - p);
+ return 0;
}

/**
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index a347792..939684e 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -112,6 +112,7 @@ char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
spin_unlock_irqrestore(&kernfs_rename_lock, flags);
return p;
}
+EXPORT_SYMBOL_GPL(kernfs_path);

/**
* pr_cont_kernfs_name - pr_cont name of a kernfs_node
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 3c0c7e4..8202abb 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -138,11 +138,6 @@ enum {
CGRP_SANE_BEHAVIOR,
};

-struct cgroup_name {
- struct rcu_head rcu_head;
- char name[];
-};
-
struct cgroup {
unsigned long flags; /* "unsigned long" so bitops work */

@@ -177,19 +172,6 @@ struct cgroup {
*/
u64 serial_nr;

- /*
- * This is a copy of dentry->d_name, and it's needed because
- * we can't use dentry->d_name in cgroup_path().
- *
- * You must acquire rcu_read_lock() to access cgrp->name, and
- * the only place that can change it is rename(), which is
- * protected by parent dir's i_mutex.
- *
- * Normally you should use cgroup_name() wrapper rather than
- * access it directly.
- */
- struct cgroup_name __rcu *name;
-
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];

@@ -477,12 +459,6 @@ static inline bool cgroup_sane_behavior(const struct cgroup *cgrp)
return cgrp->root->flags & CGRP_ROOT_SANE_BEHAVIOR;
}

-/* Caller should hold rcu_read_lock() */
-static inline const char *cgroup_name(const struct cgroup *cgrp)
-{
- return rcu_dereference(cgrp->name)->name;
-}
-
/* returns ino associated with a cgroup, 0 indicates unmounted root */
static inline ino_t cgroup_ino(struct cgroup *cgrp)
{
@@ -501,14 +477,39 @@ static inline struct cftype *seq_cft(struct seq_file *seq)

struct cgroup_subsys_state *seq_css(struct seq_file *seq);

+/*
+ * Name / path handling functions. All are thin wrappers around the kernfs
+ * counterparts and can be called under any context.
+ */
+
+static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
+{
+ return kernfs_name(cgrp->kn, buf, buflen);
+}
+
+static inline char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
+ size_t buflen)
+{
+ return kernfs_path(cgrp->kn, buf, buflen);
+}
+
+static inline void pr_cont_cgroup_name(struct cgroup *cgrp)
+{
+ pr_cont_kernfs_name(cgrp->kn);
+}
+
+static inline void pr_cont_cgroup_path(struct cgroup *cgrp)
+{
+ pr_cont_kernfs_path(cgrp->kn);
+}
+
+char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen);
+
int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_rm_cftypes(struct cftype *cfts);

bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor);

-int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);
-int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen);
-
int cgroup_task_count(const struct cgroup *cgrp);

/*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index da8aef0..a48f4ca 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -145,8 +145,6 @@ static int cgroup_root_count;
/* hierarchy ID allocation and mapping, protected by cgroup_mutex */
static DEFINE_IDR(cgroup_hierarchy_idr);

-static struct cgroup_name root_cgroup_name = { .name = "/" };
-
/*
* Assign a monotonically increasing serial number to cgroups. It
* guarantees cgroups with bigger numbers are newer than those with smaller
@@ -888,17 +886,6 @@ static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask);
static struct kernfs_syscall_ops cgroup_kf_syscall_ops;
static const struct file_operations proc_cgroupstats_operations;

-static struct cgroup_name *cgroup_alloc_name(const char *name_str)
-{
- struct cgroup_name *name;
-
- name = kmalloc(sizeof(*name) + strlen(name_str) + 1, GFP_KERNEL);
- if (!name)
- return NULL;
- strcpy(name->name, name_str);
- return name;
-}
-
static char *cgroup_file_name(struct cgroup *cgrp, const struct cftype *cft,
char *buf)
{
@@ -958,8 +945,6 @@ static void cgroup_free_fn(struct work_struct *work)
cgroup_pidlist_destroy_all(cgrp);

kernfs_put(cgrp->kn);
-
- kfree(rcu_dereference_raw(cgrp->name));
kfree(cgrp);
}

@@ -1375,7 +1360,6 @@ static void init_cgroup_root(struct cgroupfs_root *root)
INIT_LIST_HEAD(&root->root_list);
root->number_of_cgroups = 1;
cgrp->root = root;
- RCU_INIT_POINTER(cgrp->name, &root_cgroup_name);
init_cgroup_housekeeping(cgrp);
idr_init(&root->cgroup_idr);
}
@@ -1596,57 +1580,6 @@ static struct file_system_type cgroup_fs_type = {
static struct kobject *cgroup_kobj;

/**
- * cgroup_path - generate the path of a cgroup
- * @cgrp: the cgroup in question
- * @buf: the buffer to write the path into
- * @buflen: the length of the buffer
- *
- * Writes path of cgroup into buf. Returns 0 on success, -errno on error.
- *
- * We can't generate cgroup path using dentry->d_name, as accessing
- * dentry->name must be protected by irq-unsafe dentry->d_lock or parent
- * inode's i_mutex, while on the other hand cgroup_path() can be called
- * with some irq-safe spinlocks held.
- */
-int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
-{
- int ret = -ENAMETOOLONG;
- char *start;
-
- if (!cgrp->parent) {
- if (strlcpy(buf, "/", buflen) >= buflen)
- return -ENAMETOOLONG;
- return 0;
- }
-
- start = buf + buflen - 1;
- *start = '\0';
-
- rcu_read_lock();
- do {
- const char *name = cgroup_name(cgrp);
- int len;
-
- len = strlen(name);
- if ((start -= len) < buf)
- goto out;
- memcpy(start, name, len);
-
- if (--start < buf)
- goto out;
- *start = '/';
-
- cgrp = cgrp->parent;
- } while (cgrp->parent);
- ret = 0;
- memmove(buf, start, buf + buflen - start);
-out:
- rcu_read_unlock();
- return ret;
-}
-EXPORT_SYMBOL_GPL(cgroup_path);
-
-/**
* task_cgroup_path - cgroup path of a task in the first cgroup hierarchy
* @task: target task
* @buf: the buffer to write the path into
@@ -1657,16 +1590,14 @@ EXPORT_SYMBOL_GPL(cgroup_path);
* function grabs cgroup_mutex and shouldn't be used inside locks used by
* cgroup controller callbacks.
*
- * Returns 0 on success, fails with -%ENAMETOOLONG if @buflen is too short.
+ * Return value is the same as kernfs_path().
*/
-int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
+char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
{
struct cgroupfs_root *root;
struct cgroup *cgrp;
- int hierarchy_id = 1, ret = 0;
-
- if (buflen < 2)
- return -ENAMETOOLONG;
+ int hierarchy_id = 1;
+ char *path = NULL;

mutex_lock(&cgroup_mutex);

@@ -1674,14 +1605,15 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)

if (root) {
cgrp = task_cgroup_from_root(task, root);
- ret = cgroup_path(cgrp, buf, buflen);
+ path = cgroup_path(cgrp, buf, buflen);
} else {
/* if no hierarchy exists, everyone is in "/" */
- memcpy(buf, "/", 2);
+ if (strlcpy(buf, "/", buflen) < buflen)
+ path = buf;
}

mutex_unlock(&cgroup_mutex);
- return ret;
+ return path;
}
EXPORT_SYMBOL_GPL(task_cgroup_path);

@@ -2209,7 +2141,6 @@ static int cgroup_rename(struct kernfs_node *kn, struct kernfs_node *new_parent,
const char *new_name_str)
{
struct cgroup *cgrp = kn->priv;
- struct cgroup_name *name, *old_name;
int ret;

if (kernfs_type(kn) != KERNFS_DIR)
@@ -2224,25 +2155,13 @@ static int cgroup_rename(struct kernfs_node *kn, struct kernfs_node *new_parent,
if (cgroup_sane_behavior(cgrp))
return -EPERM;

- name = cgroup_alloc_name(new_name_str);
- if (!name)
- return -ENOMEM;
-
mutex_lock(&cgroup_tree_mutex);
mutex_lock(&cgroup_mutex);

ret = kernfs_rename(kn, new_parent, new_name_str);
- if (!ret) {
- old_name = rcu_dereference_protected(cgrp->name, true);
- rcu_assign_pointer(cgrp->name, name);
- } else {
- old_name = name;
- }

mutex_unlock(&cgroup_mutex);
mutex_unlock(&cgroup_tree_mutex);
-
- kfree_rcu(old_name, rcu_head);
return ret;
}

@@ -3717,14 +3636,13 @@ err_free:
/**
* cgroup_create - create a cgroup
* @parent: cgroup that will be parent of the new cgroup
- * @name_str: name of the new cgroup
+ * @name: name of the new cgroup
* @mode: mode to set on new cgroup
*/
-static long cgroup_create(struct cgroup *parent, const char *name_str,
+static long cgroup_create(struct cgroup *parent, const char *name,
umode_t mode)
{
struct cgroup *cgrp;
- struct cgroup_name *name;
struct cgroupfs_root *root = parent->root;
int ssid, err;
struct cgroup_subsys *ss;
@@ -3735,13 +3653,6 @@ static long cgroup_create(struct cgroup *parent, const char *name_str,
if (!cgrp)
return -ENOMEM;

- name = cgroup_alloc_name(name_str);
- if (!name) {
- err = -ENOMEM;
- goto err_free_cgrp;
- }
- rcu_assign_pointer(cgrp->name, name);
-
/*
* Temporarily set the pointer to NULL, so idr_find() won't return
* a half-baked cgroup.
@@ -3749,7 +3660,7 @@ static long cgroup_create(struct cgroup *parent, const char *name_str,
cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
if (cgrp->id < 0) {
err = -ENOMEM;
- goto err_free_name;
+ goto err_free_cgrp;
}

mutex_lock(&cgroup_tree_mutex);
@@ -3779,7 +3690,7 @@ static long cgroup_create(struct cgroup *parent, const char *name_str,
set_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags);

/* create the directory */
- kn = kernfs_create_dir(parent->kn, name->name, mode, cgrp);
+ kn = kernfs_create_dir(parent->kn, name, mode, cgrp);
if (IS_ERR(kn)) {
err = PTR_ERR(kn);
goto err_unlock;
@@ -3836,8 +3747,6 @@ err_unlock:
err_unlock_tree:
mutex_unlock(&cgroup_tree_mutex);
idr_remove(&root->cgroup_idr, cgrp->id);
-err_free_name:
- kfree(rcu_dereference_raw(cgrp->name));
err_free_cgrp:
kfree(cgrp);
return err;
@@ -4302,12 +4211,12 @@ int proc_cgroup_show(struct seq_file *m, void *v)
{
struct pid *pid;
struct task_struct *tsk;
- char *buf;
+ char *buf, *path;
int retval;
struct cgroupfs_root *root;

retval = -ENOMEM;
- buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ buf = kmalloc(PATH_MAX, GFP_KERNEL);
if (!buf)
goto out;

@@ -4335,10 +4244,12 @@ int proc_cgroup_show(struct seq_file *m, void *v)
root->name);
seq_putc(m, ':');
cgrp = task_cgroup_from_root(tsk, root);
- retval = cgroup_path(cgrp, buf, PAGE_SIZE);
- if (retval < 0)
+ path = cgroup_path(cgrp, buf, PATH_MAX);
+ if (!path) {
+ retval = -ENAMETOOLONG;
goto out_unlock;
- seq_puts(m, buf);
+ }
+ seq_puts(m, path);
seq_putc(m, '\n');
}

@@ -4586,16 +4497,17 @@ static void cgroup_release_agent(struct work_struct *work)
while (!list_empty(&release_list)) {
char *argv[3], *envp[3];
int i;
- char *pathbuf = NULL, *agentbuf = NULL;
+ char *pathbuf = NULL, *agentbuf = NULL, *path;
struct cgroup *cgrp = list_entry(release_list.next,
struct cgroup,
release_list);
list_del_init(&cgrp->release_list);
raw_spin_unlock(&release_list_lock);
- pathbuf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ pathbuf = kmalloc(PATH_MAX, GFP_KERNEL);
if (!pathbuf)
goto continue_free;
- if (cgroup_path(cgrp, pathbuf, PAGE_SIZE) < 0)
+ path = cgroup_path(cgrp, pathbuf, PATH_MAX);
+ if (!path)
goto continue_free;
agentbuf = kstrdup(cgrp->root->release_agent_path, GFP_KERNEL);
if (!agentbuf)
@@ -4603,7 +4515,7 @@ static void cgroup_release_agent(struct work_struct *work)

i = 0;
argv[i++] = agentbuf;
- argv[i++] = pathbuf;
+ argv[i++] = path;
argv[i] = NULL;

i = 0;
@@ -4753,6 +4665,11 @@ static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
{
struct cgrp_cset_link *link;
struct css_set *cset;
+ char *name_buf;
+
+ name_buf = kmalloc(NAME_MAX + 1, GFP_KERNEL);
+ if (!name_buf)
+ return -ENOMEM;

read_lock(&css_set_lock);
rcu_read_lock();
@@ -4761,14 +4678,17 @@ static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
struct cgroup *c = link->cgrp;
const char *name = "?";

- if (c != cgroup_dummy_top)
- name = cgroup_name(c);
+ if (c != cgroup_dummy_top) {
+ cgroup_name(c, name_buf, NAME_MAX + 1);
+ name = name_buf;
+ }

seq_printf(seq, "Root %d group %s\n",
c->root->hierarchy_id, name);
}
rcu_read_unlock();
read_unlock(&css_set_lock);
+ kfree(name_buf);
return 0;
}

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 2d018c7..e97a6e8 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2088,10 +2088,9 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
parent = parent_cs(parent);

if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
- rcu_read_lock();
- printk(KERN_ERR "cpuset: failed to transfer tasks out of empty cpuset %s\n",
- cgroup_name(cs->css.cgroup));
- rcu_read_unlock();
+ printk(KERN_ERR "cpuset: failed to transfer tasks out of empty cpuset ");
+ pr_cont_cgroup_name(cs->css.cgroup);
+ pr_cont("\n");
}
}

@@ -2619,19 +2618,17 @@ void cpuset_print_task_mems_allowed(struct task_struct *tsk)
/* Statically allocated to prevent using excess stack. */
static char cpuset_nodelist[CPUSET_NODELIST_LEN];
static DEFINE_SPINLOCK(cpuset_buffer_lock);
-
struct cgroup *cgrp = task_cs(tsk)->css.cgroup;

- rcu_read_lock();
spin_lock(&cpuset_buffer_lock);

nodelist_scnprintf(cpuset_nodelist, CPUSET_NODELIST_LEN,
tsk->mems_allowed);
- printk(KERN_INFO "%s cpuset=%s mems_allowed=%s\n",
- tsk->comm, cgroup_name(cgrp), cpuset_nodelist);
+ printk(KERN_INFO "%s cpuset=", tsk->comm);
+ pr_cont_cgroup_name(cgrp);
+ pr_cont(" mems_allowed=%s\n", cpuset_nodelist);

spin_unlock(&cpuset_buffer_lock);
- rcu_read_unlock();
}

/*
@@ -2681,12 +2678,12 @@ int proc_cpuset_show(struct seq_file *m, void *unused_v)
{
struct pid *pid;
struct task_struct *tsk;
- char *buf;
+ char *buf, *p;
struct cgroup_subsys_state *css;
int retval;

retval = -ENOMEM;
- buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ buf = kmalloc(PATH_MAX, GFP_KERNEL);
if (!buf)
goto out;

@@ -2696,14 +2693,16 @@ int proc_cpuset_show(struct seq_file *m, void *unused_v)
if (!tsk)
goto out_free;

+ retval = -ENAMETOOLONG;
rcu_read_lock();
css = task_css(tsk, cpuset_cgrp_id);
- retval = cgroup_path(css->cgroup, buf, PAGE_SIZE);
+ p = cgroup_path(css->cgroup, buf, PATH_MAX);
rcu_read_unlock();
- if (retval < 0)
+ if (!p)
goto out_put_task;
- seq_puts(m, buf);
+ seq_puts(m, p);
seq_putc(m, '\n');
+ retval = 0;
out_put_task:
put_task_struct(tsk);
out_free:
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index dd52e7f..30eee3b 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -111,8 +111,7 @@ static char *task_group_path(struct task_group *tg)
if (autogroup_path(tg, group_path, PATH_MAX))
return group_path;

- cgroup_path(tg->css.cgroup, group_path, PATH_MAX);
- return group_path;
+ return cgroup_path(tg->css.cgroup, group_path, PATH_MAX);
}
#endif

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 102ab48..c1c2549 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1683,15 +1683,8 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
*/
void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
{
- /*
- * protects memcg_name and makes sure that parallel ooms do not
- * interleave
- */
+ /* oom_info_lock ensures that parallel ooms do not interleave */
static DEFINE_SPINLOCK(oom_info_lock);
- struct cgroup *task_cgrp;
- struct cgroup *mem_cgrp;
- static char memcg_name[PATH_MAX];
- int ret;
struct mem_cgroup *iter;
unsigned int i;

@@ -1701,36 +1694,14 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
spin_lock(&oom_info_lock);
rcu_read_lock();

- mem_cgrp = memcg->css.cgroup;
- task_cgrp = task_cgroup(p, memory_cgrp_id);
+ pr_info("Task in ");
+ pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
+ pr_info(" killed as a result of limit of ");
+ pr_cont_cgroup_path(memcg->css.cgroup);
+ pr_info("\n");

- ret = cgroup_path(task_cgrp, memcg_name, PATH_MAX);
- if (ret < 0) {
- /*
- * Unfortunately, we are unable to convert to a useful name
- * But we'll still print out the usage information
- */
- rcu_read_unlock();
- goto done;
- }
rcu_read_unlock();

- pr_info("Task in %s killed", memcg_name);
-
- rcu_read_lock();
- ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
- if (ret < 0) {
- rcu_read_unlock();
- goto done;
- }
- rcu_read_unlock();
-
- /*
- * Continues from above, so we don't need an KERN_ level
- */
- pr_cont(" as a result of limit of %s\n", memcg_name);
-done:
-
pr_info("memory: usage %llukB, limit %llukB, failcnt %llu\n",
res_counter_read_u64(&memcg->res, RES_USAGE) >> 10,
res_counter_read_u64(&memcg->res, RES_LIMIT) >> 10,
@@ -1745,13 +1716,8 @@ done:
res_counter_read_u64(&memcg->kmem, RES_FAILCNT));

for_each_mem_cgroup_tree(iter, memcg) {
- pr_info("Memory cgroup stats");
-
- rcu_read_lock();
- ret = cgroup_path(iter->css.cgroup, memcg_name, PATH_MAX);
- if (!ret)
- pr_cont(" for %s", memcg_name);
- rcu_read_unlock();
+ pr_info("Memory cgroup stats for ");
+ pr_cont_cgroup_path(iter->css.cgroup);
pr_cont(":");

for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
@@ -3401,7 +3367,7 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
struct kmem_cache *s)
{
struct kmem_cache *new = NULL;
- static char *tmp_name = NULL;
+ static char *tmp_path = NULL, *tmp_name = NULL;
static DEFINE_MUTEX(mutex); /* protects tmp_name */

BUG_ON(!memcg_can_account_kmem(memcg));
@@ -3413,18 +3379,20 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
* This static temporary buffer is used to prevent from
* pointless shortliving allocation.
*/
- if (!tmp_name) {
- tmp_name = kmalloc(PATH_MAX, GFP_KERNEL);
+ if (!tmp_path || !tmp_name) {
+ if (!tmp_path)
+ tmp_path = kmalloc(PATH_MAX, GFP_KERNEL);
if (!tmp_name)
+ tmp_name = kmalloc(NAME_MAX + 1, GFP_KERNEL);
+ if (!tmp_path || !tmp_name)
goto out;
}

- rcu_read_lock();
- snprintf(tmp_name, PATH_MAX, "%s(%d:%s)", s->name,
- memcg_cache_id(memcg), cgroup_name(memcg->css.cgroup));
- rcu_read_unlock();
+ cgroup_name(memcg->css.cgroup, tmp_name, NAME_MAX + 1);
+ snprintf(tmp_path, PATH_MAX, "%s(%d:%s)", s->name,
+ memcg_cache_id(memcg), tmp_name);

- new = kmem_cache_create_memcg(memcg, tmp_name, s->object_size, s->align,
+ new = kmem_cache_create_memcg(memcg, tmp_path, s->object_size, s->align,
(s->flags & ~SLAB_PANIC), s->ctor, s);
if (new)
new->allocflags |= __GFP_KMEMCG;
--
1.8.5.3

2014-02-08 16:38:45

[permalink] [raw]

Subject: [PATCH 5/8] cgroup: make cgroup hold onto its kernfs_node

cgroup currently releases its kernfs_node when it gets removed. While
not buggy, this makes cgroup->kn access rules complicated than
necessary and leads to things like get/put protection around
kernfs_remove() in cgroup_destroy_locked(). In addition, we want to
use kernfs_name/path() and friends but also want to be able to
determine a cgroup's name between removal and release.

This patch makes cgroup hold onto its kernfs_node until freed so that
cgroup->kn is always accessible.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/cgroup.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6da820d..da8aef0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -957,6 +957,8 @@ static void cgroup_free_fn(struct work_struct *work)

cgroup_pidlist_destroy_all(cgrp);

+ kernfs_put(cgrp->kn);
+
kfree(rcu_dereference_raw(cgrp->name));
kfree(cgrp);
}
@@ -3784,6 +3786,12 @@ static long cgroup_create(struct cgroup *parent, const char *name_str,
}
cgrp->kn = kn;

+ /*
+ * This extra ref will be put in cgroup_free_fn() and guarantees
+ * that @cgrp->kn is always accessible.
+ */
+ kernfs_get(kn);
+
cgrp->serial_nr = cgroup_serial_nr_next++;

/* allocation complete, commit to creation */
@@ -3964,7 +3972,6 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
{
struct cgroup *child;
struct cgroup_subsys_state *css;
- struct kernfs_node *kn;
bool empty;
int ssid;

@@ -4042,13 +4049,8 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
* clearing of cgrp->kn->priv backpointer, which should happen
* after all files under it have been removed.
*/
- kn = cgrp->kn;
- kernfs_get(kn);
-
- kernfs_remove(cgrp->kn);
-
+ kernfs_remove(cgrp->kn); /* @cgrp has an extra ref on its kn */
RCU_INIT_POINTER(*(void __rcu __force **)&cgrp->kn->priv, NULL);
- kernfs_put(kn);

mutex_lock(&cgroup_mutex);

--
1.8.5.3

2014-02-08 16:39:04

[permalink] [raw]

Subject: [PATCH 8/8] cgroup: remove cgroupfs_root->refcnt

Currently, cgroupfs_root and its ->top_cgroup are separated reference
counted and the latter's is ignored. There's no reason to do this
separately. This patch removes cgroupfs_root->refcnt and destroys
cgroupfs_root when the top_cgroup is released.

* cgroup_put() updated to ignore cgroup_is_dead() test for top
cgroups. cgroup_free_fn() updated to handle root destruction when
releasing a top cgroup.

* As root destruction is now bounced through cgroup destruction, it is
asynchronous. Update cgroup_mount() so that it waits for pending
release which is currently implemented using msleep(). Converting
this to proper wait_queue isn't hard but likely unnecessary.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/cgroup.h | 4 +--
kernel/cgroup.c | 86 ++++++++++++++++++++++----------------------------
2 files changed, 39 insertions(+), 51 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b14abaf..6756c23 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -280,12 +280,10 @@ struct cgroupfs_root {
/* The bitmask of subsystems attached to this hierarchy */
unsigned long subsys_mask;

- atomic_t refcnt;
-
/* Unique id for this hierarchy. */
int hierarchy_id;

- /* The root cgroup for this hierarchy */
+ /* The root cgroup. Root is destroyed on its release. */
struct cgroup top_cgroup;

/* Number of cgroups in the hierarchy, used only for /proc/cgroups */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 13a8d2a..4c53e90 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -53,6 +53,7 @@
#include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
#include <linux/flex_array.h> /* used in cgroup_attach_task */
#include <linux/kthread.h>
+#include <linux/delay.h>

#include <linux/atomic.h>

@@ -728,37 +729,16 @@ static void cgroup_free_root(struct cgroupfs_root *root)
}
}

-static void cgroup_get_root(struct cgroupfs_root *root)
-{
- /*
- * The caller must ensure that @root is alive, which can be
- * achieved by holding a ref on one of the member cgroups or
- * following a registered reference to @root while holding
- * cgroup_tree_mutex.
- */
- WARN_ON_ONCE(atomic_read(&root->refcnt) <= 0);
- atomic_inc(&root->refcnt);
-}
-
-static void cgroup_put_root(struct cgroupfs_root *root)
+static void cgroup_destroy_root(struct cgroupfs_root *root)
{
struct cgroup *cgrp = &root->top_cgroup;
struct cgrp_cset_link *link, *tmp_link;
int ret;

- /*
- * @root's refcnt reaching zero and its deregistration should be
- * atomic w.r.t. cgroup_tree_mutex. This ensures that
- * cgroup_get_root() is safe to invoke if @root is registered.
- */
mutex_lock(&cgroup_tree_mutex);
- if (!atomic_dec_and_test(&root->refcnt)) {
- mutex_unlock(&cgroup_tree_mutex);
- return;
- }
mutex_lock(&cgroup_mutex);

- BUG_ON(atomic_read(&root->nr_cgrps) != 1);
+ BUG_ON(atomic_read(&root->nr_cgrps));
BUG_ON(!list_empty(&cgrp->children));

/* Rebind all subsystems back to the default hierarchy */
@@ -929,21 +909,24 @@ static void cgroup_free_fn(struct work_struct *work)
struct cgroup *cgrp = container_of(work, struct cgroup, destroy_work);

atomic_dec(&cgrp->root->nr_cgrps);
-
- /*
- * We get a ref to the parent, and put the ref when this cgroup is
- * being freed, so it's guaranteed that the parent won't be
- * destroyed before its children.
- */
- cgroup_put(cgrp->parent);
-
- /* put the root reference that we took when we created the cgroup */
- cgroup_put_root(cgrp->root);
-
cgroup_pidlist_destroy_all(cgrp);

- kernfs_put(cgrp->kn);
- kfree(cgrp);
+ if (cgrp->parent) {
+ /*
+ * We get a ref to the parent, and put the ref when this
+ * cgroup is being freed, so it's guaranteed that the
+ * parent won't be destroyed before its children.
+ */
+ cgroup_put(cgrp->parent);
+ kernfs_put(cgrp->kn);
+ kfree(cgrp);
+ } else {
+ /*
+ * This is top cgroup's refcnt reaching zero, which
+ * indicates that the root should be released.
+ */
+ cgroup_destroy_root(cgrp->root);
+ }
}

static void cgroup_free_rcu(struct rcu_head *head)
@@ -965,7 +948,7 @@ static void cgroup_put(struct cgroup *cgrp)
{
if (!atomic_dec_and_test(&cgrp->refcnt))
return;
- if (WARN_ON_ONCE(!cgroup_is_dead(cgrp)))
+ if (WARN_ON_ONCE(cgrp->parent && !cgroup_is_dead(cgrp)))
return;

/*
@@ -1354,7 +1337,6 @@ static void init_cgroup_root(struct cgroupfs_root *root)
{
struct cgroup *cgrp = &root->top_cgroup;

- atomic_set(&root->refcnt, 1);
INIT_LIST_HEAD(&root->root_list);
atomic_set(&root->nr_cgrps, 1);
cgrp->root = root;
@@ -1483,7 +1465,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
struct cgroup_sb_opts opts;
struct dentry *dentry;
int ret;
-
+retry:
mutex_lock(&cgroup_tree_mutex);
mutex_lock(&cgroup_mutex);

@@ -1529,7 +1511,21 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
}
}

- cgroup_get_root(root);
+ /*
+ * A root's lifetime is governed by its top cgroup. Zero
+ * ref indicate that the root is being destroyed. Wait for
+ * destruction to complete so that the subsystems are free.
+ * We can use wait_queue for the wait but this path is
+ * super cold. Let's just sleep for a bit and retry.
+ */
+ if (!atomic_inc_not_zero(&root->top_cgroup.refcnt)) {
+ mutex_unlock(&cgroup_mutex);
+ mutex_unlock(&cgroup_tree_mutex);
+ msleep(10);
+ goto retry;
+ }
+
+ ret = 0;
goto out_unlock;
}

@@ -1556,7 +1552,7 @@ out_unlock:

dentry = kernfs_mount(fs_type, flags, root->kf_root);
if (IS_ERR(dentry))
- cgroup_put_root(root);
+ cgroup_put(&root->top_cgroup);
return dentry;
}

@@ -1565,7 +1561,7 @@ static void cgroup_kill_sb(struct super_block *sb)
struct kernfs_root *kf_root = kernfs_root_from_sb(sb);
struct cgroupfs_root *root = cgroup_root_from_kf(kf_root);

- cgroup_put_root(root);
+ cgroup_put(&root->top_cgroup);
kernfs_kill_sb(sb);
}

@@ -3706,12 +3702,6 @@ static long cgroup_create(struct cgroup *parent, const char *name,
/* allocation complete, commit to creation */
list_add_tail_rcu(&cgrp->sibling, &cgrp->parent->children);
atomic_inc(&root->nr_cgrps);
-
- /*
- * Grab a reference on the root and parent so that they don't get
- * deleted while there are child cgroups.
- */
- cgroup_get_root(root);
cgroup_get(parent);

/*
--
1.8.5.3

2014-02-08 16:39:33

[permalink] [raw]

Subject: [PATCH 7/8] cgroup: rename cgroupfs_root->number_of_cgroups to ->nr_cgrps and make it atomic_t

root->number_of_cgroups is currently an integer protected with
cgroup_mutex. Except for sanity checks and proc reporting, the only
place it's used is to check whether the root has any child during
remount; however, this is a bit flawed as the counter is not
decremented when the cgroup is unlinked but when it's released,
meaning that there could be an extended period where all cgroups are
removed but remount is still not allowed because some internal objects
are lingering. While not perfect either, it'd be better to use
emptiness test on root->top_cgroup.children.

This patch updates cgroup_remount() to test top_cgroup's children
instead, which makes number_of_cgroups only actual usage statistics
printing in proc implemented in proc_cgroupstats_show(). Let's
shorten its name and make it an atomic_t so that we don't have to
worry about its synchronization. It's purely auxiliary at this point.

Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/cgroup.h | 4 ++--
kernel/cgroup.c | 16 +++++++---------
2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 8202abb..b14abaf 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -288,8 +288,8 @@ struct cgroupfs_root {
/* The root cgroup for this hierarchy */
struct cgroup top_cgroup;

- /* Tracks how many cgroups are currently defined in hierarchy.*/
- int number_of_cgroups;
+ /* Number of cgroups in the hierarchy, used only for /proc/cgroups */
+ atomic_t nr_cgrps;

/* A list running through the active hierarchies */
struct list_head root_list;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a48f4ca..13a8d2a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -758,7 +758,7 @@ static void cgroup_put_root(struct cgroupfs_root *root)
}
mutex_lock(&cgroup_mutex);

- BUG_ON(root->number_of_cgroups != 1);
+ BUG_ON(atomic_read(&root->nr_cgrps) != 1);
BUG_ON(!list_empty(&cgrp->children));

/* Rebind all subsystems back to the default hierarchy */
@@ -928,9 +928,7 @@ static void cgroup_free_fn(struct work_struct *work)
{
struct cgroup *cgrp = container_of(work, struct cgroup, destroy_work);

- mutex_lock(&cgroup_mutex);
- cgrp->root->number_of_cgroups--;
- mutex_unlock(&cgroup_mutex);
+ atomic_dec(&cgrp->root->nr_cgrps);

/*
* We get a ref to the parent, and put the ref when this cgroup is
@@ -1318,7 +1316,7 @@ static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data)
}

/* remounting is not allowed for populated hierarchies */
- if (root->number_of_cgroups > 1) {
+ if (!list_empty(&root->top_cgroup.children)) {
ret = -EBUSY;
goto out_unlock;
}
@@ -1358,7 +1356,7 @@ static void init_cgroup_root(struct cgroupfs_root *root)

atomic_set(&root->refcnt, 1);
INIT_LIST_HEAD(&root->root_list);
- root->number_of_cgroups = 1;
+ atomic_set(&root->nr_cgrps, 1);
cgrp->root = root;
init_cgroup_housekeeping(cgrp);
idr_init(&root->cgroup_idr);
@@ -1461,7 +1459,7 @@ static int cgroup_setup_root(struct cgroupfs_root *root)
write_unlock(&css_set_lock);

BUG_ON(!list_empty(&root_cgrp->children));
- BUG_ON(root->number_of_cgroups != 1);
+ BUG_ON(atomic_read(&root->nr_cgrps) != 1);

kernfs_activate(root_cgrp->kn);
ret = 0;
@@ -3707,7 +3705,7 @@ static long cgroup_create(struct cgroup *parent, const char *name,

/* allocation complete, commit to creation */
list_add_tail_rcu(&cgrp->sibling, &cgrp->parent->children);
- root->number_of_cgroups++;
+ atomic_inc(&root->nr_cgrps);

/*
* Grab a reference on the root and parent so that they don't get
@@ -4279,7 +4277,7 @@ static int proc_cgroupstats_show(struct seq_file *m, void *v)
for_each_subsys(ss, i)
seq_printf(m, "%s\t%d\t%d\t%d\n",
ss->name, ss->root->hierarchy_id,
- ss->root->number_of_cgroups, !ss->disabled);
+ atomic_read(&ss->root->nr_cgrps), !ss->disabled);

mutex_unlock(&cgroup_mutex);
return 0;
--
1.8.5.3

2014-02-08 16:40:03

[permalink] [raw]

Subject: [PATCH 4/8] cgroup: simplify dynamic cftype addition and removal

Dynamic cftype addition and removal using cgroup_add/rm_cftypes()
respectively has been quite hairy due to vfs i_mutex. As i_mutex
nests outside cgroup_mutex, cgroup_mutex has to be released and
regrabbed on each iteration through the hierarchy complicating the
process. Now that i_mutex is no longer in play, it can be simplified.

* Just holding cgroup_tree_mutex is enough. No need to meddle with
cgroup_mutex.

* No reason to play the unlock - relock - check serial_nr dancing.
Everything can be atomically while holding cgroup_tree_mutex.

* cgroup_cfts_prepare() is replaced with direct locking of
cgroup_tree_mutex.

* cgroup_cfts_commit() no longer fiddles with locking. It just
applies the cftypes change to the existing cgroups in the hierarchy.
Renamed to cgroup_cfts_apply().

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/cgroup.c | 87 +++++++++++++++++++++------------------------------------
1 file changed, 32 insertions(+), 55 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a3ade20..6da820d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2303,46 +2303,19 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
return 0;
}

-static void cgroup_cfts_prepare(void)
- __acquires(&cgroup_mutex)
-{
- /*
- * Thanks to the entanglement with vfs inode locking, we can't walk
- * the existing cgroups under cgroup_mutex and create files.
- * Instead, we use css_for_each_descendant_pre() and drop RCU read
- * lock before calling cgroup_addrm_files().
- */
- mutex_lock(&cgroup_tree_mutex);
- mutex_lock(&cgroup_mutex);
-}
-
-static int cgroup_cfts_commit(struct cftype *cfts, bool is_add)
- __releases(&cgroup_mutex)
+static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
{
LIST_HEAD(pending);
struct cgroup_subsys *ss = cfts[0].ss;
struct cgroup *root = &ss->root->top_cgroup;
- struct cgroup *prev = NULL;
struct cgroup_subsys_state *css;
- u64 update_before;
int ret = 0;

- mutex_unlock(&cgroup_mutex);
+ lockdep_assert_held(&cgroup_tree_mutex);

- /* %NULL @cfts indicates abort and don't bother if @ss isn't attached */
- if (!cfts || ss->root == &cgroup_dummy_root) {
- mutex_unlock(&cgroup_tree_mutex);
+ /* don't bother if @ss isn't attached */
+ if (ss->root == &cgroup_dummy_root)
return 0;
- }
-
- cgroup_get_root(ss->root);
-
- /*
- * All cgroups which are created after we drop cgroup_mutex will
- * have the updated set of files, so we only need to update the
- * cgroups created before the current @cgroup_serial_nr_next.
- */
- update_before = cgroup_serial_nr_next;

/* add/rm files for all cgroups created before */
css_for_each_descendant_pre(css, cgroup_css(root, ss)) {
@@ -2351,22 +2324,13 @@ static int cgroup_cfts_commit(struct cftype *cfts, bool is_add)
if (cgroup_is_dead(cgrp))
continue;

- cgroup_get(cgrp);
- if (prev)
- cgroup_put(prev);
- prev = cgrp;
-
- if (cgrp->serial_nr < update_before && !cgroup_is_dead(cgrp)) {
- ret = cgroup_addrm_files(cgrp, cfts, is_add);
- if (is_add)
- kernfs_activate(cgrp->kn);
- }
+ ret = cgroup_addrm_files(cgrp, cfts, is_add);
if (ret)
break;
}
- mutex_unlock(&cgroup_tree_mutex);
- cgroup_put(prev);
- cgroup_put_root(ss->root);
+
+ if (is_add && !ret)
+ kernfs_activate(root->kn);
return ret;
}

@@ -2417,6 +2381,19 @@ static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
return 0;
}

+static int cgroup_rm_cftypes_locked(struct cftype *cfts)
+{
+ lockdep_assert_held(&cgroup_tree_mutex);
+
+ if (!cfts || !cfts[0].ss)
+ return -ENOENT;
+
+ list_del(&cfts->node);
+ cgroup_apply_cftypes(cfts, false);
+ cgroup_exit_cftypes(cfts);
+ return 0;
+}
+
/**
* cgroup_rm_cftypes - remove an array of cftypes from a subsystem
* @cfts: zero-length name terminated array of cftypes
@@ -2430,15 +2407,12 @@ static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
*/
int cgroup_rm_cftypes(struct cftype *cfts)
{
- if (!cfts || !cfts[0].ss)
- return -ENOENT;
-
- cgroup_cfts_prepare();
- list_del(&cfts->node);
- cgroup_cfts_commit(cfts, false);
+ int ret;

- cgroup_exit_cftypes(cfts);
- return 0;
+ mutex_lock(&cgroup_tree_mutex);
+ ret = cgroup_rm_cftypes_locked(cfts);
+ mutex_unlock(&cgroup_tree_mutex);
+ return ret;
}

/**
@@ -2463,11 +2437,14 @@ int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
if (ret)
return ret;

- cgroup_cfts_prepare();
+ mutex_lock(&cgroup_tree_mutex);
+
list_add_tail(&cfts->node, &ss->cfts);
- ret = cgroup_cfts_commit(cfts, true);
+ ret = cgroup_apply_cftypes(cfts, true);
if (ret)
- cgroup_rm_cftypes(cfts);
+ cgroup_rm_cftypes_locked(cfts);
+
+ mutex_unlock(&cgroup_tree_mutex);
return ret;
}
EXPORT_SYMBOL_GPL(cgroup_add_cftypes);
--
1.8.5.3

2014-02-08 16:40:45

[permalink] [raw]

Subject: [PATCH 2/8] cgroup: relocate cgroup_rm_cftypes()

cftype handling is about to be revamped. Relocate cgroup_rm_cftypes()
above cgroup_add_cftypes() in preparation. This is pure relocation.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/cgroup.c | 70 ++++++++++++++++++++++++++++-----------------------------
1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 6b96516..f718cbb 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2416,6 +2416,41 @@ static int cgroup_init_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
}

/**
+ * cgroup_rm_cftypes - remove an array of cftypes from a subsystem
+ * @cfts: zero-length name terminated array of cftypes
+ *
+ * Unregister @cfts. Files described by @cfts are removed from all
+ * existing cgroups and all future cgroups won't have them either. This
+ * function can be called anytime whether @cfts' subsys is attached or not.
+ *
+ * Returns 0 on successful unregistration, -ENOENT if @cfts is not
+ * registered.
+ */
+int cgroup_rm_cftypes(struct cftype *cfts)
+{
+ struct cftype *found = NULL;
+ struct cftype_set *set;
+
+ if (!cfts || !cfts[0].ss)
+ return -ENOENT;
+
+ cgroup_cfts_prepare();
+
+ list_for_each_entry(set, &cfts[0].ss->cftsets, node) {
+ if (set->cfts == cfts) {
+ list_del(&set->node);
+ kfree(set);
+ found = cfts;
+ break;
+ }
+ }
+
+ cgroup_cfts_commit(found, false);
+ cgroup_exit_cftypes(cfts);
+ return found ? 0 : -ENOENT;
+}
+
+/**
* cgroup_add_cftypes - add an array of cftypes to a subsystem
* @ss: target cgroup subsystem
* @cfts: zero-length name terminated array of cftypes
@@ -2453,41 +2488,6 @@ int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
EXPORT_SYMBOL_GPL(cgroup_add_cftypes);

/**
- * cgroup_rm_cftypes - remove an array of cftypes from a subsystem
- * @cfts: zero-length name terminated array of cftypes
- *
- * Unregister @cfts. Files described by @cfts are removed from all
- * existing cgroups and all future cgroups won't have them either. This
- * function can be called anytime whether @cfts' subsys is attached or not.
- *
- * Returns 0 on successful unregistration, -ENOENT if @cfts is not
- * registered.
- */
-int cgroup_rm_cftypes(struct cftype *cfts)
-{
- struct cftype *found = NULL;
- struct cftype_set *set;
-
- if (!cfts || !cfts[0].ss)
- return -ENOENT;
-
- cgroup_cfts_prepare();
-
- list_for_each_entry(set, &cfts[0].ss->cftsets, node) {
- if (set->cfts == cfts) {
- list_del(&set->node);
- kfree(set);
- found = cfts;
- break;
- }
- }
-
- cgroup_cfts_commit(found, false);
- cgroup_exit_cftypes(cfts);
- return found ? 0 : -ENOENT;
-}
-
-/**
* cgroup_task_count - count the number of tasks in a cgroup.
* @cgrp: the cgroup in question
*
--
1.8.5.3

2014-02-08 20:06:52

[permalink] [raw]

Subject: [PATCH v3 6/8] cgroup: remove cgroup->name

cgroup->name handling became quite complicated over time involving
dedicated struct cgroup_name for RCU protection. Now that cgroup is
on kernfs, we can drop all of it and simply use kernfs_name/path() and
friends. Replace cgroup->name and all related code with kernfs
name/path constructs.

* Reimplement cgroup_name() and cgroup_path() as thin wrappers on top
of kernfs counterparts, which involves semantic changes.
pr_cont_cgroup_name() and pr_cont_cgroup_path() added.

* cgroup->name handling dropped from cgroup_rename().

* All users of cgroup_name/path() updated to the new semantics. Users
which were formatting the string just to printk them are converted
to use pr_cont_cgroup_name/path() instead, which simplifies things
quite a bit. As cgroup_name() no longer requires RCU read lock
around it, RCU lockings which were protecting only cgroup_name() are
removed.

v2: Comment above oom_info_lock updated as suggested by Michal.

v3: dummy_top doesn't have a kn associated and
pr_cont_cgroup_name/path() ended up calling the matching kernfs
functions with NULL kn leading to oops. Test for NULL kn and
print "/" if so.

Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
---
block/blk-cgroup.h | 12 ++--
fs/kernfs/dir.c | 1 +
include/linux/cgroup.h | 63 ++++++++++++---------
kernel/cgroup.c | 148 ++++++++++++-------------------------------------
kernel/cpuset.c | 27 +++++----
kernel/sched/debug.c | 3 +-
mm/memcontrol.c | 68 ++++++-----------------
7 files changed, 111 insertions(+), 211 deletions(-)

diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 453b528..15a8d64 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -241,12 +241,16 @@ static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd)
*/
static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen)
{
- int ret;
+ char *p;

- ret = cgroup_path(blkg->blkcg->css.cgroup, buf, buflen);
- if (ret)
+ p = cgroup_path(blkg->blkcg->css.cgroup, buf, buflen);
+ if (!p) {
strncpy(buf, "<unavailable>", buflen);
- return ret;
+ return -ENAMETOOLONG;
+ }
+
+ memmove(buf, p, buf + buflen - p);
+ return 0;
}

/**
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index a347792..939684e 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -112,6 +112,7 @@ char *kernfs_path(struct kernfs_node *kn, char *buf, size_t buflen)
spin_unlock_irqrestore(&kernfs_rename_lock, flags);
return p;
}
+EXPORT_SYMBOL_GPL(kernfs_path);

/**
* pr_cont_kernfs_name - pr_cont name of a kernfs_node
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 3c0c7e4..3dc7dc1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -138,11 +138,6 @@ enum {
CGRP_SANE_BEHAVIOR,
};

-struct cgroup_name {
- struct rcu_head rcu_head;
- char name[];
-};
-
struct cgroup {
unsigned long flags; /* "unsigned long" so bitops work */

@@ -177,19 +172,6 @@ struct cgroup {
*/
u64 serial_nr;

- /*
- * This is a copy of dentry->d_name, and it's needed because
- * we can't use dentry->d_name in cgroup_path().
- *
- * You must acquire rcu_read_lock() to access cgrp->name, and
- * the only place that can change it is rename(), which is
- * protected by parent dir's i_mutex.
- *
- * Normally you should use cgroup_name() wrapper rather than
- * access it directly.
- */
- struct cgroup_name __rcu *name;
-
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];

@@ -477,12 +459,6 @@ static inline bool cgroup_sane_behavior(const struct cgroup *cgrp)
return cgrp->root->flags & CGRP_ROOT_SANE_BEHAVIOR;
}

-/* Caller should hold rcu_read_lock() */
-static inline const char *cgroup_name(const struct cgroup *cgrp)
-{
- return rcu_dereference(cgrp->name)->name;
-}
-
/* returns ino associated with a cgroup, 0 indicates unmounted root */
static inline ino_t cgroup_ino(struct cgroup *cgrp)
{
@@ -501,14 +477,47 @@ static inline struct cftype *seq_cft(struct seq_file *seq)

struct cgroup_subsys_state *seq_css(struct seq_file *seq);

+/*
+ * Name / path handling functions. All are thin wrappers around the kernfs
+ * counterparts and can be called under any context.
+ */
+
+static inline int cgroup_name(struct cgroup *cgrp, char *buf, size_t buflen)
+{
+ return kernfs_name(cgrp->kn, buf, buflen);
+}
+
+static inline char * __must_check cgroup_path(struct cgroup *cgrp, char *buf,
+ size_t buflen)
+{
+ return kernfs_path(cgrp->kn, buf, buflen);
+}
+
+static inline void pr_cont_cgroup_name(struct cgroup *cgrp)
+{
+ /* dummy_top doesn't have a kn associated */
+ if (cgrp->kn)
+ pr_cont_kernfs_name(cgrp->kn);
+ else
+ pr_cont("/");
+}
+
+static inline void pr_cont_cgroup_path(struct cgroup *cgrp)
+{
+ /* dummy_top doesn't have a kn associated */
+ if (cgrp->kn)
+ pr_cont_kernfs_path(cgrp->kn);
+ else
+ pr_cont("/");
+}
+
+char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen);
+
int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_rm_cftypes(struct cftype *cfts);

bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor);

-int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);
-int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen);
-
int cgroup_task_count(const struct cgroup *cgrp);

/*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index da8aef0..a48f4ca 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -145,8 +145,6 @@ static int cgroup_root_count;
/* hierarchy ID allocation and mapping, protected by cgroup_mutex */
static DEFINE_IDR(cgroup_hierarchy_idr);

-static struct cgroup_name root_cgroup_name = { .name = "/" };
-
/*
* Assign a monotonically increasing serial number to cgroups. It
* guarantees cgroups with bigger numbers are newer than those with smaller
@@ -888,17 +886,6 @@ static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask);
static struct kernfs_syscall_ops cgroup_kf_syscall_ops;
static const struct file_operations proc_cgroupstats_operations;

-static struct cgroup_name *cgroup_alloc_name(const char *name_str)
-{
- struct cgroup_name *name;
-
- name = kmalloc(sizeof(*name) + strlen(name_str) + 1, GFP_KERNEL);
- if (!name)
- return NULL;
- strcpy(name->name, name_str);
- return name;
-}
-
static char *cgroup_file_name(struct cgroup *cgrp, const struct cftype *cft,
char *buf)
{
@@ -958,8 +945,6 @@ static void cgroup_free_fn(struct work_struct *work)
cgroup_pidlist_destroy_all(cgrp);

kernfs_put(cgrp->kn);
-
- kfree(rcu_dereference_raw(cgrp->name));
kfree(cgrp);
}

@@ -1375,7 +1360,6 @@ static void init_cgroup_root(struct cgroupfs_root *root)
INIT_LIST_HEAD(&root->root_list);
root->number_of_cgroups = 1;
cgrp->root = root;
- RCU_INIT_POINTER(cgrp->name, &root_cgroup_name);
init_cgroup_housekeeping(cgrp);
idr_init(&root->cgroup_idr);
}
@@ -1596,57 +1580,6 @@ static struct file_system_type cgroup_fs_type = {
static struct kobject *cgroup_kobj;

/**
- * cgroup_path - generate the path of a cgroup
- * @cgrp: the cgroup in question
- * @buf: the buffer to write the path into
- * @buflen: the length of the buffer
- *
- * Writes path of cgroup into buf. Returns 0 on success, -errno on error.
- *
- * We can't generate cgroup path using dentry->d_name, as accessing
- * dentry->name must be protected by irq-unsafe dentry->d_lock or parent
- * inode's i_mutex, while on the other hand cgroup_path() can be called
- * with some irq-safe spinlocks held.
- */
-int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
-{
- int ret = -ENAMETOOLONG;
- char *start;
-
- if (!cgrp->parent) {
- if (strlcpy(buf, "/", buflen) >= buflen)
- return -ENAMETOOLONG;
- return 0;
- }
-
- start = buf + buflen - 1;
- *start = '\0';
-
- rcu_read_lock();
- do {
- const char *name = cgroup_name(cgrp);
- int len;
-
- len = strlen(name);
- if ((start -= len) < buf)
- goto out;
- memcpy(start, name, len);
-
- if (--start < buf)
- goto out;
- *start = '/';
-
- cgrp = cgrp->parent;
- } while (cgrp->parent);
- ret = 0;
- memmove(buf, start, buf + buflen - start);
-out:
- rcu_read_unlock();
- return ret;
-}
-EXPORT_SYMBOL_GPL(cgroup_path);
-
-/**
* task_cgroup_path - cgroup path of a task in the first cgroup hierarchy
* @task: target task
* @buf: the buffer to write the path into
@@ -1657,16 +1590,14 @@ EXPORT_SYMBOL_GPL(cgroup_path);
* function grabs cgroup_mutex and shouldn't be used inside locks used by
* cgroup controller callbacks.
*
- * Returns 0 on success, fails with -%ENAMETOOLONG if @buflen is too short.
+ * Return value is the same as kernfs_path().
*/
-int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
+char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
{
struct cgroupfs_root *root;
struct cgroup *cgrp;
- int hierarchy_id = 1, ret = 0;
-
- if (buflen < 2)
- return -ENAMETOOLONG;
+ int hierarchy_id = 1;
+ char *path = NULL;

mutex_lock(&cgroup_mutex);

@@ -1674,14 +1605,15 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)

if (root) {
cgrp = task_cgroup_from_root(task, root);
- ret = cgroup_path(cgrp, buf, buflen);
+ path = cgroup_path(cgrp, buf, buflen);
} else {
/* if no hierarchy exists, everyone is in "/" */
- memcpy(buf, "/", 2);
+ if (strlcpy(buf, "/", buflen) < buflen)
+ path = buf;
}

mutex_unlock(&cgroup_mutex);
- return ret;
+ return path;
}
EXPORT_SYMBOL_GPL(task_cgroup_path);

@@ -2209,7 +2141,6 @@ static int cgroup_rename(struct kernfs_node *kn, struct kernfs_node *new_parent,
const char *new_name_str)
{
struct cgroup *cgrp = kn->priv;
- struct cgroup_name *name, *old_name;
int ret;

if (kernfs_type(kn) != KERNFS_DIR)
@@ -2224,25 +2155,13 @@ static int cgroup_rename(struct kernfs_node *kn, struct kernfs_node *new_parent,
if (cgroup_sane_behavior(cgrp))
return -EPERM;

- name = cgroup_alloc_name(new_name_str);
- if (!name)
- return -ENOMEM;
-
mutex_lock(&cgroup_tree_mutex);
mutex_lock(&cgroup_mutex);

ret = kernfs_rename(kn, new_parent, new_name_str);
- if (!ret) {
- old_name = rcu_dereference_protected(cgrp->name, true);
- rcu_assign_pointer(cgrp->name, name);
- } else {
- old_name = name;
- }

mutex_unlock(&cgroup_mutex);
mutex_unlock(&cgroup_tree_mutex);
-
- kfree_rcu(old_name, rcu_head);
return ret;
}

@@ -3717,14 +3636,13 @@ err_free:
/**
* cgroup_create - create a cgroup
* @parent: cgroup that will be parent of the new cgroup
- * @name_str: name of the new cgroup
+ * @name: name of the new cgroup
* @mode: mode to set on new cgroup
*/
-static long cgroup_create(struct cgroup *parent, const char *name_str,
+static long cgroup_create(struct cgroup *parent, const char *name,
umode_t mode)
{
struct cgroup *cgrp;
- struct cgroup_name *name;
struct cgroupfs_root *root = parent->root;
int ssid, err;
struct cgroup_subsys *ss;
@@ -3735,13 +3653,6 @@ static long cgroup_create(struct cgroup *parent, const char *name_str,
if (!cgrp)
return -ENOMEM;

- name = cgroup_alloc_name(name_str);
- if (!name) {
- err = -ENOMEM;
- goto err_free_cgrp;
- }
- rcu_assign_pointer(cgrp->name, name);
-
/*
* Temporarily set the pointer to NULL, so idr_find() won't return
* a half-baked cgroup.
@@ -3749,7 +3660,7 @@ static long cgroup_create(struct cgroup *parent, const char *name_str,
cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
if (cgrp->id < 0) {
err = -ENOMEM;
- goto err_free_name;
+ goto err_free_cgrp;
}

mutex_lock(&cgroup_tree_mutex);
@@ -3779,7 +3690,7 @@ static long cgroup_create(struct cgroup *parent, const char *name_str,
set_bit(CGRP_CPUSET_CLONE_CHILDREN, &cgrp->flags);

/* create the directory */
- kn = kernfs_create_dir(parent->kn, name->name, mode, cgrp);
+ kn = kernfs_create_dir(parent->kn, name, mode, cgrp);
if (IS_ERR(kn)) {
err = PTR_ERR(kn);
goto err_unlock;
@@ -3836,8 +3747,6 @@ err_unlock:
err_unlock_tree:
mutex_unlock(&cgroup_tree_mutex);
idr_remove(&root->cgroup_idr, cgrp->id);
-err_free_name:
- kfree(rcu_dereference_raw(cgrp->name));
err_free_cgrp:
kfree(cgrp);
return err;
@@ -4302,12 +4211,12 @@ int proc_cgroup_show(struct seq_file *m, void *v)
{
struct pid *pid;
struct task_struct *tsk;
- char *buf;
+ char *buf, *path;
int retval;
struct cgroupfs_root *root;

retval = -ENOMEM;
- buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ buf = kmalloc(PATH_MAX, GFP_KERNEL);
if (!buf)
goto out;

@@ -4335,10 +4244,12 @@ int proc_cgroup_show(struct seq_file *m, void *v)
root->name);
seq_putc(m, ':');
cgrp = task_cgroup_from_root(tsk, root);
- retval = cgroup_path(cgrp, buf, PAGE_SIZE);
- if (retval < 0)
+ path = cgroup_path(cgrp, buf, PATH_MAX);
+ if (!path) {
+ retval = -ENAMETOOLONG;
goto out_unlock;
- seq_puts(m, buf);
+ }
+ seq_puts(m, path);
seq_putc(m, '\n');
}

@@ -4586,16 +4497,17 @@ static void cgroup_release_agent(struct work_struct *work)
while (!list_empty(&release_list)) {
char *argv[3], *envp[3];
int i;
- char *pathbuf = NULL, *agentbuf = NULL;
+ char *pathbuf = NULL, *agentbuf = NULL, *path;
struct cgroup *cgrp = list_entry(release_list.next,
struct cgroup,
release_list);
list_del_init(&cgrp->release_list);
raw_spin_unlock(&release_list_lock);
- pathbuf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ pathbuf = kmalloc(PATH_MAX, GFP_KERNEL);
if (!pathbuf)
goto continue_free;
- if (cgroup_path(cgrp, pathbuf, PAGE_SIZE) < 0)
+ path = cgroup_path(cgrp, pathbuf, PATH_MAX);
+ if (!path)
goto continue_free;
agentbuf = kstrdup(cgrp->root->release_agent_path, GFP_KERNEL);
if (!agentbuf)
@@ -4603,7 +4515,7 @@ static void cgroup_release_agent(struct work_struct *work)

i = 0;
argv[i++] = agentbuf;
- argv[i++] = pathbuf;
+ argv[i++] = path;
argv[i] = NULL;

i = 0;
@@ -4753,6 +4665,11 @@ static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
{
struct cgrp_cset_link *link;
struct css_set *cset;
+ char *name_buf;
+
+ name_buf = kmalloc(NAME_MAX + 1, GFP_KERNEL);
+ if (!name_buf)
+ return -ENOMEM;

read_lock(&css_set_lock);
rcu_read_lock();
@@ -4761,14 +4678,17 @@ static int current_css_set_cg_links_read(struct seq_file *seq, void *v)
struct cgroup *c = link->cgrp;
const char *name = "?";

- if (c != cgroup_dummy_top)
- name = cgroup_name(c);
+ if (c != cgroup_dummy_top) {
+ cgroup_name(c, name_buf, NAME_MAX + 1);
+ name = name_buf;
+ }

seq_printf(seq, "Root %d group %s\n",
c->root->hierarchy_id, name);
}
rcu_read_unlock();
read_unlock(&css_set_lock);
+ kfree(name_buf);
return 0;
}

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 2d018c7..e97a6e8 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2088,10 +2088,9 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
parent = parent_cs(parent);

if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
- rcu_read_lock();
- printk(KERN_ERR "cpuset: failed to transfer tasks out of empty cpuset %s\n",
- cgroup_name(cs->css.cgroup));
- rcu_read_unlock();
+ printk(KERN_ERR "cpuset: failed to transfer tasks out of empty cpuset ");
+ pr_cont_cgroup_name(cs->css.cgroup);
+ pr_cont("\n");
}
}

@@ -2619,19 +2618,17 @@ void cpuset_print_task_mems_allowed(struct task_struct *tsk)
/* Statically allocated to prevent using excess stack. */
static char cpuset_nodelist[CPUSET_NODELIST_LEN];
static DEFINE_SPINLOCK(cpuset_buffer_lock);
-
struct cgroup *cgrp = task_cs(tsk)->css.cgroup;

- rcu_read_lock();
spin_lock(&cpuset_buffer_lock);

nodelist_scnprintf(cpuset_nodelist, CPUSET_NODELIST_LEN,
tsk->mems_allowed);
- printk(KERN_INFO "%s cpuset=%s mems_allowed=%s\n",
- tsk->comm, cgroup_name(cgrp), cpuset_nodelist);
+ printk(KERN_INFO "%s cpuset=", tsk->comm);
+ pr_cont_cgroup_name(cgrp);
+ pr_cont(" mems_allowed=%s\n", cpuset_nodelist);

spin_unlock(&cpuset_buffer_lock);
- rcu_read_unlock();
}

/*
@@ -2681,12 +2678,12 @@ int proc_cpuset_show(struct seq_file *m, void *unused_v)
{
struct pid *pid;
struct task_struct *tsk;
- char *buf;
+ char *buf, *p;
struct cgroup_subsys_state *css;
int retval;

retval = -ENOMEM;
- buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ buf = kmalloc(PATH_MAX, GFP_KERNEL);
if (!buf)
goto out;

@@ -2696,14 +2693,16 @@ int proc_cpuset_show(struct seq_file *m, void *unused_v)
if (!tsk)
goto out_free;

+ retval = -ENAMETOOLONG;
rcu_read_lock();
css = task_css(tsk, cpuset_cgrp_id);
- retval = cgroup_path(css->cgroup, buf, PAGE_SIZE);
+ p = cgroup_path(css->cgroup, buf, PATH_MAX);
rcu_read_unlock();
- if (retval < 0)
+ if (!p)
goto out_put_task;
- seq_puts(m, buf);
+ seq_puts(m, p);
seq_putc(m, '\n');
+ retval = 0;
out_put_task:
put_task_struct(tsk);
out_free:
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index dd52e7f..30eee3b 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -111,8 +111,7 @@ static char *task_group_path(struct task_group *tg)
if (autogroup_path(tg, group_path, PATH_MAX))
return group_path;

- cgroup_path(tg->css.cgroup, group_path, PATH_MAX);
- return group_path;
+ return cgroup_path(tg->css.cgroup, group_path, PATH_MAX);
}
#endif

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 102ab48..c1c2549 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1683,15 +1683,8 @@ static void move_unlock_mem_cgroup(struct mem_cgroup *memcg,
*/
void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
{
- /*
- * protects memcg_name and makes sure that parallel ooms do not
- * interleave
- */
+ /* oom_info_lock ensures that parallel ooms do not interleave */
static DEFINE_SPINLOCK(oom_info_lock);
- struct cgroup *task_cgrp;
- struct cgroup *mem_cgrp;
- static char memcg_name[PATH_MAX];
- int ret;
struct mem_cgroup *iter;
unsigned int i;

@@ -1701,36 +1694,14 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
spin_lock(&oom_info_lock);
rcu_read_lock();

- mem_cgrp = memcg->css.cgroup;
- task_cgrp = task_cgroup(p, memory_cgrp_id);
+ pr_info("Task in ");
+ pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
+ pr_info(" killed as a result of limit of ");
+ pr_cont_cgroup_path(memcg->css.cgroup);
+ pr_info("\n");

- ret = cgroup_path(task_cgrp, memcg_name, PATH_MAX);
- if (ret < 0) {
- /*
- * Unfortunately, we are unable to convert to a useful name
- * But we'll still print out the usage information
- */
- rcu_read_unlock();
- goto done;
- }
rcu_read_unlock();

- pr_info("Task in %s killed", memcg_name);
-
- rcu_read_lock();
- ret = cgroup_path(mem_cgrp, memcg_name, PATH_MAX);
- if (ret < 0) {
- rcu_read_unlock();
- goto done;
- }
- rcu_read_unlock();
-
- /*
- * Continues from above, so we don't need an KERN_ level
- */
- pr_cont(" as a result of limit of %s\n", memcg_name);
-done:
-
pr_info("memory: usage %llukB, limit %llukB, failcnt %llu\n",
res_counter_read_u64(&memcg->res, RES_USAGE) >> 10,
res_counter_read_u64(&memcg->res, RES_LIMIT) >> 10,
@@ -1745,13 +1716,8 @@ done:
res_counter_read_u64(&memcg->kmem, RES_FAILCNT));

for_each_mem_cgroup_tree(iter, memcg) {
- pr_info("Memory cgroup stats");
-
- rcu_read_lock();
- ret = cgroup_path(iter->css.cgroup, memcg_name, PATH_MAX);
- if (!ret)
- pr_cont(" for %s", memcg_name);
- rcu_read_unlock();
+ pr_info("Memory cgroup stats for ");
+ pr_cont_cgroup_path(iter->css.cgroup);
pr_cont(":");

for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
@@ -3401,7 +3367,7 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
struct kmem_cache *s)
{
struct kmem_cache *new = NULL;
- static char *tmp_name = NULL;
+ static char *tmp_path = NULL, *tmp_name = NULL;
static DEFINE_MUTEX(mutex); /* protects tmp_name */

BUG_ON(!memcg_can_account_kmem(memcg));
@@ -3413,18 +3379,20 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
* This static temporary buffer is used to prevent from
* pointless shortliving allocation.
*/
- if (!tmp_name) {
- tmp_name = kmalloc(PATH_MAX, GFP_KERNEL);
+ if (!tmp_path || !tmp_name) {
+ if (!tmp_path)
+ tmp_path = kmalloc(PATH_MAX, GFP_KERNEL);
if (!tmp_name)
+ tmp_name = kmalloc(NAME_MAX + 1, GFP_KERNEL);
+ if (!tmp_path || !tmp_name)
goto out;
}

- rcu_read_lock();
- snprintf(tmp_name, PATH_MAX, "%s(%d:%s)", s->name,
- memcg_cache_id(memcg), cgroup_name(memcg->css.cgroup));
- rcu_read_unlock();
+ cgroup_name(memcg->css.cgroup, tmp_name, NAME_MAX + 1);
+ snprintf(tmp_path, PATH_MAX, "%s(%d:%s)", s->name,
+ memcg_cache_id(memcg), tmp_name);

- new = kmem_cache_create_memcg(memcg, tmp_name, s->object_size, s->align,
+ new = kmem_cache_create_memcg(memcg, tmp_path, s->object_size, s->align,
(s->flags & ~SLAB_PANIC), s->ctor, s);
if (new)
new->allocflags |= __GFP_KMEMCG;
--
1.8.5.3

2014-02-12 07:52:46

[permalink] [raw]

Subject: Re: [PATCH 6/8] cgroup: remove cgroup->name

> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index dd52e7f..30eee3b 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -111,8 +111,7 @@ static char *task_group_path(struct task_group *tg)
> if (autogroup_path(tg, group_path, PATH_MAX))
> return group_path;
>
> - cgroup_path(tg->css.cgroup, group_path, PATH_MAX);
> - return group_path;
> + return cgroup_path(tg->css.cgroup, group_path, PATH_MAX);

The caller won't check the return value but pass it to printk/seq_printf,
but now cgroup_path() might return -ENAMETOOLONG..

2014-02-12 08:27:21

[permalink] [raw]

Subject: Re: [PATCH 6/8] cgroup: remove cgroup->name

On Wed, Feb 12, 2014 at 03:52:20PM +0800, Li Zefan wrote:
> The caller won't check the return value but pass it to printk/seq_printf,
> but now cgroup_path() might return -ENAMETOOLONG..

cgroup_path() returns NULL on overflow which printk handles fine, no?

Thanks.

--
tejun

2014-02-12 08:52:33

[permalink] [raw]

Subject: Re: [PATCH 6/8] cgroup: remove cgroup->name

On 2014/2/12 16:27, Tejun Heo wrote:
> On Wed, Feb 12, 2014 at 03:52:20PM +0800, Li Zefan wrote:
>> The caller won't check the return value but pass it to printk/seq_printf,
>> but now cgroup_path() might return -ENAMETOOLONG..
>
> cgroup_path() returns NULL on overflow which printk handles fine, no?
>

Ah, right. My mistake. I didn't take an afternoon nap, and was a bit sleeply.

2014-02-12 08:59:18

[permalink] [raw]

Subject: Re: [PATCHSET v2 cgroup/for-3.15] cgroup: cleanups after kernfs conversion

On 2014/2/9 0:38, Tejun Heo wrote:
> Hello,
>
> This is v2 of cleanups-after-kernfs-conversion patchset. Nothing
> really changed since the last take[L]. It just got rebased on top of
> the updated patches.
>
> This patchset does a number of cleanups which are possible now that
> cgroup is converted to kernfs. This patchset contains the following
> eight patches.
>
> 0001-cgroup-warn-if-xattr-is-specified-with-sane_behavior.patch
> 0002-cgroup-relocate-cgroup_rm_cftypes.patch
> 0003-cgroup-remove-cftype_set.patch
> 0004-cgroup-simplify-dynamic-cftype-addition-and-removal.patch
> 0005-cgroup-make-cgroup-hold-onto-its-kernfs_node.patch
> 0006-cgroup-remove-cgroup-name.patch
> 0007-cgroup-rename-cgroupfs_root-number_of_cgroups-to-nr_.patch
> 0008-cgroup-remove-cgroupfs_root-refcnt.patch
>

Ack-by: Li Zefan <[email protected]>

2014-02-12 14:30:37

[permalink] [raw]

Subject: Re: [PATCHSET v2 cgroup/for-3.15] cgroup: cleanups after kernfs conversion

On Sat, Feb 08, 2014 at 11:38:21AM -0500, Tejun Heo wrote:
> Hello,
>
> This is v2 of cleanups-after-kernfs-conversion patchset. Nothing
> really changed since the last take[L]. It just got rebased on top of
> the updated patches.
>
> This patchset does a number of cleanups which are possible now that
> cgroup is converted to kernfs. This patchset contains the following
> eight patches.

Applied to cgroup/for-3.15.

Thanks.

--
tejun