Hello,
This patchset establishes conventions on low frequency events,
converts "cgroup.populated" to "cgroup.events" accordingly,
generalizes event handling and enable notifications for
"memory.events".
This patchset contains the following eight patches.
0001-cgroup-replace-cgroup.populated-with-cgroup.events.patch
0002-cgroup-replace-cftype-mode-with-CFTYPE_WORLD_WRITABL.patch
0003-cgroup-relocate-cgroup_populate_dir.patch
0004-cgroup-make-cgroup_addrm_files-clean-up-after-itself.patch
0005-cgroup-cosmetic-updates-to-rebind_subsystems.patch
0006-cgroup-restructure-file-creation-removal-handling.patch
0007-cgroup-generalize-obtaining-the-handles-of-and-notif.patch
0008-memcg-generate-file-modified-notifications-on-memory.patch
0001 replaces "cgroup.populated" with "cgroup.events". 0002-0006 are
prep patches. 0007 generalizes event notification. 0008 hook up
event notifications for "memory.events".
This patchset is on top of cgroup/for-4.3 e753531991b8 ("Merge branch
'for-4.3-unified-base' into for-4.3") and available in the following
git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-events
diffstat follows. Thanks.
Documentation/cgroups/unified-hierarchy.txt | 15 +
include/linux/cgroup-defs.h | 32 ++-
include/linux/cgroup.h | 13 +
kernel/cgroup.c | 264 ++++++++++++++--------------
kernel/cpuset.c | 6
mm/memcontrol.c | 8
6 files changed, 194 insertions(+), 144 deletions(-)
--
tejun
memcg already uses "memory.events" for event reporting and other
controllers may need event reporting too. Let's standardize on
"$SUBSYS.events" interface file for reporting events which don't
happen too frequently and thus can share event notification.
"cgroup.populated" is replaced with "populated" field in
"cgroup.events" and documentation is updated accordingly.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
Documentation/cgroups/unified-hierarchy.txt | 15 ++++++++++-----
include/linux/cgroup-defs.h | 2 +-
kernel/cgroup.c | 17 +++++++++--------
3 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index 1ee9caf..7ea2bc1 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -341,11 +341,11 @@ is riddled with issues.
unnecessarily complicated and probably done this way because event
delivery itself was expensive.
-Unified hierarchy implements an interface file "cgroup.populated"
-which can be used to monitor whether the cgroup's subhierarchy has
-tasks in it or not. Its value is 0 if there is no task in the cgroup
-and its descendants; otherwise, 1. poll and [id]notify events are
-triggered when the value changes.
+Unified hierarchy implements "populated" field in "cgroup.events"
+interface file which can be used to monitor whether the cgroup's
+subhierarchy has tasks in it or not. Its value is 0 if there is no
+task in the cgroup and its descendants; otherwise, 1. poll and
+[id]notify events are triggered when the value changes.
This is significantly lighter and simpler and trivially allows
delegating management of subhierarchy - subhierarchy monitoring can
@@ -435,6 +435,11 @@ may be specified in any order and not all pairs have to be specified.
the first entry in the file. Specific entries can use "default" as
its value to indicate inheritance of the default value.
+- For events which are not very high frequency, an interface file
+ "events" should be created which lists event key value pairs.
+ Whenever a notifiable event happens, file modified event should be
+ generated on the file.
+
5-4. Per-Controller Changes
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 5294f1f..74d241d 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -226,7 +226,7 @@ struct cgroup {
struct kernfs_node *kn; /* cgroup kernfs entry */
struct kernfs_node *procs_kn; /* kn for "cgroup.procs" */
- struct kernfs_node *populated_kn; /* kn for "cgroup.subtree_populated" */
+ struct kernfs_node *events_kn; /* kn for "cgroup.events" */
/*
* The bitmask of subsystems enabled on the child cgroups.
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c4d94a5..43535fc 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -519,8 +519,8 @@ static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
if (!trigger)
break;
- if (cgrp->populated_kn)
- kernfs_notify(cgrp->populated_kn);
+ if (cgrp->events_kn)
+ kernfs_notify(cgrp->events_kn);
cgrp = cgroup_parent(cgrp);
} while (cgrp);
}
@@ -2944,9 +2944,10 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
goto out_unlock;
}
-static int cgroup_populated_show(struct seq_file *seq, void *v)
+static int cgroup_events_show(struct seq_file *seq, void *v)
{
- seq_printf(seq, "%d\n", (bool)seq_css(seq)->cgroup->populated_cnt);
+ seq_printf(seq, "populated %d\n",
+ (bool)seq_css(seq)->cgroup->populated_cnt);
return 0;
}
@@ -3113,8 +3114,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
if (cft->write == cgroup_procs_write)
cgrp->procs_kn = kn;
- else if (cft->seq_show == cgroup_populated_show)
- cgrp->populated_kn = kn;
+ else if (cft->seq_show == cgroup_events_show)
+ cgrp->events_kn = kn;
return 0;
}
@@ -4287,9 +4288,9 @@ static struct cftype cgroup_dfl_base_files[] = {
.write = cgroup_subtree_control_write,
},
{
- .name = "cgroup.populated",
+ .name = "cgroup.events",
.flags = CFTYPE_NOT_ON_ROOT,
- .seq_show = cgroup_populated_show,
+ .seq_show = cgroup_events_show,
},
{ } /* terminate */
};
--
2.4.3
cftype->mode allows controllers to give arbitrary permissions to
interface knobs. Except for "cgroup.event_control", the existing uses
are spurious.
* Some explicitly specify S_IRUGO | S_IWUSR even though that's the
default.
* "cpuset.memory_pressure" specifies S_IRUGO while also setting a
write callback which returns -EACCES. All it needs to do is simply
not setting a write callback.
"cgroup.event_control" uses cftype->mode to make the file
world-writable. It's a misdesigned interface and we don't want
controllers to be tweaking interface file permissions in general.
This patch removes cftype->mode and all its spurious uses and
implements CFTYPE_WORLD_WRITABLE for "cgroup.event_control" which is
marked as compatibility-only.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
include/linux/cgroup-defs.h | 6 +-----
kernel/cgroup.c | 19 +++++++------------
kernel/cpuset.c | 6 ------
mm/memcontrol.c | 3 +--
4 files changed, 9 insertions(+), 25 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 74d241d..93f48ca 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -76,6 +76,7 @@ enum {
CFTYPE_ONLY_ON_ROOT = (1 << 0), /* only create on root cgrp */
CFTYPE_NOT_ON_ROOT = (1 << 1), /* don't create on root cgrp */
CFTYPE_NO_PREFIX = (1 << 3), /* (DON'T USE FOR NEW FILES) no subsys prefix */
+ CFTYPE_WORLD_WRITABLE = (1 << 4), /* (DON'T USE FOR NEW FILES) S_IWUGO */
/* internal flags, do not use outside cgroup core proper */
__CFTYPE_ONLY_ON_DFL = (1 << 16), /* only on default hierarchy */
@@ -324,11 +325,6 @@ struct cftype {
*/
char name[MAX_CFTYPE_NAME];
unsigned long private;
- /*
- * If not 0, file mode is set to this value, otherwise it will
- * be figured out automatically
- */
- umode_t mode;
/*
* The maximum length of string, excluding trailing nul, that can
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 43535fc..a909e4d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1044,23 +1044,21 @@ static char *cgroup_file_name(struct cgroup *cgrp, const struct cftype *cft,
* cgroup_file_mode - deduce file mode of a control file
* @cft: the control file in question
*
- * returns cft->mode if ->mode is not 0
- * returns S_IRUGO|S_IWUSR if it has both a read and a write handler
- * returns S_IRUGO if it has only a read handler
- * returns S_IWUSR if it has only a write hander
+ * S_IRUGO for read, S_IWUSR for write.
*/
static umode_t cgroup_file_mode(const struct cftype *cft)
{
umode_t mode = 0;
- if (cft->mode)
- return cft->mode;
-
if (cft->read_u64 || cft->read_s64 || cft->seq_show)
mode |= S_IRUGO;
- if (cft->write_u64 || cft->write_s64 || cft->write)
- mode |= S_IWUSR;
+ if (cft->write_u64 || cft->write_s64 || cft->write) {
+ if (cft->flags & CFTYPE_WORLD_WRITABLE)
+ mode |= S_IWUGO;
+ else
+ mode |= S_IWUSR;
+ }
return mode;
}
@@ -4270,7 +4268,6 @@ static struct cftype cgroup_dfl_base_files[] = {
.seq_show = cgroup_pidlist_show,
.private = CGROUP_FILE_PROCS,
.write = cgroup_procs_write,
- .mode = S_IRUGO | S_IWUSR,
},
{
.name = "cgroup.controllers",
@@ -4305,7 +4302,6 @@ static struct cftype cgroup_legacy_base_files[] = {
.seq_show = cgroup_pidlist_show,
.private = CGROUP_FILE_PROCS,
.write = cgroup_procs_write,
- .mode = S_IRUGO | S_IWUSR,
},
{
.name = "cgroup.clone_children",
@@ -4325,7 +4321,6 @@ static struct cftype cgroup_legacy_base_files[] = {
.seq_show = cgroup_pidlist_show,
.private = CGROUP_FILE_TASKS,
.write = cgroup_tasks_write,
- .mode = S_IRUGO | S_IWUSR,
},
{
.name = "notify_on_release",
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index ee14e3a..4da3f45 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1594,9 +1594,6 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
case FILE_MEMORY_PRESSURE_ENABLED:
cpuset_memory_pressure_enabled = !!val;
break;
- case FILE_MEMORY_PRESSURE:
- retval = -EACCES;
- break;
case FILE_SPREAD_PAGE:
retval = update_flag(CS_SPREAD_PAGE, cs, val);
break;
@@ -1863,9 +1860,6 @@ static struct cftype files[] = {
{
.name = "memory_pressure",
.read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_MEMORY_PRESSURE,
- .mode = S_IRUGO,
},
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index acb93c5..78ba418 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4360,8 +4360,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
{
.name = "cgroup.event_control", /* XXX: for compat */
.write = memcg_write_event_control,
- .flags = CFTYPE_NO_PREFIX,
- .mode = S_IWUGO,
+ .flags = CFTYPE_NO_PREFIX | CFTYPE_WORLD_WRITABLE,
},
{
.name = "swappiness",
--
2.4.3
Move it upwards so that it's right below cgroup_clear_dir() and the
forward declaration is unnecessary.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
kernel/cgroup.c | 63 ++++++++++++++++++++++++++++-----------------------------
1 file changed, 31 insertions(+), 32 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a909e4d..92b8cc7 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1024,7 +1024,6 @@ static struct cgroup *task_cgroup_from_root(struct task_struct *task,
* update of a tasks cgroup pointer by cgroup_attach_task()
*/
-static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask);
static struct kernfs_syscall_ops cgroup_kf_syscall_ops;
static const struct file_operations proc_cgroupstats_operations;
@@ -1238,6 +1237,37 @@ static void cgroup_clear_dir(struct cgroup *cgrp, unsigned long subsys_mask)
}
}
+/**
+ * cgroup_populate_dir - create subsys files in a cgroup directory
+ * @cgrp: target cgroup
+ * @subsys_mask: mask of the subsystem ids whose files should be added
+ *
+ * On failure, no file is added.
+ */
+static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask)
+{
+ struct cgroup_subsys *ss;
+ int i, ret = 0;
+
+ /* process cftsets of each subsystem */
+ for_each_subsys(ss, i) {
+ struct cftype *cfts;
+
+ if (!(subsys_mask & (1 << i)))
+ continue;
+
+ list_for_each_entry(cfts, &ss->cfts, node) {
+ ret = cgroup_addrm_files(cgrp, cfts, true);
+ if (ret < 0)
+ goto err;
+ }
+ }
+ return 0;
+err:
+ cgroup_clear_dir(cgrp, subsys_mask);
+ return ret;
+}
+
static int rebind_subsystems(struct cgroup_root *dst_root,
unsigned long ss_mask)
{
@@ -4337,37 +4367,6 @@ static struct cftype cgroup_legacy_base_files[] = {
{ } /* terminate */
};
-/**
- * cgroup_populate_dir - create subsys files in a cgroup directory
- * @cgrp: target cgroup
- * @subsys_mask: mask of the subsystem ids whose files should be added
- *
- * On failure, no file is added.
- */
-static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask)
-{
- struct cgroup_subsys *ss;
- int i, ret = 0;
-
- /* process cftsets of each subsystem */
- for_each_subsys(ss, i) {
- struct cftype *cfts;
-
- if (!(subsys_mask & (1 << i)))
- continue;
-
- list_for_each_entry(cfts, &ss->cfts, node) {
- ret = cgroup_addrm_files(cgrp, cfts, true);
- if (ret < 0)
- goto err;
- }
- }
- return 0;
-err:
- cgroup_clear_dir(cgrp, subsys_mask);
- return ret;
-}
-
/*
* css destruction is four-stage process.
*
--
2.4.3
After a file creation failure, cgroup_addrm_files() it didn't remove
the files which had already been created. When cgroup_populate_dir()
is the caller, this is fine as the caller performs cleanup; however,
for other callers, this may leave unactivated dangling files behind.
As kernfs directory removals are recursive, this doesn't lead to
permanent memory leak but it can, for example, fail future attempts to
create those files again.
There's no point in keeping around this sort of subtlety and it gets
in the way of planned updates to file handling. This patch makes
cgroup_addrm_files() clean up after itself on failures.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
kernel/cgroup.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 92b8cc7..5e5a4e0 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3154,19 +3154,18 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
* @is_add: whether to add or remove
*
* Depending on @is_add, add or remove files defined by @cfts on @cgrp.
- * For removals, this function never fails. If addition fails, this
- * function doesn't remove files already added. The caller is responsible
- * for cleaning up.
+ * For removals, this function never fails.
*/
static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
bool is_add)
{
- struct cftype *cft;
+ struct cftype *cft, *cft_end = NULL;
int ret;
lockdep_assert_held(&cgroup_mutex);
- for (cft = cfts; cft->name[0] != '\0'; cft++) {
+restart:
+ for (cft = cfts; cft != cft_end && cft->name[0] != '\0'; cft++) {
/* does cft->flags tell us to skip this file on @cgrp? */
if ((cft->flags & __CFTYPE_ONLY_ON_DFL) && !cgroup_on_dfl(cgrp))
continue;
@@ -3182,7 +3181,9 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
if (ret) {
pr_warn("%s: failed to add %s, err=%d\n",
__func__, cft->name, ret);
- return ret;
+ cft_end = cft;
+ is_add = false;
+ goto restart;
}
} else {
cgroup_rm_file(cgrp, cft);
--
2.4.3
* Use local variables @scgrp and @dcgrp for @src_root->cgrp and
@dst_root->cgrp respectively.
* Use initializers to set @src_root and @css in the inner bind loop.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
kernel/cgroup.c | 31 +++++++++++++++----------------
1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5e5a4e0..67d2ba3 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1271,6 +1271,7 @@ static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask)
static int rebind_subsystems(struct cgroup_root *dst_root,
unsigned long ss_mask)
{
+ struct cgroup *dcgrp = &dst_root->cgrp;
struct cgroup_subsys *ss;
unsigned long tmp_ss_mask;
int ssid, i, ret;
@@ -1292,7 +1293,7 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
if (dst_root == &cgrp_dfl_root)
tmp_ss_mask &= ~cgrp_dfl_root_inhibit_ss_mask;
- ret = cgroup_populate_dir(&dst_root->cgrp, tmp_ss_mask);
+ ret = cgroup_populate_dir(dcgrp, tmp_ss_mask);
if (ret) {
if (dst_root != &cgrp_dfl_root)
return ret;
@@ -1318,42 +1319,40 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
cgroup_clear_dir(&ss->root->cgrp, 1 << ssid);
for_each_subsys_which(ss, ssid, &ss_mask) {
- struct cgroup_root *src_root;
- struct cgroup_subsys_state *css;
+ struct cgroup_root *src_root = ss->root;
+ struct cgroup *scgrp = &src_root->cgrp;
+ struct cgroup_subsys_state *css = cgroup_css(scgrp, ss);
struct css_set *cset;
- src_root = ss->root;
- css = cgroup_css(&src_root->cgrp, ss);
-
- WARN_ON(!css || cgroup_css(&dst_root->cgrp, ss));
+ WARN_ON(!css || cgroup_css(dcgrp, ss));
- RCU_INIT_POINTER(src_root->cgrp.subsys[ssid], NULL);
- rcu_assign_pointer(dst_root->cgrp.subsys[ssid], css);
+ RCU_INIT_POINTER(scgrp->subsys[ssid], NULL);
+ rcu_assign_pointer(dcgrp->subsys[ssid], css);
ss->root = dst_root;
- css->cgroup = &dst_root->cgrp;
+ css->cgroup = dcgrp;
down_write(&css_set_rwsem);
hash_for_each(css_set_table, i, cset, hlist)
list_move_tail(&cset->e_cset_node[ss->id],
- &dst_root->cgrp.e_csets[ss->id]);
+ &dcgrp->e_csets[ss->id]);
up_write(&css_set_rwsem);
src_root->subsys_mask &= ~(1 << ssid);
- src_root->cgrp.subtree_control &= ~(1 << ssid);
- cgroup_refresh_child_subsys_mask(&src_root->cgrp);
+ scgrp->subtree_control &= ~(1 << ssid);
+ cgroup_refresh_child_subsys_mask(scgrp);
/* default hierarchy doesn't enable controllers by default */
dst_root->subsys_mask |= 1 << ssid;
if (dst_root != &cgrp_dfl_root) {
- dst_root->cgrp.subtree_control |= 1 << ssid;
- cgroup_refresh_child_subsys_mask(&dst_root->cgrp);
+ dcgrp->subtree_control |= 1 << ssid;
+ cgroup_refresh_child_subsys_mask(dcgrp);
}
if (ss->bind)
ss->bind(css);
}
- kernfs_activate(dst_root->cgrp.kn);
+ kernfs_activate(dcgrp->kn);
return 0;
}
--
2.4.3
The file creation / removal path has always been a bit icky and the
planned notification update requires css during file creation.
Restructure as follows.
* cgroup_addrm_files() now takes both @css and @cgrp and is only
called directly by other file handling functions.
* cgroup_populate/clear_dir() are replaced with
css_populate/clear_dir() taking @css and @cgrp_override.
@cgrp_override is used only when files needs to be created on /
removed from a cgroup which isn't attached to @css which happens
during subsystem rebinds. Subsystem loops are moved to the callers.
* cgroup_add_file() now takes both @css and @cgrp. @css isn't used
yet but will be used by the planned notification update.
This patch doens't cause any behavior changes.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
kernel/cgroup.c | 143 ++++++++++++++++++++++++++++++--------------------------
1 file changed, 76 insertions(+), 67 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 67d2ba3..b287522 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -200,7 +200,8 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
bool visible);
static void css_release(struct percpu_ref *ref);
static void kill_css(struct cgroup_subsys_state *css);
-static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
+static int cgroup_addrm_files(struct cgroup_subsys_state *css,
+ struct cgroup *cgrp, struct cftype cfts[],
bool is_add);
/* IDR wrappers which synchronize using cgroup_idr_lock */
@@ -1218,53 +1219,57 @@ static void cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft)
}
/**
- * cgroup_clear_dir - remove subsys files in a cgroup directory
- * @cgrp: target cgroup
- * @subsys_mask: mask of the subsystem ids whose files should be removed
+ * css_clear_dir - remove subsys files in a cgroup directory
+ * @css: taget css
+ * @cgrp_override: specify if target cgroup is different from css->cgroup
*/
-static void cgroup_clear_dir(struct cgroup *cgrp, unsigned long subsys_mask)
+static void css_clear_dir(struct cgroup_subsys_state *css,
+ struct cgroup *cgrp_override)
{
- struct cgroup_subsys *ss;
- int i;
+ struct cgroup *cgrp = cgrp_override ?: css->cgroup;
+ struct cftype *cfts;
- for_each_subsys(ss, i) {
- struct cftype *cfts;
-
- if (!(subsys_mask & (1 << i)))
- continue;
- list_for_each_entry(cfts, &ss->cfts, node)
- cgroup_addrm_files(cgrp, cfts, false);
- }
+ list_for_each_entry(cfts, &css->ss->cfts, node)
+ cgroup_addrm_files(css, cgrp, cfts, false);
}
/**
- * cgroup_populate_dir - create subsys files in a cgroup directory
- * @cgrp: target cgroup
- * @subsys_mask: mask of the subsystem ids whose files should be added
+ * css_populate_dir - create subsys files in a cgroup directory
+ * @css: target css
+ * @cgrp_overried: specify if target cgroup is different from css->cgroup
*
* On failure, no file is added.
*/
-static int cgroup_populate_dir(struct cgroup *cgrp, unsigned long subsys_mask)
+static int css_populate_dir(struct cgroup_subsys_state *css,
+ struct cgroup *cgrp_override)
{
- struct cgroup_subsys *ss;
- int i, ret = 0;
+ struct cgroup *cgrp = cgrp_override ?: css->cgroup;
+ struct cftype *cfts, *failed_cfts;
+ int ret;
- /* process cftsets of each subsystem */
- for_each_subsys(ss, i) {
- struct cftype *cfts;
+ if (!css->ss) {
+ if (cgroup_on_dfl(cgrp))
+ cfts = cgroup_dfl_base_files;
+ else
+ cfts = cgroup_legacy_base_files;
- if (!(subsys_mask & (1 << i)))
- continue;
+ return cgroup_addrm_files(&cgrp->self, cgrp, cfts, true);
+ }
- list_for_each_entry(cfts, &ss->cfts, node) {
- ret = cgroup_addrm_files(cgrp, cfts, true);
- if (ret < 0)
- goto err;
+ list_for_each_entry(cfts, &css->ss->cfts, node) {
+ ret = cgroup_addrm_files(css, cgrp, cfts, true);
+ if (ret < 0) {
+ failed_cfts = cfts;
+ goto err;
}
}
return 0;
err:
- cgroup_clear_dir(cgrp, subsys_mask);
+ list_for_each_entry(cfts, &css->ss->cfts, node) {
+ if (cfts == failed_cfts)
+ break;
+ cgroup_addrm_files(css, cgrp, cfts, false);
+ }
return ret;
}
@@ -1293,10 +1298,13 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
if (dst_root == &cgrp_dfl_root)
tmp_ss_mask &= ~cgrp_dfl_root_inhibit_ss_mask;
- ret = cgroup_populate_dir(dcgrp, tmp_ss_mask);
- if (ret) {
- if (dst_root != &cgrp_dfl_root)
- return ret;
+ for_each_subsys_which(ss, ssid, &tmp_ss_mask) {
+ struct cgroup *scgrp = &ss->root->cgrp;
+ int tssid;
+
+ ret = css_populate_dir(cgroup_css(scgrp, ss), dcgrp);
+ if (!ret)
+ continue;
/*
* Rebinding back to the default root is not allowed to
@@ -1304,20 +1312,27 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
* be rare. Moving subsystems back and forth even more so.
* Just warn about it and continue.
*/
- if (cgrp_dfl_root_visible) {
- pr_warn("failed to create files (%d) while rebinding 0x%lx to default root\n",
- ret, ss_mask);
- pr_warn("you may retry by moving them to a different hierarchy and unbinding\n");
+ if (dst_root == &cgrp_dfl_root) {
+ if (cgrp_dfl_root_visible) {
+ pr_warn("failed to create files (%d) while rebinding 0x%lx to default root\n",
+ ret, ss_mask);
+ pr_warn("you may retry by moving them to a different hierarchy and unbinding\n");
+ }
+ continue;
}
+
+ for_each_subsys_which(ss, tssid, &tmp_ss_mask) {
+ if (tssid == ssid)
+ break;
+ css_clear_dir(cgroup_css(scgrp, ss), dcgrp);
+ }
+ return ret;
}
/*
* Nothing can fail from this point on. Remove files for the
* removed subsystems and rebind each subsystem.
*/
- for_each_subsys_which(ss, ssid, &ss_mask)
- cgroup_clear_dir(&ss->root->cgrp, 1 << ssid);
-
for_each_subsys_which(ss, ssid, &ss_mask) {
struct cgroup_root *src_root = ss->root;
struct cgroup *scgrp = &src_root->cgrp;
@@ -1326,6 +1341,8 @@ static int rebind_subsystems(struct cgroup_root *dst_root,
WARN_ON(!css || cgroup_css(dcgrp, ss));
+ css_clear_dir(css, NULL);
+
RCU_INIT_POINTER(scgrp->subsys[ssid], NULL);
rcu_assign_pointer(dcgrp->subsys[ssid], css);
ss->root = dst_root;
@@ -1691,7 +1708,6 @@ static int cgroup_setup_root(struct cgroup_root *root, unsigned long ss_mask)
{
LIST_HEAD(tmp_links);
struct cgroup *root_cgrp = &root->cgrp;
- struct cftype *base_files;
struct css_set *cset;
int i, ret;
@@ -1730,12 +1746,7 @@ static int cgroup_setup_root(struct cgroup_root *root, unsigned long ss_mask)
}
root_cgrp->kn = root->kf_root->kn;
- if (root == &cgrp_dfl_root)
- base_files = cgroup_dfl_base_files;
- else
- base_files = cgroup_legacy_base_files;
-
- ret = cgroup_addrm_files(root_cgrp, base_files, true);
+ ret = css_populate_dir(&root_cgrp->self, NULL);
if (ret)
goto destroy_root;
@@ -2884,7 +2895,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
ret = create_css(child, ss,
cgrp->subtree_control & (1 << ssid));
else
- ret = cgroup_populate_dir(child, 1 << ssid);
+ ret = css_populate_dir(cgroup_css(child, ss),
+ NULL);
if (ret)
goto err_undo_css;
}
@@ -2917,7 +2929,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
if (css_disable & (1 << ssid)) {
kill_css(css);
} else {
- cgroup_clear_dir(child, 1 << ssid);
+ css_clear_dir(css, NULL);
if (ss->css_reset)
ss->css_reset(css);
}
@@ -2965,7 +2977,7 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
if (css_enable & (1 << ssid))
kill_css(css);
else
- cgroup_clear_dir(child, 1 << ssid);
+ css_clear_dir(css, NULL);
}
}
goto out_unlock;
@@ -3117,7 +3129,8 @@ static int cgroup_kn_set_ugid(struct kernfs_node *kn)
return kernfs_setattr(kn, &iattr);
}
-static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
+static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp,
+ struct cftype *cft)
{
char name[CGROUP_FILE_NAME_MAX];
struct kernfs_node *kn;
@@ -3148,14 +3161,16 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cftype *cft)
/**
* cgroup_addrm_files - add or remove files to a cgroup directory
- * @cgrp: the target cgroup
+ * @css: the target css
+ * @cgrp: the target cgroup (usually css->cgroup)
* @cfts: array of cftypes to be added
* @is_add: whether to add or remove
*
* Depending on @is_add, add or remove files defined by @cfts on @cgrp.
* For removals, this function never fails.
*/
-static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
+static int cgroup_addrm_files(struct cgroup_subsys_state *css,
+ struct cgroup *cgrp, struct cftype cfts[],
bool is_add)
{
struct cftype *cft, *cft_end = NULL;
@@ -3176,7 +3191,7 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
continue;
if (is_add) {
- ret = cgroup_add_file(cgrp, cft);
+ ret = cgroup_add_file(css, cgrp, cft);
if (ret) {
pr_warn("%s: failed to add %s, err=%d\n",
__func__, cft->name, ret);
@@ -3208,7 +3223,7 @@ static int cgroup_apply_cftypes(struct cftype *cfts, bool is_add)
if (cgroup_is_dead(cgrp))
continue;
- ret = cgroup_addrm_files(cgrp, cfts, is_add);
+ ret = cgroup_addrm_files(css, cgrp, cfts, is_add);
if (ret)
break;
}
@@ -4584,7 +4599,7 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
css->id = err;
if (visible) {
- err = cgroup_populate_dir(cgrp, 1 << ss->id);
+ err = css_populate_dir(css, NULL);
if (err)
goto err_free_id;
}
@@ -4610,7 +4625,7 @@ static int create_css(struct cgroup *cgrp, struct cgroup_subsys *ss,
err_list_del:
list_del_rcu(&css->sibling);
- cgroup_clear_dir(css->cgroup, 1 << css->ss->id);
+ css_clear_dir(css, NULL);
err_free_id:
cgroup_idr_remove(&ss->css_idr, css->id);
err_free_percpu_ref:
@@ -4627,7 +4642,6 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
struct cgroup_root *root;
struct cgroup_subsys *ss;
struct kernfs_node *kn;
- struct cftype *base_files;
int ssid, ret;
/* Do not accept '\n' to prevent making /proc/<pid>/cgroup unparsable.
@@ -4703,12 +4717,7 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
if (ret)
goto out_destroy;
- if (cgroup_on_dfl(cgrp))
- base_files = cgroup_dfl_base_files;
- else
- base_files = cgroup_legacy_base_files;
-
- ret = cgroup_addrm_files(cgrp, base_files, true);
+ ret = css_populate_dir(&cgrp->self, NULL);
if (ret)
goto out_destroy;
@@ -4795,7 +4804,7 @@ static void kill_css(struct cgroup_subsys_state *css)
* This must happen before css is disassociated with its cgroup.
* See seq_css() for details.
*/
- cgroup_clear_dir(css->cgroup, 1 << css->ss->id);
+ css_clear_dir(css, NULL);
/*
* Killing would put the base ref, but we need to keep it alive
--
2.4.3
cgroup core handles creations and removals of cgroup interface files
as described by cftypes. There are cases where the handle for a given
file instance is necessary, for example, to generate a file modified
event. Currently, this is handled by explicitly matching the callback
method pointer and storing the file handle manually in
cgroup_add_file(). While this simple approach works for cgroup core
files, it can't for controller interface files.
This patch generalizes cgroup interface file handle handling. struct
cgroup_file is defined and each cftype can optionally tell cgroup core
to store the file handle by setting ->file_offset. A file handle
remains accessible as long as the containing css is accessible.
Both "cgroup.procs" and "cgroup.events" are converted to use the new
generic mechanism instead of hooking directly into cgroup_add_file().
Also, cgroup_file_notify() which takes a struct cgroup_file and
generates a file modified event on it is added and replaces explicit
kernfs_notify() invocations.
This generalizes cgroup file handle handling and allows controllers to
generate file modified notifications.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
---
include/linux/cgroup-defs.h | 26 ++++++++++++++++++++++++--
include/linux/cgroup.h | 13 +++++++++++++
kernel/cgroup.c | 26 +++++++++++++++++++-------
3 files changed, 56 insertions(+), 9 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 93f48ca..cc5898a 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -84,6 +84,17 @@ enum {
};
/*
+ * cgroup_file is the handle for a file instance created in a cgroup which
+ * is used, for example, to generate file changed notifications. This can
+ * be obtained by setting cftype->file_offset.
+ */
+struct cgroup_file {
+ /* do not access any fields from outside cgroup core */
+ struct list_head node; /* anchored at css->files */
+ struct kernfs_node *kn;
+};
+
+/*
* Per-subsystem/per-cgroup state maintained by the system. This is the
* fundamental structural building block that controllers deal with.
*
@@ -123,6 +134,9 @@ struct cgroup_subsys_state {
*/
u64 serial_nr;
+ /* all cgroup_files associated with this css */
+ struct list_head files;
+
/* percpu_ref killing and RCU release */
struct rcu_head rcu_head;
struct work_struct destroy_work;
@@ -226,8 +240,8 @@ struct cgroup {
int populated_cnt;
struct kernfs_node *kn; /* cgroup kernfs entry */
- struct kernfs_node *procs_kn; /* kn for "cgroup.procs" */
- struct kernfs_node *events_kn; /* kn for "cgroup.events" */
+ struct cgroup_file procs_file; /* handle for "cgroup.procs" */
+ struct cgroup_file events_file; /* handle for "cgroup.events" */
/*
* The bitmask of subsystems enabled on the child cgroups.
@@ -336,6 +350,14 @@ struct cftype {
unsigned int flags;
/*
+ * If non-zero, should contain the offset from the start of css to
+ * a struct cgroup_file field. cgroup will record the handle of
+ * the created file into it. The recorded handle can be used as
+ * long as the containing css remains accessible.
+ */
+ unsigned int file_offset;
+
+ /*
* Fields used for internal bookkeeping. Initialized automatically
* during registration.
*/
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index eb7ca55..00ddf3c 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -527,6 +527,19 @@ static inline void pr_cont_cgroup_path(struct cgroup *cgrp)
pr_cont_kernfs_path(cgrp->kn);
}
+/**
+ * cgroup_file_notify - generate a file modified event for a cgroup_file
+ * @cfile: target cgroup_file
+ *
+ * @cfile must have been obtained by setting cftype->file_offset.
+ */
+static inline void cgroup_file_notify(struct cgroup_file *cfile)
+{
+ /* might not have been created due to one of the CFTYPE selector flags */
+ if (cfile->kn)
+ kernfs_notify(cfile->kn);
+}
+
#else /* !CONFIG_CGROUPS */
struct cgroup_subsys_state;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b287522..4d0d522 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -520,8 +520,8 @@ static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
if (!trigger)
break;
- if (cgrp->events_kn)
- kernfs_notify(cgrp->events_kn);
+ cgroup_file_notify(&cgrp->events_file);
+
cgrp = cgroup_parent(cgrp);
} while (cgrp);
}
@@ -1671,6 +1671,7 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
INIT_LIST_HEAD(&cgrp->self.sibling);
INIT_LIST_HEAD(&cgrp->self.children);
+ INIT_LIST_HEAD(&cgrp->self.files);
INIT_LIST_HEAD(&cgrp->cset_links);
INIT_LIST_HEAD(&cgrp->pidlists);
mutex_init(&cgrp->pidlist_mutex);
@@ -2462,7 +2463,7 @@ static int cgroup_procs_write_permission(struct task_struct *task,
cgrp = cgroup_parent(cgrp);
ret = -ENOMEM;
- inode = kernfs_get_inode(sb, cgrp->procs_kn);
+ inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
if (inode) {
ret = inode_permission(inode, MAY_WRITE);
iput(inode);
@@ -3152,10 +3153,14 @@ static int cgroup_add_file(struct cgroup_subsys_state *css, struct cgroup *cgrp,
return ret;
}
- if (cft->write == cgroup_procs_write)
- cgrp->procs_kn = kn;
- else if (cft->seq_show == cgroup_events_show)
- cgrp->events_kn = kn;
+ if (cft->file_offset) {
+ struct cgroup_file *cfile = (void *)css + cft->file_offset;
+
+ kernfs_get(kn);
+ cfile->kn = kn;
+ list_add(&cfile->node, &css->files);
+ }
+
return 0;
}
@@ -4307,6 +4312,7 @@ static int cgroup_clone_children_write(struct cgroup_subsys_state *css,
static struct cftype cgroup_dfl_base_files[] = {
{
.name = "cgroup.procs",
+ .file_offset = offsetof(struct cgroup, procs_file),
.seq_start = cgroup_pidlist_start,
.seq_next = cgroup_pidlist_next,
.seq_stop = cgroup_pidlist_stop,
@@ -4332,6 +4338,7 @@ static struct cftype cgroup_dfl_base_files[] = {
{
.name = "cgroup.events",
.flags = CFTYPE_NOT_ON_ROOT,
+ .file_offset = offsetof(struct cgroup, events_file),
.seq_show = cgroup_events_show,
},
{ } /* terminate */
@@ -4410,9 +4417,13 @@ static void css_free_work_fn(struct work_struct *work)
container_of(work, struct cgroup_subsys_state, destroy_work);
struct cgroup_subsys *ss = css->ss;
struct cgroup *cgrp = css->cgroup;
+ struct cgroup_file *cfile;
percpu_ref_exit(&css->refcnt);
+ list_for_each_entry(cfile, &css->files, node)
+ kernfs_put(cfile->kn);
+
if (ss) {
/* css free path */
int id = css->id;
@@ -4517,6 +4528,7 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
css->ss = ss;
INIT_LIST_HEAD(&css->sibling);
INIT_LIST_HEAD(&css->children);
+ INIT_LIST_HEAD(&css->files);
css->serial_nr = css_serial_nr_next++;
if (cgroup_parent(cgrp)) {
--
2.4.3
cgroup core only recently grew generic notification support. Wire up
"memory.events" so that it triggers a file modified event whenever its
content changes.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
---
mm/memcontrol.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 78ba418..10db5f1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -295,6 +295,9 @@ struct mem_cgroup {
/* OOM-Killer disable */
int oom_kill_disable;
+ /* handle for "memory.events" */
+ struct cgroup_file events_file;
+
/* protect arrays of thresholds */
struct mutex thresholds_lock;
@@ -5499,6 +5502,7 @@ static struct cftype memory_files[] = {
{
.name = "events",
.flags = CFTYPE_NOT_ON_ROOT,
+ .file_offset = offsetof(struct mem_cgroup, events_file),
.seq_show = memory_events_show,
},
{ } /* terminate */
@@ -5530,6 +5534,7 @@ void mem_cgroup_events(struct mem_cgroup *memcg,
unsigned int nr)
{
this_cpu_add(memcg->stat->events[idx], nr);
+ cgroup_file_notify(&memcg->events_file);
}
/**
--
2.4.3
On Tue, Aug 11, 2015 at 01:58:09PM -0400, Tejun Heo wrote:
> cgroup core only recently grew generic notification support. Wire up
> "memory.events" so that it triggers a file modified event whenever its
> content changes.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Li Zefan <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
So, this won't apply to the current -mm. Once the earlier part of the
series gets applied to cgroup/for-4.3, I'll refresh this patch on top
of -mm.
Thanks.
--
tejun
[Ups this was hanging in to-be-posted since last week - sorry about that]
On Tue 11-08-15 14:02:36, Tejun Heo wrote:
> On Tue, Aug 11, 2015 at 01:58:09PM -0400, Tejun Heo wrote:
> > cgroup core only recently grew generic notification support. Wire up
> > "memory.events" so that it triggers a file modified event whenever its
> > content changes.
> >
> > Signed-off-by: Tejun Heo <[email protected]>
> > Cc: Li Zefan <[email protected]>
> > Cc: Johannes Weiner <[email protected]>
> > Cc: Michal Hocko <[email protected]>
I cannot say I would be fond of the offset logic but whatever suits the
cgroup core...
Acked-by: Michal Hocko <[email protected]>
> So, this won't apply to the current -mm. Once the earlier part of the
> series gets applied to cgroup/for-4.3, I'll refresh this patch on top
> of -mm.
I think you can route it via the same tree.
--
Michal Hocko
SUSE Labs
Hello, Michal.
On Mon, Aug 17, 2015 at 04:30:57PM +0200, Michal Hocko wrote:
> I cannot say I would be fond of the offset logic but whatever suits the
> cgroup core...
I don't particularly like it either but couldn't think of anything
prettier. :(
Thanks.
--
tejun
On Tue, Aug 11, 2015 at 01:58:01PM -0400, Tejun Heo wrote:
> Hello,
>
> This patchset establishes conventions on low frequency events,
> converts "cgroup.populated" to "cgroup.events" accordingly,
> generalizes event handling and enable notifications for
> "memory.events".
>
> This patchset contains the following eight patches.
>
> 0001-cgroup-replace-cgroup.populated-with-cgroup.events.patch
> 0002-cgroup-replace-cftype-mode-with-CFTYPE_WORLD_WRITABL.patch
> 0003-cgroup-relocate-cgroup_populate_dir.patch
> 0004-cgroup-make-cgroup_addrm_files-clean-up-after-itself.patch
> 0005-cgroup-cosmetic-updates-to-rebind_subsystems.patch
> 0006-cgroup-restructure-file-creation-removal-handling.patch
> 0007-cgroup-generalize-obtaining-the-handles-of-and-notif.patch
> 0008-memcg-generate-file-modified-notifications-on-memory.patch
>
> 0001 replaces "cgroup.populated" with "cgroup.events". 0002-0006 are
> prep patches. 0007 generalizes event notification. 0008 hook up
> event notifications for "memory.events".
These look good to me.
Acked-by: Johannes Weiner <[email protected]>
Out of curiosity, do you envision additional entries for cgroup.events
in the near future?
Hello,
On Mon, Aug 17, 2015 at 11:29:20PM +0200, Johannes Weiner wrote:
> Out of curiosity, do you envision additional entries for cgroup.events
> in the near future?
I don't have anything specific I can think of right now. I primarily
want to establish interface convention regarding low-frequency event
delivery and memory.events's seemed simple and extensible.
Thanks.
--
tejun