v4:
- Rebased to the latest for-4.21 branch of cgroup tree.
- Make each controller explicitly opt in to become bypassable by
setting the bypassable cgroup_subsys flag. Currently, only cpu
controller is made bypassable.
- Break out the cgroup_v2.rst documentation update as separate patch.
v3:
- Remove invalid cgroup subdirectory creation patch.
- Add use cases for the bypass mode and removing statements about
control files ownership in cgroup-v2.txt.
- Restrict bypass mode to non-domain (threaded) controllers only.
v3 patch - https://lkml.org/lkml/2017/8/9/604
This patchset introduces a new bypass mode to the cgroup v2 core to
give more freedom and flexibility to controllers which choose to become
bypassable the freedom to shape their own unique views of the virtual
cgroup hierarchies that can best suit thier own use cases.
Because of the inherent performance overhead in enabling cpu controller,
it is made bypassable so that the controller only needs to be enabled
at those cgroups that really need it instead of in every cgroups at a
given layer if at least one of them needs it.
The cpu controller performance problem is one of the major issues
in migrating from cgroup v1 to v2.
For example,
R - A(+) - B(#) - C(+)
\ D(#)
where "+" means the controller is enabled and "#" means the controller
is bypassed. For this controller's perspective, the cgroups are
equivalent to:
R - A|B|D - C
Underneath the root R, cgoups A, B and D are controlled by one set of
knobs and cgroup C is controlled by another set of knobs as a child of
cgroups A|B|C.
This patchset is layered on top of the "for-4.21" branch of Tejun's
cgroup git tree.
Patch 1 introduces a new bypass mode that allows a bypassable
controller to be disabled in a cgroup, but to be re-enabled again in its
children. This is enabled by writing the controller name prefixed with
'#' to the "cgroup.subtree_control" file. Then all its children will
have this controller in bypass mode.
Patch 2 extends the bypass mode mechanism to allow those child cgroups
that are put into the bypass mode for a particular bypassable controller
by their parent to be re-enabled again by writing the controller name
with the '+' prefix to the "cgroup.controllers" file.
Patch 3 extends the debug controller to expose additional controller
masks introduced by this patchset.
Patch 4 makes the cpu controller bypassable.
Patch 5 documents the new bypass mode in cgroup-v2.rst file.
Waiman Long (5):
cgroup: subtree_control bypass mode for bypassable controllers
cgroup: Allow reenabling of controller in bypass mode
cgroup: Make debug controller report new controller masks
sched/core: Make cpu cgroup controller bypassable
cgroup: Document bypass mode
Documentation/admin-guide/cgroup-v2.rst | 66 +++++---
include/linux/cgroup-defs.h | 26 +++-
kernel/cgroup/cgroup.c | 256 +++++++++++++++++++++++++-------
kernel/cgroup/debug.c | 2 +
kernel/sched/core.c | 1 +
5 files changed, 276 insertions(+), 75 deletions(-)
--
1.8.3.1
The cgroup-v2.rst file is updated to document the new bypass mode.
Signed-off-by: Waiman Long <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 66 ++++++++++++++++++++++++---------
1 file changed, 48 insertions(+), 18 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 07e0613..17e6584 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -372,9 +372,16 @@ disabled by writing to the "cgroup.subtree_control" file::
# echo "+cpu +memory -io" > cgroup.subtree_control
-Only controllers which are listed in "cgroup.controllers" can be
-enabled. When multiple operations are specified as above, either they
-all succeed or fail. If multiple operations on the same controller
+The prefixes '+', '-' and '#' are used to enable, disable or put
+a controller in the bypass mode respectively. In the bypass mode,
+a controller is disabled in a cgroup, but it can be enabled again in
+its child cgroups as it will still be listed in "cgroup.controllers".
+Bypass mode can only be used on bypassable controllers. Currently,
+only the cpu controller is bypassable.
+
+Only controllers which are listed in "cgroup.controllers" can be enabled
+or bypassed. When multiple operations are specified as above, either
+they all succeed or fail. If multiple operations on the same controller
are specified, the last one is effective.
Enabling a controller in a cgroup indicates that the distribution of
@@ -399,6 +406,20 @@ prefixed controller interface files from C and D. This means that the
controller interface files - anything which doesn't start with
"cgroup." are owned by the parent rather than the cgroup itself.
+Once a bypassable controller is put into bypass mode in
+"cgroup.subtree_control", that controller can optionally be enabled
+again in child cgroups by writing the controller name with the '+ prefix
+into "cgroup.controllers". Writing the controller name with the '#'
+prefix into "cgroup.controllers" resets the state back to bypass mode.
+The state of a bypassable controller cannot be changed anymore if it
+is enabled or bypassed in its "cgroup.subtree_control".
+
+The use of bypass mode thus allows a cgroup parent to have the ability
+to selectively enable a bypassable controller in a subset of its child
+cgroups instead of in either all or none of them. In other words, a
+bypassable can be enabled only on the cgroups that actually need it,
+if desired.
+
Top-down Constraint
~~~~~~~~~~~~~~~~~~~
@@ -406,10 +427,11 @@ Top-down Constraint
Resources are distributed top-down and a cgroup can further distribute
a resource only if the resource has been distributed to it from the
parent. This means that all non-root "cgroup.subtree_control" files
-can only contain controllers which are enabled in the parent's
-"cgroup.subtree_control" file. A controller can be enabled only if
-the parent has the controller enabled and a controller can't be
-disabled if one or more children have it enabled.
+can only contain controllers which are enabled or bypassed in the
+parent's "cgroup.subtree_control" file. A controller can be enabled
+or bypassed only if the parent has the controller enabled or bypassed
+and the state of a controller can't be changed if one or more children
+have it enabled or bypassed.
No Internal Process Constraint
@@ -834,11 +856,18 @@ All cgroup core files are prefixed with "cgroup."
should be granted along with the containing directory.
cgroup.controllers
- A read-only space separated values file which exists on all
+ A read-write space separated values file which exists on all
cgroups.
It shows space separated list of all controllers available to
- the cgroup. The controllers are not ordered.
+ the cgroup. Controller names with '#' prefix are in bypass mode.
+ The controllers are not ordered.
+
+ When a controller is set into bypass mode in its parent's
+ "cgroup.subtree_control", its name prefixed with '+' or '#'
+ can be written to enable it or reset it back to bypass mode
+ respectively. Controllers not in bypass mode are not allowed
+ to be written.
cgroup.subtree_control
A read-write space separated values file which exists on all
@@ -848,12 +877,12 @@ All cgroup core files are prefixed with "cgroup."
which are enabled to control resource distribution from the
cgroup to its children.
- Space separated list of controllers prefixed with '+' or '-'
- can be written to enable or disable controllers. A controller
- name prefixed with '+' enables the controller and '-'
- disables. If a controller appears more than once on the list,
- the last one is effective. When multiple enable and disable
- operations are specified, either all succeed or all fail.
+ Space separated list of controllers prefixed with '+', '-' or
+ '#' can be written to enable, disable or bypass controllers
+ respectively. If a controller appears more than once on
+ the list, the last one is effective. When multiple enable,
+ disable or bypass operations are specified, either all succeed
+ or all fail.
cgroup.events
A read-only flat-keyed file which exists on non-root cgroups.
@@ -904,17 +933,18 @@ Controllers
CPU
---
-The "cpu" controllers regulates distribution of CPU cycles. This
+The "cpu" controller regulates distribution of CPU cycles. This
controller implements weight and absolute bandwidth limit models for
normal scheduling policy and absolute bandwidth allocation model for
-realtime scheduling policy.
+realtime scheduling policy. The cpu controller is bypassable.
WARNING: cgroup2 doesn't yet support control of realtime processes and
the cpu controller can only be enabled when all RT processes are in
the root cgroup. Be aware that system management software may already
have placed RT processes into nonroot cgroups during the system boot
process, and these processes may need to be moved to the root cgroup
-before the cpu controller can be enabled.
+before the cpu controller can be enabled unless bypass mode is used
+in those non-root cgroups.
CPU Interface Files
--
1.8.3.1
Bypassable controllers set to bypass mode in the parent's
"cgroup.subtree_control" can now be optionally enabled by writing the
controller name with the '+' prefix to "cgroup.controllers". Using the
'#' prefix will reset it back to the bypass state.
This capability allows a cgroup parent to individually enable bypassable
controllers in a subset of its children instead of either all or none
of them. This increases the flexibility each controller has in shaping
the effective cgroup hierarchy to best suit its need.
Signed-off-by: Waiman Long <[email protected]>
---
include/linux/cgroup-defs.h | 7 +++
kernel/cgroup/cgroup.c | 109 ++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 113 insertions(+), 3 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 5bff798..ab1b355 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -388,6 +388,13 @@ struct cgroup {
u16 old_subtree_ss_mask;
u16 old_subtree_bypass;
+ /*
+ * The bitmask of subsystems that are set in its parent's
+ * ->subtree_bypass and explicitly enabled in this cgroup.
+ */
+ u16 enable_ss_mask;
+ u16 old_enable_ss_mask;
+
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a361c10..fa538f2 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -424,7 +424,7 @@ static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
u16 root_ss_mask = cgrp->root->subsys_mask;
if (parent) {
- u16 ss_mask = parent->subtree_control;
+ u16 ss_mask = parent->subtree_control|cgrp->enable_ss_mask;
if (show_bypass)
ss_mask |= parent->subtree_bypass;
@@ -447,7 +447,7 @@ static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
struct cgroup *parent = cgroup_parent(cgrp);
if (parent) {
- u16 ss_mask = parent->subtree_ss_mask;
+ u16 ss_mask = parent->subtree_ss_mask|cgrp->enable_ss_mask;
if (show_bypass)
@@ -2874,6 +2874,7 @@ static void cgroup_save_control(struct cgroup *cgrp)
dsct->old_subtree_control = dsct->subtree_control;
dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
dsct->old_subtree_bypass = dsct->subtree_bypass;
+ dsct->old_enable_ss_mask = dsct->enable_ss_mask;
dsct->old_dom_cgrp = dsct->dom_cgrp;
}
}
@@ -2919,6 +2920,7 @@ static void cgroup_restore_control(struct cgroup *cgrp)
dsct->subtree_control = dsct->old_subtree_control;
dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
dsct->subtree_bypass = dsct->old_subtree_bypass;
+ dsct->enable_ss_mask = dsct->old_enable_ss_mask;
dsct->dom_cgrp = dsct->old_dom_cgrp;
}
}
@@ -3197,7 +3199,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
}
cgroup_for_each_live_child(child, cgrp)
- child_enable |= child->subtree_control|child->subtree_bypass;
+ child_enable |= child->subtree_control|child->subtree_bypass|
+ child->enable_ss_mask;
/*
* Cannot change the state of a controller if enabled in children.
@@ -3230,6 +3233,105 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
return ret ?: nbytes;
}
+/*
+ * Change bypass status of controllers for a cgroup in the default hierarchy.
+ */
+static ssize_t cgroup_controllers_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes,
+ loff_t off)
+{
+ u16 enable = 0, bypass = 0;
+ struct cgroup *cgrp, *parent;
+ struct cgroup_subsys *ss;
+ char *tok;
+ int ssid, ret;
+
+ /*
+ * Parse input - space separated list of subsystem names prefixed
+ * with either + or #.
+ */
+ buf = strstrip(buf);
+ while ((tok = strsep(&buf, " "))) {
+ if (tok[0] == '\0')
+ continue;
+ do_each_subsys_mask(ss, ssid, ~cgrp_dfl_inhibit_ss_mask) {
+ if (!cgroup_ssid_enabled(ssid) ||
+ strcmp(tok + 1, ss->name))
+ continue;
+
+ if (*tok == '+') {
+ enable |= 1 << ssid;
+ bypass &= ~(1 << ssid);
+ } else if (*tok == '#') {
+ bypass |= 1 << ssid;
+ enable &= ~(1 << ssid);
+ } else {
+ return -EINVAL;
+ }
+ break;
+ } while_each_subsys_mask();
+ if (ssid == CGROUP_SUBSYS_COUNT)
+ return -EINVAL;
+ }
+
+ cgrp = cgroup_kn_lock_live(of->kn, true);
+ if (!cgrp)
+ return -ENODEV;
+
+ /*
+ * Write to root cgroup's controllers file is not allowed.
+ */
+ parent = cgroup_parent(cgrp);
+ if (!parent) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /*
+ * Only controllers set into bypass mode in the parent cgroup
+ * can be specified here.
+ */
+ if (~parent->subtree_bypass & (enable|bypass)) {
+ ret = -ENOENT;
+ goto out_unlock;
+ }
+
+ /*
+ * Mask off irrelevant bits.
+ */
+ enable &= ~cgrp->enable_ss_mask;
+ bypass &= cgrp->enable_ss_mask;
+
+ if (!(enable|bypass)) {
+ ret = 0;
+ goto out_unlock;
+ }
+
+ /*
+ * We cannot change the bypass state of a controller that is enabled
+ * in subtree_control.
+ */
+ if ((cgrp->subtree_control|cgrp->subtree_bypass) & (enable|bypass)) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ /* Save and update control masks and prepare csses */
+ cgroup_save_control(cgrp);
+
+ cgrp->enable_ss_mask |= enable;
+ cgrp->enable_ss_mask &= ~bypass;
+
+ ret = cgroup_apply_control(cgrp);
+ cgroup_finalize_control(cgrp, ret);
+ kernfs_activate(cgrp->kn);
+ ret = 0;
+
+out_unlock:
+ cgroup_kn_unlock(of->kn);
+ return ret ?: nbytes;
+}
+
/**
* cgroup_enable_threaded - make @cgrp threaded
* @cgrp: the target cgroup
@@ -4573,6 +4675,7 @@ static ssize_t cgroup_threads_write(struct kernfs_open_file *of,
{
.name = "cgroup.controllers",
.seq_show = cgroup_controllers_show,
+ .write = cgroup_controllers_write,
},
{
.name = "cgroup.subtree_control",
--
1.8.3.1
The newly added cgroup controller masks (subtree_bypass and
enable_ss_mask) are now being reported in the debug.masks controller
file.
Signed-off-by: Waiman Long <[email protected]>
---
kernel/cgroup/debug.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/cgroup/debug.c b/kernel/cgroup/debug.c
index 5f1b873..176bb9e 100644
--- a/kernel/cgroup/debug.c
+++ b/kernel/cgroup/debug.c
@@ -263,6 +263,8 @@ static int cgroup_masks_read(struct seq_file *seq, void *v)
cgroup_masks_read_one(seq, "subtree_control", cgrp->subtree_control);
cgroup_masks_read_one(seq, "subtree_ss_mask", cgrp->subtree_ss_mask);
+ cgroup_masks_read_one(seq, "subtree_bypass", cgrp->subtree_bypass);
+ cgroup_masks_read_one(seq, "enable_ss_mask", cgrp->enable_ss_mask);
cgroup_kn_unlock(of->kn);
return 0;
--
1.8.3.1
Make the cpu cgroup controller bypassable in the default hierarchy so
that cpu controller could be activated only in those cgroups that really
need it instead of all the cgroups down to the lowest level that need it.
Signed-off-by: Waiman Long <[email protected]>
---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f12225f..ba4550c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7000,6 +7000,7 @@ struct cgroup_subsys cpu_cgrp_subsys = {
.dfl_cftypes = cpu_files,
.early_init = true,
.threaded = true,
+ .bypassable = true,
};
#endif /* CONFIG_CGROUP_SCHED */
--
1.8.3.1
A controller in the default hierarchy is bypassable if the bypassable
flag is set in its cgroup_subsys structure. The special prefix '#'
attached to a bypassable controller name can now be written into the
cgroup.subtree_control file to set that controller in bypass mode in
all the child cgroups. The controller will show up in the children's
cgroup.controllers file, but the corresponding control knobs will be
absent. However, that controller can be enabled or bypassed in its
children by writing to their respective subtree_control files.
In term of resource control, a bypassed cgroup was lumped into its
nearest non-bypassed ancestor. So all the tasks in that ancestor as
well as those in its bypassed direct descendants are controlled by the
same set of control knobs.
This mode is useful to those bypassable controllers where there are
costs to each additional layer of hierarchy. This mode will also allow
more freedom in how each controller can shape its effective hierarchy
independent of the others.
Signed-off-by: Waiman Long <[email protected]>
---
include/linux/cgroup-defs.h | 19 ++++--
kernel/cgroup/cgroup.c | 149 +++++++++++++++++++++++++++++---------------
2 files changed, 113 insertions(+), 55 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 8fcbae1..5bff798 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -375,16 +375,18 @@ struct cgroup {
struct cgroup_file events_file; /* handle for "cgroup.events" */
/*
- * The bitmask of subsystems enabled on the child cgroups.
- * ->subtree_control is the one configured through
- * "cgroup.subtree_control" while ->child_ss_mask is the effective
- * one which may have more subsystems enabled. Controller knobs
- * are made available iff it's enabled in ->subtree_control.
+ * The bitmask of subsystems enabled or bypassed on the child cgroups.
+ * ->subtree_control and ->subtree_bypass are the one configured
+ * through "cgroup.subtree_control" while ->subtree_ss_mask is the
+ * effective one which may have more subsystems enabled. Controller
+ * knobs are made available iff it's enabled in ->subtree_ss_mask.
*/
u16 subtree_control;
u16 subtree_ss_mask;
+ u16 subtree_bypass;
u16 old_subtree_control;
u16 old_subtree_ss_mask;
+ u16 old_subtree_bypass;
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state __rcu *subsys[CGROUP_SUBSYS_COUNT];
@@ -647,6 +649,13 @@ struct cgroup_subsys {
bool broken_hierarchy:1;
bool warned_broken_hierarchy:1;
+ /*
+ * If %true, the controller, on the default hierarchy, can be
+ * bypassable. IOW, it can have a virtual hierarchy that is
+ * different from the default hierarchy.
+ */
+ bool bypassable:1;
+
/* the following two fields are initialized automtically during boot */
int id;
const char *name;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e06994f..a361c10 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -175,6 +175,9 @@ struct cgroup_subsys *cgroup_subsys[] = {
/* some controllers can be threaded on the default hierarchy */
static u16 cgrp_dfl_threaded_ss_mask;
+/* some controllers can be bypassable on the default hierarchy */
+static u16 cgrp_dfl_bypass_ss_mask;
+
/* The list of hierarchy roots */
LIST_HEAD(cgroup_roots);
static int cgroup_root_count;
@@ -366,7 +369,8 @@ static bool cgroup_can_be_thread_root(struct cgroup *cgrp)
return false;
/* and no domain controllers can be enabled */
- if (cgrp->subtree_control & ~cgrp_dfl_threaded_ss_mask)
+ if ((cgrp->subtree_control|cgrp->subtree_bypass) &
+ ~cgrp_dfl_threaded_ss_mask)
return false;
return true;
@@ -388,7 +392,8 @@ bool cgroup_is_thread_root(struct cgroup *cgrp)
* enabled is a thread root.
*/
if (cgroup_has_tasks(cgrp) &&
- (cgrp->subtree_control & cgrp_dfl_threaded_ss_mask))
+ ((cgrp->subtree_control|cgrp->subtree_bypass)
+ & cgrp_dfl_threaded_ss_mask))
return true;
return false;
@@ -413,7 +418,7 @@ static bool cgroup_is_valid_domain(struct cgroup *cgrp)
}
/* subsystems visibly enabled on a cgroup */
-static u16 cgroup_control(struct cgroup *cgrp)
+static u16 cgroup_control(struct cgroup *cgrp, bool show_bypass)
{
struct cgroup *parent = cgroup_parent(cgrp);
u16 root_ss_mask = cgrp->root->subsys_mask;
@@ -421,6 +426,9 @@ static u16 cgroup_control(struct cgroup *cgrp)
if (parent) {
u16 ss_mask = parent->subtree_control;
+ if (show_bypass)
+ ss_mask |= parent->subtree_bypass;
+
/* threaded cgroups can only have threaded controllers */
if (cgroup_is_threaded(cgrp))
ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -434,13 +442,17 @@ static u16 cgroup_control(struct cgroup *cgrp)
}
/* subsystems enabled on a cgroup */
-static u16 cgroup_ss_mask(struct cgroup *cgrp)
+static u16 cgroup_ss_mask(struct cgroup *cgrp, bool show_bypass)
{
struct cgroup *parent = cgroup_parent(cgrp);
if (parent) {
u16 ss_mask = parent->subtree_ss_mask;
+
+ if (show_bypass)
+ ss_mask |= parent->subtree_bypass;
+
/* threaded cgroups can only have threaded controllers */
if (cgroup_is_threaded(cgrp))
ss_mask &= cgrp_dfl_threaded_ss_mask;
@@ -515,7 +527,7 @@ static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
* This function is used while updating css associations and thus
* can't test the csses directly. Test ss_mask.
*/
- while (!(cgroup_ss_mask(cgrp) & (1 << ss->id))) {
+ while (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id))) {
cgrp = cgroup_parent(cgrp);
if (!cgrp)
return NULL;
@@ -2414,7 +2426,7 @@ int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp)
return 0;
/* apply no-internal-process constraint */
- if (dst_cgrp->subtree_control)
+ if (dst_cgrp->subtree_control|dst_cgrp->subtree_bypass)
return -EBUSY;
return 0;
@@ -2712,15 +2724,18 @@ void cgroup_procs_write_finish(struct task_struct *task)
ss->post_attach();
}
-static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
+static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask,
+ u16 bypass_mask)
{
struct cgroup_subsys *ss;
bool printed = false;
int ssid;
- do_each_subsys_mask(ss, ssid, ss_mask) {
+ do_each_subsys_mask(ss, ssid, ss_mask|bypass_mask) {
if (printed)
seq_putc(seq, ' ');
+ if (!(ss_mask & (1 << ssid)))
+ seq_putc(seq, '#');
seq_printf(seq, "%s", ss->name);
printed = true;
} while_each_subsys_mask();
@@ -2732,8 +2747,10 @@ static void cgroup_print_ss_mask(struct seq_file *seq, u16 ss_mask)
static int cgroup_controllers_show(struct seq_file *seq, void *v)
{
struct cgroup *cgrp = seq_css(seq)->cgroup;
+ struct cgroup *parent = cgroup_parent(cgrp);
+ u16 bypass = parent ? parent->subtree_bypass : 0;
- cgroup_print_ss_mask(seq, cgroup_control(cgrp));
+ cgroup_print_ss_mask(seq, cgroup_control(cgrp, false), bypass);
return 0;
}
@@ -2742,7 +2759,7 @@ static int cgroup_subtree_control_show(struct seq_file *seq, void *v)
{
struct cgroup *cgrp = seq_css(seq)->cgroup;
- cgroup_print_ss_mask(seq, cgrp->subtree_control);
+ cgroup_print_ss_mask(seq, cgrp->subtree_control, cgrp->subtree_bypass);
return 0;
}
@@ -2856,6 +2873,7 @@ static void cgroup_save_control(struct cgroup *cgrp)
cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
dsct->old_subtree_control = dsct->subtree_control;
dsct->old_subtree_ss_mask = dsct->subtree_ss_mask;
+ dsct->old_subtree_bypass = dsct->subtree_bypass;
dsct->old_dom_cgrp = dsct->dom_cgrp;
}
}
@@ -2874,10 +2892,13 @@ static void cgroup_propagate_control(struct cgroup *cgrp)
struct cgroup_subsys_state *d_css;
cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) {
- dsct->subtree_control &= cgroup_control(dsct);
+ u16 mask = cgroup_control(dsct, true);
+
+ dsct->subtree_control &= mask;
+ dsct->subtree_bypass &= mask;
dsct->subtree_ss_mask =
cgroup_calc_subtree_ss_mask(dsct->subtree_control,
- cgroup_ss_mask(dsct));
+ cgroup_ss_mask(dsct, true));
}
}
@@ -2897,6 +2918,7 @@ static void cgroup_restore_control(struct cgroup *cgrp)
cgroup_for_each_live_descendant_post(dsct, d_css, cgrp) {
dsct->subtree_control = dsct->old_subtree_control;
dsct->subtree_ss_mask = dsct->old_subtree_ss_mask;
+ dsct->subtree_bypass = dsct->old_subtree_bypass;
dsct->dom_cgrp = dsct->old_dom_cgrp;
}
}
@@ -2906,9 +2928,9 @@ static bool css_visible(struct cgroup_subsys_state *css)
struct cgroup_subsys *ss = css->ss;
struct cgroup *cgrp = css->cgroup;
- if (cgroup_control(cgrp) & (1 << ss->id))
+ if (cgroup_control(cgrp, false) & (1 << ss->id))
return true;
- if (!(cgroup_ss_mask(cgrp) & (1 << ss->id)))
+ if (!(cgroup_ss_mask(cgrp, false) & (1 << ss->id)))
return false;
return cgroup_on_dfl(cgrp) && ss->implicit_on_dfl;
}
@@ -2939,7 +2961,7 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp)
WARN_ON_ONCE(css && percpu_ref_is_dying(&css->refcnt));
- if (!(cgroup_ss_mask(dsct) & (1 << ss->id)))
+ if (!(cgroup_ss_mask(dsct, false) & (1 << ss->id)))
continue;
if (!css) {
@@ -2989,7 +3011,7 @@ static void cgroup_apply_control_disable(struct cgroup *cgrp)
continue;
if (css->parent &&
- !(cgroup_ss_mask(dsct) & (1 << ss->id))) {
+ !(cgroup_ss_mask(dsct, false) & (1 << ss->id))) {
kill_css(css);
} else if (!css_visible(css)) {
css_clear_dir(css);
@@ -3101,7 +3123,8 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
char *buf, size_t nbytes,
loff_t off)
{
- u16 enable = 0, disable = 0;
+ u16 enable = 0, disable = 0, bypass = 0;
+ u16 child_enable = 0;
struct cgroup *cgrp, *child;
struct cgroup_subsys *ss;
char *tok;
@@ -3122,10 +3145,16 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
if (*tok == '+') {
enable |= 1 << ssid;
+ bypass &= ~(1 << ssid);
disable &= ~(1 << ssid);
} else if (*tok == '-') {
disable |= 1 << ssid;
enable &= ~(1 << ssid);
+ bypass &= ~(1 << ssid);
+ } else if (*tok == '#') {
+ bypass |= 1 << ssid;
+ enable &= ~(1 << ssid);
+ disable &= ~(1 << ssid);
} else {
return -EINVAL;
}
@@ -3139,35 +3168,42 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
if (!cgrp)
return -ENODEV;
- for_each_subsys(ss, ssid) {
- if (enable & (1 << ssid)) {
- if (cgrp->subtree_control & (1 << ssid)) {
- enable &= ~(1 << ssid);
- continue;
- }
+ /*
+ * Cannot use controllers that aren't allowed.
+ */
+ if (~cgroup_control(cgrp, true) & (enable|disable|bypass)) {
+ ret = -ENOENT;
+ goto out_unlock;
+ }
- if (!(cgroup_control(cgrp) & (1 << ssid))) {
- ret = -ENOENT;
- goto out_unlock;
- }
- } else if (disable & (1 << ssid)) {
- if (!(cgrp->subtree_control & (1 << ssid))) {
- disable &= ~(1 << ssid);
- continue;
- }
+ /*
+ * Strip out redundant bits.
+ */
+ enable &= ~cgrp->subtree_control;
+ bypass &= ~cgrp->subtree_bypass;
+ disable &= (cgrp->subtree_control|cgrp->subtree_bypass);
- /* a child has it enabled? */
- cgroup_for_each_live_child(child, cgrp) {
- if (child->subtree_control & (1 << ssid)) {
- ret = -EBUSY;
- goto out_unlock;
- }
- }
- }
+ if (!(enable|bypass|disable)) {
+ ret = 0;
+ goto out_unlock;
}
- if (!enable && !disable) {
- ret = 0;
+ /*
+ * Only bypassable controllers can be bypassed.
+ */
+ if (bypass & ~cgrp_dfl_bypass_ss_mask) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ cgroup_for_each_live_child(child, cgrp)
+ child_enable |= child->subtree_control|child->subtree_bypass;
+
+ /*
+ * Cannot change the state of a controller if enabled in children.
+ */
+ if ((enable|bypass|disable) & child_enable) {
+ ret = -EBUSY;
goto out_unlock;
}
@@ -3179,7 +3215,9 @@ static ssize_t cgroup_subtree_control_write(struct kernfs_open_file *of,
cgroup_save_control(cgrp);
cgrp->subtree_control |= enable;
- cgrp->subtree_control &= ~disable;
+ cgrp->subtree_control &= ~(bypass|disable);
+ cgrp->subtree_bypass |= bypass;
+ cgrp->subtree_bypass &= ~(enable|disable);
ret = cgroup_apply_control(cgrp);
cgroup_finalize_control(cgrp, ret);
@@ -4727,7 +4765,8 @@ static void css_release(struct percpu_ref *ref)
}
static void init_and_link_css(struct cgroup_subsys_state *css,
- struct cgroup_subsys *ss, struct cgroup *cgrp)
+ struct cgroup_subsys *ss, struct cgroup *cgrp,
+ struct cgroup_subsys_state *parent_css)
{
lockdep_assert_held(&cgroup_mutex);
@@ -4743,8 +4782,8 @@ static void init_and_link_css(struct cgroup_subsys_state *css,
css->serial_nr = css_serial_nr_next++;
atomic_set(&css->online_cnt, 0);
- if (cgroup_parent(cgrp)) {
- css->parent = cgroup_css(cgroup_parent(cgrp), ss);
+ if (parent_css) {
+ css->parent = parent_css;
css_get(css->parent);
}
@@ -4807,19 +4846,26 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
struct cgroup_subsys *ss)
{
struct cgroup *parent = cgroup_parent(cgrp);
- struct cgroup_subsys_state *parent_css = cgroup_css(parent, ss);
+ struct cgroup_subsys_state *parent_css = NULL;
struct cgroup_subsys_state *css;
int err;
lockdep_assert_held(&cgroup_mutex);
+ /*
+ * As cgroup may be in bypass mode, need to skip over ancestor
+ * cgroups with NULL CSS.
+ */
+ for (; parent && !parent_css; parent = cgroup_parent(parent))
+ parent_css = cgroup_css(parent, ss);
+
css = ss->css_alloc(parent_css);
if (!css)
css = ERR_PTR(-ENOMEM);
if (IS_ERR(css))
return css;
- init_and_link_css(css, ss, cgrp);
+ init_and_link_css(css, ss, cgrp, parent_css);
err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
if (err)
@@ -4941,7 +4987,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
* subtree_control from the parent. Each is configured manually.
*/
if (!cgroup_on_dfl(cgrp))
- cgrp->subtree_control = cgroup_control(cgrp);
+ cgrp->subtree_control = cgroup_control(cgrp, false);
cgroup_propagate_control(cgrp);
@@ -5254,7 +5300,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
css = ss->css_alloc(cgroup_css(&cgrp_dfl_root.cgrp, ss));
/* We don't handle early failures gracefully */
BUG_ON(IS_ERR(css));
- init_and_link_css(css, ss, &cgrp_dfl_root.cgrp);
+ init_and_link_css(css, ss, &cgrp_dfl_root.cgrp, NULL);
/*
* Root csses are never destroyed and we can't initialize
@@ -5412,6 +5458,9 @@ int __init cgroup_init(void)
if (ss->threaded)
cgrp_dfl_threaded_ss_mask |= 1 << ss->id;
+ if (ss->bypassable)
+ cgrp_dfl_bypass_ss_mask |= 1 << ss->id;
+
if (ss->dfl_cftypes == ss->legacy_cftypes) {
WARN_ON(cgroup_add_cftypes(ss, ss->dfl_cftypes));
} else {
--
1.8.3.1
[CC += [email protected]]
Hi Waiman,
Since this is a kernel-user-space API change, for all future
iterations of this patch, please CC linux-api@.
The kernel source file Documentation/SubmitChecklist notes that all
Linux kernel patches that change userspace interfaces should be CCed
to [email protected], so that the various parties who are
interested in API changes are informed. For further information, see
https://www.kernel.org/doc/man-pages/linux-api-ml.html
Thanks,
Michael
On Tue, Nov 20, 2018 at 8:45 PM Waiman Long <[email protected]> wrote:
>
> v4:
> - Rebased to the latest for-4.21 branch of cgroup tree.
> - Make each controller explicitly opt in to become bypassable by
> setting the bypassable cgroup_subsys flag. Currently, only cpu
> controller is made bypassable.
> - Break out the cgroup_v2.rst documentation update as separate patch.
>
> v3:
> - Remove invalid cgroup subdirectory creation patch.
> - Add use cases for the bypass mode and removing statements about
> control files ownership in cgroup-v2.txt.
> - Restrict bypass mode to non-domain (threaded) controllers only.
>
> v3 patch - https://lkml.org/lkml/2017/8/9/604
>
> This patchset introduces a new bypass mode to the cgroup v2 core to
> give more freedom and flexibility to controllers which choose to become
> bypassable the freedom to shape their own unique views of the virtual
> cgroup hierarchies that can best suit thier own use cases.
>
> Because of the inherent performance overhead in enabling cpu controller,
> it is made bypassable so that the controller only needs to be enabled
> at those cgroups that really need it instead of in every cgroups at a
> given layer if at least one of them needs it.
>
> The cpu controller performance problem is one of the major issues
> in migrating from cgroup v1 to v2.
>
> For example,
>
> R - A(+) - B(#) - C(+)
> \ D(#)
>
> where "+" means the controller is enabled and "#" means the controller
> is bypassed. For this controller's perspective, the cgroups are
> equivalent to:
>
> R - A|B|D - C
>
> Underneath the root R, cgoups A, B and D are controlled by one set of
> knobs and cgroup C is controlled by another set of knobs as a child of
> cgroups A|B|C.
>
> This patchset is layered on top of the "for-4.21" branch of Tejun's
> cgroup git tree.
>
> Patch 1 introduces a new bypass mode that allows a bypassable
> controller to be disabled in a cgroup, but to be re-enabled again in its
> children. This is enabled by writing the controller name prefixed with
> '#' to the "cgroup.subtree_control" file. Then all its children will
> have this controller in bypass mode.
>
> Patch 2 extends the bypass mode mechanism to allow those child cgroups
> that are put into the bypass mode for a particular bypassable controller
> by their parent to be re-enabled again by writing the controller name
> with the '+' prefix to the "cgroup.controllers" file.
>
> Patch 3 extends the debug controller to expose additional controller
> masks introduced by this patchset.
>
> Patch 4 makes the cpu controller bypassable.
>
> Patch 5 documents the new bypass mode in cgroup-v2.rst file.
>
> Waiman Long (5):
> cgroup: subtree_control bypass mode for bypassable controllers
> cgroup: Allow reenabling of controller in bypass mode
> cgroup: Make debug controller report new controller masks
> sched/core: Make cpu cgroup controller bypassable
> cgroup: Document bypass mode
>
> Documentation/admin-guide/cgroup-v2.rst | 66 +++++---
> include/linux/cgroup-defs.h | 26 +++-
> kernel/cgroup/cgroup.c | 256 +++++++++++++++++++++++++-------
> kernel/cgroup/debug.c | 2 +
> kernel/sched/core.c | 1 +
> 5 files changed, 276 insertions(+), 75 deletions(-)
>
> --
> 1.8.3.1
>
--
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
Hi Waiman,
Thank you for the patch! Perhaps something to improve:
url: https://github.com/0day-ci/linux/commits/Waiman-Long/cgroup-Introducing-bypass-mode/20181123-030552
base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
smatch warnings:
kernel/cgroup/cgroup.c:4893 css_create() error: we previously assumed 'parent' could be null (see line 4864)
# https://github.com/0day-ci/linux/commit/8b68fd4330e043645667a5d3306398f8f88f9ff2
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 8b68fd4330e043645667a5d3306398f8f88f9ff2
vim +/parent +4893 kernel/cgroup/cgroup.c
a31f2d3ff kernel/cgroup.c Tejun Heo 2012-11-19 4840
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4841 /**
6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4842 * css_create - create a cgroup_subsys_state
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4843 * @cgrp: the cgroup new css will be associated with
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4844 * @ss: the subsys of new css
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4845 *
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4846 * Create a new css associated with @cgrp - @ss pair. On success, the new
6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4847 * css is online and installed in @cgrp. This function doesn't create the
6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4848 * interface files. Returns 0 on success, -errno on failure.
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4849 */
6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4850 static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4851 struct cgroup_subsys *ss)
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4852 {
d51f39b05 kernel/cgroup.c Tejun Heo 2014-05-16 4853 struct cgroup *parent = cgroup_parent(cgrp);
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4854 struct cgroup_subsys_state *parent_css = NULL;
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4855 struct cgroup_subsys_state *css;
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4856 int err;
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4857
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4858 lockdep_assert_held(&cgroup_mutex);
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4859
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4860 /*
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4861 * As cgroup may be in bypass mode, need to skip over ancestor
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4862 * cgroups with NULL CSS.
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4863 */
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 @4864 for (; parent && !parent_css; parent = cgroup_parent(parent))
^^^^^^^^^^^^^^^^^^^^^
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4865 parent_css = cgroup_css(parent, ss);
When we exit this loop it means either parent is NULL or parent_css
is non-NULL.
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4866
1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4867 css = ss->css_alloc(parent_css);
e7e15b87f kernel/cgroup.c Tejun Heo 2016-06-21 4868 if (!css)
e7e15b87f kernel/cgroup.c Tejun Heo 2016-06-21 4869 css = ERR_PTR(-ENOMEM);
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4870 if (IS_ERR(css))
6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4871 return css;
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4872
8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4873 init_and_link_css(css, ss, cgrp, parent_css);
a2bed8209 kernel/cgroup.c Tejun Heo 2014-05-04 4874
2aad2a86f kernel/cgroup.c Tejun Heo 2014-09-24 4875 err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4876 if (err)
3eb59ec64 kernel/cgroup.c Li Zefan 2014-03-18 4877 goto err_free_css;
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4878
cf780b7dc kernel/cgroup.c Vladimir Davydov 2015-08-03 4879 err = cgroup_idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
15a4c835e kernel/cgroup.c Tejun Heo 2014-05-04 4880 if (err < 0)
b00c52dae kernel/cgroup.c Wenwei Tao 2016-05-13 4881 goto err_free_css;
15a4c835e kernel/cgroup.c Tejun Heo 2014-05-04 4882 css->id = err;
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4883
15a4c835e kernel/cgroup.c Tejun Heo 2014-05-04 4884 /* @css is ready to be brought online now, make it visible */
1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4885 list_add_tail_rcu(&css->sibling, &parent_css->children);
15a4c835e kernel/cgroup.c Tejun Heo 2014-05-04 4886 cgroup_idr_replace(&ss->css_idr, css, css->id);
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4887
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4888 err = online_css(css);
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4889 if (err)
1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4890 goto err_list_del;
944196278 kernel/cgroup.c Tejun Heo 2014-03-19 4891
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4892 if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
d51f39b05 kernel/cgroup.c Tejun Heo 2014-05-16 @4893 cgroup_parent(parent)) {
^^^^^^
We dereference parent inside the function, but I don't know for sure
if this is reachable when "parent" is NULL.
ed3d261b5 kernel/cgroup.c Joe Perches 2014-04-25 4894 pr_warn("%s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4895 current->comm, current->pid, ss->name);
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4896 if (!strcmp(ss->name, "memory"))
ed3d261b5 kernel/cgroup.c Joe Perches 2014-04-25 4897 pr_warn("\"memory\" requires setting use_hierarchy to 1 on the root\n");
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4898 ss->warned_broken_hierarchy = true;
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4899 }
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4900
6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4901 return css;
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4902
1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4903 err_list_del:
1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4904 list_del_rcu(&css->sibling);
3eb59ec64 kernel/cgroup.c Li Zefan 2014-03-18 4905 err_free_css:
8f53470ba kernel/cgroup/cgroup.c Tejun Heo 2018-04-26 4906 list_del_rcu(&css->rstat_css_node);
8f36aaec9 kernel/cgroup/cgroup.c Tejun Heo 2018-03-14 4907 INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
8f36aaec9 kernel/cgroup/cgroup.c Tejun Heo 2018-03-14 4908 queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4909 return ERR_PTR(err);
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4910 }
c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4911
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On 11/29/2018 06:18 AM, Dan Carpenter wrote:
> Hi Waiman,
>
> Thank you for the patch! Perhaps something to improve:
>
> url: https://github.com/0day-ci/linux/commits/Waiman-Long/cgroup-Introducing-bypass-mode/20181123-030552
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
>
> smatch warnings:
> kernel/cgroup/cgroup.c:4893 css_create() error: we previously assumed 'parent' could be null (see line 4864)
>
> # https://github.com/0day-ci/linux/commit/8b68fd4330e043645667a5d3306398f8f88f9ff2
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 8b68fd4330e043645667a5d3306398f8f88f9ff2
> vim +/parent +4893 kernel/cgroup/cgroup.c
>
> a31f2d3ff kernel/cgroup.c Tejun Heo 2012-11-19 4840
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4841 /**
> 6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4842 * css_create - create a cgroup_subsys_state
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4843 * @cgrp: the cgroup new css will be associated with
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4844 * @ss: the subsys of new css
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4845 *
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4846 * Create a new css associated with @cgrp - @ss pair. On success, the new
> 6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4847 * css is online and installed in @cgrp. This function doesn't create the
> 6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4848 * interface files. Returns 0 on success, -errno on failure.
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4849 */
> 6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4850 static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
> 6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4851 struct cgroup_subsys *ss)
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4852 {
> d51f39b05 kernel/cgroup.c Tejun Heo 2014-05-16 4853 struct cgroup *parent = cgroup_parent(cgrp);
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4854 struct cgroup_subsys_state *parent_css = NULL;
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4855 struct cgroup_subsys_state *css;
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4856 int err;
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4857
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4858 lockdep_assert_held(&cgroup_mutex);
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4859
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4860 /*
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4861 * As cgroup may be in bypass mode, need to skip over ancestor
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4862 * cgroups with NULL CSS.
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4863 */
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 @4864 for (; parent && !parent_css; parent = cgroup_parent(parent))
> ^^^^^^^^^^^^^^^^^^^^^
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4865 parent_css = cgroup_css(parent, ss);
>
> When we exit this loop it means either parent is NULL or parent_css
> is non-NULL.
Sorry for the bug. I should have a temp variable to go up
cgroup_parent() iteration chain and leave parent alone. Thanks for
spotting that.
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4866
> 1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4867 css = ss->css_alloc(parent_css);
> e7e15b87f kernel/cgroup.c Tejun Heo 2016-06-21 4868 if (!css)
> e7e15b87f kernel/cgroup.c Tejun Heo 2016-06-21 4869 css = ERR_PTR(-ENOMEM);
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4870 if (IS_ERR(css))
> 6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4871 return css;
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4872
> 8b68fd433 kernel/cgroup/cgroup.c Waiman Long 2018-11-20 4873 init_and_link_css(css, ss, cgrp, parent_css);
> a2bed8209 kernel/cgroup.c Tejun Heo 2014-05-04 4874
> 2aad2a86f kernel/cgroup.c Tejun Heo 2014-09-24 4875 err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4876 if (err)
> 3eb59ec64 kernel/cgroup.c Li Zefan 2014-03-18 4877 goto err_free_css;
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4878
> cf780b7dc kernel/cgroup.c Vladimir Davydov 2015-08-03 4879 err = cgroup_idr_alloc(&ss->css_idr, NULL, 2, 0, GFP_KERNEL);
> 15a4c835e kernel/cgroup.c Tejun Heo 2014-05-04 4880 if (err < 0)
> b00c52dae kernel/cgroup.c Wenwei Tao 2016-05-13 4881 goto err_free_css;
> 15a4c835e kernel/cgroup.c Tejun Heo 2014-05-04 4882 css->id = err;
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4883
> 15a4c835e kernel/cgroup.c Tejun Heo 2014-05-04 4884 /* @css is ready to be brought online now, make it visible */
> 1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4885 list_add_tail_rcu(&css->sibling, &parent_css->children);
> 15a4c835e kernel/cgroup.c Tejun Heo 2014-05-04 4886 cgroup_idr_replace(&ss->css_idr, css, css->id);
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4887
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4888 err = online_css(css);
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4889 if (err)
> 1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4890 goto err_list_del;
> 944196278 kernel/cgroup.c Tejun Heo 2014-03-19 4891
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4892 if (ss->broken_hierarchy && !ss->warned_broken_hierarchy &&
> d51f39b05 kernel/cgroup.c Tejun Heo 2014-05-16 @4893 cgroup_parent(parent)) {
> ^^^^^^
> We dereference parent inside the function, but I don't know for sure
> if this is reachable when "parent" is NULL.
>
> ed3d261b5 kernel/cgroup.c Joe Perches 2014-04-25 4894 pr_warn("%s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n",
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4895 current->comm, current->pid, ss->name);
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4896 if (!strcmp(ss->name, "memory"))
> ed3d261b5 kernel/cgroup.c Joe Perches 2014-04-25 4897 pr_warn("\"memory\" requires setting use_hierarchy to 1 on the root\n");
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4898 ss->warned_broken_hierarchy = true;
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4899 }
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4900
> 6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4901 return css;
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4902
> 1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4903 err_list_del:
> 1fed1b2e3 kernel/cgroup.c Tejun Heo 2014-05-16 4904 list_del_rcu(&css->sibling);
> 3eb59ec64 kernel/cgroup.c Li Zefan 2014-03-18 4905 err_free_css:
> 8f53470ba kernel/cgroup/cgroup.c Tejun Heo 2018-04-26 4906 list_del_rcu(&css->rstat_css_node);
> 8f36aaec9 kernel/cgroup/cgroup.c Tejun Heo 2018-03-14 4907 INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
> 8f36aaec9 kernel/cgroup/cgroup.c Tejun Heo 2018-03-14 4908 queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
> 6cd0f5bba kernel/cgroup.c Tejun Heo 2016-03-03 4909 return ERR_PTR(err);
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4910 }
> c81c925ad kernel/cgroup.c Tejun Heo 2013-12-06 4911
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
Cheers,
Longman