perf_event cgroup controller is one of the remaining few with broken
hierarchy support. It turns out it's pretty easy to implement - the
only thing necessary is making perf_cgroup_match() return %true also
when the cgroup of the current task is a descendant of the event's
cgroup. This patchset implements cgroup_is_descendant() and uses it
to implement hierarchy support in perf_event controller.
This patchset contains the following three patches.
0001-cgroup-make-sure-parent-won-t-be-destroyed-before-it.patch
0002-cgroup-implement-cgroup_is_descendant.patch
0003-perf-make-perf_event-cgroup-hierarchical.patch
The patches are also available in the following git branch, which is
based on top of cgroup/for-3.10. It's currently based on top of
cgroup/for-3.10 as the first patch causes non-trivial conflict with it
otherwise, which is not difficult to resolve but still nice to avoid
anyway.
Li, Michal, I picked the first two patches from Li's memcg patchset.
Can we push the first two through cgroup/for-3.10 and put the rest in
-mm?
Ingo, how should these be routed?
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git perf_event-hierarchy-support
diffstat follows, thanks.
include/linux/cgroup.h | 1 +
kernel/cgroup.c | 30 ++++++++++++++++++++++++++++++
kernel/events/core.c | 24 ++++++++++++++++--------
3 files changed, 47 insertions(+), 8 deletions(-)
--
tejun
From: Li Zefan <[email protected]>
A couple controllers want to determine whether two cgroups are in
ancestor/descendant relationship. As it's more likely that the
descendant is the primary subject of interest and there are other
operations focusing on the descendants, let's ask is_descendent rather
than is_ancestor.
Implementation is trivial as the previous patch guarantees that all
ancestors of a cgroup stay accessible as long as the cgroup is
accessible.
tj: Removed depth optimization, renamed from cgroup_is_ancestor(),
rewrote descriptions.
Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/cgroup.h | 1 +
kernel/cgroup.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 515927e..45aee0f 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -439,6 +439,7 @@ int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_is_removed(const struct cgroup *cgrp);
+bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor);
int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index d7dfa64..678a22c 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -276,6 +276,26 @@ inline int cgroup_is_removed(const struct cgroup *cgrp)
return test_bit(CGRP_REMOVED, &cgrp->flags);
}
+/**
+ * cgroup_is_descendant - test ancestry
+ * @cgrp: the cgroup to be tested
+ * @ancestor: possible ancestor of @cgrp
+ *
+ * Test whether @cgrp is a descendant of @ancestor. It also returns %true
+ * if @cgrp == @ancestor. This function is safe to call as long as @cgrp
+ * and @ancestor are accessible.
+ */
+bool cgroup_is_descendant(struct cgroup *cgrp, struct cgroup *ancestor)
+{
+ while (cgrp) {
+ if (cgrp == ancestor)
+ return true;
+ cgrp = cgrp->parent;
+ }
+ return false;
+}
+EXPORT_SYMBOL_GPL(cgroup_is_descendant);
+
/* bits in struct cgroupfs_root flags field */
enum {
ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
--
1.8.1.4
From: Li Zefan <[email protected]>
Suppose we rmdir a cgroup and there're still css refs, this cgroup won't
be freed. Then we rmdir the parent cgroup, and the parent is freed
immediately due to css ref draining to 0. Now it would be a disaster if
the still-alive child cgroup tries to access its parent.
Make sure this won't happen.
Signed-off-by: Li Zefan <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
kernel/cgroup.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ba3e24a..d7dfa64 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -876,6 +876,13 @@ static void cgroup_free_fn(struct work_struct *work)
mutex_unlock(&cgroup_mutex);
/*
+ * We get a ref to the parent's dentry, and put the ref when
+ * this cgroup is being freed, so it's guaranteed that the
+ * parent won't be destroyed before its children.
+ */
+ dput(cgrp->parent->dentry);
+
+ /*
* Drop the active superblock reference that we took when we
* created the cgroup
*/
@@ -4168,6 +4175,9 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
for_each_subsys(root, ss)
dget(dentry);
+ /* hold a ref to the parent's dentry */
+ dget(parent->dentry);
+
/* creation succeeded, notify subsystems */
for_each_subsys(root, ss) {
err = online_css(ss, cgrp);
--
1.8.1.4
perf_event is one of a couple remaining cgroup controllers with broken
hierarchy support. Converting it to support hierarchy is almost
trivial. The only thing necessary is to consider a task belonging to
a descendant cgroup as a match. IOW, if the cgroup of the currently
executing task (@cpuctx->cgrp) equals or is a descendant of the
event's cgroup (@event->cgrp), then the event should be enabled.
Implement hierarchy support and remove .broken_hierarchy tag along
with the incorrect comment on what needs to be done for hierarchy
support.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Namhyung Kim <[email protected]>
---
kernel/events/core.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b0cd865..310ec19 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -251,7 +251,22 @@ perf_cgroup_match(struct perf_event *event)
struct perf_event_context *ctx = event->ctx;
struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
- return !event->cgrp || event->cgrp == cpuctx->cgrp;
+ /* @event doesn't care about cgroup */
+ if (!event->cgrp)
+ return true;
+
+ /* wants specific cgroup scope but @cpuctx isn't associated with any */
+ if (!cpuctx->cgrp)
+ return false;
+
+ /*
+ * Cgroup scoping is recursive. An event enabled for a cgroup is
+ * also enabled for all its descendant cgroups. If @cpuctx's
+ * cgroup is a descendant of @event's (the test covers identity
+ * case), it's a match.
+ */
+ return cgroup_is_descendant(cpuctx->cgrp->css.cgroup,
+ event->cgrp->css.cgroup);
}
static inline bool perf_tryget_cgroup(struct perf_event *event)
@@ -7509,12 +7524,5 @@ struct cgroup_subsys perf_subsys = {
.css_free = perf_cgroup_css_free,
.exit = perf_cgroup_exit,
.attach = perf_cgroup_attach,
-
- /*
- * perf_event cgroup doesn't handle nesting correctly.
- * ctx->nr_cgroups adjustments should be propagated through the
- * cgroup hierarchy. Fix it and remove the following.
- */
- .broken_hierarchy = true,
};
#endif /* CONFIG_CGROUP_PERF */
--
1.8.1.4
On 2013/4/9 10:23, Tejun Heo wrote:
> perf_event cgroup controller is one of the remaining few with broken
> hierarchy support. It turns out it's pretty easy to implement - the
> only thing necessary is making perf_cgroup_match() return %true also
> when the cgroup of the current task is a descendant of the event's
> cgroup. This patchset implements cgroup_is_descendant() and uses it
> to implement hierarchy support in perf_event controller.
>
> This patchset contains the following three patches.
>
> 0001-cgroup-make-sure-parent-won-t-be-destroyed-before-it.patch
> 0002-cgroup-implement-cgroup_is_descendant.patch
> 0003-perf-make-perf_event-cgroup-hierarchical.patch
>
> The patches are also available in the following git branch, which is
> based on top of cgroup/for-3.10. It's currently based on top of
> cgroup/for-3.10 as the first patch causes non-trivial conflict with it
> otherwise, which is not difficult to resolve but still nice to avoid
> anyway.
>
> Li, Michal, I picked the first two patches from Li's memcg patchset.
> Can we push the first two through cgroup/for-3.10 and put the rest in
> -mm?
>
Sure. Andrew asked me to resend the memcg patchset when we are at 3.10-rc1,
because 3.9-rc6 is a bit too late and he's too busy.
> Ingo, how should these be routed?
On Mon, 2013-04-08 at 19:23 -0700, Tejun Heo wrote:
> perf_event cgroup controller is one of the remaining few with broken
> hierarchy support. It turns out it's pretty easy to implement - the
> only thing necessary is making perf_cgroup_match() return %true also
> when the cgroup of the current task is a descendant of the event's
> cgroup. This patchset implements cgroup_is_descendant() and uses it
> to implement hierarchy support in perf_event controller.
>
> This patchset contains the following three patches.
>
> 0001-cgroup-make-sure-parent-won-t-be-destroyed-before-it.patch
> 0002-cgroup-implement-cgroup_is_descendant.patch
> 0003-perf-make-perf_event-cgroup-hierarchical.patch
Thanks, looks simple and straight fwd.
> Ingo, how should these be routed?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git perf_event-hierarchy-support
Ingo typically likes multi-maintainer sets to be pulled from a common
tree into all relevant maintainer trees so that git merges afterwards
just-work (tm). But I'll let him expand.
* Peter Zijlstra <[email protected]> wrote:
> On Mon, 2013-04-08 at 19:23 -0700, Tejun Heo wrote:
> > perf_event cgroup controller is one of the remaining few with broken
> > hierarchy support. It turns out it's pretty easy to implement - the
> > only thing necessary is making perf_cgroup_match() return %true also
> > when the cgroup of the current task is a descendant of the event's
> > cgroup. This patchset implements cgroup_is_descendant() and uses it
> > to implement hierarchy support in perf_event controller.
> >
> > This patchset contains the following three patches.
> >
> > 0001-cgroup-make-sure-parent-won-t-be-destroyed-before-it.patch
> > 0002-cgroup-implement-cgroup_is_descendant.patch
> > 0003-perf-make-perf_event-cgroup-hierarchical.patch
>
> Thanks, looks simple and straight fwd.
>
> > Ingo, how should these be routed?
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git perf_event-hierarchy-support
>
> Ingo typically likes multi-maintainer sets to be pulled from a common
> tree into all relevant maintainer trees so that git merges afterwards
> just-work (tm). But I'll let him expand.
Yeah - at least for larger changes that's a good workflow.
For smaller changes we can pick one or the other tree. Tejun, do these changes
create any conflicts with the current tip:master tree? If not then you could carry
these changes in your tree. If there's significant conflicts then it might be
better to rebase this on top of perf/core and pull them into perf/core. (assuming
there's no other cgroups prereq patches beyond the ones in this series.)
Thanks,
Ingo
Hello,
On Wed, Apr 10, 2013 at 11:37:55AM +0200, Ingo Molnar wrote:
> Yeah - at least for larger changes that's a good workflow.
>
> For smaller changes we can pick one or the other tree. Tejun, do these changes
> create any conflicts with the current tip:master tree? If not then you could carry
> these changes in your tree. If there's significant conflicts then it might be
> better to rebase this on top of perf/core and pull them into perf/core. (assuming
> there's no other cgroups prereq patches beyond the ones in this series.)
The only conflict is a trivial #include one in
include/linux/res_counter.h which is unrelated to this series. I'll
commit these to cgroup/for-3.10.
Thanks!
--
tejun
On Mon, Apr 08, 2013 at 07:23:30PM -0700, Tejun Heo wrote:
> perf_event cgroup controller is one of the remaining few with broken
> hierarchy support. It turns out it's pretty easy to implement - the
> only thing necessary is making perf_cgroup_match() return %true also
> when the cgroup of the current task is a descendant of the event's
> cgroup. This patchset implements cgroup_is_descendant() and uses it
> to implement hierarchy support in perf_event controller.
Applied to cgroup/for-3.10.
Thanks.
--
tejun