2024-05-28 16:37:45

by T.J. Mercier

[permalink] [raw]
Subject: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

The /proc/cgroups documentation says that the num_cgroups value is,
"the number of control groups in this hierarchy using this controller."

The value printed is simply the total number of cgroups in the hierarchy
which is correct for v1, but not for the shared v2 hierarchy.

Consider:
controllers="cpuset cpu io memory hugetlb pids rdma misc"
for c in $controllers
do
echo +$c > /sys/fs/cgroup/cgroup.subtree_control
mkdir /sys/fs/cgroup/$c
echo +$c > /sys/fs/cgroup/$c/cgroup.subtree_control
for i in `seq 100`; do mkdir /sys/fs/cgroup/$c/$i; done
done
cat /proc/cgroups

cpuset 0 809 1
cpu 0 809 1
cpuacct 0 809 1
blkio 0 809 1
memory 0 809 1
devices 0 809 1
freezer 0 809 1
net_cls 0 809 1
perf_event 0 809 1
net_prio 0 809 1
hugetlb 0 809 1
pids 0 809 1
rdma 0 809 1
misc 0 809 1
debug 0 809 1

A count of 809 is reported for each controller, but only 109 should be
reported for most of them since each controller is enabled in only part
of the hierarchy. (Note that io depends on memcg, so its count should be
209.)

The number of cgroups using a controller is an important metric since
kernel memory is used for each cgroup, and some kernel operations scale
with the number of cgroups for some controllers (memory, io). So users
have an interest in minimizing/tracking the number of them.

Signed-off-by: T.J. Mercier <[email protected]>

---
Changes from RFC:
Don't manually initialize the atomic counters to 0 since they are
kzalloced - Michal Koutny

Also return the CSS count for utility controllers instead of the cgroup
count - Michal Koutny

include/linux/cgroup-defs.h | 6 ++++++
kernel/cgroup/cgroup-v1.c | 8 ++++++--
kernel/cgroup/cgroup.c | 2 ++
3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index ea48c861cd36..bc1dbf7652c4 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -579,6 +579,12 @@ struct cgroup_root {
/* Number of cgroups in the hierarchy, used only for /proc/cgroups */
atomic_t nr_cgrps;

+ /*
+ * Number of cgroups using each controller. Includes online and zombies.
+ * Used only for /proc/cgroups.
+ */
+ atomic_t nr_css[CGROUP_SUBSYS_COUNT];
+
/* Hierarchy-specific flags */
unsigned int flags;

diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index b9dbf6bf2779..9bad59486c46 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -675,11 +675,15 @@ int proc_cgroupstats_show(struct seq_file *m, void *v)
* cgroup_mutex contention.
*/

- for_each_subsys(ss, i)
+ for_each_subsys(ss, i) {
+ int count = cgroup_on_dfl(&ss->root->cgrp) ?
+ atomic_read(&ss->root->nr_css[i]) : atomic_read(&ss->root->nr_cgrps);
+
seq_printf(m, "%s\t%d\t%d\t%d\n",
ss->legacy_name, ss->root->hierarchy_id,
- atomic_read(&ss->root->nr_cgrps),
+ count,
cgroup_ssid_enabled(i));
+ }

return 0;
}
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e32b6972c478..1bacd7cf7551 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5362,6 +5362,7 @@ static void css_free_rwork_fn(struct work_struct *work)
ss->css_free(css);
cgroup_idr_remove(&ss->css_idr, id);
cgroup_put(cgrp);
+ atomic_dec(&ss->root->nr_css[ss->id]);

if (parent)
css_put(parent);
@@ -5504,6 +5505,7 @@ static int online_css(struct cgroup_subsys_state *css)
atomic_inc(&css->online_cnt);
if (css->parent)
atomic_inc(&css->parent->online_cnt);
+ atomic_inc(&ss->root->nr_css[ss->id]);
}
return ret;
}

base-commit: 6fbf71854e2ddea7c99397772fbbb3783bfe15b5
--
2.45.1.288.g0e0cd299f1-goog



2024-05-28 19:46:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

Hello,

On Tue, May 28, 2024 at 04:37:12PM +0000, T.J. Mercier wrote:
> The number of cgroups using a controller is an important metric since
> kernel memory is used for each cgroup, and some kernel operations scale
> with the number of cgroups for some controllers (memory, io). So users
> have an interest in minimizing/tracking the number of them.

I agree that this can be a useful metric but am not sure /proc/cgroups is
the right place to put it. Its use of v1 controller names, listing of
controllers that don't exist in v2 and the unnecessary column are rather
ugly and unnecessary. Also, it isn't hierarchical which can make
understanding where the css's are staying allocated difficult.

In v2, cgroup.controllers and cgroup.subtree_control govern which
controllers are available and enabled in the subtree. I think it would make
sense to introduce something in a similar fashion. Can't think of a good
name off the top of my head but add a cgroup. file which lists the
controllers in the subtree along with the number of css's.

Thanks.

--
tejun

2024-05-28 21:42:40

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

On Tue, May 28, 2024 at 09:43:08AM GMT, Tejun Heo <[email protected]> wrote:
> I agree that this can be a useful metric but am not sure /proc/cgroups is
> the right place to put it. Its use of v1 controller names, listing of
> controllers that don't exist in v2 and the unnecessary column are rather
> ugly and unnecessary.

At the same time, the info provided currently is incorrect or at least
misleading (when only v2 hierarchy is mounted, it mixes the counts) --
that's what T.J.'s patch attempts to rectify in my understanding.

> In v2, cgroup.controllers and cgroup.subtree_control govern which
> controllers are available and enabled in the subtree.

Yes, users could sum up cgroup.controllers contents for true v2
fingerprint.

> I think it would make sense to introduce something in a similar
> fashion. Can't think of a good name off the top of my head but add a
> cgroup. file which lists the controllers in the subtree along with the
> number of css's.

BTW, there is the 'debug' subsys that has (almost) exactly that:
'debug.csses' -- it's in v1 fashion though so it won't show hierarchical
sums.

Michal

2024-05-28 22:30:30

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

Hello,

On Tue, May 28, 2024 at 11:42:13PM +0200, Michal Koutn? wrote:
> On Tue, May 28, 2024 at 09:43:08AM GMT, Tejun Heo <[email protected]> wrote:
> > I agree that this can be a useful metric but am not sure /proc/cgroups is
> > the right place to put it. Its use of v1 controller names, listing of
> > controllers that don't exist in v2 and the unnecessary column are rather
> > ugly and unnecessary.
>
> At the same time, the info provided currently is incorrect or at least
> misleading (when only v2 hierarchy is mounted, it mixes the counts) --
> that's what T.J.'s patch attempts to rectify in my understanding.

Yeah, I was hoping to phase out that file once folks are all on v2.

Thanks.

--
tejun

2024-05-28 22:38:40

by T.J. Mercier

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

On Tue, May 28, 2024 at 2:42 PM Michal Koutný <[email protected]> wrote:
>
> On Tue, May 28, 2024 at 09:43:08AM GMT, Tejun Heo <[email protected]> wrote:
> > I agree that this can be a useful metric but am not sure /proc/cgroups is
> > the right place to put it. Its use of v1 controller names, listing of
> > controllers that don't exist in v2 and the unnecessary column are rather
> > ugly and unnecessary.
>
> At the same time, the info provided currently is incorrect or at least
> misleading (when only v2 hierarchy is mounted, it mixes the counts) --
> that's what T.J.'s patch attempts to rectify in my understanding.
>
Yeah, I meant this more as a fix for the current interface which would
still be wrong if we added a new file. I intended to find and add a
Fixes: line too, but I forgot!

> > In v2, cgroup.controllers and cgroup.subtree_control govern which
> > controllers are available and enabled in the subtree.
>
> Yes, users could sum up cgroup.controllers contents for true v2
> fingerprint.
>
This usually gets close but doesn't capture zombies, where /proc/cgroups does.

memcg is enabled in every cgroup in this example, so just find all
dirs and count them without checking cgroup.subtree_control
# find /sys/fs/cgroup/ -type d|wc -l ; grep memory /proc/cgroups
440
memory 0 465 1

Last year I found a driver bug that was causing thousands of zombies
which I only noticed from the huge difference between these two
values.

> > I think it would make sense to introduce something in a similar
> > fashion. Can't think of a good name off the top of my head but add a
> > cgroup. file which lists the controllers in the subtree along with the
> > number of css's.
>
> BTW, there is the 'debug' subsys that has (almost) exactly that:
> 'debug.csses' -- it's in v1 fashion though so it won't show hierarchical
> sums.
>
> Michal

2024-05-28 22:38:53

by T.J. Mercier

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

On Tue, May 28, 2024 at 3:30 PM Tejun Heo <[email protected]> wrote:
>
> Hello,
>
> On Tue, May 28, 2024 at 11:42:13PM +0200, Michal Koutnı wrote:
> > On Tue, May 28, 2024 at 09:43:08AM GMT, Tejun Heo <[email protected]> wrote:
> > > I agree that this can be a useful metric but am not sure /proc/cgroups is
> > > the right place to put it. Its use of v1 controller names, listing of
> > > controllers that don't exist in v2 and the unnecessary column are rather
> > > ugly and unnecessary.
> >
> > At the same time, the info provided currently is incorrect or at least
> > misleading (when only v2 hierarchy is mounted, it mixes the counts) --
> > that's what T.J.'s patch attempts to rectify in my understanding.
>
> Yeah, I was hoping to phase out that file once folks are all on v2.

I'll buy a round of drinks when that happens, but aren't we a few
years out from that? :)

>
> Thanks.
>
> --
> tejun

2024-05-28 22:41:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

Hello,

On Tue, May 28, 2024 at 03:38:07PM -0700, T.J. Mercier wrote:
> On Tue, May 28, 2024 at 3:30 PM Tejun Heo <[email protected]> wrote:
> > > At the same time, the info provided currently is incorrect or at least
> > > misleading (when only v2 hierarchy is mounted, it mixes the counts) --
> > > that's what T.J.'s patch attempts to rectify in my understanding.
> >
> > Yeah, I was hoping to phase out that file once folks are all on v2.
>
> I'll buy a round of drinks when that happens, but aren't we a few
> years out from that? :)

I don't think we're too far off from at least making cgroup1 a CONFIG
option. As for the /proc/cgroups file, it's mostly useless on cgroup2, so I
was hoping that this could be put behind the same CONFIG option. I haven't
really thought through it tho, so it's not a hard set plan or anything.

Thanks.

--
tejun

2024-05-28 22:42:50

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

On Tue, May 28, 2024 at 03:38:01PM -0700, T.J. Mercier wrote:
> > > I think it would make sense to introduce something in a similar
> > > fashion. Can't think of a good name off the top of my head but add a
> > > cgroup. file which lists the controllers in the subtree along with the
> > > number of css's.
> >
> > BTW, there is the 'debug' subsys that has (almost) exactly that:
> > 'debug.csses' -- it's in v1 fashion though so it won't show hierarchical
> > sums.

Yeah, something like that but hierarchical and built into cgroup2 interface.
Would that work for your use case?

Thanks.

--
tejun

2024-05-28 22:46:21

by T.J. Mercier

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

On Tue, May 28, 2024 at 3:42 PM Tejun Heo <[email protected]> wrote:
>
> On Tue, May 28, 2024 at 03:38:01PM -0700, T.J. Mercier wrote:
> > > > I think it would make sense to introduce something in a similar
> > > > fashion. Can't think of a good name off the top of my head but add a
> > > > cgroup. file which lists the controllers in the subtree along with the
> > > > number of css's.
> > >
> > > BTW, there is the 'debug' subsys that has (almost) exactly that:
> > > 'debug.csses' -- it's in v1 fashion though so it won't show hierarchical
> > > sums.
>
> Yeah, something like that but hierarchical and built into cgroup2 interface.
> Would that work for your use case?
>
I think so, I'm checking this out now. debug as v1-only and non-stable
interface files doesn't work, but the same sort of thing with a stable
interface on v2 seems like it would.

> Thanks.
>
> --
> tejun

2024-05-28 23:06:03

by T.J. Mercier

[permalink] [raw]
Subject: Re: [PATCH 1/2] cgroup: Fix /proc/cgroups count for v2

On Tue, May 28, 2024 at 3:45 PM T.J. Mercier <[email protected]> wrote:
>
> On Tue, May 28, 2024 at 3:42 PM Tejun Heo <[email protected]> wrote:
> >
> > On Tue, May 28, 2024 at 03:38:01PM -0700, T.J. Mercier wrote:
> > > > > I think it would make sense to introduce something in a similar
> > > > > fashion. Can't think of a good name off the top of my head but add a
> > > > > cgroup. file which lists the controllers in the subtree along with the
> > > > > number of css's.
> > > >
> > > > BTW, there is the 'debug' subsys that has (almost) exactly that:
> > > > 'debug.csses' -- it's in v1 fashion though so it won't show hierarchical
> > > > sums.
> >
> > Yeah, something like that but hierarchical and built into cgroup2 interface.
> > Would that work for your use case?
> >
> I think so, I'm checking this out now. debug as v1-only and non-stable
> interface files doesn't work, but the same sort of thing with a stable
> interface on v2 seems like it would.
>
Ok debug.cgroup_subsys_states reports css->css->online_cnt, but I'd
also want to include the number that are dying. Either as a separate
count or combined with the online_cnt like in /proc/cgroups.