This makes pids.events:max affine to pids.max limit.
How are the new events supposed to be useful?
- pids.events.local:max
- tells that cgroup's limit is hit (too tight?)
- pids.events:*
- "only" directs top-down search to cgroups of interest
Changes from v3 (https://lore.kernel.org/r/[email protected])
- use existing functions for TAP output in selftest (Muhammad)
- formatting in selftest (Muhammad)
- remove pids.events:max.imposed event, keep it internal (Johannes)
- allow legacy behavior with a mount option
- detach migration charging patches
- drop RFC prefix
Changes from v2 (https://lore.kernel.org/r/[email protected])
- implemented pids.events.local (Tejun)
- added migration charging
[1] https://lore.kernel.org/r/[email protected]/
Michal Koutný (6):
cgroup/pids: Remove superfluous zeroing
cgroup/pids: Separate semantics of pids.events related to pids.max
cgroup/pids: Make event counters hierarchical
cgroup/pids: Add pids.events.local
selftests: cgroup: Lexicographic order in Makefile
selftests: cgroup: Add basic tests for pids controller
Documentation/admin-guide/cgroup-v1/pids.rst | 3 +-
Documentation/admin-guide/cgroup-v2.rst | 12 ++
include/linux/cgroup-defs.h | 7 +-
kernel/cgroup/cgroup.c | 15 +-
kernel/cgroup/pids.c | 131 +++++++++++---
tools/testing/selftests/cgroup/.gitignore | 11 +-
tools/testing/selftests/cgroup/Makefile | 25 +--
tools/testing/selftests/cgroup/test_pids.c | 178 +++++++++++++++++++
8 files changed, 341 insertions(+), 41 deletions(-)
create mode 100644 tools/testing/selftests/cgroup/test_pids.c
base-commit: 026e680b0a08a62b1d948e5a8ca78700bfac0e6e
--
2.44.0
The pids.events file should honor the hierarchy, so make the events
propagate from their origin up to the root on the unified hierarchy. The
legacy behavior remains non-hierarchical.
Signed-off-by: Michal Koutný <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 2 +-
kernel/cgroup/pids.c | 46 ++++++++++++++++---------
2 files changed, 31 insertions(+), 17 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 108b03dfb26a..aa97e9f91c51 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -241,7 +241,7 @@ cgroup v2 currently supports the following mount options.
pid_localevents
Represent fork failures inside cgroup's pids.events:max (not its limit
- being hit).
+ being hit) and exclude subtree events from pids.events.
Organizing Processes and Threads
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index ea1fc6b37c0d..4ad28109c1c8 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -238,6 +238,34 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
}
}
+static void pids_event(struct pids_cgroup *pids_forking,
+ struct pids_cgroup *pids_over_limit)
+{
+ struct pids_cgroup *p = pids_forking;
+ bool limit = false;
+
+ for (; parent_pids(p); p = parent_pids(p)) {
+ /* Only log the first time limit is hit. */
+ if (atomic64_inc_return(&p->events[PIDCG_FORKFAIL]) == 1) {
+ pr_info("cgroup: fork rejected by pids controller in ");
+ pr_cont_cgroup_path(p->css.cgroup);
+ pr_cont("\n");
+ }
+ cgroup_file_notify(&p->events_file);
+
+ if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
+ cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
+ break;
+
+ if (p == pids_over_limit)
+ limit = true;
+ if (limit)
+ atomic64_inc(&p->events[PIDCG_MAX]);
+
+ cgroup_file_notify(&p->events_file);
+ }
+}
+
/*
* task_css_check(true) in pids_can_fork() and pids_cancel_fork() relies
* on cgroup_threadgroup_change_begin() held by the copy_process().
@@ -254,23 +282,9 @@ static int pids_can_fork(struct task_struct *task, struct css_set *cset)
css = task_css_check(current, pids_cgrp_id, true);
pids = css_pids(css);
err = pids_try_charge(pids, 1, &pids_over_limit);
- if (err) {
- /* compatibility on v1 where events were notified in leaves. */
- if (!cgroup_subsys_on_dfl(pids_cgrp_subsys))
- pids_over_limit = pids;
-
- /* Only log the first time limit is hit. */
- if (atomic64_inc_return(&pids->events[PIDCG_FORKFAIL]) == 1) {
- pr_info("cgroup: fork rejected by pids controller in ");
- pr_cont_cgroup_path(pids->css.cgroup);
- pr_cont("\n");
- }
- atomic64_inc(&pids_over_limit->events[PIDCG_MAX]);
+ if (err)
+ pids_event(pids, pids_over_limit);
- cgroup_file_notify(&pids->events_file);
- if (pids_over_limit != pids)
- cgroup_file_notify(&pids_over_limit->events_file);
- }
return err;
}
--
2.44.0
Currently, when pids.max limit is breached in the hierarchy, the event
is counted and reported in the cgroup where the forking task resides.
This decouples the limit and the notification caused by the limit making
it hard to detect when the actual limit was effected.
Redefine the pids.events:max as: the number of times the limit of the
cgroup was hit.
(Implementation differentiates also "forkfail" event but this is
currently not exposed as it would better fit into pids.stat. It also
differs from pids.events:max only when pids.max is configured on
non-leaf cgroups.)
Since it changes semantics of the original "max" event, introduce this
change only in the v2 API of the controller and add a cgroup2 mount
option to revert to the legacy behavior.
Signed-off-by: Michal Koutný <[email protected]>
---
Documentation/admin-guide/cgroup-v1/pids.rst | 3 +-
Documentation/admin-guide/cgroup-v2.rst | 12 ++++++
include/linux/cgroup-defs.h | 7 ++-
kernel/cgroup/cgroup.c | 15 ++++++-
kernel/cgroup/pids.c | 45 ++++++++++++++------
5 files changed, 67 insertions(+), 15 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v1/pids.rst b/Documentation/admin-guide/cgroup-v1/pids.rst
index 6acebd9e72c8..0f9f9a7b1f6c 100644
--- a/Documentation/admin-guide/cgroup-v1/pids.rst
+++ b/Documentation/admin-guide/cgroup-v1/pids.rst
@@ -36,7 +36,8 @@ superset of parent/child/pids.current.
The pids.events file contains event counters:
- - max: Number of times fork failed because limit was hit.
+ - max: Number of times fork failed in the cgroup because limit was hit in
+ self or ancestors.
Example
-------
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 17e6e9565156..108b03dfb26a 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -239,6 +239,10 @@ cgroup v2 currently supports the following mount options.
will not be tracked by the memory controller (even if cgroup
v2 is remounted later on).
+ pid_localevents
+ Represent fork failures inside cgroup's pids.events:max (not its limit
+ being hit).
+
Organizing Processes and Threads
--------------------------------
@@ -2186,6 +2190,14 @@ PID Interface Files
The number of processes currently in the cgroup and its
descendants.
+ pids.events
+ A read-only flat-keyed file which exists on non-root cgroups. Unless
+ specified otherwise, a value change in this file generates a file modified
+ event. The following entries are defined.
+
+ max
+ The number of times the limit of the cgroup was hit.
+
Organisational operations are not blocked by cgroup policies, so it is
possible to have pids.current > pids.max. This can be done by either
setting the limit to be smaller than pids.current, or attaching enough
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index ea48c861cd36..b36690ca0d3f 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -119,7 +119,12 @@ enum {
/*
* Enable hugetlb accounting for the memory controller.
*/
- CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19),
+ CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19),
+
+ /*
+ * Enable legacy local pids.events.
+ */
+ CGRP_ROOT_PIDS_LOCAL_EVENTS = (1 << 20),
};
/* cftype->flags */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index a66c088c851c..306af389a78a 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1922,6 +1922,7 @@ enum cgroup2_param {
Opt_memory_localevents,
Opt_memory_recursiveprot,
Opt_memory_hugetlb_accounting,
+ Opt_pids_localevents,
nr__cgroup2_params
};
@@ -1931,6 +1932,7 @@ static const struct fs_parameter_spec cgroup2_fs_parameters[] = {
fsparam_flag("memory_localevents", Opt_memory_localevents),
fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot),
fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting),
+ fsparam_flag("pids_localevents", Opt_pids_localevents),
{}
};
@@ -1960,6 +1962,9 @@ static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param
case Opt_memory_hugetlb_accounting:
ctx->flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
return 0;
+ case Opt_pids_localevents:
+ ctx->flags |= CGRP_ROOT_PIDS_LOCAL_EVENTS;
+ return 0;
}
return -EINVAL;
}
@@ -1989,6 +1994,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags)
cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
else
cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING;
+
+ if (root_flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
+ cgrp_dfl_root.flags |= CGRP_ROOT_PIDS_LOCAL_EVENTS;
+ else
+ cgrp_dfl_root.flags &= ~CGRP_ROOT_PIDS_LOCAL_EVENTS;
}
}
@@ -2004,6 +2014,8 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root
seq_puts(seq, ",memory_recursiveprot");
if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING)
seq_puts(seq, ",memory_hugetlb_accounting");
+ if (cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
+ seq_puts(seq, ",pids_localevents");
return 0;
}
@@ -7061,7 +7073,8 @@ static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
"favordynmods\n"
"memory_localevents\n"
"memory_recursiveprot\n"
- "memory_hugetlb_accounting\n");
+ "memory_hugetlb_accounting\n"
+ "pids_localevents\n");
}
static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 0e5ec7d59b4d..ea1fc6b37c0d 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -38,6 +38,14 @@
#define PIDS_MAX (PID_MAX_LIMIT + 1ULL)
#define PIDS_MAX_STR "max"
+enum pidcg_event {
+ /* Fork failed in subtree because this pids_cgroup limit was hit. */
+ PIDCG_MAX,
+ /* Fork failed in this pids_cgroup because ancestor limit was hit. */
+ PIDCG_FORKFAIL,
+ NR_PIDCG_EVENTS,
+};
+
struct pids_cgroup {
struct cgroup_subsys_state css;
@@ -52,8 +60,7 @@ struct pids_cgroup {
/* Handle for "pids.events" */
struct cgroup_file events_file;
- /* Number of times fork failed because limit was hit. */
- atomic64_t events_limit;
+ atomic64_t events[NR_PIDCG_EVENTS];
};
static struct pids_cgroup *css_pids(struct cgroup_subsys_state *css)
@@ -148,12 +155,13 @@ static void pids_charge(struct pids_cgroup *pids, int num)
* pids_try_charge - hierarchically try to charge the pid count
* @pids: the pid cgroup state
* @num: the number of pids to charge
+ * @fail: storage of pid cgroup causing the fail
*
* This function follows the set limit. It will fail if the charge would cause
* the new value to exceed the hierarchical limit. Returns 0 if the charge
* succeeded, otherwise -EAGAIN.
*/
-static int pids_try_charge(struct pids_cgroup *pids, int num)
+static int pids_try_charge(struct pids_cgroup *pids, int num, struct pids_cgroup **fail)
{
struct pids_cgroup *p, *q;
@@ -166,9 +174,10 @@ static int pids_try_charge(struct pids_cgroup *pids, int num)
* p->limit is %PIDS_MAX then we know that this test will never
* fail.
*/
- if (new > limit)
+ if (new > limit) {
+ *fail = p;
goto revert;
-
+ }
/*
* Not technically accurate if we go over limit somewhere up
* the hierarchy, but that's tolerable for the watermark.
@@ -236,7 +245,7 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
static int pids_can_fork(struct task_struct *task, struct css_set *cset)
{
struct cgroup_subsys_state *css;
- struct pids_cgroup *pids;
+ struct pids_cgroup *pids, *pids_over_limit;
int err;
if (cset)
@@ -244,15 +253,23 @@ static int pids_can_fork(struct task_struct *task, struct css_set *cset)
else
css = task_css_check(current, pids_cgrp_id, true);
pids = css_pids(css);
- err = pids_try_charge(pids, 1);
+ err = pids_try_charge(pids, 1, &pids_over_limit);
if (err) {
- /* Only log the first time events_limit is incremented. */
- if (atomic64_inc_return(&pids->events_limit) == 1) {
+ /* compatibility on v1 where events were notified in leaves. */
+ if (!cgroup_subsys_on_dfl(pids_cgrp_subsys))
+ pids_over_limit = pids;
+
+ /* Only log the first time limit is hit. */
+ if (atomic64_inc_return(&pids->events[PIDCG_FORKFAIL]) == 1) {
pr_info("cgroup: fork rejected by pids controller in ");
- pr_cont_cgroup_path(css->cgroup);
+ pr_cont_cgroup_path(pids->css.cgroup);
pr_cont("\n");
}
+ atomic64_inc(&pids_over_limit->events[PIDCG_MAX]);
+
cgroup_file_notify(&pids->events_file);
+ if (pids_over_limit != pids)
+ cgroup_file_notify(&pids_over_limit->events_file);
}
return err;
}
@@ -340,8 +357,13 @@ static s64 pids_peak_read(struct cgroup_subsys_state *css,
static int pids_events_show(struct seq_file *sf, void *v)
{
struct pids_cgroup *pids = css_pids(seq_css(sf));
+ enum pidcg_event pe = PIDCG_MAX;
+
+ if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
+ cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
+ pe = PIDCG_FORKFAIL;
- seq_printf(sf, "max %lld\n", (s64)atomic64_read(&pids->events_limit));
+ seq_printf(sf, "max %lld\n", (s64)atomic64_read(&pids->events[pe]));
return 0;
}
@@ -379,7 +401,6 @@ struct cgroup_subsys pids_cgrp_subsys = {
.can_fork = pids_can_fork,
.cancel_fork = pids_cancel_fork,
.release = pids_release,
- .legacy_cftypes = pids_files,
.dfl_cftypes = pids_files,
.threaded = true,
};
--
2.44.0
Hierarchical counting of events is not practical for watching when a
particular pids.max is being hit. Therefore introduce .local flavor of
events file (akin to memory controller) that collects only events
relevant to given cgroup.
The file is only added to the default hierarchy.
Signed-off-by: Michal Koutný <[email protected]>
---
kernel/cgroup/pids.c | 88 +++++++++++++++++++++++++++++++++++---------
1 file changed, 71 insertions(+), 17 deletions(-)
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 4ad28109c1c8..6cd15c3785d4 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -57,10 +57,12 @@ struct pids_cgroup {
atomic64_t limit;
int64_t watermark;
- /* Handle for "pids.events" */
+ /* Handles for pids.events[.local] */
struct cgroup_file events_file;
+ struct cgroup_file events_local_file;
atomic64_t events[NR_PIDCG_EVENTS];
+ atomic64_t events_local[NR_PIDCG_EVENTS];
};
static struct pids_cgroup *css_pids(struct cgroup_subsys_state *css)
@@ -244,21 +246,23 @@ static void pids_event(struct pids_cgroup *pids_forking,
struct pids_cgroup *p = pids_forking;
bool limit = false;
- for (; parent_pids(p); p = parent_pids(p)) {
- /* Only log the first time limit is hit. */
- if (atomic64_inc_return(&p->events[PIDCG_FORKFAIL]) == 1) {
- pr_info("cgroup: fork rejected by pids controller in ");
- pr_cont_cgroup_path(p->css.cgroup);
- pr_cont("\n");
- }
- cgroup_file_notify(&p->events_file);
-
- if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
- cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
- break;
+ /* Only log the first time limit is hit. */
+ if (atomic64_inc_return(&p->events_local[PIDCG_FORKFAIL]) == 1) {
+ pr_info("cgroup: fork rejected by pids controller in ");
+ pr_cont_cgroup_path(p->css.cgroup);
+ pr_cont("\n");
+ }
+ cgroup_file_notify(&p->events_local_file);
+ if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
+ cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
+ return;
- if (p == pids_over_limit)
+ for (; parent_pids(p); p = parent_pids(p)) {
+ if (p == pids_over_limit) {
limit = true;
+ atomic64_inc(&p->events_local[PIDCG_MAX]);
+ cgroup_file_notify(&p->events_local_file);
+ }
if (limit)
atomic64_inc(&p->events[PIDCG_MAX]);
@@ -368,20 +372,68 @@ static s64 pids_peak_read(struct cgroup_subsys_state *css,
return READ_ONCE(pids->watermark);
}
-static int pids_events_show(struct seq_file *sf, void *v)
+static int __pids_events_show(struct seq_file *sf, bool local)
{
struct pids_cgroup *pids = css_pids(seq_css(sf));
enum pidcg_event pe = PIDCG_MAX;
+ atomic64_t *events;
if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
- cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
+ cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS) {
pe = PIDCG_FORKFAIL;
+ local = true;
+ }
+ events = local ? pids->events_local : pids->events;
- seq_printf(sf, "max %lld\n", (s64)atomic64_read(&pids->events[pe]));
+ seq_printf(sf, "max %lld\n", (s64)atomic64_read(&events[pe]));
+ return 0;
+}
+
+static int pids_events_show(struct seq_file *sf, void *v)
+{
+ __pids_events_show(sf, false);
+ return 0;
+}
+
+static int pids_events_local_show(struct seq_file *sf, void *v)
+{
+ __pids_events_show(sf, true);
return 0;
}
static struct cftype pids_files[] = {
+ {
+ .name = "max",
+ .write = pids_max_write,
+ .seq_show = pids_max_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
+ {
+ .name = "current",
+ .read_s64 = pids_current_read,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
+ {
+ .name = "peak",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .read_s64 = pids_peak_read,
+ },
+ {
+ .name = "events",
+ .seq_show = pids_events_show,
+ .file_offset = offsetof(struct pids_cgroup, events_file),
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
+ {
+ .name = "events.local",
+ .seq_show = pids_events_local_show,
+ .file_offset = offsetof(struct pids_cgroup, events_local_file),
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
+ { } /* terminate */
+};
+
+static struct cftype pids_files_legacy[] = {
{
.name = "max",
.write = pids_max_write,
@@ -407,6 +459,7 @@ static struct cftype pids_files[] = {
{ } /* terminate */
};
+
struct cgroup_subsys pids_cgrp_subsys = {
.css_alloc = pids_css_alloc,
.css_free = pids_css_free,
@@ -416,5 +469,6 @@ struct cgroup_subsys pids_cgrp_subsys = {
.cancel_fork = pids_cancel_fork,
.release = pids_release,
.dfl_cftypes = pids_files,
+ .legacy_cftypes = pids_files_legacy,
.threaded = true,
};
--
2.44.0
This will reduce number of conflicts when modifying the lists.
Signed-off-by: Michal Koutný <[email protected]>
---
tools/testing/selftests/cgroup/.gitignore | 10 +++++-----
tools/testing/selftests/cgroup/Makefile | 23 ++++++++++++-----------
2 files changed, 17 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore
index 2732e0b29271..ec635a0ef488 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -1,11 +1,11 @@
# SPDX-License-Identifier: GPL-2.0-only
-test_memcontrol
test_core
-test_freezer
-test_kmem
-test_kill
test_cpu
test_cpuset
-test_zswap
+test_freezer
test_hugetlb_memcg
+test_kill
+test_kmem
+test_memcontrol
+test_zswap
wait_inotify
diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
index 00b441928909..f3e1ef69e88d 100644
--- a/tools/testing/selftests/cgroup/Makefile
+++ b/tools/testing/selftests/cgroup/Makefile
@@ -6,26 +6,27 @@ all: ${HELPER_PROGS}
TEST_FILES := with_stress.sh
TEST_PROGS := test_stress.sh test_cpuset_prs.sh
TEST_GEN_FILES := wait_inotify
-TEST_GEN_PROGS = test_memcontrol
-TEST_GEN_PROGS += test_kmem
-TEST_GEN_PROGS += test_core
-TEST_GEN_PROGS += test_freezer
-TEST_GEN_PROGS += test_kill
+# Keep the lists lexicographically sorted
+TEST_GEN_PROGS = test_core
TEST_GEN_PROGS += test_cpu
TEST_GEN_PROGS += test_cpuset
-TEST_GEN_PROGS += test_zswap
+TEST_GEN_PROGS += test_freezer
TEST_GEN_PROGS += test_hugetlb_memcg
+TEST_GEN_PROGS += test_kill
+TEST_GEN_PROGS += test_kmem
+TEST_GEN_PROGS += test_memcontrol
+TEST_GEN_PROGS += test_zswap
LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h
include ../lib.mk
-$(OUTPUT)/test_memcontrol: cgroup_util.c
-$(OUTPUT)/test_kmem: cgroup_util.c
$(OUTPUT)/test_core: cgroup_util.c
-$(OUTPUT)/test_freezer: cgroup_util.c
-$(OUTPUT)/test_kill: cgroup_util.c
$(OUTPUT)/test_cpu: cgroup_util.c
$(OUTPUT)/test_cpuset: cgroup_util.c
-$(OUTPUT)/test_zswap: cgroup_util.c
+$(OUTPUT)/test_freezer: cgroup_util.c
$(OUTPUT)/test_hugetlb_memcg: cgroup_util.c
+$(OUTPUT)/test_kill: cgroup_util.c
+$(OUTPUT)/test_kmem: cgroup_util.c
+$(OUTPUT)/test_memcontrol: cgroup_util.c
+$(OUTPUT)/test_zswap: cgroup_util.c
--
2.44.0
This commit adds (and wires in) new test program for checking basic pids
controller functionality -- restricting tasks in a cgroup and correct
event counting.
Signed-off-by: Michal Koutný <[email protected]>
---
tools/testing/selftests/cgroup/.gitignore | 1 +
tools/testing/selftests/cgroup/Makefile | 2 +
tools/testing/selftests/cgroup/test_pids.c | 178 +++++++++++++++++++++
3 files changed, 181 insertions(+)
create mode 100644 tools/testing/selftests/cgroup/test_pids.c
diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore
index ec635a0ef488..952e4448bf07 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -7,5 +7,6 @@ test_hugetlb_memcg
test_kill
test_kmem
test_memcontrol
+test_pids
test_zswap
wait_inotify
diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
index f3e1ef69e88d..f5f0886a2c4a 100644
--- a/tools/testing/selftests/cgroup/Makefile
+++ b/tools/testing/selftests/cgroup/Makefile
@@ -15,6 +15,7 @@ TEST_GEN_PROGS += test_hugetlb_memcg
TEST_GEN_PROGS += test_kill
TEST_GEN_PROGS += test_kmem
TEST_GEN_PROGS += test_memcontrol
+TEST_GEN_PROGS += test_pids
TEST_GEN_PROGS += test_zswap
LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h
@@ -29,4 +30,5 @@ $(OUTPUT)/test_hugetlb_memcg: cgroup_util.c
$(OUTPUT)/test_kill: cgroup_util.c
$(OUTPUT)/test_kmem: cgroup_util.c
$(OUTPUT)/test_memcontrol: cgroup_util.c
+$(OUTPUT)/test_pids: cgroup_util.c
$(OUTPUT)/test_zswap: cgroup_util.c
diff --git a/tools/testing/selftests/cgroup/test_pids.c b/tools/testing/selftests/cgroup/test_pids.c
new file mode 100644
index 000000000000..61f939c9bc24
--- /dev/null
+++ b/tools/testing/selftests/cgroup/test_pids.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <linux/limits.h>
+#include <signal.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+#include "cgroup_util.h"
+
+static int run_success(const char *cgroup, void *arg)
+{
+ return 0;
+}
+
+static int run_pause(const char *cgroup, void *arg)
+{
+ return pause();
+}
+
+/*
+ * This test checks that pids.max prevents forking new children above the
+ * specified limit in the cgroup.
+ */
+static int test_pids_max(const char *root)
+{
+ int ret = KSFT_FAIL;
+ char *cg_pids;
+ int pid;
+
+ cg_pids = cg_name(root, "pids_test");
+ if (!cg_pids)
+ goto cleanup;
+
+ if (cg_create(cg_pids))
+ goto cleanup;
+
+ if (cg_read_strcmp(cg_pids, "pids.max", "max\n"))
+ goto cleanup;
+
+ if (cg_write(cg_pids, "pids.max", "2"))
+ goto cleanup;
+
+ if (cg_enter_current(cg_pids))
+ goto cleanup;
+
+ pid = cg_run_nowait(cg_pids, run_pause, NULL);
+ if (pid < 0)
+ goto cleanup;
+
+ if (cg_run_nowait(cg_pids, run_success, NULL) != -1 || errno != EAGAIN)
+ goto cleanup;
+
+ if (kill(pid, SIGINT))
+ goto cleanup;
+
+ ret = KSFT_PASS;
+
+cleanup:
+ cg_enter_current(root);
+ cg_destroy(cg_pids);
+ free(cg_pids);
+
+ return ret;
+}
+
+/*
+ * This test checks that pids.events are counted in cgroup associated with pids.max
+ */
+static int test_pids_events(const char *root)
+{
+ int ret = KSFT_FAIL;
+ char *cg_parent = NULL, *cg_child = NULL;
+ int pid;
+
+ cg_parent = cg_name(root, "pids_parent");
+ cg_child = cg_name(cg_parent, "pids_child");
+ if (!cg_parent || !cg_child)
+ goto cleanup;
+
+ if (cg_create(cg_parent))
+ goto cleanup;
+ if (cg_write(cg_parent, "cgroup.subtree_control", "+pids"))
+ goto cleanup;
+ if (cg_create(cg_child))
+ goto cleanup;
+
+ if (cg_write(cg_parent, "pids.max", "2"))
+ goto cleanup;
+
+ if (cg_read_strcmp(cg_child, "pids.max", "max\n"))
+ goto cleanup;
+
+ if (cg_enter_current(cg_child))
+ goto cleanup;
+
+ pid = cg_run_nowait(cg_child, run_pause, NULL);
+ if (pid < 0)
+ goto cleanup;
+
+ if (cg_run_nowait(cg_child, run_success, NULL) != -1 || errno != EAGAIN)
+ goto cleanup;
+
+ if (kill(pid, SIGINT))
+ goto cleanup;
+
+ if (cg_read_key_long(cg_child, "pids.events", "max ") != 0)
+ goto cleanup;
+ if (cg_read_key_long(cg_parent, "pids.events", "max ") != 1)
+ goto cleanup;
+
+
+ ret = KSFT_PASS;
+
+cleanup:
+ cg_enter_current(root);
+ if (cg_child)
+ cg_destroy(cg_child);
+ if (cg_parent)
+ cg_destroy(cg_parent);
+ free(cg_child);
+ free(cg_parent);
+
+ return ret;
+}
+
+
+
+#define T(x) { x, #x }
+struct pids_test {
+ int (*fn)(const char *root);
+ const char *name;
+} tests[] = {
+ T(test_pids_max),
+ T(test_pids_events),
+};
+#undef T
+
+int main(int argc, char **argv)
+{
+ char root[PATH_MAX];
+
+ ksft_print_header();
+ ksft_set_plan(ARRAY_SIZE(tests));
+ if (cg_find_unified_root(root, sizeof(root)))
+ ksft_exit_skip("cgroup v2 isn't mounted\n");
+
+ /*
+ * Check that pids controller is available:
+ * pids is listed in cgroup.controllers
+ */
+ if (cg_read_strstr(root, "cgroup.controllers", "pids"))
+ ksft_exit_skip("pids controller isn't available\n");
+
+ if (cg_read_strstr(root, "cgroup.subtree_control", "pids"))
+ if (cg_write(root, "cgroup.subtree_control", "+pids"))
+ ksft_exit_skip("Failed to set pids controller\n");
+
+ for (int i = 0; i < ARRAY_SIZE(tests); i++) {
+ switch (tests[i].fn(root)) {
+ case KSFT_PASS:
+ ksft_test_result_pass("%s\n", tests[i].name);
+ break;
+ case KSFT_SKIP:
+ ksft_test_result_skip("%s\n", tests[i].name);
+ break;
+ default:
+ ksft_test_result_fail("%s\n", tests[i].name);
+ break;
+ }
+ }
+
+ ksft_finished();
+}
--
2.44.0
Atomic counters are in kzalloc'd struct. They are zeroed already and
atomic64_t does not need special initialization
(cf kernel/trace/trace_clock.c:trace_counter).
Signed-off-by: Michal Koutný <[email protected]>
---
kernel/cgroup/pids.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 7695e60bcb40..0e5ec7d59b4d 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -75,9 +75,7 @@ pids_css_alloc(struct cgroup_subsys_state *parent)
if (!pids)
return ERR_PTR(-ENOMEM);
- atomic64_set(&pids->counter, 0);
atomic64_set(&pids->limit, PIDS_MAX);
- atomic64_set(&pids->events_limit, 0);
return &pids->css;
}
--
2.44.0
On Tue, Apr 16, 2024 at 04:20:09PM +0200, Michal Koutn? wrote:
> Atomic counters are in kzalloc'd struct. They are zeroed already and
> atomic64_t does not need special initialization
> (cf?kernel/trace/trace_clock.c:trace_counter).
>
> Signed-off-by: Michal Koutn? <[email protected]>
Applied to cgroup/for-6.10.
Thanks.
--
tejun
Hello,
On Tue, Apr 16, 2024 at 04:20:10PM +0200, Michal Koutn? wrote:
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 17e6e9565156..108b03dfb26a 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -239,6 +239,10 @@ cgroup v2 currently supports the following mount options.
> will not be tracked by the memory controller (even if cgroup
> v2 is remounted later on).
>
> + pid_localevents
> + Represent fork failures inside cgroup's pids.events:max (not its limit
> + being hit).
It might be useful to be more verbose with the explanation. I'm afraid the
above may be a bit difficult to understand if one doesn't already know what
it's about.
> @@ -379,7 +401,6 @@ struct cgroup_subsys pids_cgrp_subsys = {
> .can_fork = pids_can_fork,
> .cancel_fork = pids_cancel_fork,
> .release = pids_release,
> - .legacy_cftypes = pids_files,
Hmmm.... doesn't this remove all pids files from cgroup1?
Thanks.
--
tejun
On Tue, Apr 16, 2024 at 04:20:12PM +0200, Michal Koutn? wrote:
> struct cgroup_subsys pids_cgrp_subsys = {
> .css_alloc = pids_css_alloc,
> .css_free = pids_css_free,
> @@ -416,5 +469,6 @@ struct cgroup_subsys pids_cgrp_subsys = {
> .cancel_fork = pids_cancel_fork,
> .release = pids_release,
> .dfl_cftypes = pids_files,
> + .legacy_cftypes = pids_files_legacy,
Ah, you restore it here. I see what you're doing now. It may be better to
reorder patches so that .local is added first or just keep the legacy file
behavior temporarily altered than removing them altogether, but this isn't
the end of the world either. Can you please explicitly note what you're
doing in the commit message?
Thanks.
--
tejun
On Tue, Apr 16, 2024 at 04:20:08PM +0200, Michal Koutn? wrote:
> This makes pids.events:max affine to pids.max limit.
>
> How are the new events supposed to be useful?
>
> - pids.events.local:max
> - tells that cgroup's limit is hit (too tight?)
> - pids.events:*
> - "only" directs top-down search to cgroups of interest
Generally look great to me. If you resend with the couple nits addressed,
I'll apply the rest of the series.
Thanks.
--
tejun