2024-05-21 09:22:14

by Michal Koutný

[permalink] [raw]
Subject: [PATCH v5 0/5] pids controller events rework

This makes pids.events:max affine to pids.max limit.

How are the new events supposed to be useful?

- pids.events.local:max
- tells that cgroup's limit is hit (too tight?)
- pids.events:*
- "only" directs top-down search to cgroups of interest

Changes from v4 (https://lore.kernel.org/r/[email protected])
- rebased on cgroup/for-6.10 (rather cgroup/for-next, there's no rush)
- introduce pids_files_legacy at one place (Tejun)
- more descriptive Documentation/ (Tejun)

Changes from v3 (https://lore.kernel.org/r/[email protected])
- use existing functions for TAP output in selftest (Muhammad)
- formatting in selftest (Muhammad)
- remove pids.events:max.imposed event, keep it internal (Johannes)
- allow legacy behavior with a mount option
- detach migration charging patches
- drop RFC prefix

Changes from v2 (https://lore.kernel.org/r/[email protected])
- implemented pids.events.local (Tejun)
- added migration charging

[1] https://lore.kernel.org/r/[email protected]/

Michal Koutný (5):
cgroup/pids: Separate semantics of pids.events related to pids.max
cgroup/pids: Make event counters hierarchical
cgroup/pids: Add pids.events.local
selftests: cgroup: Lexicographic order in Makefile
selftests: cgroup: Add basic tests for pids controller

Documentation/admin-guide/cgroup-v1/pids.rst | 3 +-
Documentation/admin-guide/cgroup-v2.rst | 21 ++-
include/linux/cgroup-defs.h | 7 +-
kernel/cgroup/cgroup.c | 15 +-
kernel/cgroup/pids.c | 129 +++++++++++---
tools/testing/selftests/cgroup/.gitignore | 11 +-
tools/testing/selftests/cgroup/Makefile | 25 +--
tools/testing/selftests/cgroup/test_pids.c | 178 +++++++++++++++++++
8 files changed, 346 insertions(+), 43 deletions(-)
create mode 100644 tools/testing/selftests/cgroup/test_pids.c


base-commit: 21c38a3bd4ee3fb7337d013a638302fb5e5f9dc2
--
2.44.0



2024-05-21 09:22:16

by Michal Koutný

[permalink] [raw]
Subject: [PATCH v5 2/5] cgroup/pids: Make event counters hierarchical

The pids.events file should honor the hierarchy, so make the events
propagate from their origin up to the root on the unified hierarchy. The
legacy behavior remains non-hierarchical.

Signed-off-by: Michal Koutný <[email protected]>
---
Documentation/admin-guide/cgroup-v2.rst | 9 +++--
kernel/cgroup/pids.c | 46 ++++++++++++++++---------
2 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 945ff743a3c9..0b5f77104e8b 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -240,8 +240,11 @@ cgroup v2 currently supports the following mount options.
v2 is remounted later on).

pids_localevents
- Represent fork failures inside cgroup's pids.events:max (v1 behavior),
- not its limit being hit (v2 behavior).
+ The option restores v1-like behavior of pids.events:max, that is only
+ local (inside cgroup proper) fork failures are counted. Without this
+ option pids.events.max represents any pids.max enforcemnt across
+ cgroup's subtree.
+


Organizing Processes and Threads
@@ -2205,7 +2208,7 @@ PID Interface Files
modified event. The following entries are defined.

max
- The number of times the cgroup's number of processes hit the
+ The number of times the cgroup's total number of processes hit the pids.max
limit (see also pids_localevents).

Organisational operations are not blocked by cgroup policies, so it is
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index a557f5c8300b..c09b744d548c 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -238,6 +238,34 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
}
}

+static void pids_event(struct pids_cgroup *pids_forking,
+ struct pids_cgroup *pids_over_limit)
+{
+ struct pids_cgroup *p = pids_forking;
+ bool limit = false;
+
+ for (; parent_pids(p); p = parent_pids(p)) {
+ /* Only log the first time limit is hit. */
+ if (atomic64_inc_return(&p->events[PIDCG_FORKFAIL]) == 1) {
+ pr_info("cgroup: fork rejected by pids controller in ");
+ pr_cont_cgroup_path(p->css.cgroup);
+ pr_cont("\n");
+ }
+ cgroup_file_notify(&p->events_file);
+
+ if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
+ cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
+ break;
+
+ if (p == pids_over_limit)
+ limit = true;
+ if (limit)
+ atomic64_inc(&p->events[PIDCG_MAX]);
+
+ cgroup_file_notify(&p->events_file);
+ }
+}
+
/*
* task_css_check(true) in pids_can_fork() and pids_cancel_fork() relies
* on cgroup_threadgroup_change_begin() held by the copy_process().
@@ -254,23 +282,9 @@ static int pids_can_fork(struct task_struct *task, struct css_set *cset)
css = task_css_check(current, pids_cgrp_id, true);
pids = css_pids(css);
err = pids_try_charge(pids, 1, &pids_over_limit);
- if (err) {
- /* compatibility on v1 where events were notified in leaves. */
- if (!cgroup_subsys_on_dfl(pids_cgrp_subsys))
- pids_over_limit = pids;
-
- /* Only log the first time limit is hit. */
- if (atomic64_inc_return(&pids->events[PIDCG_FORKFAIL]) == 1) {
- pr_info("cgroup: fork rejected by pids controller in ");
- pr_cont_cgroup_path(pids->css.cgroup);
- pr_cont("\n");
- }
- atomic64_inc(&pids_over_limit->events[PIDCG_MAX]);
+ if (err)
+ pids_event(pids, pids_over_limit);

- cgroup_file_notify(&pids->events_file);
- if (pids_over_limit != pids)
- cgroup_file_notify(&pids_over_limit->events_file);
- }
return err;
}

--
2.44.0


2024-05-21 09:22:16

by Michal Koutný

[permalink] [raw]
Subject: [PATCH v5 4/5] selftests: cgroup: Lexicographic order in Makefile

This will reduce number of conflicts when modifying the lists.

Signed-off-by: Michal Koutný <[email protected]>
---
tools/testing/selftests/cgroup/.gitignore | 10 +++++-----
tools/testing/selftests/cgroup/Makefile | 23 ++++++++++++-----------
2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore
index 2732e0b29271..ec635a0ef488 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -1,11 +1,11 @@
# SPDX-License-Identifier: GPL-2.0-only
-test_memcontrol
test_core
-test_freezer
-test_kmem
-test_kill
test_cpu
test_cpuset
-test_zswap
+test_freezer
test_hugetlb_memcg
+test_kill
+test_kmem
+test_memcontrol
+test_zswap
wait_inotify
diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
index 16461dc0ffdf..b91f60f3402c 100644
--- a/tools/testing/selftests/cgroup/Makefile
+++ b/tools/testing/selftests/cgroup/Makefile
@@ -6,26 +6,27 @@ all: ${HELPER_PROGS}
TEST_FILES := with_stress.sh
TEST_PROGS := test_stress.sh test_cpuset_prs.sh test_cpuset_v1_hp.sh
TEST_GEN_FILES := wait_inotify
-TEST_GEN_PROGS = test_memcontrol
-TEST_GEN_PROGS += test_kmem
-TEST_GEN_PROGS += test_core
-TEST_GEN_PROGS += test_freezer
-TEST_GEN_PROGS += test_kill
+# Keep the lists lexicographically sorted
+TEST_GEN_PROGS = test_core
TEST_GEN_PROGS += test_cpu
TEST_GEN_PROGS += test_cpuset
-TEST_GEN_PROGS += test_zswap
+TEST_GEN_PROGS += test_freezer
TEST_GEN_PROGS += test_hugetlb_memcg
+TEST_GEN_PROGS += test_kill
+TEST_GEN_PROGS += test_kmem
+TEST_GEN_PROGS += test_memcontrol
+TEST_GEN_PROGS += test_zswap

LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h

include ../lib.mk

-$(OUTPUT)/test_memcontrol: cgroup_util.c
-$(OUTPUT)/test_kmem: cgroup_util.c
$(OUTPUT)/test_core: cgroup_util.c
-$(OUTPUT)/test_freezer: cgroup_util.c
-$(OUTPUT)/test_kill: cgroup_util.c
$(OUTPUT)/test_cpu: cgroup_util.c
$(OUTPUT)/test_cpuset: cgroup_util.c
-$(OUTPUT)/test_zswap: cgroup_util.c
+$(OUTPUT)/test_freezer: cgroup_util.c
$(OUTPUT)/test_hugetlb_memcg: cgroup_util.c
+$(OUTPUT)/test_kill: cgroup_util.c
+$(OUTPUT)/test_kmem: cgroup_util.c
+$(OUTPUT)/test_memcontrol: cgroup_util.c
+$(OUTPUT)/test_zswap: cgroup_util.c
--
2.44.0


2024-05-21 09:22:56

by Michal Koutný

[permalink] [raw]
Subject: [PATCH v5 5/5] selftests: cgroup: Add basic tests for pids controller

This commit adds (and wires in) new test program for checking basic pids
controller functionality -- restricting tasks in a cgroup and correct
event counting.

Signed-off-by: Michal Koutný <[email protected]>
---
tools/testing/selftests/cgroup/.gitignore | 1 +
tools/testing/selftests/cgroup/Makefile | 2 +
tools/testing/selftests/cgroup/test_pids.c | 178 +++++++++++++++++++++
3 files changed, 181 insertions(+)
create mode 100644 tools/testing/selftests/cgroup/test_pids.c

diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore
index ec635a0ef488..952e4448bf07 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -7,5 +7,6 @@ test_hugetlb_memcg
test_kill
test_kmem
test_memcontrol
+test_pids
test_zswap
wait_inotify
diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
index b91f60f3402c..1b897152bab6 100644
--- a/tools/testing/selftests/cgroup/Makefile
+++ b/tools/testing/selftests/cgroup/Makefile
@@ -15,6 +15,7 @@ TEST_GEN_PROGS += test_hugetlb_memcg
TEST_GEN_PROGS += test_kill
TEST_GEN_PROGS += test_kmem
TEST_GEN_PROGS += test_memcontrol
+TEST_GEN_PROGS += test_pids
TEST_GEN_PROGS += test_zswap

LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h
@@ -29,4 +30,5 @@ $(OUTPUT)/test_hugetlb_memcg: cgroup_util.c
$(OUTPUT)/test_kill: cgroup_util.c
$(OUTPUT)/test_kmem: cgroup_util.c
$(OUTPUT)/test_memcontrol: cgroup_util.c
+$(OUTPUT)/test_pids: cgroup_util.c
$(OUTPUT)/test_zswap: cgroup_util.c
diff --git a/tools/testing/selftests/cgroup/test_pids.c b/tools/testing/selftests/cgroup/test_pids.c
new file mode 100644
index 000000000000..9ecb83c6cc5c
--- /dev/null
+++ b/tools/testing/selftests/cgroup/test_pids.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <linux/limits.h>
+#include <signal.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+#include "cgroup_util.h"
+
+static int run_success(const char *cgroup, void *arg)
+{
+ return 0;
+}
+
+static int run_pause(const char *cgroup, void *arg)
+{
+ return pause();
+}
+
+/*
+ * This test checks that pids.max prevents forking new children above the
+ * specified limit in the cgroup.
+ */
+static int test_pids_max(const char *root)
+{
+ int ret = KSFT_FAIL;
+ char *cg_pids;
+ int pid;
+
+ cg_pids = cg_name(root, "pids_test");
+ if (!cg_pids)
+ goto cleanup;
+
+ if (cg_create(cg_pids))
+ goto cleanup;
+
+ if (cg_read_strcmp(cg_pids, "pids.max", "max\n"))
+ goto cleanup;
+
+ if (cg_write(cg_pids, "pids.max", "2"))
+ goto cleanup;
+
+ if (cg_enter_current(cg_pids))
+ goto cleanup;
+
+ pid = cg_run_nowait(cg_pids, run_pause, NULL);
+ if (pid < 0)
+ goto cleanup;
+
+ if (cg_run_nowait(cg_pids, run_success, NULL) != -1 || errno != EAGAIN)
+ goto cleanup;
+
+ if (kill(pid, SIGINT))
+ goto cleanup;
+
+ ret = KSFT_PASS;
+
+cleanup:
+ cg_enter_current(root);
+ cg_destroy(cg_pids);
+ free(cg_pids);
+
+ return ret;
+}
+
+/*
+ * This test checks that pids.events are counted in cgroup associated with pids.max
+ */
+static int test_pids_events(const char *root)
+{
+ int ret = KSFT_FAIL;
+ char *cg_parent = NULL, *cg_child = NULL;
+ int pid;
+
+ cg_parent = cg_name(root, "pids_parent");
+ cg_child = cg_name(cg_parent, "pids_child");
+ if (!cg_parent || !cg_child)
+ goto cleanup;
+
+ if (cg_create(cg_parent))
+ goto cleanup;
+ if (cg_write(cg_parent, "cgroup.subtree_control", "+pids"))
+ goto cleanup;
+ if (cg_create(cg_child))
+ goto cleanup;
+
+ if (cg_write(cg_parent, "pids.max", "2"))
+ goto cleanup;
+
+ if (cg_read_strcmp(cg_child, "pids.max", "max\n"))
+ goto cleanup;
+
+ if (cg_enter_current(cg_child))
+ goto cleanup;
+
+ pid = cg_run_nowait(cg_child, run_pause, NULL);
+ if (pid < 0)
+ goto cleanup;
+
+ if (cg_run_nowait(cg_child, run_success, NULL) != -1 || errno != EAGAIN)
+ goto cleanup;
+
+ if (kill(pid, SIGINT))
+ goto cleanup;
+
+ if (cg_read_key_long(cg_child, "pids.events", "max ") != 0)
+ goto cleanup;
+ if (cg_read_key_long(cg_parent, "pids.events", "max ") != 1)
+ goto cleanup;
+
+
+ ret = KSFT_PASS;
+
+cleanup:
+ cg_enter_current(root);
+ if (cg_child)
+ cg_destroy(cg_child);
+ if (cg_parent)
+ cg_destroy(cg_parent);
+ free(cg_child);
+ free(cg_parent);
+
+ return ret;
+}
+
+
+
+#define T(x) { x, #x }
+struct pids_test {
+ int (*fn)(const char *root);
+ const char *name;
+} tests[] = {
+ T(test_pids_max),
+ T(test_pids_events),
+};
+#undef T
+
+int main(int argc, char **argv)
+{
+ char root[PATH_MAX];
+
+ ksft_print_header();
+ ksft_set_plan(ARRAY_SIZE(tests));
+ if (cg_find_unified_root(root, sizeof(root), NULL))
+ ksft_exit_skip("cgroup v2 isn't mounted\n");
+
+ /*
+ * Check that pids controller is available:
+ * pids is listed in cgroup.controllers
+ */
+ if (cg_read_strstr(root, "cgroup.controllers", "pids"))
+ ksft_exit_skip("pids controller isn't available\n");
+
+ if (cg_read_strstr(root, "cgroup.subtree_control", "pids"))
+ if (cg_write(root, "cgroup.subtree_control", "+pids"))
+ ksft_exit_skip("Failed to set pids controller\n");
+
+ for (int i = 0; i < ARRAY_SIZE(tests); i++) {
+ switch (tests[i].fn(root)) {
+ case KSFT_PASS:
+ ksft_test_result_pass("%s\n", tests[i].name);
+ break;
+ case KSFT_SKIP:
+ ksft_test_result_skip("%s\n", tests[i].name);
+ break;
+ default:
+ ksft_test_result_fail("%s\n", tests[i].name);
+ break;
+ }
+ }
+
+ ksft_finished();
+}
--
2.44.0


2024-05-26 18:47:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 0/5] pids controller events rework

On Tue, May 21, 2024 at 11:21:25AM +0200, Michal Koutn? wrote:
> Michal Koutn? (5):
> cgroup/pids: Separate semantics of pids.events related to pids.max
> cgroup/pids: Make event counters hierarchical
> cgroup/pids: Add pids.events.local
> selftests: cgroup: Lexicographic order in Makefile
> selftests: cgroup: Add basic tests for pids controller

Applied 1-5 to cgroup/for-6.11.

Shuah, I applied the two selftests patches to the cgroup tree as the new
tests are dependent on the preceding changes. Please let me know if you wish
them to be routed differently.

Thanks.

--
tejun