This patchset implements a cgroup bpf auto-detachment functionality:
bpf programs are attached as soon as possible after removal of the
cgroup, without waiting for the release of all associated resources.
Patches 2 and 3 are required to implement a corresponding kselftest
in patch 4.
v2:
1) removed a bogus check in patch 4
2) moved buf[len] = 0 in patch 2
Roman Gushchin (4):
bpf: decouple the lifetime of cgroup_bpf from cgroup itself
selftests/bpf: convert test_cgrp2_attach2 example into kselftest
selftests/bpf: enable all available cgroup v2 controllers
selftests/bpf: add auto-detach test
include/linux/bpf-cgroup.h | 8 +-
include/linux/cgroup.h | 18 +++
kernel/bpf/cgroup.c | 25 ++-
kernel/cgroup/cgroup.c | 11 +-
samples/bpf/Makefile | 2 -
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/cgroup_helpers.c | 57 +++++++
.../selftests/bpf/test_cgroup_attach.c | 145 ++++++++++++++++--
8 files changed, 243 insertions(+), 27 deletions(-)
rename samples/bpf/test_cgrp2_attach2.c => tools/testing/selftests/bpf/test_cgroup_attach.c (79%)
--
2.20.1
Add a kselftest to cover bpf auto-detachment functionality.
The test creates a cgroup, associates some resources with it,
attaches a couple of bpf programs and deletes the cgroup.
Then it checks that bpf programs are going away in 5 seconds.
Expected output:
$ ./test_cgroup_attach
#override:PASS
#multi:PASS
#autodetach:PASS
test_cgroup_attach:PASS
On a kernel without auto-detaching:
$ ./test_cgroup_attach
#override:PASS
#multi:PASS
#autodetach:FAIL
test_cgroup_attach:FAIL
Signed-off-by: Roman Gushchin <[email protected]>
---
.../selftests/bpf/test_cgroup_attach.c | 99 ++++++++++++++++++-
1 file changed, 98 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_cgroup_attach.c b/tools/testing/selftests/bpf/test_cgroup_attach.c
index 93d4fe295e7d..bc5bd0f1728e 100644
--- a/tools/testing/selftests/bpf/test_cgroup_attach.c
+++ b/tools/testing/selftests/bpf/test_cgroup_attach.c
@@ -456,9 +456,106 @@ static int test_multiprog(void)
return rc;
}
+static int test_autodetach(void)
+{
+ __u32 prog_cnt = 4, attach_flags;
+ int allow_prog[2] = {0};
+ __u32 prog_ids[2] = {0};
+ int cg = 0, i, rc = -1;
+ void *ptr = NULL;
+ int attempts;
+
+
+ for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
+ allow_prog[i] = prog_load_cnt(1, 1 << i);
+ if (!allow_prog[i])
+ goto err;
+ }
+
+ if (setup_cgroup_environment())
+ goto err;
+
+ /* create a cgroup, attach two programs and remember their ids */
+ cg = create_and_get_cgroup("/cg_autodetach");
+ if (cg < 0)
+ goto err;
+
+ if (join_cgroup("/cg_autodetach"))
+ goto err;
+
+ for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
+ if (bpf_prog_attach(allow_prog[i], cg, BPF_CGROUP_INET_EGRESS,
+ BPF_F_ALLOW_MULTI)) {
+ log_err("Attaching prog[%d] to cg:egress", i);
+ goto err;
+ }
+ }
+
+ /* make sure that programs are attached and run some traffic */
+ assert(bpf_prog_query(cg, BPF_CGROUP_INET_EGRESS, 0, &attach_flags,
+ prog_ids, &prog_cnt) == 0);
+ assert(system(PING_CMD) == 0);
+
+ /* allocate some memory (4Mb) to pin the original cgroup */
+ ptr = malloc(4 * (1 << 20));
+ if (!ptr)
+ goto err;
+
+ /* close programs and cgroup fd */
+ for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
+ close(allow_prog[i]);
+ allow_prog[i] = 0;
+ }
+
+ close(cg);
+ cg = 0;
+
+ /* leave the cgroup and remove it. don't detach programs */
+ cleanup_cgroup_environment();
+
+ /* wait for the asynchronous auto-detachment.
+ * wait for no more than 5 sec and give up.
+ */
+ for (i = 0; i < ARRAY_SIZE(prog_ids); i++) {
+ for (attempts = 5; attempts >= 0; attempts--) {
+ int fd = bpf_prog_get_fd_by_id(prog_ids[i]);
+
+ if (fd < 0)
+ break;
+
+ /* don't leave the fd open */
+ close(fd);
+
+ if (!attempts)
+ goto err;
+
+ sleep(1);
+ }
+ }
+
+ rc = 0;
+err:
+ for (i = 0; i < ARRAY_SIZE(allow_prog); i++)
+ if (allow_prog[i] > 0)
+ close(allow_prog[i]);
+ if (cg)
+ close(cg);
+ free(ptr);
+ cleanup_cgroup_environment();
+ if (!rc)
+ printf("#autodetach:PASS\n");
+ else
+ printf("#autodetach:FAIL\n");
+ return rc;
+}
+
int main(int argc, char **argv)
{
- int (*tests[])(void) = {test_foo_bar, test_multiprog};
+ int (*tests[])(void) = {
+ test_foo_bar,
+ test_multiprog,
+ test_autodetach,
+ };
int errors = 0;
int i;
--
2.20.1
Enable all available cgroup v2 controllers when setting up
the environment for the bpf kselftests. It's required to properly test
the bpf prog auto-detach feature. Also it will generally increase
the code coverage.
Signed-off-by: Roman Gushchin <[email protected]>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 57 ++++++++++++++++++++
1 file changed, 57 insertions(+)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index 6692a40a6979..4efe57c171cd 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -33,6 +33,60 @@
snprintf(buf, sizeof(buf), "%s%s%s", CGROUP_MOUNT_PATH, \
CGROUP_WORK_DIR, path)
+/**
+ * enable_all_controllers() - Enable all available cgroup v2 controllers
+ *
+ * Enable all available cgroup v2 controllers in order to increase
+ * the code coverage.
+ *
+ * If successful, 0 is returned.
+ */
+int enable_all_controllers(char *cgroup_path)
+{
+ char path[PATH_MAX + 1];
+ char buf[PATH_MAX];
+ char *c, *c2;
+ int fd, cfd;
+ size_t len;
+
+ snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
+ fd = open(path, O_RDONLY);
+ if (fd < 0) {
+ log_err("Opening cgroup.controllers: %s", path);
+ return -1;
+ }
+
+ len = read(fd, buf, sizeof(buf) - 1);
+ if (len < 0) {
+ close(fd);
+ log_err("Reading cgroup.controllers: %s", path);
+ return -1;
+ }
+ buf[len] = 0;
+ close(fd);
+
+ /* No controllers available? We're probably on cgroup v1. */
+ if (len == 0)
+ return 0;
+
+ snprintf(path, sizeof(path), "%s/cgroup.subtree_control", cgroup_path);
+ cfd = open(path, O_RDWR);
+ if (cfd < 0) {
+ log_err("Opening cgroup.subtree_control: %s", path);
+ return -1;
+ }
+
+ for (c = strtok_r(buf, " ", &c2); c; c = strtok_r(NULL, " ", &c2)) {
+ if (dprintf(cfd, "+%s\n", c) <= 0) {
+ log_err("Enabling controller %s: %s", c, path);
+ close(cfd);
+ return -1;
+ }
+ }
+ close(cfd);
+ return 0;
+}
+
/**
* setup_cgroup_environment() - Setup the cgroup environment
*
@@ -71,6 +125,9 @@ int setup_cgroup_environment(void)
return 1;
}
+ if (enable_all_controllers(cgroup_workdir))
+ return 1;
+
return 0;
}
--
2.20.1
Currently the lifetime of bpf programs attached to a cgroup is bound
to the lifetime of the cgroup itself. It means that if a user
forgets (or intentionally avoids) to detach a bpf program before
removing the cgroup, it will stay attached up to the release of the
cgroup. Since the cgroup can stay in the dying state (the state
between being rmdir()'ed and being released) for a very long time, it
leads to a waste of memory. Also, it blocks a possibility to implement
the memcg-based memory accounting for bpf objects, because a circular
reference dependency will occur. Charged memory pages are pinning the
corresponding memory cgroup, and if the memory cgroup is pinning
the attached bpf program, nothing will be ever released.
A dying cgroup can not contain any processes, so the only chance for
an attached bpf program to be executed is a live socket associated
with the cgroup. So in order to release all bpf data early, let's
count associated sockets using a new percpu refcounter. On cgroup
removal the counter is transitioned to the atomic mode, and as soon
as it reaches 0, all bpf programs are detached.
The reference counter is not socket specific, and can be used for any
other types of programs, which can be executed from a cgroup-bpf hook
outside of the process context, had such a need arise in the future.
Signed-off-by: Roman Gushchin <[email protected]>
Cc: [email protected]
---
include/linux/bpf-cgroup.h | 8 ++++++--
include/linux/cgroup.h | 18 ++++++++++++++++++
kernel/bpf/cgroup.c | 25 ++++++++++++++++++++++---
kernel/cgroup/cgroup.c | 11 ++++++++---
4 files changed, 54 insertions(+), 8 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index cb3c6b3b89c8..a0945de9ba5f 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -6,6 +6,7 @@
#include <linux/errno.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
+#include <linux/percpu-refcount.h>
#include <linux/rbtree.h>
#include <uapi/linux/bpf.h>
@@ -72,10 +73,13 @@ struct cgroup_bpf {
/* temp storage for effective prog array used by prog_attach/detach */
struct bpf_prog_array __rcu *inactive;
+
+ /* reference counter used to detach bpf programs after cgroup removal */
+ struct percpu_ref refcnt;
};
-void cgroup_bpf_put(struct cgroup *cgrp);
int cgroup_bpf_inherit(struct cgroup *cgrp);
+void cgroup_bpf_offline(struct cgroup *cgrp);
int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
enum bpf_attach_type type, u32 flags);
@@ -283,8 +287,8 @@ int cgroup_bpf_prog_query(const union bpf_attr *attr,
struct bpf_prog;
struct cgroup_bpf {};
-static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
+static inline void cgroup_bpf_offline(struct cgroup *cgrp) {}
static inline int cgroup_bpf_prog_attach(const union bpf_attr *attr,
enum bpf_prog_type ptype,
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c0077adeea83..49e8facf7c4a 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -924,4 +924,22 @@ static inline bool cgroup_task_frozen(struct task_struct *task)
#endif /* !CONFIG_CGROUPS */
+#ifdef CONFIG_CGROUP_BPF
+static inline void cgroup_bpf_get(struct cgroup *cgrp)
+{
+ percpu_ref_get(&cgrp->bpf.refcnt);
+}
+
+static inline void cgroup_bpf_put(struct cgroup *cgrp)
+{
+ percpu_ref_put(&cgrp->bpf.refcnt);
+}
+
+#else /* CONFIG_CGROUP_BPF */
+
+static inline void cgroup_bpf_get(struct cgroup *cgrp) {}
+static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
+
+#endif /* CONFIG_CGROUP_BPF */
+
#endif /* _LINUX_CGROUP_H */
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index fcde0f7b2585..65f5c482ed9d 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -22,12 +22,20 @@
DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
EXPORT_SYMBOL(cgroup_bpf_enabled_key);
+void cgroup_bpf_offline(struct cgroup *cgrp)
+{
+ cgroup_get(cgrp);
+ percpu_ref_kill(&cgrp->bpf.refcnt);
+}
+
/**
- * cgroup_bpf_put() - put references of all bpf programs
+ * cgroup_bpf_release() - put references of all bpf programs and
+ * release all cgroup bpf data
* @cgrp: the cgroup to modify
*/
-void cgroup_bpf_put(struct cgroup *cgrp)
+static void cgroup_bpf_release(struct percpu_ref *ref)
{
+ struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt);
enum bpf_cgroup_storage_type stype;
unsigned int type;
@@ -47,6 +55,9 @@ void cgroup_bpf_put(struct cgroup *cgrp)
}
bpf_prog_array_free(cgrp->bpf.effective[type]);
}
+
+ percpu_ref_exit(&cgrp->bpf.refcnt);
+ cgroup_put(cgrp);
}
/* count number of elements in the list.
@@ -167,7 +178,12 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
*/
#define NR ARRAY_SIZE(cgrp->bpf.effective)
struct bpf_prog_array __rcu *arrays[NR] = {};
- int i;
+ int ret, i;
+
+ ret = percpu_ref_init(&cgrp->bpf.refcnt, cgroup_bpf_release, 0,
+ GFP_KERNEL);
+ if (ret)
+ return -ENOMEM;
for (i = 0; i < NR; i++)
INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
@@ -183,6 +199,9 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
cleanup:
for (i = 0; i < NR; i++)
bpf_prog_array_free(arrays[i]);
+
+ percpu_ref_exit(&cgrp->bpf.refcnt);
+
return -ENOMEM;
}
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 217cec4e22c6..ef9cfbfc82a9 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4955,8 +4955,6 @@ static void css_release_work_fn(struct work_struct *work)
if (cgrp->kn)
RCU_INIT_POINTER(*(void __rcu __force **)&cgrp->kn->priv,
NULL);
-
- cgroup_bpf_put(cgrp);
}
mutex_unlock(&cgroup_mutex);
@@ -5482,6 +5480,8 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
cgroup1_check_for_release(parent);
+ cgroup_bpf_offline(cgrp);
+
/* put the base reference */
percpu_ref_kill(&cgrp->self.refcnt);
@@ -6221,6 +6221,7 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
* Don't use cgroup_get_live().
*/
cgroup_get(sock_cgroup_ptr(skcd));
+ cgroup_bpf_get(sock_cgroup_ptr(skcd));
return;
}
@@ -6232,6 +6233,7 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
cset = task_css_set(current);
if (likely(cgroup_tryget(cset->dfl_cgrp))) {
skcd->val = (unsigned long)cset->dfl_cgrp;
+ cgroup_bpf_get(cset->dfl_cgrp);
break;
}
cpu_relax();
@@ -6242,7 +6244,10 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
void cgroup_sk_free(struct sock_cgroup_data *skcd)
{
- cgroup_put(sock_cgroup_ptr(skcd));
+ struct cgroup *cgrp = sock_cgroup_ptr(skcd);
+
+ cgroup_bpf_put(cgrp);
+ cgroup_put(cgrp);
}
#endif /* CONFIG_SOCK_CGROUP_DATA */
--
2.20.1
On 5/22/19 4:20 PM, Roman Gushchin wrote:
> This patchset implements a cgroup bpf auto-detachment functionality:
> bpf programs are attached as soon as possible after removal of the
typo here "attached" => "detached"?
> cgroup, without waiting for the release of all associated resources.
>
> Patches 2 and 3 are required to implement a corresponding kselftest
> in patch 4.
>
> v2:
> 1) removed a bogus check in patch 4
> 2) moved buf[len] = 0 in patch 2
>
>
> Roman Gushchin (4):
> bpf: decouple the lifetime of cgroup_bpf from cgroup itself
> selftests/bpf: convert test_cgrp2_attach2 example into kselftest
> selftests/bpf: enable all available cgroup v2 controllers
> selftests/bpf: add auto-detach test
>
> include/linux/bpf-cgroup.h | 8 +-
> include/linux/cgroup.h | 18 +++
> kernel/bpf/cgroup.c | 25 ++-
> kernel/cgroup/cgroup.c | 11 +-
> samples/bpf/Makefile | 2 -
> tools/testing/selftests/bpf/Makefile | 4 +-
> tools/testing/selftests/bpf/cgroup_helpers.c | 57 +++++++
> .../selftests/bpf/test_cgroup_attach.c | 145 ++++++++++++++++--
> 8 files changed, 243 insertions(+), 27 deletions(-)
> rename samples/bpf/test_cgrp2_attach2.c => tools/testing/selftests/bpf/test_cgroup_attach.c (79%)
>
On 5/22/19 4:20 PM, Roman Gushchin wrote:
> Currently the lifetime of bpf programs attached to a cgroup is bound
> to the lifetime of the cgroup itself. It means that if a user
> forgets (or intentionally avoids) to detach a bpf program before
> removing the cgroup, it will stay attached up to the release of the
> cgroup. Since the cgroup can stay in the dying state (the state
> between being rmdir()'ed and being released) for a very long time, it
> leads to a waste of memory. Also, it blocks a possibility to implement
> the memcg-based memory accounting for bpf objects, because a circular
> reference dependency will occur. Charged memory pages are pinning the
> corresponding memory cgroup, and if the memory cgroup is pinning
> the attached bpf program, nothing will be ever released.
>
> A dying cgroup can not contain any processes, so the only chance for
> an attached bpf program to be executed is a live socket associated
> with the cgroup. So in order to release all bpf data early, let's
> count associated sockets using a new percpu refcounter. On cgroup
> removal the counter is transitioned to the atomic mode, and as soon
> as it reaches 0, all bpf programs are detached.
>
> The reference counter is not socket specific, and can be used for any
> other types of programs, which can be executed from a cgroup-bpf hook
> outside of the process context, had such a need arise in the future.
>
> Signed-off-by: Roman Gushchin <[email protected]>
> Cc: [email protected]
The logic looks sound to me. With one nit below,
Acked-by: Yonghong Song <[email protected]>
> ---
> include/linux/bpf-cgroup.h | 8 ++++++--
> include/linux/cgroup.h | 18 ++++++++++++++++++
> kernel/bpf/cgroup.c | 25 ++++++++++++++++++++++---
> kernel/cgroup/cgroup.c | 11 ++++++++---
> 4 files changed, 54 insertions(+), 8 deletions(-)
>
[...]
> @@ -167,7 +178,12 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
> */
> #define NR ARRAY_SIZE(cgrp->bpf.effective)
> struct bpf_prog_array __rcu *arrays[NR] = {};
> - int i;
> + int ret, i;
> +
> + ret = percpu_ref_init(&cgrp->bpf.refcnt, cgroup_bpf_release, 0,
> + GFP_KERNEL);
> + if (ret)
> + return -ENOMEM;
Maybe return "ret" here instead of -ENOMEM. Currently, percpu_ref_init
only return error code is -ENOMEM. But in the future, it could
change?
>
> for (i = 0; i < NR; i++)
> INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
> @@ -183,6 +199,9 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
> cleanup:
> for (i = 0; i < NR; i++)
> bpf_prog_array_free(arrays[i]);
> +
> + percpu_ref_exit(&cgrp->bpf.refcnt);
> +
> return -ENOMEM;
> }
>
[...]
On 5/22/19 4:20 PM, Roman Gushchin wrote:
> Enable all available cgroup v2 controllers when setting up
> the environment for the bpf kselftests. It's required to properly test
> the bpf prog auto-detach feature. Also it will generally increase
> the code coverage.
>
> Signed-off-by: Roman Gushchin <[email protected]>
Looks good to me. Ack with one nit below.
Acked-by: Yonghong Song <[email protected]>
> ---
> tools/testing/selftests/bpf/cgroup_helpers.c | 57 ++++++++++++++++++++
> 1 file changed, 57 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
> index 6692a40a6979..4efe57c171cd 100644
> --- a/tools/testing/selftests/bpf/cgroup_helpers.c
> +++ b/tools/testing/selftests/bpf/cgroup_helpers.c
> @@ -33,6 +33,60 @@
> snprintf(buf, sizeof(buf), "%s%s%s", CGROUP_MOUNT_PATH, \
> CGROUP_WORK_DIR, path)
>
> +/**
> + * enable_all_controllers() - Enable all available cgroup v2 controllers
> + *
> + * Enable all available cgroup v2 controllers in order to increase
> + * the code coverage.
> + *
> + * If successful, 0 is returned.
> + */
> +int enable_all_controllers(char *cgroup_path)
> +{
> + char path[PATH_MAX + 1];
> + char buf[PATH_MAX];
> + char *c, *c2;
> + int fd, cfd;
> + size_t len;
> +
> + snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
> + fd = open(path, O_RDONLY);
> + if (fd < 0) {
> + log_err("Opening cgroup.controllers: %s", path);
> + return -1;
It looks like either -1 or 1 could be returned to indicate an error
in this file. Maybe, at least for the consistency of this file,
always returning 1 is preferred as setup_cgroup_environment()
has the following comments:
* This function will print an error to stderr and return 1 if it is unable
* to setup the cgroup environment. If setup is successful, 0 is returned.
> + }
> +
> + len = read(fd, buf, sizeof(buf) - 1);
> + if (len < 0) {
> + close(fd);
> + log_err("Reading cgroup.controllers: %s", path);
> + return -1;
> + }
> + buf[len] = 0;
> + close(fd);
> +
> + /* No controllers available? We're probably on cgroup v1. */
> + if (len == 0)
> + return 0;
> +
> + snprintf(path, sizeof(path), "%s/cgroup.subtree_control", cgroup_path);
> + cfd = open(path, O_RDWR);
> + if (cfd < 0) {
> + log_err("Opening cgroup.subtree_control: %s", path);
> + return -1;
> + }
> +
> + for (c = strtok_r(buf, " ", &c2); c; c = strtok_r(NULL, " ", &c2)) {
> + if (dprintf(cfd, "+%s\n", c) <= 0) {
> + log_err("Enabling controller %s: %s", c, path);
> + close(cfd);
> + return -1;
> + }
> + }
> + close(cfd);
> + return 0;
> +}
> +
> /**
> * setup_cgroup_environment() - Setup the cgroup environment
> *
> @@ -71,6 +125,9 @@ int setup_cgroup_environment(void)
> return 1;
> }
>
> + if (enable_all_controllers(cgroup_workdir))
> + return 1;
> +
> return 0;
> }
>
>
On 5/22/19 4:20 PM, Roman Gushchin wrote:
> Add a kselftest to cover bpf auto-detachment functionality.
> The test creates a cgroup, associates some resources with it,
> attaches a couple of bpf programs and deletes the cgroup.
>
> Then it checks that bpf programs are going away in 5 seconds.
>
> Expected output:
> $ ./test_cgroup_attach
> #override:PASS
> #multi:PASS
> #autodetach:PASS
> test_cgroup_attach:PASS
>
> On a kernel without auto-detaching:
> $ ./test_cgroup_attach
> #override:PASS
> #multi:PASS
> #autodetach:FAIL
> test_cgroup_attach:FAIL
I ran this problem without both old and new kernels and
both get all PASSes. My testing environment is a VM.
Could you specify how to trigger the above failure?
>
> Signed-off-by: Roman Gushchin <[email protected]>
> ---
> .../selftests/bpf/test_cgroup_attach.c | 99 ++++++++++++++++++-
> 1 file changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_cgroup_attach.c b/tools/testing/selftests/bpf/test_cgroup_attach.c
> index 93d4fe295e7d..bc5bd0f1728e 100644
> --- a/tools/testing/selftests/bpf/test_cgroup_attach.c
> +++ b/tools/testing/selftests/bpf/test_cgroup_attach.c
> @@ -456,9 +456,106 @@ static int test_multiprog(void)
> return rc;
> }
>
> +static int test_autodetach(void)
> +{
> + __u32 prog_cnt = 4, attach_flags;
> + int allow_prog[2] = {0};
> + __u32 prog_ids[2] = {0};
> + int cg = 0, i, rc = -1;
> + void *ptr = NULL;
> + int attempts;
> +
> +
> + for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
> + allow_prog[i] = prog_load_cnt(1, 1 << i);
> + if (!allow_prog[i])
> + goto err;
> + }
> +
> + if (setup_cgroup_environment())
> + goto err;
> +
> + /* create a cgroup, attach two programs and remember their ids */
> + cg = create_and_get_cgroup("/cg_autodetach");
> + if (cg < 0)
> + goto err;
> +
> + if (join_cgroup("/cg_autodetach"))
> + goto err;
> +
> + for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
> + if (bpf_prog_attach(allow_prog[i], cg, BPF_CGROUP_INET_EGRESS,
> + BPF_F_ALLOW_MULTI)) {
> + log_err("Attaching prog[%d] to cg:egress", i);
> + goto err;
> + }
> + }
> +
> + /* make sure that programs are attached and run some traffic */
> + assert(bpf_prog_query(cg, BPF_CGROUP_INET_EGRESS, 0, &attach_flags,
> + prog_ids, &prog_cnt) == 0);
> + assert(system(PING_CMD) == 0);
> +
> + /* allocate some memory (4Mb) to pin the original cgroup */
> + ptr = malloc(4 * (1 << 20));
> + if (!ptr)
> + goto err;
> +
> + /* close programs and cgroup fd */
> + for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
> + close(allow_prog[i]);
> + allow_prog[i] = 0;
> + }
> +
> + close(cg);
> + cg = 0;
> +
> + /* leave the cgroup and remove it. don't detach programs */
> + cleanup_cgroup_environment();
> +
> + /* wait for the asynchronous auto-detachment.
> + * wait for no more than 5 sec and give up.
> + */
> + for (i = 0; i < ARRAY_SIZE(prog_ids); i++) {
> + for (attempts = 5; attempts >= 0; attempts--) {
> + int fd = bpf_prog_get_fd_by_id(prog_ids[i]);
> +
> + if (fd < 0)
> + break;
> +
> + /* don't leave the fd open */
> + close(fd);
> +
> + if (!attempts)
> + goto err;
> +
> + sleep(1);
> + }
> + }
> +
> + rc = 0;
> +err:
> + for (i = 0; i < ARRAY_SIZE(allow_prog); i++)
> + if (allow_prog[i] > 0)
> + close(allow_prog[i]);
> + if (cg)
> + close(cg);
> + free(ptr);
> + cleanup_cgroup_environment();
> + if (!rc)
> + printf("#autodetach:PASS\n");
> + else
> + printf("#autodetach:FAIL\n");
> + return rc;
> +}
> +
> int main(int argc, char **argv)
> {
> - int (*tests[])(void) = {test_foo_bar, test_multiprog};
> + int (*tests[])(void) = {
> + test_foo_bar,
> + test_multiprog,
> + test_autodetach,
> + };
> int errors = 0;
> int i;
>
>
On 5/22/19 4:20 PM, Roman Gushchin wrote:
> Add a kselftest to cover bpf auto-detachment functionality.
> The test creates a cgroup, associates some resources with it,
> attaches a couple of bpf programs and deletes the cgroup.
>
> Then it checks that bpf programs are going away in 5 seconds.
>
> Expected output:
> $ ./test_cgroup_attach
> #override:PASS
> #multi:PASS
> #autodetach:PASS
> test_cgroup_attach:PASS
>
> On a kernel without auto-detaching:
> $ ./test_cgroup_attach
> #override:PASS
> #multi:PASS
> #autodetach:FAIL
> test_cgroup_attach:FAIL
>
> Signed-off-by: Roman Gushchin <[email protected]>
> ---
> .../selftests/bpf/test_cgroup_attach.c | 99 ++++++++++++++++++-
> 1 file changed, 98 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_cgroup_attach.c b/tools/testing/selftests/bpf/test_cgroup_attach.c
> index 93d4fe295e7d..bc5bd0f1728e 100644
> --- a/tools/testing/selftests/bpf/test_cgroup_attach.c
> +++ b/tools/testing/selftests/bpf/test_cgroup_attach.c
> @@ -456,9 +456,106 @@ static int test_multiprog(void)
> return rc;
> }
>
> +static int test_autodetach(void)
> +{
> + __u32 prog_cnt = 4, attach_flags;
> + int allow_prog[2] = {0};
> + __u32 prog_ids[2] = {0};
> + int cg = 0, i, rc = -1;
> + void *ptr = NULL;
> + int attempts;
> +
> +
Also extra line here.
> + for (i = 0; i < ARRAY_SIZE(allow_prog); i++) {
> + allow_prog[i] = prog_load_cnt(1, 1 << i);
> + if (!allow_prog[i])
> + goto err;
> + }
> +
> + if (setup_cgroup_environment())
> + goto err;
> +
> + /* create a cgroup, attach two programs and remember their ids */
> + cg = create_and_get_cgroup("/cg_autodetach");
[...]
> +
> int main(int argc, char **argv)
> {
> - int (*tests[])(void) = {test_foo_bar, test_multiprog};
> + int (*tests[])(void) = {
> + test_foo_bar,
> + test_multiprog,
> + test_autodetach,
> + };
> int errors = 0;
> int i;
>
>
On Wed, May 22, 2019 at 10:47:24PM -0700, Yonghong Song wrote:
>
>
> On 5/22/19 4:20 PM, Roman Gushchin wrote:
> > Add a kselftest to cover bpf auto-detachment functionality.
> > The test creates a cgroup, associates some resources with it,
> > attaches a couple of bpf programs and deletes the cgroup.
> >
> > Then it checks that bpf programs are going away in 5 seconds.
> >
> > Expected output:
> > $ ./test_cgroup_attach
> > #override:PASS
> > #multi:PASS
> > #autodetach:PASS
> > test_cgroup_attach:PASS
> >
> > On a kernel without auto-detaching:
> > $ ./test_cgroup_attach
> > #override:PASS
> > #multi:PASS
> > #autodetach:FAIL
> > test_cgroup_attach:FAIL
>
> I ran this problem without both old and new kernels and
> both get all PASSes. My testing environment is a VM.
> Could you specify how to trigger the above failure?
Most likely you're running cgroup v1, so the memory controller
is not enabled on unified hierarchy. You need to pass
"cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1"
as boot time options to run fully on cgroup v2.
But generally speaking, the lifecycle of a dying cgroup is
completely implementation-defined. No guarantees are provided.
So false positives are fine here, and shouldn't be considered as
something bad.
At the end all we want it to detach programs in a reasonable time
after rmdir.
Btw, thank you for the careful review of the patchset. I'll
address your comments, add acks and will send out v3.
Thanks!
On 5/23/19 10:58 AM, Roman Gushchin wrote:
> On Wed, May 22, 2019 at 10:47:24PM -0700, Yonghong Song wrote:
>>
>>
>> On 5/22/19 4:20 PM, Roman Gushchin wrote:
>>> Add a kselftest to cover bpf auto-detachment functionality.
>>> The test creates a cgroup, associates some resources with it,
>>> attaches a couple of bpf programs and deletes the cgroup.
>>>
>>> Then it checks that bpf programs are going away in 5 seconds.
>>>
>>> Expected output:
>>> $ ./test_cgroup_attach
>>> #override:PASS
>>> #multi:PASS
>>> #autodetach:PASS
>>> test_cgroup_attach:PASS
>>>
>>> On a kernel without auto-detaching:
>>> $ ./test_cgroup_attach
>>> #override:PASS
>>> #multi:PASS
>>> #autodetach:FAIL
>>> test_cgroup_attach:FAIL
>>
>> I ran this problem without both old and new kernels and
>> both get all PASSes. My testing environment is a VM.
>> Could you specify how to trigger the above failure?
>
> Most likely you're running cgroup v1, so the memory controller
> is not enabled on unified hierarchy. You need to pass
> "cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1"
> as boot time options to run fully on cgroup v2.
I tested on a cgroup v2 machine and it indeed failed without
the core patch. Thanks!
>
> But generally speaking, the lifecycle of a dying cgroup is
> completely implementation-defined. No guarantees are provided.
> So false positives are fine here, and shouldn't be considered as
> something bad.
>
> At the end all we want it to detach programs in a reasonable time
> after rmdir.
>
> Btw, thank you for the careful review of the patchset. I'll
> address your comments, add acks and will send out v3.
>
> Thanks!
>