This is version 5 of task, css_task and css iters support.
--- Changelog ---
v5 -> v4:
Patch 3~4:
* Change the BUILD_BUG_ON check in bpf_iter_task_new and bpf_iter_css_new to avoid
netdev/build_32bit CI error
(https://netdev.bots.linux.dev/static/nipa/790929/13412333/build_32bit/stderr)
Patch 8:
* Initialize skel pointer to fix the LLVM-16 build CI error
(https://github.com/kernel-patches/bpf/actions/runs/6462875618/job/17545170863)
v3 -> v4:https://lore.kernel.org/all/[email protected]/
* Address all the comments from Andrii in patch-3 ~ patch-6
* Collect Tejun's ack
* Add a extra patch to rename bpf_iter_task.c to bpf_iter_tasks.c
* Seperate three BPF program files for selftests (iters_task.c iters_css_task.c iters_css.c)
v2 -> v3:https://lore.kernel.org/lkml/[email protected]/
Patch 1 (cgroup: Prepare for using css_task_iter_*() in BPF)
* Add tj's ack and Alexei's suggest-by.
Patch 2 (bpf: Introduce css_task open-coded iterator kfuncs)
* Use bpf_mem_alloc/bpf_mem_free rather than kzalloc()
* Add KF_TRUSTED_ARGS for bpf_iter_css_task_new (Alexei)
* Move bpf_iter_css_task's definition from uapi/linux/bpf.h to
kernel/bpf/task_iter.c and we can use it from vmlinux.h
* Move bpf_iter_css_task_XXX's declaration from bpf_helpers.h to
bpf_experimental.h
Patch 3 (Introduce task open coded iterator kfuncs)
* Change th API design keep consistent with SEC("iter/task"), support
iterating all threads(BPF_TASK_ITERATE_ALL) and threads of a
specific task (BPF_TASK_ITERATE_THREAD).(Andrii)
* Move bpf_iter_task's definition from uapi/linux/bpf.h to
kernel/bpf/task_iter.c and we can use it from vmlinux.h
* Move bpf_iter_task_XXX's declaration from bpf_helpers.h to
bpf_experimental.h
Patch 4 (Introduce css open-coded iterator kfuncs)
* Change th API design keep consistent with cgroup_iters, reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST
/BPF_CGROUP_ITER_ANCESTORS_UP(Andrii)
* Add KF_TRUSTED_ARGS for bpf_iter_css_new
* Move bpf_iter_css's definition from uapi/linux/bpf.h to
kernel/bpf/task_iter.c and we can use it from vmlinux.h
* Move bpf_iter_css_XXX's declaration from bpf_helpers.h to
bpf_experimental.h
Patch 5 (teach the verifier to enforce css_iter and task_iter in RCU CS)
* Add KF flag KF_RCU_PROTECTED to maintain kfuncs which need RCU CS.(Andrii)
* Consider STACK_ITER when using bpf_for_each_spilled_reg.
Patch 6 (Let bpf_iter_task_new accept null task ptr)
* Add this extra patch to let bpf_iter_task_new accept a 'nullable'
* task pointer(Andrii)
Patch 7 (selftests/bpf: Add tests for open-coded task and css iter)
* Add failure testcase(Alexei)
Changes from v1(https://lore.kernel.org/lkml/[email protected]/):
- Add a pre-patch to make some preparations before supporting css_task
iters.(Alexei)
- Add an allowlist for css_task iters(Alexei)
- Let bpf progs do explicit bpf_rcu_read_lock() when using process
iters and css_descendant iters.(Alexei)
---------------------
In some BPF usage scenarios, it will be useful to iterate the process and
css directly in the BPF program. One of the expected scenarios is
customizable OOM victim selection via BPF[1].
Inspired by Dave's task_vma iter[2], this patchset adds three types of
open-coded iterator kfuncs:
1. bpf_task_iters. It can be used to
1) iterate all process in the system, like for_each_forcess() in kernel.
2) iterate all threads in the system.
3) iterate all threads of a specific task
2. bpf_css_iters. It works like css_task_iter_{start, next, end} and would
be used to iterating tasks/threads under a css.
3. css_iters. It works like css_next_descendant_{pre, post} to iterating all
descendant css.
BPF programs can use these kfuncs directly or through bpf_for_each macro.
link[1]: https://lore.kernel.org/lkml/[email protected]/
link[2]: https://lore.kernel.org/all/[email protected]/
Chuyi Zhou (8):
cgroup: Prepare for using css_task_iter_*() in BPF
bpf: Introduce css_task open-coded iterator kfuncs
bpf: Introduce task open coded iterator kfuncs
bpf: Introduce css open-coded iterator kfuncs
bpf: teach the verifier to enforce css_iter and task_iter in RCU CS
bpf: Let bpf_iter_task_new accept null task ptr
selftests/bpf: rename bpf_iter_task.c to bpf_iter_tasks.c
selftests/bpf: Add tests for open-coded task and css iter
include/linux/bpf_verifier.h | 19 ++-
include/linux/btf.h | 1 +
include/linux/cgroup.h | 12 +-
kernel/bpf/cgroup_iter.c | 59 +++++++
kernel/bpf/helpers.c | 9 +
kernel/bpf/task_iter.c | 138 +++++++++++++++
kernel/bpf/verifier.c | 86 ++++++++--
kernel/cgroup/cgroup.c | 18 +-
.../testing/selftests/bpf/bpf_experimental.h | 19 +++
.../selftests/bpf/prog_tests/bpf_iter.c | 18 +-
.../testing/selftests/bpf/prog_tests/iters.c | 161 ++++++++++++++++++
.../{bpf_iter_task.c => bpf_iter_tasks.c} | 0
tools/testing/selftests/bpf/progs/iters_css.c | 74 ++++++++
.../selftests/bpf/progs/iters_css_task.c | 42 +++++
.../testing/selftests/bpf/progs/iters_task.c | 41 +++++
.../selftests/bpf/progs/iters_task_failure.c | 105 ++++++++++++
16 files changed, 760 insertions(+), 42 deletions(-)
rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
create mode 100644 tools/testing/selftests/bpf/progs/iters_css.c
create mode 100644 tools/testing/selftests/bpf/progs/iters_css_task.c
create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
create mode 100644 tools/testing/selftests/bpf/progs/iters_task_failure.c
--
2.20.1
This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css in open-coded iterator
style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
css_iter can be used to:
1) iterating a sepcific cgroup tree with pre/post/up order
2) iterating cgroup_subsystem in BPF Prog, like
for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
parameters defining iteration order and starting css. Here we also reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.
Signed-off-by: Chuyi Zhou <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Chuyi Zhou <[email protected]>
---
kernel/bpf/cgroup_iter.c | 59 +++++++++++++++++++
kernel/bpf/helpers.c | 3 +
.../testing/selftests/bpf/bpf_experimental.h | 6 ++
3 files changed, 68 insertions(+)
diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
index 810378f04fbc..df2d6b6a5dd8 100644
--- a/kernel/bpf/cgroup_iter.c
+++ b/kernel/bpf/cgroup_iter.c
@@ -294,3 +294,62 @@ static int __init bpf_cgroup_iter_init(void)
}
late_initcall(bpf_cgroup_iter_init);
+
+struct bpf_iter_css {
+ __u64 __opaque[3];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_css_kern {
+ struct cgroup_subsys_state *start;
+ struct cgroup_subsys_state *pos;
+ unsigned int flags;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
+ struct cgroup_subsys_state *start, unsigned int flags)
+{
+ struct bpf_iter_css_kern *kit = (void *)it;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) > sizeof(struct bpf_iter_css));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css));
+
+ kit->start = NULL;
+ switch (flags) {
+ case BPF_CGROUP_ITER_DESCENDANTS_PRE:
+ case BPF_CGROUP_ITER_DESCENDANTS_POST:
+ case BPF_CGROUP_ITER_ANCESTORS_UP:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ kit->start = start;
+ kit->pos = NULL;
+ kit->flags = flags;
+ return 0;
+}
+
+__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it)
+{
+ struct bpf_iter_css_kern *kit = (void *)it;
+
+ if (!kit->start)
+ return NULL;
+
+ switch (kit->flags) {
+ case BPF_CGROUP_ITER_DESCENDANTS_PRE:
+ kit->pos = css_next_descendant_pre(kit->pos, kit->start);
+ break;
+ case BPF_CGROUP_ITER_DESCENDANTS_POST:
+ kit->pos = css_next_descendant_post(kit->pos, kit->start);
+ break;
+ case BPF_CGROUP_ITER_ANCESTORS_UP:
+ kit->pos = kit->pos ? kit->pos->parent : kit->start;
+ }
+
+ return kit->pos;
+}
+
+__bpf_kfunc void bpf_iter_css_destroy(struct bpf_iter_css *it)
+{
+}
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 690763751f6e..6330e37ca8d5 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2558,6 +2558,9 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 1ec82997cce7..9aab609f6edd 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -463,4 +463,10 @@ extern int bpf_iter_task_new(struct bpf_iter_task *it,
extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
+struct bpf_iter_css;
+extern int bpf_iter_css_new(struct bpf_iter_css *it,
+ struct cgroup_subsys_state *start, unsigned int flags) __weak __ksym;
+extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
+extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
+
#endif
--
2.20.1
This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_task in open-coded iterator
style. BPF programs can use these kfuncs or through bpf_for_each macro to
iterate all processes in the system.
The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
accepts a specific task and iterating type which allows:
1. iterating all process in the system(BPF_TASK_ITER_ALL_PROCS)
2. iterating all threads in the system(BPF_TASK_ITER_ALL_THREADS)
3. iterating all threads of a specific task(BPF_TASK_ITER_PROC_THREADS)
Signed-off-by: Chuyi Zhou <[email protected]>
---
kernel/bpf/helpers.c | 3 +
kernel/bpf/task_iter.c | 82 +++++++++++++++++++
.../testing/selftests/bpf/bpf_experimental.h | 5 ++
3 files changed, 90 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index cb24c4a916df..690763751f6e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2555,6 +2555,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 2cfcb4dd8a37..caeddad3d2f1 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -856,6 +856,88 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
bpf_mem_free(&bpf_global_ma, kit->css_it);
}
+struct bpf_iter_task {
+ __u64 __opaque[3];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_task_kern {
+ struct task_struct *task;
+ struct task_struct *pos;
+ unsigned int flags;
+} __attribute__((aligned(8)));
+
+enum {
+ BPF_TASK_ITER_ALL_PROCS,
+ BPF_TASK_ITER_ALL_THREADS,
+ BPF_TASK_ITER_PROC_THREADS
+};
+
+__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
+ struct task_struct *task, unsigned int flags)
+{
+ struct bpf_iter_task_kern *kit = (void *)it;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) > sizeof(struct bpf_iter_task));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
+ __alignof__(struct bpf_iter_task));
+
+ kit->task = kit->pos = NULL;
+ switch (flags) {
+ case BPF_TASK_ITER_ALL_THREADS:
+ case BPF_TASK_ITER_ALL_PROCS:
+ case BPF_TASK_ITER_PROC_THREADS:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (flags == BPF_TASK_ITER_PROC_THREADS)
+ kit->task = task;
+ else
+ kit->task = &init_task;
+ kit->pos = kit->task;
+ kit->flags = flags;
+ return 0;
+}
+
+__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
+{
+ struct bpf_iter_task_kern *kit = (void *)it;
+ struct task_struct *pos;
+ unsigned int flags;
+
+ flags = kit->flags;
+ pos = kit->pos;
+
+ if (!pos)
+ goto out;
+
+ if (flags == BPF_TASK_ITER_ALL_PROCS)
+ goto get_next_task;
+
+ kit->pos = next_thread(kit->pos);
+ if (kit->pos == kit->task) {
+ if (flags == BPF_TASK_ITER_PROC_THREADS) {
+ kit->pos = NULL;
+ goto out;
+ }
+ } else
+ goto out;
+
+get_next_task:
+ kit->pos = next_task(kit->pos);
+ kit->task = kit->pos;
+ if (kit->pos == &init_task)
+ kit->pos = NULL;
+
+out:
+ return pos;
+}
+
+__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
+{
+}
+
DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 8b53537e0f27..1ec82997cce7 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -457,5 +457,10 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
+struct bpf_iter_task;
+extern int bpf_iter_task_new(struct bpf_iter_task *it,
+ struct task_struct *task, unsigned int flags) __weak __ksym;
+extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
+extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
#endif
--
2.20.1
When using task_iter to iterate all threads of a specific task, we enforce
that the user must pass a valid task pointer to ensure safety. However,
when iterating all threads/process in the system, BPF verifier still
require a valid ptr instead of "nullable" pointer, even though it's
pointless, which is a kind of surprising from usability standpoint. It
would be nice if we could let that kfunc accept a explicit null pointer
when we are using BPF_TASK_ITER_ALL_{PROCS, THREADS} and a valid pointer
when using BPF_TASK_ITER_THREAD.
Given a trival kfunc:
__bpf_kfunc void FN(struct TYPE_A *obj);
BPF Prog would reject a nullptr for obj. The error info is:
"arg#x pointer type xx xx must point to scalar, or struct with scalar"
reported by get_kfunc_ptr_arg_type(). The reg->type is SCALAR_VALUE and
the btf type of ref_t is not scalar or scalar_struct which leads to the
rejection of get_kfunc_ptr_arg_type.
This patch add "__nullable" annotation:
__bpf_kfunc void FN(struct TYPE_A *obj__nullable);
Here __nullable indicates obj can be optional, user can pass a explicit
nullptr or a normal TYPE_A pointer. In get_kfunc_ptr_arg_type(), we will
detect whether the current arg is optional and register is null, If so,
return a new kfunc_ptr_arg_type KF_ARG_PTR_TO_NULL and skip to the next
arg in check_kfunc_args().
Signed-off-by: Chuyi Zhou <[email protected]>
---
kernel/bpf/task_iter.c | 7 +++++--
kernel/bpf/verifier.c | 13 ++++++++++++-
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index caeddad3d2f1..0772545568f1 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -873,7 +873,7 @@ enum {
};
__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
- struct task_struct *task, unsigned int flags)
+ struct task_struct *task__nullable, unsigned int flags)
{
struct bpf_iter_task_kern *kit = (void *)it;
@@ -885,14 +885,17 @@ __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
switch (flags) {
case BPF_TASK_ITER_ALL_THREADS:
case BPF_TASK_ITER_ALL_PROCS:
+ break;
case BPF_TASK_ITER_PROC_THREADS:
+ if (!task__nullable)
+ return -EINVAL;
break;
default:
return -EINVAL;
}
if (flags == BPF_TASK_ITER_PROC_THREADS)
- kit->task = task;
+ kit->task = task__nullable;
else
kit->task = &init_task;
kit->pos = kit->task;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3a60cc87520e..d09697dbfd9c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10310,6 +10310,11 @@ static bool is_kfunc_arg_refcounted_kptr(const struct btf *btf, const struct btf
return __kfunc_param_match_suffix(btf, arg, "__refcounted_kptr");
}
+static bool is_kfunc_arg_nullable(const struct btf *btf, const struct btf_param *arg)
+{
+ return __kfunc_param_match_suffix(btf, arg, "__nullable");
+}
+
static bool is_kfunc_arg_scalar_with_name(const struct btf *btf,
const struct btf_param *arg,
const char *name)
@@ -10452,6 +10457,7 @@ enum kfunc_ptr_arg_type {
KF_ARG_PTR_TO_CALLBACK,
KF_ARG_PTR_TO_RB_ROOT,
KF_ARG_PTR_TO_RB_NODE,
+ KF_ARG_PTR_TO_NULL,
};
enum special_kfunc_type {
@@ -10608,6 +10614,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (is_kfunc_arg_callback(env, meta->btf, &args[argno]))
return KF_ARG_PTR_TO_CALLBACK;
+ if (is_kfunc_arg_nullable(meta->btf, &args[argno]) && register_is_null(reg))
+ return KF_ARG_PTR_TO_NULL;
if (argno + 1 < nargs &&
(is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], ®s[regno + 1]) ||
@@ -11158,7 +11166,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
}
if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) &&
- (register_is_null(reg) || type_may_be_null(reg->type))) {
+ (register_is_null(reg) || type_may_be_null(reg->type)) &&
+ !is_kfunc_arg_nullable(meta->btf, &args[i])) {
verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i);
return -EACCES;
}
@@ -11183,6 +11192,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
return kf_arg_type;
switch (kf_arg_type) {
+ case KF_ARG_PTR_TO_NULL:
+ continue;
case KF_ARG_PTR_TO_ALLOC_BTF_ID:
case KF_ARG_PTR_TO_BTF_ID:
if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
--
2.20.1
css_iter and task_iter should be used in rcu section. Specifically, in
sleepable progs explicit bpf_rcu_read_lock() is needed before use these
iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
use them directly.
This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
whether we are in rcu cs before we want to invoke this kfunc. If the rcu
protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.
It is worth noting that currently, bpf_rcu_read_unlock does not
clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
search STACK_ITER.
Signed-off-by: Chuyi Zhou <[email protected]>
---
include/linux/bpf_verifier.h | 19 ++++++++------
include/linux/btf.h | 1 +
kernel/bpf/helpers.c | 4 +--
kernel/bpf/verifier.c | 50 ++++++++++++++++++++++++++++--------
4 files changed, 53 insertions(+), 21 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 94ec766432f5..e67cd45a85be 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -386,19 +386,18 @@ struct bpf_verifier_state {
u32 jmp_history_cnt;
};
-#define bpf_get_spilled_reg(slot, frame) \
+#define bpf_get_spilled_reg(slot, frame, mask) \
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
- (frame->stack[slot].slot_type[0] == STACK_SPILL)) \
+ ((1 << frame->stack[slot].slot_type[0]) & (mask))) \
? &frame->stack[slot].spilled_ptr : NULL)
/* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
-#define bpf_for_each_spilled_reg(iter, frame, reg) \
- for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \
+#define bpf_for_each_spilled_reg(iter, frame, reg, mask) \
+ for (iter = 0, reg = bpf_get_spilled_reg(iter, frame, mask); \
iter < frame->allocated_stack / BPF_REG_SIZE; \
- iter++, reg = bpf_get_spilled_reg(iter, frame))
+ iter++, reg = bpf_get_spilled_reg(iter, frame, mask))
-/* Invoke __expr over regsiters in __vst, setting __state and __reg */
-#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \
+#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr) \
({ \
struct bpf_verifier_state *___vstate = __vst; \
int ___i, ___j; \
@@ -410,7 +409,7 @@ struct bpf_verifier_state {
__reg = &___regs[___j]; \
(void)(__expr); \
} \
- bpf_for_each_spilled_reg(___j, __state, __reg) { \
+ bpf_for_each_spilled_reg(___j, __state, __reg, __mask) { \
if (!__reg) \
continue; \
(void)(__expr); \
@@ -418,6 +417,10 @@ struct bpf_verifier_state {
} \
})
+/* Invoke __expr over regsiters in __vst, setting __state and __reg */
+#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \
+ bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, 1 << STACK_SPILL, __expr)
+
/* linked list of verifier states used to prune search */
struct bpf_verifier_state_list {
struct bpf_verifier_state state;
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 928113a80a95..c2231c64d60b 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -74,6 +74,7 @@
#define KF_ITER_NEW (1 << 8) /* kfunc implements BPF iter constructor */
#define KF_ITER_NEXT (1 << 9) /* kfunc implements BPF iter next method */
#define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */
+#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */
/*
* Tag marking a kernel function as a kfunc. This is meant to minimize the
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6330e37ca8d5..5081a4ac5b6d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2555,10 +2555,10 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
-BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
-BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED)
BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 528d375c17ee..3a60cc87520e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1173,7 +1173,12 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
static void __mark_reg_known_zero(struct bpf_reg_state *reg);
+static bool in_rcu_cs(struct bpf_verifier_env *env);
+
+static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta);
+
static int mark_stack_slots_iter(struct bpf_verifier_env *env,
+ struct bpf_kfunc_call_arg_meta *meta,
struct bpf_reg_state *reg, int insn_idx,
struct btf *btf, u32 btf_id, int nr_slots)
{
@@ -1194,6 +1199,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
__mark_reg_known_zero(st);
st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
+ if (is_kfunc_rcu_protected(meta)) {
+ if (in_rcu_cs(env))
+ st->type |= MEM_RCU;
+ else
+ st->type |= PTR_UNTRUSTED;
+ }
st->live |= REG_LIVE_WRITTEN;
st->ref_obj_id = i == 0 ? id : 0;
st->iter.btf = btf;
@@ -1268,7 +1279,7 @@ static bool is_iter_reg_valid_uninit(struct bpf_verifier_env *env,
return true;
}
-static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
struct btf *btf, u32 btf_id, int nr_slots)
{
struct bpf_func_state *state = func(env, reg);
@@ -1276,26 +1287,28 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
spi = iter_get_spi(env, reg, nr_slots);
if (spi < 0)
- return false;
+ return -EINVAL;
for (i = 0; i < nr_slots; i++) {
struct bpf_stack_state *slot = &state->stack[spi - i];
struct bpf_reg_state *st = &slot->spilled_ptr;
+ if (st->type & PTR_UNTRUSTED)
+ return -EPROTO;
/* only main (first) slot has ref_obj_id set */
if (i == 0 && !st->ref_obj_id)
- return false;
+ return -EINVAL;
if (i != 0 && st->ref_obj_id)
- return false;
+ return -EINVAL;
if (st->iter.btf != btf || st->iter.btf_id != btf_id)
- return false;
+ return -EINVAL;
for (j = 0; j < BPF_REG_SIZE; j++)
if (slot->slot_type[j] != STACK_ITER)
- return false;
+ return -EINVAL;
}
- return true;
+ return 0;
}
/* Check if given stack slot is "special":
@@ -7618,15 +7631,24 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
return err;
}
- err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
+ err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
if (err)
return err;
} else {
/* iter_next() or iter_destroy() expect initialized iter state*/
- if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
+ err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots);
+ switch (err) {
+ case 0:
+ break;
+ case -EINVAL:
verbose(env, "expected an initialized iter_%s as arg #%d\n",
iter_type_str(meta->btf, btf_id), regno);
- return -EINVAL;
+ return err;
+ case -EPROTO:
+ verbose(env, "expected an RCU CS when using %s\n", meta->func_name);
+ return err;
+ default:
+ return err;
}
spi = iter_get_spi(env, reg, nr_slots);
@@ -10209,6 +10231,11 @@ static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
return meta->kfunc_flags & KF_RCU;
}
+static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta)
+{
+ return meta->kfunc_flags & KF_RCU_PROTECTED;
+}
+
static bool __kfunc_param_match_suffix(const struct btf *btf,
const struct btf_param *arg,
const char *suffix)
@@ -11560,6 +11587,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
if (env->cur_state->active_rcu_lock) {
struct bpf_func_state *state;
struct bpf_reg_state *reg;
+ u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER);
if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
@@ -11570,7 +11598,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
return -EINVAL;
} else if (rcu_unlock) {
- bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+ bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({
if (reg->type & MEM_RCU) {
reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL);
reg->type |= PTR_UNTRUSTED;
--
2.20.1
The newly-added struct bpf_iter_task has a name collision with a selftest
for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
renamed in order to avoid the collision.
Signed-off-by: Chuyi Zhou <[email protected]>
---
.../selftests/bpf/prog_tests/bpf_iter.c | 18 +++++++++---------
.../{bpf_iter_task.c => bpf_iter_tasks.c} | 0
2 files changed, 9 insertions(+), 9 deletions(-)
rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 1f02168103dd..dc60e8e125cd 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -7,7 +7,7 @@
#include "bpf_iter_ipv6_route.skel.h"
#include "bpf_iter_netlink.skel.h"
#include "bpf_iter_bpf_map.skel.h"
-#include "bpf_iter_task.skel.h"
+#include "bpf_iter_tasks.skel.h"
#include "bpf_iter_task_stack.skel.h"
#include "bpf_iter_task_file.skel.h"
#include "bpf_iter_task_vma.skel.h"
@@ -215,12 +215,12 @@ static void *do_nothing_wait(void *arg)
static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
int *num_unknown, int *num_known)
{
- struct bpf_iter_task *skel;
+ struct bpf_iter_tasks *skel;
pthread_t thread_id;
void *ret;
- skel = bpf_iter_task__open_and_load();
- if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
+ skel = bpf_iter_tasks__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load"))
return;
ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock");
@@ -239,7 +239,7 @@ static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
"pthread_join");
- bpf_iter_task__destroy(skel);
+ bpf_iter_tasks__destroy(skel);
}
static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)
@@ -307,10 +307,10 @@ static void test_task_pidfd(void)
static void test_task_sleepable(void)
{
- struct bpf_iter_task *skel;
+ struct bpf_iter_tasks *skel;
- skel = bpf_iter_task__open_and_load();
- if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
+ skel = bpf_iter_tasks__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load"))
return;
do_dummy_read(skel->progs.dump_task_sleepable);
@@ -320,7 +320,7 @@ static void test_task_sleepable(void)
ASSERT_GT(skel->bss->num_success_copy_from_user_task, 0,
"num_success_copy_from_user_task");
- bpf_iter_task__destroy(skel);
+ bpf_iter_tasks__destroy(skel);
}
static void test_task_stack(void)
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_tasks.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/bpf_iter_task.c
rename to tools/testing/selftests/bpf/progs/bpf_iter_tasks.c
--
2.20.1
This patch adds three subtests to demonstrate these patterns and validating
correctness.
subtest1:
1) We use task_iter to iterate all process in the system and search for the
current process with a given pid.
2) We create some threads in current process context, and use
BPF_TASK_ITER_PROC_THREADS to iterate all threads of current process. As
expected, we would find all the threads of current process.
3) We create some threads and use BPF_TASK_ITER_ALL_THREADS to iterate all
threads in the system. As expected, we would find all the threads which was
created.
subtest2: We create a cgroup and add the current task to the cgroup. In the
BPF program, we would use bpf_for_each(css_task, task, css) to iterate all
tasks under the cgroup. As expected, we would find the current process.
subtest3:
1) We create a cgroup tree. In the BPF program, we use
bpf_for_each(css, pos, root, XXX) to iterate all descendant under the root
with pre and post order. As expected, we would find all descendant and the
last iterating cgroup in post-order is root cgroup, the first iterating
cgroup in pre-order is root cgroup.
2) We wse BPF_CGROUP_ITER_ANCESTORS_UP to traverse the cgroup tree starting
from leaf and root separately, and record the height. The diff of the
hights would be the total tree-high - 1.
Signed-off-by: Chuyi Zhou <[email protected]>
---
.../testing/selftests/bpf/prog_tests/iters.c | 161 ++++++++++++++++++
tools/testing/selftests/bpf/progs/iters_css.c | 74 ++++++++
.../selftests/bpf/progs/iters_css_task.c | 42 +++++
.../testing/selftests/bpf/progs/iters_task.c | 41 +++++
.../selftests/bpf/progs/iters_task_failure.c | 105 ++++++++++++
5 files changed, 423 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/iters_css.c
create mode 100644 tools/testing/selftests/bpf/progs/iters_css_task.c
create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
create mode 100644 tools/testing/selftests/bpf/progs/iters_task_failure.c
diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index 10804ae5ae97..8d7a7bef5c73 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -1,13 +1,24 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include <malloc.h>
+#include <stdlib.h>
#include <test_progs.h>
+#include "cgroup_helpers.h"
#include "iters.skel.h"
#include "iters_state_safety.skel.h"
#include "iters_looping.skel.h"
#include "iters_num.skel.h"
#include "iters_testmod_seq.skel.h"
+#include "iters_task.skel.h"
+#include "iters_css_task.skel.h"
+#include "iters_css.skel.h"
+#include "iters_task_failure.skel.h"
static void subtest_num_iters(void)
{
@@ -90,6 +101,149 @@ static void subtest_testmod_seq_iters(void)
iters_testmod_seq__destroy(skel);
}
+static pthread_mutex_t do_nothing_mutex;
+
+static void *do_nothing_wait(void *arg)
+{
+ pthread_mutex_lock(&do_nothing_mutex);
+ pthread_mutex_unlock(&do_nothing_mutex);
+
+ pthread_exit(arg);
+}
+
+#define thread_num 2
+
+static void subtest_task_iters(void)
+{
+ struct iters_task *skel = NULL;
+ pthread_t thread_ids[thread_num];
+ void *ret;
+ int err;
+
+ skel = iters_task__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+ err = iters_task__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+ skel->bss->target_pid = getpid();
+ err = iters_task__attach(skel);
+ if (!ASSERT_OK(err, "iters_task__attach"))
+ goto cleanup;
+ pthread_mutex_lock(&do_nothing_mutex);
+ for (int i = 0; i < thread_num; i++)
+ ASSERT_OK(pthread_create(&thread_ids[i], NULL, &do_nothing_wait, NULL),
+ "pthread_create");
+
+ syscall(SYS_getpgid);
+ iters_task__detach(skel);
+ ASSERT_EQ(skel->bss->process_cnt, 1, "process_cnt");
+ ASSERT_EQ(skel->bss->thread_cnt, thread_num + 1, "thread_cnt");
+ ASSERT_EQ(skel->bss->all_thread_cnt, thread_num + 1, "all_thread_cnt");
+ pthread_mutex_unlock(&do_nothing_mutex);
+ for (int i = 0; i < thread_num; i++)
+ pthread_join(thread_ids[i], &ret);
+cleanup:
+ iters_task__destroy(skel);
+}
+
+extern int stack_mprotect(void);
+
+static void subtest_css_task_iters(void)
+{
+ struct iters_css_task *skel = NULL;
+ int err, cg_fd, cg_id;
+ const char *cgrp_path = "/cg1";
+
+ err = setup_cgroup_environment();
+ if (!ASSERT_OK(err, "setup_cgroup_environment"))
+ goto cleanup;
+ cg_fd = create_and_get_cgroup(cgrp_path);
+ if (!ASSERT_GE(cg_fd, 0, "cg_create"))
+ goto cleanup;
+ cg_id = get_cgroup_id(cgrp_path);
+ err = join_cgroup(cgrp_path);
+ if (!ASSERT_OK(err, "setup_cgroup_environment"))
+ goto cleanup;
+
+ skel = iters_css_task__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+
+ err = iters_css_task__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ skel->bss->target_pid = getpid();
+ skel->bss->cg_id = cg_id;
+ err = iters_css_task__attach(skel);
+
+ err = stack_mprotect();
+ if (!ASSERT_OK(err, "iters_task__attach"))
+ goto cleanup;
+
+ iters_css_task__detach(skel);
+ ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt");
+
+cleanup:
+ cleanup_cgroup_environment();
+ iters_css_task__destroy(skel);
+}
+
+static void subtest_css_iters(void)
+{
+ struct iters_css *skel = NULL;
+ struct {
+ const char *path;
+ int fd;
+ } cgs[] = {
+ { "/cg1" },
+ { "/cg1/cg2" },
+ { "/cg1/cg2/cg3" },
+ { "/cg1/cg2/cg3/cg4" },
+ };
+ int err, cg_nr = ARRAY_SIZE(cgs);
+ int i;
+
+ err = setup_cgroup_environment();
+ if (!ASSERT_OK(err, "setup_cgroup_environment"))
+ goto cleanup;
+ for (i = 0; i < cg_nr; i++) {
+ cgs[i].fd = create_and_get_cgroup(cgs[i].path);
+ if (!ASSERT_GE(cgs[i].fd, 0, "cg_create"))
+ goto cleanup;
+ }
+
+ skel = iters_css__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ goto cleanup;
+ err = iters_css__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ skel->bss->target_pid = getpid();
+ skel->bss->root_cg_id = get_cgroup_id(cgs[0].path);
+ skel->bss->leaf_cg_id = get_cgroup_id(cgs[cg_nr - 1].path);
+ err = iters_css__attach(skel);
+
+ if (!ASSERT_OK(err, "iters_task__attach"))
+ goto cleanup;
+
+ syscall(SYS_getpgid);
+ ASSERT_EQ(skel->bss->pre_css_dec_cnt, cg_nr, "pre order search dec count");
+ ASSERT_EQ(skel->bss->first_cg_id, get_cgroup_id(cgs[0].path),
+ "pre order search first cgroup id");
+
+ ASSERT_EQ(skel->bss->post_css_dec_cnt, cg_nr, "post order search dec count");
+ ASSERT_EQ(skel->bss->last_cg_id, get_cgroup_id(cgs[0].path),
+ "post order search last cgroup id");
+ ASSERT_EQ(skel->bss->tree_high, cg_nr - 1, "tree high");
+ iters_css__detach(skel);
+cleanup:
+ cleanup_cgroup_environment();
+ iters_css__destroy(skel);
+}
+
void test_iters(void)
{
RUN_TESTS(iters_state_safety);
@@ -103,4 +257,11 @@ void test_iters(void)
subtest_num_iters();
if (test__start_subtest("testmod_seq"))
subtest_testmod_seq_iters();
+ if (test__start_subtest("task"))
+ subtest_task_iters();
+ if (test__start_subtest("css_task"))
+ subtest_css_task_iters();
+ if (test__start_subtest("css"))
+ subtest_css_iters();
+ RUN_TESTS(iters_task_failure);
}
diff --git a/tools/testing/selftests/bpf/progs/iters_css.c b/tools/testing/selftests/bpf/progs/iters_css.c
new file mode 100644
index 000000000000..1422a7956c44
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_css.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Chuyi Zhou <[email protected]> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+pid_t target_pid = 0;
+u64 root_cg_id;
+u64 leaf_cg_id;
+
+u64 last_cg_id = 0;
+u64 first_cg_id = 0;
+
+int post_css_dec_cnt = 0;
+int pre_css_dec_cnt = 0;
+int tree_high = 0;
+
+struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
+void bpf_cgroup_release(struct cgroup *p) __ksym;
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_css_for_each(const void *ctx)
+{
+ struct task_struct *cur_task = bpf_get_current_task_btf();
+ struct cgroup_subsys_state *root_css, *leaf_css, *pos;
+ struct cgroup *root_cgrp, *leaf_cgrp, *cur_cgrp;
+
+ if (cur_task->pid != target_pid)
+ return 0;
+
+ root_cgrp = bpf_cgroup_from_id(root_cg_id);
+
+ if (!root_cgrp)
+ return 0;
+
+ leaf_cgrp = bpf_cgroup_from_id(leaf_cg_id);
+
+ if (!leaf_cgrp) {
+ bpf_cgroup_release(root_cgrp);
+ return 0;
+ }
+ root_css = &root_cgrp->self;
+ leaf_css = &leaf_cgrp->self;
+ bpf_rcu_read_lock();
+ bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) {
+ cur_cgrp = pos->cgroup;
+ post_css_dec_cnt += 1;
+ last_cg_id = cur_cgrp->kn->id;
+ }
+
+ bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_PRE) {
+ cur_cgrp = pos->cgroup;
+ pre_css_dec_cnt += 1;
+ if (!first_cg_id)
+ first_cg_id = cur_cgrp->kn->id;
+ }
+
+ bpf_for_each(css, pos, leaf_css, BPF_CGROUP_ITER_ANCESTORS_UP)
+ tree_high += 1;
+
+ bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_ANCESTORS_UP)
+ tree_high -= 1;
+ bpf_rcu_read_unlock();
+ bpf_cgroup_release(root_cgrp);
+ bpf_cgroup_release(leaf_cgrp);
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c b/tools/testing/selftests/bpf/progs/iters_css_task.c
new file mode 100644
index 000000000000..506a2755234e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_css_task.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Chuyi Zhou <[email protected]> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
+void bpf_cgroup_release(struct cgroup *p) __ksym;
+
+pid_t target_pid = 0;
+int css_task_cnt = 0;
+u64 cg_id;
+
+SEC("lsm/file_mprotect")
+int BPF_PROG(iter_css_task_for_each)
+{
+ struct task_struct *cur_task = bpf_get_current_task_btf();
+ struct cgroup_subsys_state *css;
+ struct task_struct *task;
+ struct cgroup *cgrp;
+
+ if (cur_task->pid != target_pid)
+ return 0;
+
+ cgrp = bpf_cgroup_from_id(cg_id);
+
+ if (cgrp == NULL)
+ return 0;
+ css = &cgrp->self;
+
+ bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS)
+ if (task->pid == target_pid)
+ css_task_cnt += 1;
+
+ bpf_cgroup_release(cgrp);
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/iters_task.c b/tools/testing/selftests/bpf/progs/iters_task.c
new file mode 100644
index 000000000000..bd6d4f7b5e59
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Chuyi Zhou <[email protected]> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+pid_t target_pid = 0;
+int process_cnt = 0;
+int thread_cnt = 0;
+int all_thread_cnt = 0;
+
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
+int iter_task_for_each_sleep(void *ctx)
+{
+ struct task_struct *cur_task = bpf_get_current_task_btf();
+ struct task_struct *pos;
+
+ if (cur_task->pid != target_pid)
+ return 0;
+ bpf_rcu_read_lock();
+ bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_PROCS)
+ if (pos->pid == target_pid)
+ process_cnt += 1;
+
+ bpf_for_each(task, pos, cur_task, BPF_TASK_ITER_PROC_THREADS)
+ thread_cnt += 1;
+
+ bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_THREADS)
+ if (pos->tgid == target_pid)
+ all_thread_cnt += 1;
+ bpf_rcu_read_unlock();
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/iters_task_failure.c b/tools/testing/selftests/bpf/progs/iters_task_failure.c
new file mode 100644
index 000000000000..c3bf96a67dba
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/iters_task_failure.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Chuyi Zhou <[email protected]> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym;
+void bpf_cgroup_release(struct cgroup *p) __ksym;
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("expected an RCU CS when using bpf_iter_task_next")
+int BPF_PROG(iter_tasks_without_lock)
+{
+ struct task_struct *pos;
+
+ bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_PROCS) {
+
+ }
+ return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("expected an RCU CS when using bpf_iter_css_next")
+int BPF_PROG(iter_css_without_lock)
+{
+ u64 cg_id = bpf_get_current_cgroup_id();
+ struct cgroup *cgrp = bpf_cgroup_from_id(cg_id);
+ struct cgroup_subsys_state *root_css, *pos;
+
+ if (!cgrp)
+ return 0;
+ root_css = &cgrp->self;
+
+ bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) {
+
+ }
+ bpf_cgroup_release(cgrp);
+ return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("expected an RCU CS when using bpf_iter_task_next")
+int BPF_PROG(iter_tasks_lock_and_unlock)
+{
+ struct task_struct *pos;
+
+ bpf_rcu_read_lock();
+ bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_PROCS) {
+ bpf_rcu_read_unlock();
+
+ bpf_rcu_read_lock();
+ }
+ bpf_rcu_read_unlock();
+ return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("expected an RCU CS when using bpf_iter_css_next")
+int BPF_PROG(iter_css_lock_and_unlock)
+{
+ u64 cg_id = bpf_get_current_cgroup_id();
+ struct cgroup *cgrp = bpf_cgroup_from_id(cg_id);
+ struct cgroup_subsys_state *root_css, *pos;
+
+ if (!cgrp)
+ return 0;
+ root_css = &cgrp->self;
+
+ bpf_rcu_read_lock();
+ bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) {
+ bpf_rcu_read_unlock();
+
+ bpf_rcu_read_lock();
+ }
+ bpf_rcu_read_unlock();
+ bpf_cgroup_release(cgrp);
+ return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+__failure __msg("css_task_iter is only allowed in bpf_lsm and bpf iter-s")
+int BPF_PROG(iter_css_task_for_each)
+{
+ u64 cg_id = bpf_get_current_cgroup_id();
+ struct cgroup *cgrp = bpf_cgroup_from_id(cg_id);
+ struct cgroup_subsys_state *css;
+ struct task_struct *task;
+
+ if (cgrp == NULL)
+ return 0;
+ css = &cgrp->self;
+
+ bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) {
+
+ }
+ bpf_cgroup_release(cgrp);
+ return 0;
+}
--
2.20.1
在 2023/10/11 20:08, Chuyi Zhou 写道:
> This patch adds three subtests to demonstrate these patterns and validating
> correctness.
>
> subtest1:
>
> 1) We use task_iter to iterate all process in the system and search for the
> current process with a given pid.
>
> 2) We create some threads in current process context, and use
> BPF_TASK_ITER_PROC_THREADS to iterate all threads of current process. As
> expected, we would find all the threads of current process.
>
> 3) We create some threads and use BPF_TASK_ITER_ALL_THREADS to iterate all
> threads in the system. As expected, we would find all the threads which was
> created.
>
> subtest2: We create a cgroup and add the current task to the cgroup. In the
> BPF program, we would use bpf_for_each(css_task, task, css) to iterate all
> tasks under the cgroup. As expected, we would find the current process.
>
> subtest3:
>
> 1) We create a cgroup tree. In the BPF program, we use
> bpf_for_each(css, pos, root, XXX) to iterate all descendant under the root
> with pre and post order. As expected, we would find all descendant and the
> last iterating cgroup in post-order is root cgroup, the first iterating
> cgroup in pre-order is root cgroup.
>
> 2) We wse BPF_CGROUP_ITER_ANCESTORS_UP to traverse the cgroup tree starting
> from leaf and root separately, and record the height. The diff of the
> hights would be the total tree-high - 1.
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> .../testing/selftests/bpf/prog_tests/iters.c | 161 ++++++++++++++++++
> tools/testing/selftests/bpf/progs/iters_css.c | 74 ++++++++
> .../selftests/bpf/progs/iters_css_task.c | 42 +++++
> .../testing/selftests/bpf/progs/iters_task.c | 41 +++++
> .../selftests/bpf/progs/iters_task_failure.c | 105 ++++++++++++
> 5 files changed, 423 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/progs/iters_css.c
> create mode 100644 tools/testing/selftests/bpf/progs/iters_css_task.c
> create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
> create mode 100644 tools/testing/selftests/bpf/progs/iters_task_failure.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
> index 10804ae5ae97..8d7a7bef5c73 100644
> --- a/tools/testing/selftests/bpf/prog_tests/iters.c
> +++ b/tools/testing/selftests/bpf/prog_tests/iters.c
> @@ -1,13 +1,24 @@
> // SPDX-License-Identifier: GPL-2.0
> /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
>
> +#include <sys/syscall.h>
> +#include <sys/mman.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +#include <malloc.h>
> +#include <stdlib.h>
> #include <test_progs.h>
> +#include "cgroup_helpers.h"
>
> #include "iters.skel.h"
> #include "iters_state_safety.skel.h"
> #include "iters_looping.skel.h"
> #include "iters_num.skel.h"
> #include "iters_testmod_seq.skel.h"
> +#include "iters_task.skel.h"
> +#include "iters_css_task.skel.h"
> +#include "iters_css.skel.h"
> +#include "iters_task_failure.skel.h"
>
> static void subtest_num_iters(void)
> {
> @@ -90,6 +101,149 @@ static void subtest_testmod_seq_iters(void)
> iters_testmod_seq__destroy(skel);
> }
>
> +static pthread_mutex_t do_nothing_mutex;
> +
> +static void *do_nothing_wait(void *arg)
> +{
> + pthread_mutex_lock(&do_nothing_mutex);
> + pthread_mutex_unlock(&do_nothing_mutex);
> +
> + pthread_exit(arg);
> +}
> +
> +#define thread_num 2
> +
> +static void subtest_task_iters(void)
> +{
> + struct iters_task *skel = NULL;
> + pthread_t thread_ids[thread_num];
> + void *ret;
> + int err;
> +
> + skel = iters_task__open();
> + if (!ASSERT_OK_PTR(skel, "skel_open"))
> + goto cleanup;
> + err = iters_task__load(skel);
> + if (!ASSERT_OK(err, "skel_load"))
> + goto cleanup;
> + skel->bss->target_pid = getpid();
> + err = iters_task__attach(skel);
> + if (!ASSERT_OK(err, "iters_task__attach"))
> + goto cleanup;
> + pthread_mutex_lock(&do_nothing_mutex);
> + for (int i = 0; i < thread_num; i++)
> + ASSERT_OK(pthread_create(&thread_ids[i], NULL, &do_nothing_wait, NULL),
> + "pthread_create");
> +
> + syscall(SYS_getpgid);
> + iters_task__detach(skel);
> + ASSERT_EQ(skel->bss->process_cnt, 1, "process_cnt");
> + ASSERT_EQ(skel->bss->thread_cnt, thread_num + 1, "thread_cnt");
> + ASSERT_EQ(skel->bss->all_thread_cnt, thread_num + 1, "all_thread_cnt");
> + pthread_mutex_unlock(&do_nothing_mutex);
> + for (int i = 0; i < thread_num; i++)
> + pthread_join(thread_ids[i], &ret);
> +cleanup:
> + iters_task__destroy(skel);
> +}
> +
> +extern int stack_mprotect(void);
> +
> +static void subtest_css_task_iters(void)
> +{
> + struct iters_css_task *skel = NULL;
> + int err, cg_fd, cg_id;
> + const char *cgrp_path = "/cg1";
> +
> + err = setup_cgroup_environment();
> + if (!ASSERT_OK(err, "setup_cgroup_environment"))
> + goto cleanup;
> + cg_fd = create_and_get_cgroup(cgrp_path);
> + if (!ASSERT_GE(cg_fd, 0, "cg_create"))
> + goto cleanup;
> + cg_id = get_cgroup_id(cgrp_path);
> + err = join_cgroup(cgrp_path);
> + if (!ASSERT_OK(err, "setup_cgroup_environment"))
> + goto cleanup;
> +
> + skel = iters_css_task__open();
> + if (!ASSERT_OK_PTR(skel, "skel_open"))
> + goto cleanup;
> +
> + err = iters_css_task__load(skel);
> + if (!ASSERT_OK(err, "skel_load"))
> + goto cleanup;
> +
> + skel->bss->target_pid = getpid();
> + skel->bss->cg_id = cg_id;
> + err = iters_css_task__attach(skel);
> +
> + err = stack_mprotect();
> + if (!ASSERT_OK(err, "iters_task__attach"))
> + goto cleanup;
The is incorrect and would fail the lsm_test in prog_test.
Here we should check the stack_mprotect return value is -1 or
-EPERM. In BPF Prog iter_css_task_for_each, we need to return -EPERM.
The whole logic should keep same with lsm_test.c/bpf_cookie.c
After the following fix, CI would works well.
(https://github.com/kernel-patches/bpf/actions/runs/6484774470/job/17609349165?pr=5808)
@@ -177,11 +177,12 @@ static void subtest_css_task_iters(void)
skel->bss->target_pid = getpid();
skel->bss->cg_id = cg_id;
err = iters_css_task__attach(skel);
-
- err = stack_mprotect();
if (!ASSERT_OK(err, "iters_task__attach"))
goto cleanup;
-
+ err = stack_mprotect();
+ if (!ASSERT_EQ(err, -1, "stack_mprotect") ||
+ !ASSERT_EQ(errno, EPERM, "stack_mprotect"))
+ goto cleanup;
iters_css_task__detach(skel);
ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt");
diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c
b/tools/testing/selftests/bpf/progs/iters_css_task.c
index 506a2755234e..9f79a57fde8e 100644
--- a/tools/testing/selftests/bpf/progs/iters_css_task.c
+++ b/tools/testing/selftests/bpf/progs/iters_css_task.c
@@ -2,6 +2,7 @@
/* Copyright (C) 2023 Chuyi Zhou <[email protected]> */
#include "vmlinux.h"
+#include <errno.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "bpf_misc.h"
@@ -17,7 +18,8 @@ int css_task_cnt = 0;
u64 cg_id;
SEC("lsm/file_mprotect")
-int BPF_PROG(iter_css_task_for_each)
+int BPF_PROG(iter_css_task_for_each, struct vm_area_struct *vma,
+ unsigned long reqprot, unsigned long prot, int ret)
{
struct task_struct *cur_task = bpf_get_current_task_btf();
struct cgroup_subsys_state *css;
@@ -25,12 +27,12 @@ int BPF_PROG(iter_css_task_for_each)
struct cgroup *cgrp;
if (cur_task->pid != target_pid)
- return 0;
+ return ret;
cgrp = bpf_cgroup_from_id(cg_id);
- if (cgrp == NULL)
- return 0;
+ if (!cgrp)
+ goto out;
css = &cgrp->self;
bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS)
@@ -38,5 +40,6 @@ int BPF_PROG(iter_css_task_for_each)
css_task_cnt += 1;
bpf_cgroup_release(cgrp);
- return 0;
+out:
+ return -EPERM;
}
On Wed, Oct 11, 2023 at 5:09 AM Chuyi Zhou <[email protected]> wrote:
>
> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_task in open-coded iterator
> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> iterate all processes in the system.
>
> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
> accepts a specific task and iterating type which allows:
>
> 1. iterating all process in the system(BPF_TASK_ITER_ALL_PROCS)
>
> 2. iterating all threads in the system(BPF_TASK_ITER_ALL_THREADS)
>
> 3. iterating all threads of a specific task(BPF_TASK_ITER_PROC_THREADS)
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> kernel/bpf/helpers.c | 3 +
> kernel/bpf/task_iter.c | 82 +++++++++++++++++++
> .../testing/selftests/bpf/bpf_experimental.h | 5 ++
> 3 files changed, 90 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index cb24c4a916df..690763751f6e 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2555,6 +2555,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 2cfcb4dd8a37..caeddad3d2f1 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -856,6 +856,88 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> bpf_mem_free(&bpf_global_ma, kit->css_it);
> }
>
> +struct bpf_iter_task {
> + __u64 __opaque[3];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_task_kern {
> + struct task_struct *task;
> + struct task_struct *pos;
> + unsigned int flags;
> +} __attribute__((aligned(8)));
> +
> +enum {
> + BPF_TASK_ITER_ALL_PROCS,
> + BPF_TASK_ITER_ALL_THREADS,
> + BPF_TASK_ITER_PROC_THREADS
> +};
> +
> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
> + struct task_struct *task, unsigned int flags)
> +{
> + struct bpf_iter_task_kern *kit = (void *)it;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) > sizeof(struct bpf_iter_task));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
> + __alignof__(struct bpf_iter_task));
> +
> + kit->task = kit->pos = NULL;
> + switch (flags) {
> + case BPF_TASK_ITER_ALL_THREADS:
> + case BPF_TASK_ITER_ALL_PROCS:
> + case BPF_TASK_ITER_PROC_THREADS:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + if (flags == BPF_TASK_ITER_PROC_THREADS)
> + kit->task = task;
> + else
> + kit->task = &init_task;
> + kit->pos = kit->task;
> + kit->flags = flags;
> + return 0;
> +}
> +
> +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
> +{
> + struct bpf_iter_task_kern *kit = (void *)it;
> + struct task_struct *pos;
> + unsigned int flags;
> +
> + flags = kit->flags;
> + pos = kit->pos;
> +
> + if (!pos)
> + goto out;
> +
> + if (flags == BPF_TASK_ITER_ALL_PROCS)
> + goto get_next_task;
> +
> + kit->pos = next_thread(kit->pos);
> + if (kit->pos == kit->task) {
> + if (flags == BPF_TASK_ITER_PROC_THREADS) {
> + kit->pos = NULL;
> + goto out;
> + }
> + } else
> + goto out;
nit: this should have {} around it to match the other if branch
but actually, why goto out instead of return pos? same above, return
pos instead of goto out?
> +
> +get_next_task:
> + kit->pos = next_task(kit->pos);
> + kit->task = kit->pos;
> + if (kit->pos == &init_task)
> + kit->pos = NULL;
> +
> +out:
> + return pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
> +{
> +}
> +
> DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
> static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> index 8b53537e0f27..1ec82997cce7 100644
> --- a/tools/testing/selftests/bpf/bpf_experimental.h
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -457,5 +457,10 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
> extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
>
> +struct bpf_iter_task;
> +extern int bpf_iter_task_new(struct bpf_iter_task *it,
> + struct task_struct *task, unsigned int flags) __weak __ksym;
> +extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
> +extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
>
> #endif
> --
> 2.20.1
>
On Wed, Oct 11, 2023 at 5:09 AM Chuyi Zhou <[email protected]> wrote:
>
> When using task_iter to iterate all threads of a specific task, we enforce
> that the user must pass a valid task pointer to ensure safety. However,
> when iterating all threads/process in the system, BPF verifier still
> require a valid ptr instead of "nullable" pointer, even though it's
> pointless, which is a kind of surprising from usability standpoint. It
> would be nice if we could let that kfunc accept a explicit null pointer
> when we are using BPF_TASK_ITER_ALL_{PROCS, THREADS} and a valid pointer
> when using BPF_TASK_ITER_THREAD.
>
> Given a trival kfunc:
> __bpf_kfunc void FN(struct TYPE_A *obj);
>
> BPF Prog would reject a nullptr for obj. The error info is:
> "arg#x pointer type xx xx must point to scalar, or struct with scalar"
> reported by get_kfunc_ptr_arg_type(). The reg->type is SCALAR_VALUE and
> the btf type of ref_t is not scalar or scalar_struct which leads to the
> rejection of get_kfunc_ptr_arg_type.
>
> This patch add "__nullable" annotation:
> __bpf_kfunc void FN(struct TYPE_A *obj__nullable);
> Here __nullable indicates obj can be optional, user can pass a explicit
> nullptr or a normal TYPE_A pointer. In get_kfunc_ptr_arg_type(), we will
> detect whether the current arg is optional and register is null, If so,
> return a new kfunc_ptr_arg_type KF_ARG_PTR_TO_NULL and skip to the next
> arg in check_kfunc_args().
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> kernel/bpf/task_iter.c | 7 +++++--
> kernel/bpf/verifier.c | 13 ++++++++++++-
> 2 files changed, 17 insertions(+), 3 deletions(-)
>
Looks good to me, but someone better versed in kfunc internals should
double-check.
Acked-by: Andrii Nakryiko <[email protected]>
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index caeddad3d2f1..0772545568f1 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -873,7 +873,7 @@ enum {
> };
>
[...]
On Wed, Oct 11, 2023 at 5:09 AM Chuyi Zhou <[email protected]> wrote:
>
> css_iter and task_iter should be used in rcu section. Specifically, in
> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> use them directly.
>
> This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
> bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
> whether we are in rcu cs before we want to invoke this kfunc. If the rcu
> protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
> Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
> be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.
>
> It is worth noting that currently, bpf_rcu_read_unlock does not
> clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
> only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
> search STACK_ITER.
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> include/linux/bpf_verifier.h | 19 ++++++++------
> include/linux/btf.h | 1 +
> kernel/bpf/helpers.c | 4 +--
> kernel/bpf/verifier.c | 50 ++++++++++++++++++++++++++++--------
> 4 files changed, 53 insertions(+), 21 deletions(-)
>
LGTM.
Acked-by: Andrii Nakryiko <[email protected]>
On Wed, Oct 11, 2023 at 5:09 AM Chuyi Zhou <[email protected]> wrote:
>
> The newly-added struct bpf_iter_task has a name collision with a selftest
> for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
> renamed in order to avoid the collision.
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
Acked-by: Andrii Nakryiko <[email protected]>
> .../selftests/bpf/prog_tests/bpf_iter.c | 18 +++++++++---------
> .../{bpf_iter_task.c => bpf_iter_tasks.c} | 0
> 2 files changed, 9 insertions(+), 9 deletions(-)
> rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> index 1f02168103dd..dc60e8e125cd 100644
> --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> @@ -7,7 +7,7 @@
> #include "bpf_iter_ipv6_route.skel.h"
> #include "bpf_iter_netlink.skel.h"
> #include "bpf_iter_bpf_map.skel.h"
> -#include "bpf_iter_task.skel.h"
> +#include "bpf_iter_tasks.skel.h"
> #include "bpf_iter_task_stack.skel.h"
> #include "bpf_iter_task_file.skel.h"
> #include "bpf_iter_task_vma.skel.h"
> @@ -215,12 +215,12 @@ static void *do_nothing_wait(void *arg)
> static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
> int *num_unknown, int *num_known)
> {
> - struct bpf_iter_task *skel;
> + struct bpf_iter_tasks *skel;
> pthread_t thread_id;
> void *ret;
>
> - skel = bpf_iter_task__open_and_load();
> - if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
> + skel = bpf_iter_tasks__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load"))
> return;
>
> ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock");
> @@ -239,7 +239,7 @@ static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts,
> ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL,
> "pthread_join");
>
> - bpf_iter_task__destroy(skel);
> + bpf_iter_tasks__destroy(skel);
> }
>
> static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known)
> @@ -307,10 +307,10 @@ static void test_task_pidfd(void)
>
> static void test_task_sleepable(void)
> {
> - struct bpf_iter_task *skel;
> + struct bpf_iter_tasks *skel;
>
> - skel = bpf_iter_task__open_and_load();
> - if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load"))
> + skel = bpf_iter_tasks__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load"))
> return;
>
> do_dummy_read(skel->progs.dump_task_sleepable);
> @@ -320,7 +320,7 @@ static void test_task_sleepable(void)
> ASSERT_GT(skel->bss->num_success_copy_from_user_task, 0,
> "num_success_copy_from_user_task");
>
> - bpf_iter_task__destroy(skel);
> + bpf_iter_tasks__destroy(skel);
> }
>
> static void test_task_stack(void)
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_tasks.c
> similarity index 100%
> rename from tools/testing/selftests/bpf/progs/bpf_iter_task.c
> rename to tools/testing/selftests/bpf/progs/bpf_iter_tasks.c
> --
> 2.20.1
>
Hello,
在 2023/10/14 05:27, Andrii Nakryiko 写道:
> On Wed, Oct 11, 2023 at 5:09 AM Chuyi Zhou <[email protected]> wrote:
>>
>> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_task in open-coded iterator
>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>> iterate all processes in the system.
>>
>> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
>> accepts a specific task and iterating type which allows:
>>
>> 1. iterating all process in the system(BPF_TASK_ITER_ALL_PROCS)
>>
>> 2. iterating all threads in the system(BPF_TASK_ITER_ALL_THREADS)
>>
>> 3. iterating all threads of a specific task(BPF_TASK_ITER_PROC_THREADS)
>>
>> Signed-off-by: Chuyi Zhou <[email protected]>
>> ---
>> kernel/bpf/helpers.c | 3 +
>> kernel/bpf/task_iter.c | 82 +++++++++++++++++++
>> .../testing/selftests/bpf/bpf_experimental.h | 5 ++
>> 3 files changed, 90 insertions(+)
>>
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index cb24c4a916df..690763751f6e 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2555,6 +2555,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
>> +BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 2cfcb4dd8a37..caeddad3d2f1 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -856,6 +856,88 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
>> bpf_mem_free(&bpf_global_ma, kit->css_it);
>> }
>>
>> +struct bpf_iter_task {
>> + __u64 __opaque[3];
>> +} __attribute__((aligned(8)));
>> +
>> +struct bpf_iter_task_kern {
>> + struct task_struct *task;
>> + struct task_struct *pos;
>> + unsigned int flags;
>> +} __attribute__((aligned(8)));
>> +
>> +enum {
>> + BPF_TASK_ITER_ALL_PROCS,
>> + BPF_TASK_ITER_ALL_THREADS,
>> + BPF_TASK_ITER_PROC_THREADS
>> +};
>> +
>> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
>> + struct task_struct *task, unsigned int flags)
>> +{
>> + struct bpf_iter_task_kern *kit = (void *)it;
>> +
>> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) > sizeof(struct bpf_iter_task));
>> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
>> + __alignof__(struct bpf_iter_task));
>> +
>> + kit->task = kit->pos = NULL;
>> + switch (flags) {
>> + case BPF_TASK_ITER_ALL_THREADS:
>> + case BPF_TASK_ITER_ALL_PROCS:
>> + case BPF_TASK_ITER_PROC_THREADS:
>> + break;
>> + default:
>> + return -EINVAL;
>> + }
>> +
>> + if (flags == BPF_TASK_ITER_PROC_THREADS)
>> + kit->task = task;
>> + else
>> + kit->task = &init_task;
>> + kit->pos = kit->task;
>> + kit->flags = flags;
>> + return 0;
>> +}
>> +
>> +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
>> +{
>> + struct bpf_iter_task_kern *kit = (void *)it;
>> + struct task_struct *pos;
>> + unsigned int flags;
>> +
>> + flags = kit->flags;
>> + pos = kit->pos;
>> +
>> + if (!pos)
>> + goto out;
>> +
>> + if (flags == BPF_TASK_ITER_ALL_PROCS)
>> + goto get_next_task;
>> +
>> + kit->pos = next_thread(kit->pos);
>> + if (kit->pos == kit->task) {
>> + if (flags == BPF_TASK_ITER_PROC_THREADS) {
>> + kit->pos = NULL;
>> + goto out;
>> + }
>> + } else
>> + goto out;
>
> nit: this should have {} around it to match the other if branch
>
> but actually, why goto out instead of return pos? same above, return
> pos instead of goto out?
>
Thanks for the review.
IIUC, do you mean:
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 0772545568f1..b35debf19edb 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -913,7 +913,7 @@ __bpf_kfunc struct task_struct
*bpf_iter_task_next(struct bpf_iter_task *it)
pos = kit->pos;
if (!pos)
- goto out;
+ return pos;
if (flags == BPF_TASK_ITER_ALL_PROCS)
goto get_next_task;
@@ -922,18 +922,22 @@ __bpf_kfunc struct task_struct
*bpf_iter_task_next(struct bpf_iter_task *it)
if (kit->pos == kit->task) {
if (flags == BPF_TASK_ITER_PROC_THREADS) {
kit->pos = NULL;
- goto out;
+ return pos;
}
} else
- goto out;
+ return pos;
+ /*
+ * goto get_next_task means:
+ * case 1: flags == BPF_TASK_ITER_ALL_PROCS
+ * case 2: kit->pos == kit->task && flags ==
BPF_TASK_ITER_ALL_THREADS
+ */
get_next_task:
kit->pos = next_task(kit->pos);
kit->task = kit->pos;
if (kit->pos == &init_task)
kit->pos = NULL;
-out:
return pos;
BTW, do you have some comments on patch-8 ? or I should send next
version and pass all the CI first ?
Thanks.
>
>> +
>> +get_next_task:
>> + kit->pos = next_task(kit->pos);
>> + kit->task = kit->pos;
>> + if (kit->pos == &init_task)
>> + kit->pos = NULL;
>> +
>> +out:
>> + return pos;
>> +}
>> +
>> +__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
>> +{
>> +}
>> +
>> DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>>
>> static void do_mmap_read_unlock(struct irq_work *entry)
>> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
>> index 8b53537e0f27..1ec82997cce7 100644
>> --- a/tools/testing/selftests/bpf/bpf_experimental.h
>> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
>> @@ -457,5 +457,10 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
>> extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
>> extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
>>
>> +struct bpf_iter_task;
>> +extern int bpf_iter_task_new(struct bpf_iter_task *it,
>> + struct task_struct *task, unsigned int flags) __weak __ksym;
>> +extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
>> +extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
>>
>> #endif
>> --
>> 2.20.1
>>
在 2023/10/11 20:08, Chuyi Zhou 写道:
> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_task in open-coded iterator
> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> iterate all processes in the system.
>
> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
> accepts a specific task and iterating type which allows:
>
> 1. iterating all process in the system(BPF_TASK_ITER_ALL_PROCS)
>
> 2. iterating all threads in the system(BPF_TASK_ITER_ALL_THREADS)
>
> 3. iterating all threads of a specific task(BPF_TASK_ITER_PROC_THREADS)
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> kernel/bpf/helpers.c | 3 +
> kernel/bpf/task_iter.c | 82 +++++++++++++++++++
> .../testing/selftests/bpf/bpf_experimental.h | 5 ++
> 3 files changed, 90 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index cb24c4a916df..690763751f6e 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2555,6 +2555,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 2cfcb4dd8a37..caeddad3d2f1 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -856,6 +856,88 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> bpf_mem_free(&bpf_global_ma, kit->css_it);
> }
>
> +struct bpf_iter_task {
> + __u64 __opaque[3];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_task_kern {
> + struct task_struct *task;
> + struct task_struct *pos;
> + unsigned int flags;
> +} __attribute__((aligned(8)));
> +
> +enum {
> + BPF_TASK_ITER_ALL_PROCS,
> + BPF_TASK_ITER_ALL_THREADS,
> + BPF_TASK_ITER_PROC_THREADS
> +};
> +
In next version, I would add the missing __diag_ignore_all for
-Wmissing-prototypes in Patch2 ~ Patch4 to avoid kernel build warning.
Thanks.
> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
> + struct task_struct *task, unsigned int flags)
> +{
> + struct bpf_iter_task_kern *kit = (void *)it;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) > sizeof(struct bpf_iter_task));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
> + __alignof__(struct bpf_iter_task));
> +
> + kit->task = kit->pos = NULL;
> + switch (flags) {
> + case BPF_TASK_ITER_ALL_THREADS:
> + case BPF_TASK_ITER_ALL_PROCS:
> + case BPF_TASK_ITER_PROC_THREADS:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + if (flags == BPF_TASK_ITER_PROC_THREADS)
> + kit->task = task;
> + else
> + kit->task = &init_task;
> + kit->pos = kit->task;
> + kit->flags = flags;
> + return 0;
> +}
> +
> +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
> +{
> + struct bpf_iter_task_kern *kit = (void *)it;
> + struct task_struct *pos;
> + unsigned int flags;
> +
> + flags = kit->flags;
> + pos = kit->pos;
> +
> + if (!pos)
> + goto out;
> +
> + if (flags == BPF_TASK_ITER_ALL_PROCS)
> + goto get_next_task;
> +
> + kit->pos = next_thread(kit->pos);
> + if (kit->pos == kit->task) {
> + if (flags == BPF_TASK_ITER_PROC_THREADS) {
> + kit->pos = NULL;
> + goto out;
> + }
> + } else
> + goto out;
> +
> +get_next_task:
> + kit->pos = next_task(kit->pos);
> + kit->task = kit->pos;
> + if (kit->pos == &init_task)
> + kit->pos = NULL;
> +
> +out:
> + return pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
> +{
> +}
> +
> DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
On Fri, Oct 13, 2023 at 7:02 PM Chuyi Zhou <[email protected]> wrote:
>
> Hello,
>
> 在 2023/10/14 05:27, Andrii Nakryiko 写道:
> > On Wed, Oct 11, 2023 at 5:09 AM Chuyi Zhou <[email protected]> wrote:
> >>
> >> This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_task in open-coded iterator
> >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >> iterate all processes in the system.
> >>
> >> The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
> >> accepts a specific task and iterating type which allows:
> >>
> >> 1. iterating all process in the system(BPF_TASK_ITER_ALL_PROCS)
> >>
> >> 2. iterating all threads in the system(BPF_TASK_ITER_ALL_THREADS)
> >>
> >> 3. iterating all threads of a specific task(BPF_TASK_ITER_PROC_THREADS)
> >>
> >> Signed-off-by: Chuyi Zhou <[email protected]>
> >> ---
> >> kernel/bpf/helpers.c | 3 +
> >> kernel/bpf/task_iter.c | 82 +++++++++++++++++++
> >> .../testing/selftests/bpf/bpf_experimental.h | 5 ++
> >> 3 files changed, 90 insertions(+)
> >>
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index cb24c4a916df..690763751f6e 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2555,6 +2555,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> >> +BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index 2cfcb4dd8a37..caeddad3d2f1 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -856,6 +856,88 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> >> bpf_mem_free(&bpf_global_ma, kit->css_it);
> >> }
> >>
> >> +struct bpf_iter_task {
> >> + __u64 __opaque[3];
> >> +} __attribute__((aligned(8)));
> >> +
> >> +struct bpf_iter_task_kern {
> >> + struct task_struct *task;
> >> + struct task_struct *pos;
> >> + unsigned int flags;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +enum {
> >> + BPF_TASK_ITER_ALL_PROCS,
> >> + BPF_TASK_ITER_ALL_THREADS,
> >> + BPF_TASK_ITER_PROC_THREADS
> >> +};
> >> +
> >> +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
> >> + struct task_struct *task, unsigned int flags)
> >> +{
> >> + struct bpf_iter_task_kern *kit = (void *)it;
> >> +
> >> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) > sizeof(struct bpf_iter_task));
> >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
> >> + __alignof__(struct bpf_iter_task));
> >> +
> >> + kit->task = kit->pos = NULL;
> >> + switch (flags) {
> >> + case BPF_TASK_ITER_ALL_THREADS:
> >> + case BPF_TASK_ITER_ALL_PROCS:
> >> + case BPF_TASK_ITER_PROC_THREADS:
> >> + break;
> >> + default:
> >> + return -EINVAL;
> >> + }
> >> +
> >> + if (flags == BPF_TASK_ITER_PROC_THREADS)
> >> + kit->task = task;
> >> + else
> >> + kit->task = &init_task;
> >> + kit->pos = kit->task;
> >> + kit->flags = flags;
> >> + return 0;
> >> +}
> >> +
> >> +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it)
> >> +{
> >> + struct bpf_iter_task_kern *kit = (void *)it;
> >> + struct task_struct *pos;
> >> + unsigned int flags;
> >> +
> >> + flags = kit->flags;
> >> + pos = kit->pos;
> >> +
> >> + if (!pos)
> >> + goto out;
> >> +
> >> + if (flags == BPF_TASK_ITER_ALL_PROCS)
> >> + goto get_next_task;
> >> +
> >> + kit->pos = next_thread(kit->pos);
> >> + if (kit->pos == kit->task) {
> >> + if (flags == BPF_TASK_ITER_PROC_THREADS) {
> >> + kit->pos = NULL;
> >> + goto out;
> >> + }
> >> + } else
> >> + goto out;
> >
> > nit: this should have {} around it to match the other if branch
> >
> > but actually, why goto out instead of return pos? same above, return
> > pos instead of goto out?
> >
>
> Thanks for the review.
>
>
> IIUC, do you mean:
>
yes, goto only makes sense when there is some common clean up or error
handling logic, in this case it's a plain return result, so no point.
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 0772545568f1..b35debf19edb 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -913,7 +913,7 @@ __bpf_kfunc struct task_struct
> *bpf_iter_task_next(struct bpf_iter_task *it)
> pos = kit->pos;
>
> if (!pos)
> - goto out;
> + return pos;
>
> if (flags == BPF_TASK_ITER_ALL_PROCS)
> goto get_next_task;
> @@ -922,18 +922,22 @@ __bpf_kfunc struct task_struct
> *bpf_iter_task_next(struct bpf_iter_task *it)
> if (kit->pos == kit->task) {
> if (flags == BPF_TASK_ITER_PROC_THREADS) {
> kit->pos = NULL;
> - goto out;
> + return pos;
> }
> } else
> - goto out;
> + return pos;
>
> + /*
> + * goto get_next_task means:
> + * case 1: flags == BPF_TASK_ITER_ALL_PROCS
> + * case 2: kit->pos == kit->task && flags ==
> BPF_TASK_ITER_ALL_THREADS
> + */
> get_next_task:
> kit->pos = next_task(kit->pos);
> kit->task = kit->pos;
> if (kit->pos == &init_task)
> kit->pos = NULL;
>
> -out:
> return pos;
>
>
>
> BTW, do you have some comments on patch-8 ? or I should send next
> version and pass all the CI first ?
>
I didn't think too hard about changes you are proposing, but yes, CI
should be green on submission, of course
> Thanks.
>
> >
> >> +
> >> +get_next_task:
> >> + kit->pos = next_task(kit->pos);
> >> + kit->task = kit->pos;
> >> + if (kit->pos == &init_task)
> >> + kit->pos = NULL;
> >> +
> >> +out:
> >> + return pos;
> >> +}
> >> +
> >> +__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it)
> >> +{
> >> +}
> >> +
> >> DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
> >>
> >> static void do_mmap_read_unlock(struct irq_work *entry)
> >> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> >> index 8b53537e0f27..1ec82997cce7 100644
> >> --- a/tools/testing/selftests/bpf/bpf_experimental.h
> >> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> >> @@ -457,5 +457,10 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> >> extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
> >> extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
> >>
> >> +struct bpf_iter_task;
> >> +extern int bpf_iter_task_new(struct bpf_iter_task *it,
> >> + struct task_struct *task, unsigned int flags) __weak __ksym;
> >> +extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym;
> >> +extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym;
> >>
> >> #endif
> >> --
> >> 2.20.1
> >>