2023-09-12 08:14:02

by Chuyi Zhou

[permalink] [raw]
Subject: [PATCH bpf-next v2 0/6] Add Open-coded process and css iters

Hi,

This is version 2 of process and css iters support. All the changes were
suggested by Alexei.

Thanks for your review!

--- Changelog ---
Changes from v1:
- Add a pre-patch to make some preparations before supporting css_task
iters.
- Add an allowlist for css_task iters
- Let bpf progs do explicit bpf_rcu_read_lock() when using process iters
and css_descendant iters.
---------------------

In some BPF usage scenarios, it will be useful to iterate the process and
css directly in the BPF program. One of the expected scenarios is
customizable OOM victim selection via BPF[1].

Inspired by Dave's task_vma iter[2], this patchset adds three types of
open-coded iterator kfuncs:

1. bpf_for_each(process, p). Just like for_each_process(p) in kernel to
itearing all tasks in the system.

2. bpf_for_each(css_task, task, css). It works like
css_task_iter_{start, next, end} and would be used to iterating
tasks/threads under a css.

3. bpf_for_each(css_{post, pre}, pos, root_css). It works like
css_next_descendant_{pre, post} to iterating all descendant css.

BPF programs can use these kfuncs directly or through bpf_for_each macro.

link[1]: https://lore.kernel.org/lkml/[email protected]/
link[2]: https://lore.kernel.org/all/[email protected]/

Chuyi Zhou (6):
cgroup: Prepare for using css_task_iter_*() in BPF
bpf: Introduce css_task open-coded iterator kfuncs
bpf: Introduce process open coded iterator kfuncs
bpf: Introduce css_descendant open-coded iterator kfuncs
bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
selftests/bpf: Add tests for open-coded task and css iter

include/linux/cgroup.h | 12 +-
include/uapi/linux/bpf.h | 16 ++
kernel/bpf/helpers.c | 12 ++
kernel/bpf/task_iter.c | 130 +++++++++++++++++
kernel/bpf/verifier.c | 53 ++++++-
kernel/cgroup/cgroup.c | 18 ++-
tools/include/uapi/linux/bpf.h | 16 ++
tools/lib/bpf/bpf_helpers.h | 24 +++
.../testing/selftests/bpf/prog_tests/iters.c | 138 ++++++++++++++++++
.../testing/selftests/bpf/progs/iters_task.c | 104 +++++++++++++
10 files changed, 508 insertions(+), 15 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c

--
2.20.1


2023-09-12 08:51:24

by Chuyi Zhou

[permalink] [raw]
Subject: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs

This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css_task in open-coded
iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
end}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all tasks under a css.

css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
the bpf side has to be careful in where it allows to use this iter.
Currently we only allow it in bpf_lsm and bpf iter-s.

Signed-off-by: Chuyi Zhou <[email protected]>
---
include/uapi/linux/bpf.h | 4 +++
kernel/bpf/helpers.c | 3 +++
kernel/bpf/task_iter.c | 48 ++++++++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 23 ++++++++++++++++
tools/include/uapi/linux/bpf.h | 4 +++
tools/lib/bpf/bpf_helpers.h | 7 +++++
6 files changed, 89 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 73b155e52204..de02c0971428 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7318,4 +7318,8 @@ struct bpf_iter_num {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+struct bpf_iter_css_task {
+ __u64 __opaque[1];
+} __attribute__((aligned(8)));
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index b0a9834f1051..d6a16becfbb9 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 7473068ed313..d8539cc05ffd 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -803,6 +803,54 @@ const struct bpf_func_proto bpf_find_vma_proto = {
.arg5_type = ARG_ANYTHING,
};

+struct bpf_iter_css_task_kern {
+ struct css_task_iter *css_it;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
+ struct cgroup_subsys_state *css, unsigned int flags)
+{
+ struct bpf_iter_css_task_kern *kit = (void *)it;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
+ __alignof__(struct bpf_iter_css_task));
+
+ switch (flags) {
+ case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
+ case CSS_TASK_ITER_PROCS:
+ case 0:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);
+ if (!kit->css_it)
+ return -ENOMEM;
+ css_task_iter_start(css, flags, kit->css_it);
+ return 0;
+}
+
+__bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
+{
+ struct bpf_iter_css_task_kern *kit = (void *)it;
+
+ if (!kit->css_it)
+ return NULL;
+ return css_task_iter_next(kit->css_it);
+}
+
+__bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
+{
+ struct bpf_iter_css_task_kern *kit = (void *)it;
+
+ if (!kit->css_it)
+ return;
+ css_task_iter_end(kit->css_it);
+ kfree(kit->css_it);
+}
+
DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);

static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dbba2b806017..2367483bf4c2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10332,6 +10332,7 @@ enum special_kfunc_type {
KF_bpf_dynptr_clone,
KF_bpf_percpu_obj_new_impl,
KF_bpf_percpu_obj_drop_impl,
+ KF_bpf_iter_css_task_new,
};

BTF_SET_START(special_kfunc_set)
@@ -10354,6 +10355,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
BTF_ID(func, bpf_dynptr_clone)
BTF_ID(func, bpf_percpu_obj_new_impl)
BTF_ID(func, bpf_percpu_obj_drop_impl)
+BTF_ID(func, bpf_iter_css_task_new)
BTF_SET_END(special_kfunc_set)

BTF_ID_LIST(special_kfunc_list)
@@ -10378,6 +10380,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
BTF_ID(func, bpf_dynptr_clone)
BTF_ID(func, bpf_percpu_obj_new_impl)
BTF_ID(func, bpf_percpu_obj_drop_impl)
+BTF_ID(func, bpf_iter_css_task_new)

static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
{
@@ -10902,6 +10905,20 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
&meta->arg_rbtree_root.field);
}

+static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
+{
+ enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
+
+ switch (prog_type) {
+ case BPF_PROG_TYPE_LSM:
+ return true;
+ case BPF_TRACE_ITER:
+ return env->prog->aux->sleepable;
+ default:
+ return false;
+ }
+}
+
static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta,
int insn_idx)
{
@@ -11152,6 +11169,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
break;
}
case KF_ARG_PTR_TO_ITER:
+ if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) {
+ if (!check_css_task_iter_allowlist(env)) {
+ verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n");
+ return -EINVAL;
+ }
+ }
ret = process_iter_arg(env, regno, insn_idx, meta);
if (ret < 0)
return ret;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 73b155e52204..de02c0971428 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7318,4 +7318,8 @@ struct bpf_iter_num {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+struct bpf_iter_css_task {
+ __u64 __opaque[1];
+} __attribute__((aligned(8)));
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 77ceea575dc7..f48723c6c593 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -303,6 +303,13 @@ extern int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) __weak
extern int *bpf_iter_num_next(struct bpf_iter_num *it) __weak __ksym;
extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym;

+struct bpf_iter_css_task;
+struct cgroup_subsys_state;
+extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
+ struct cgroup_subsys_state *css, unsigned int flags) __weak __ksym;
+extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
+extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
+
#ifndef bpf_for_each
/* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
* using BPF open-coded iterators without having to write mundane explicit
--
2.20.1

2023-09-12 13:13:21

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 0/6] Add Open-coded process and css iters



在 2023/9/12 15:01, Chuyi Zhou 写道:
> Hi,
>
> This is version 2 of process and css iters support. All the changes were
> suggested by Alexei.
>
> Thanks for your review!
>
> --- Changelog ---
> Changes from v1:
> - Add a pre-patch to make some preparations before supporting css_task
> iters.
> - Add an allowlist for css_task iters
> - Let bpf progs do explicit bpf_rcu_read_lock() when using process iters
> and css_descendant iters.

Sorry for missing the link to v1
(https://lore.kernel.org/lkml/[email protected]/).


> ---------------------
>
> In some BPF usage scenarios, it will be useful to iterate the process and
> css directly in the BPF program. One of the expected scenarios is
> customizable OOM victim selection via BPF[1].
>
> Inspired by Dave's task_vma iter[2], this patchset adds three types of
> open-coded iterator kfuncs:
>
> 1. bpf_for_each(process, p). Just like for_each_process(p) in kernel to
> itearing all tasks in the system.
>
> 2. bpf_for_each(css_task, task, css). It works like
> css_task_iter_{start, next, end} and would be used to iterating
> tasks/threads under a css.
>
> 3. bpf_for_each(css_{post, pre}, pos, root_css). It works like
> css_next_descendant_{pre, post} to iterating all descendant css.
>
> BPF programs can use these kfuncs directly or through bpf_for_each macro.
>
> link[1]: https://lore.kernel.org/lkml/[email protected]/
> link[2]: https://lore.kernel.org/all/[email protected]/
>
> Chuyi Zhou (6):
> cgroup: Prepare for using css_task_iter_*() in BPF
> bpf: Introduce css_task open-coded iterator kfuncs
> bpf: Introduce process open coded iterator kfuncs
> bpf: Introduce css_descendant open-coded iterator kfuncs
> bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
> selftests/bpf: Add tests for open-coded task and css iter
>
> include/linux/cgroup.h | 12 +-
> include/uapi/linux/bpf.h | 16 ++
> kernel/bpf/helpers.c | 12 ++
> kernel/bpf/task_iter.c | 130 +++++++++++++++++
> kernel/bpf/verifier.c | 53 ++++++-
> kernel/cgroup/cgroup.c | 18 ++-
> tools/include/uapi/linux/bpf.h | 16 ++
> tools/lib/bpf/bpf_helpers.h | 24 +++
> .../testing/selftests/bpf/prog_tests/iters.c | 138 ++++++++++++++++++
> .../testing/selftests/bpf/progs/iters_task.c | 104 +++++++++++++
> 10 files changed, 508 insertions(+), 15 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
>

2023-09-12 16:45:20

by Chuyi Zhou

[permalink] [raw]
Subject: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs

This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_process in open-coded iterator
style. BPF programs can use these kfuncs or through bpf_for_each macro to
iterate all processes in the system.

Signed-off-by: Chuyi Zhou <[email protected]>
---
include/uapi/linux/bpf.h | 4 ++++
kernel/bpf/helpers.c | 3 +++
kernel/bpf/task_iter.c | 29 +++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 4 ++++
tools/lib/bpf/bpf_helpers.h | 5 +++++
5 files changed, 45 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index de02c0971428..befa55b52e29 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+struct bpf_iter_process {
+ __u64 __opaque[1];
+} __attribute__((aligned(8)));
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index d6a16becfbb9..9b7d2c6f99d1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index d8539cc05ffd..9d1927dc3a06 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
kfree(kit->css_it);
}

+struct bpf_iter_process_kern {
+ struct task_struct *tsk;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
+{
+ struct bpf_iter_process_kern *kit = (void *)it;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_process_kern) != sizeof(struct bpf_iter_process));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_process_kern) !=
+ __alignof__(struct bpf_iter_process));
+
+ kit->tsk = &init_task;
+ return 0;
+}
+
+__bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
+{
+ struct bpf_iter_process_kern *kit = (void *)it;
+
+ kit->tsk = next_task(kit->tsk);
+
+ return kit->tsk == &init_task ? NULL : kit->tsk;
+}
+
+__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
+{
+}
+
DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);

static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index de02c0971428..befa55b52e29 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+struct bpf_iter_process {
+ __u64 __opaque[1];
+} __attribute__((aligned(8)));
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index f48723c6c593..858252c2641c 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -310,6 +310,11 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;

+struct bpf_iter_process;
+extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
+extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
+extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;
+
#ifndef bpf_for_each
/* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
* using BPF open-coded iterators without having to write mundane explicit
--
2.20.1

2023-09-12 16:47:25

by Chuyi Zhou

[permalink] [raw]
Subject: [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF

This patch makes some preparations for using css_task_iter_*() in BPF
Program.

1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to
use them. Convert them to enum so bpf prog can take them from vmlinux.h.

2. In the next patch we will add css_task_iter_*() in common kfuncs which
is not safe. Since css_task_iter_*() does spin_unlock_irq() which might
screw up irq flags depending on the context where bpf prog is running.
So we should use irqsave/irqrestore here and the switching is harmless.

Signed-off-by: Chuyi Zhou <[email protected]>
---
include/linux/cgroup.h | 12 +++++-------
kernel/cgroup/cgroup.c | 18 ++++++++++++------
2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index b307013b9c6c..0ef0af66080e 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -40,13 +40,11 @@ struct kernel_clone_args;
#define CGROUP_WEIGHT_DFL 100
#define CGROUP_WEIGHT_MAX 10000

-/* walk only threadgroup leaders */
-#define CSS_TASK_ITER_PROCS (1U << 0)
-/* walk all threaded css_sets in the domain */
-#define CSS_TASK_ITER_THREADED (1U << 1)
-
-/* internal flags */
-#define CSS_TASK_ITER_SKIPPED (1U << 16)
+enum {
+ CSS_TASK_ITER_PROCS = (1U << 0), /* walk only threadgroup leaders */
+ CSS_TASK_ITER_THREADED = (1U << 1), /* walk all threaded css_sets in the domain */
+ CSS_TASK_ITER_SKIPPED = (1U << 16), /* internal flags */
+};

/* a css_task_iter should be treated as an opaque object */
struct css_task_iter {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1fb7f562289d..b6d64f3b8888 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4917,9 +4917,11 @@ static void css_task_iter_advance(struct css_task_iter *it)
void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
struct css_task_iter *it)
{
+ unsigned long irqflags;
+
memset(it, 0, sizeof(*it));

- spin_lock_irq(&css_set_lock);
+ spin_lock_irqsave(&css_set_lock, irqflags);

it->ss = css->ss;
it->flags = flags;
@@ -4933,7 +4935,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,

css_task_iter_advance(it);

- spin_unlock_irq(&css_set_lock);
+ spin_unlock_irqrestore(&css_set_lock, irqflags);
}

/**
@@ -4946,12 +4948,14 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags,
*/
struct task_struct *css_task_iter_next(struct css_task_iter *it)
{
+ unsigned long irqflags;
+
if (it->cur_task) {
put_task_struct(it->cur_task);
it->cur_task = NULL;
}

- spin_lock_irq(&css_set_lock);
+ spin_lock_irqsave(&css_set_lock, irqflags);

/* @it may be half-advanced by skips, finish advancing */
if (it->flags & CSS_TASK_ITER_SKIPPED)
@@ -4964,7 +4968,7 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
css_task_iter_advance(it);
}

- spin_unlock_irq(&css_set_lock);
+ spin_unlock_irqrestore(&css_set_lock, irqflags);

return it->cur_task;
}
@@ -4977,11 +4981,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it)
*/
void css_task_iter_end(struct css_task_iter *it)
{
+ unsigned long irqflags;
+
if (it->cur_cset) {
- spin_lock_irq(&css_set_lock);
+ spin_lock_irqsave(&css_set_lock, irqflags);
list_del(&it->iters_node);
put_css_set_locked(it->cur_cset);
- spin_unlock_irq(&css_set_lock);
+ spin_unlock_irqrestore(&css_set_lock, irqflags);
}

if (it->cur_dcset)
--
2.20.1

2023-09-12 16:51:11

by Chuyi Zhou

[permalink] [raw]
Subject: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs

This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
allow creation and manipulation of struct bpf_iter_css in open-coded
iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
post}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all descendant css under a root css.

Signed-off-by: Chuyi Zhou <[email protected]>
---
include/uapi/linux/bpf.h | 8 +++++
kernel/bpf/helpers.c | 6 ++++
kernel/bpf/task_iter.c | 53 ++++++++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 8 +++++
tools/lib/bpf/bpf_helpers.h | 12 ++++++++
5 files changed, 87 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index befa55b52e29..57760afc13d0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7326,4 +7326,12 @@ struct bpf_iter_process {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+struct bpf_iter_css_pre {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_css_post {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9b7d2c6f99d1..ca1f6404af9e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 9d1927dc3a06..8963fc779b87 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
{
}

+struct bpf_iter_css_kern {
+ struct cgroup_subsys_state *root;
+ struct cgroup_subsys_state *pos;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
+ struct cgroup_subsys_state *root)
+{
+ struct bpf_iter_css_kern *kit = (void *)it;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
+ kit->root = root;
+ kit->pos = NULL;
+ return 0;
+}
+
+__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
+{
+ struct bpf_iter_css_kern *kit = (void *)it;
+
+ kit->pos = css_next_descendant_pre(kit->pos, kit->root);
+ return kit->pos;
+}
+
+__bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
+{
+}
+
+__bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
+ struct cgroup_subsys_state *root)
+{
+ struct bpf_iter_css_kern *kit = (void *)it;
+
+ BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
+ BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
+ kit->root = root;
+ kit->pos = NULL;
+ return 0;
+}
+
+__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
+{
+ struct bpf_iter_css_kern *kit = (void *)it;
+
+ kit->pos = css_next_descendant_post(kit->pos, kit->root);
+ return kit->pos;
+}
+
+__bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
+{
+}
+
DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);

static void do_mmap_read_unlock(struct irq_work *entry)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index befa55b52e29..57760afc13d0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7326,4 +7326,12 @@ struct bpf_iter_process {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+struct bpf_iter_css_pre {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_css_post {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 858252c2641c..6e5bd9ef14d6 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -315,6 +315,18 @@ extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;

+struct bpf_iter_css_pre;
+extern int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
+ struct cgroup_subsys_state *root) __weak __ksym;
+extern struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it) __weak __ksym;
+extern void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it) __weak __ksym;
+
+struct bpf_iter_css_post;
+extern int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
+ struct cgroup_subsys_state *root) __weak __ksym;
+extern struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it) __weak __ksym;
+extern void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it) __weak __ksym;
+
#ifndef bpf_for_each
/* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
* using BPF open-coded iterators without having to write mundane explicit
--
2.20.1

2023-09-12 16:54:50

by Chuyi Zhou

[permalink] [raw]
Subject: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

css_iter and process_iter should be used in rcu section. Specifically, in
sleepable progs explicit bpf_rcu_read_lock() is needed before use these
iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
use them directly.

This patch checks whether we are in rcu cs before we want to invoke
bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
reject if reg->type is UNTRUSTED.

Signed-off-by: Chuyi Zhou <[email protected]>
---
kernel/bpf/verifier.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2367483bf4c2..6a6827ba7a18 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1172,7 +1172,13 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg

static void __mark_reg_known_zero(struct bpf_reg_state *reg);

+static bool in_rcu_cs(struct bpf_verifier_env *env);
+
+/* check whether we are using bpf_iter_process_*() or bpf_iter_css_*() */
+static bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta);
+
static int mark_stack_slots_iter(struct bpf_verifier_env *env,
+ struct bpf_kfunc_call_arg_meta *meta,
struct bpf_reg_state *reg, int insn_idx,
struct btf *btf, u32 btf_id, int nr_slots)
{
@@ -1193,6 +1199,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,

__mark_reg_known_zero(st);
st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
+ if (is_iter_need_rcu(meta)) {
+ if (in_rcu_cs(env))
+ st->type |= MEM_RCU;
+ else
+ st->type |= PTR_UNTRUSTED;
+ }
st->live |= REG_LIVE_WRITTEN;
st->ref_obj_id = i == 0 ? id : 0;
st->iter.btf = btf;
@@ -1281,6 +1293,8 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
struct bpf_stack_state *slot = &state->stack[spi - i];
struct bpf_reg_state *st = &slot->spilled_ptr;

+ if (st->type & PTR_UNTRUSTED)
+ return false;
/* only main (first) slot has ref_obj_id set */
if (i == 0 && !st->ref_obj_id)
return false;
@@ -7503,13 +7517,13 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
return err;
}

- err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
+ err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
if (err)
return err;
} else {
/* iter_next() or iter_destroy() expect initialized iter state*/
if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
- verbose(env, "expected an initialized iter_%s as arg #%d\n",
+ verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n",
iter_type_str(meta->btf, btf_id), regno);
return -EINVAL;
}
@@ -10382,6 +10396,18 @@ BTF_ID(func, bpf_percpu_obj_new_impl)
BTF_ID(func, bpf_percpu_obj_drop_impl)
BTF_ID(func, bpf_iter_css_task_new)

+BTF_SET_START(rcu_protect_kfuns_set)
+BTF_ID(func, bpf_iter_process_new)
+BTF_ID(func, bpf_iter_css_pre_new)
+BTF_ID(func, bpf_iter_css_post_new)
+BTF_SET_END(rcu_protect_kfuns_set)
+
+static inline bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta)
+{
+ return btf_id_set_contains(&rcu_protect_kfuns_set, meta->func_id);
+}
+
+
static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
{
if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
--
2.20.1

2023-09-12 22:18:44

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs

On 9/12/23 12:01 AM, Chuyi Zhou wrote:
> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> + struct cgroup_subsys_state *css, unsigned int flags)
> +{
> + struct bpf_iter_css_task_kern *kit = (void *)it;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
> + __alignof__(struct bpf_iter_css_task));
> +
> + switch (flags) {
> + case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
> + case CSS_TASK_ITER_PROCS:
> + case 0:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);
> + if (!kit->css_it)
> + return -ENOMEM;
> + css_task_iter_start(css, flags, kit->css_it);
> + return 0;
> +}
> +

> +static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
> +{
> + enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
> +
> + switch (prog_type) {
> + case BPF_PROG_TYPE_LSM:

This will allow the non-sleepable lsm prog to call bpf_iter_css_task_new. The
above kzalloc(GFP_KERNEL) in bpf_iter_css_task_new should trigger a lockdep
error in the lsm selftest in patch 6.

> + return true;
> + case BPF_TRACE_ITER:
> + return env->prog->aux->sleepable;
> + default:
> + return false;
> + }
> +}

2023-09-12 22:32:30

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs

On Tue, Sep 12, 2023 at 03:01:45PM +0800, Chuyi Zhou wrote:
> This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_css_task in open-coded
> iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
> end}. BPF programs can use these kfuncs through bpf_for_each macro for
> iteration of all tasks under a css.
>
> css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
> the bpf side has to be careful in where it allows to use this iter.
> Currently we only allow it in bpf_lsm and bpf iter-s.
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> include/uapi/linux/bpf.h | 4 +++
> kernel/bpf/helpers.c | 3 +++
> kernel/bpf/task_iter.c | 48 ++++++++++++++++++++++++++++++++++
> kernel/bpf/verifier.c | 23 ++++++++++++++++
> tools/include/uapi/linux/bpf.h | 4 +++
> tools/lib/bpf/bpf_helpers.h | 7 +++++
> 6 files changed, 89 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 73b155e52204..de02c0971428 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7318,4 +7318,8 @@ struct bpf_iter_num {
> __u64 __opaque[1];
> } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_task {
> + __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index b0a9834f1051..d6a16becfbb9 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
> BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)

Looking at selftest:
struct cgroup_subsys_state *css = &cgrp->self;

realized that we're missing KF_TRUSTED_ARGS here.

> +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 7473068ed313..d8539cc05ffd 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -803,6 +803,54 @@ const struct bpf_func_proto bpf_find_vma_proto = {
> .arg5_type = ARG_ANYTHING,
> };
>
> +struct bpf_iter_css_task_kern {
> + struct css_task_iter *css_it;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> + struct cgroup_subsys_state *css, unsigned int flags)

the verifier does a type check, but it's not strong enough.
We need KF_TRUSTED_ARGS to make sure the pointer is valid.
The BTF_TYPE_SAFE_RCU(struct cgroup) {
probably doesn't need to change, since '&cgrp->self' is not a pointer deref.
The verifier should understand that cgroup_subsys_state is also PTR_TRUSTED
just like 'cgrp' pointer.

Also please add negative tests in patch 6.
Like doing bpf_rcu_read_unlock() in the middle and check that the verifier
catches such mistake.

2023-09-13 07:06:34

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs

Hello Martin.

在 2023/9/13 03:57, Martin KaFai Lau 写道:
> On 9/12/23 12:01 AM, Chuyi Zhou wrote:
>> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
>> +        struct cgroup_subsys_state *css, unsigned int flags)
>> +{
>> +    struct bpf_iter_css_task_kern *kit = (void *)it;
>> +
>> +    BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) !=
>> sizeof(struct bpf_iter_css_task));
>> +    BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
>> +                    __alignof__(struct bpf_iter_css_task));
>> +
>> +    switch (flags) {
>> +    case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
>> +    case CSS_TASK_ITER_PROCS:
>> +    case 0:
>> +        break;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +
>> +    kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);
>> +    if (!kit->css_it)
>> +        return -ENOMEM;
>> +    css_task_iter_start(css, flags, kit->css_it);
>> +    return 0;
>> +}
>> +
>
>> +static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
>> +{
>> +    enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
>> +
>> +    switch (prog_type) {
>> +    case BPF_PROG_TYPE_LSM:
>
> This will allow the non-sleepable lsm prog to call
> bpf_iter_css_task_new. The above kzalloc(GFP_KERNEL) in
> bpf_iter_css_task_new should trigger a lockdep error in the lsm selftest
> in patch 6.

Thanks for your test. I missed the dmesg error since I did not set
CONFIG_LOCKDEP.

I think here we can use kzalloc(GFP_ATOMIC) to eliminate this error if
we want to use this iter in non-sleepable lsm prog. I just tested, it
works well.

>
>> +        return true;
>> +    case BPF_TRACE_ITER:
>> +        return env->prog->aux->sleepable;
>> +    default:
>> +        return false;
>> +    }
>> +}
>

2023-09-13 12:06:17

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs

Hi Chuyi,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url: https://github.com/intel-lab-lkp/linux/commits/Chuyi-Zhou/cgroup-Prepare-for-using-css_task_iter_-in-BPF/20230912-150454
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20230912070149.969939-5-zhouchuyi%40bytedance.com
patch subject: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
config: arm-randconfig-r032-20230913 (https://download.01.org/0day-ci/archive/20230913/[email protected]/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

kernel/bpf/task_iter.c:810:17: warning: no previous prototype for function 'bpf_iter_css_task_new' [-Wmissing-prototypes]
810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
| ^
kernel/bpf/task_iter.c:810:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
| ^
| static
kernel/bpf/task_iter.c:835:33: warning: no previous prototype for function 'bpf_iter_css_task_next' [-Wmissing-prototypes]
835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
| ^
kernel/bpf/task_iter.c:835:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
| ^
| static
kernel/bpf/task_iter.c:844:18: warning: no previous prototype for function 'bpf_iter_css_task_destroy' [-Wmissing-prototypes]
844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
| ^
kernel/bpf/task_iter.c:844:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
| ^
| static
kernel/bpf/task_iter.c:858:17: warning: no previous prototype for function 'bpf_iter_process_new' [-Wmissing-prototypes]
858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
| ^
kernel/bpf/task_iter.c:858:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
| ^
| static
kernel/bpf/task_iter.c:870:33: warning: no previous prototype for function 'bpf_iter_process_next' [-Wmissing-prototypes]
870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
| ^
kernel/bpf/task_iter.c:870:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
| ^
| static
kernel/bpf/task_iter.c:879:18: warning: no previous prototype for function 'bpf_iter_process_destroy' [-Wmissing-prototypes]
879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
| ^
kernel/bpf/task_iter.c:879:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
| ^
| static
kernel/bpf/task_iter.c:888:17: warning: no previous prototype for function 'bpf_iter_css_pre_new' [-Wmissing-prototypes]
888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
| ^
kernel/bpf/task_iter.c:888:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
| ^
| static
kernel/bpf/task_iter.c:900:41: warning: no previous prototype for function 'bpf_iter_css_pre_next' [-Wmissing-prototypes]
900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
| ^
kernel/bpf/task_iter.c:900:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
| ^
| static
kernel/bpf/task_iter.c:908:18: warning: no previous prototype for function 'bpf_iter_css_pre_destroy' [-Wmissing-prototypes]
908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
| ^
kernel/bpf/task_iter.c:908:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
| ^
| static
kernel/bpf/task_iter.c:912:17: warning: no previous prototype for function 'bpf_iter_css_post_new' [-Wmissing-prototypes]
912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
| ^
kernel/bpf/task_iter.c:912:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
| ^
| static
kernel/bpf/task_iter.c:924:41: warning: no previous prototype for function 'bpf_iter_css_post_next' [-Wmissing-prototypes]
924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
| ^
kernel/bpf/task_iter.c:924:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
| ^
| static
kernel/bpf/task_iter.c:932:18: warning: no previous prototype for function 'bpf_iter_css_post_destroy' [-Wmissing-prototypes]
932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
| ^
kernel/bpf/task_iter.c:932:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
| ^
| static
>> kernel/bpf/task_iter.c:893:2: error: call to '__compiletime_assert_389' declared with 'error' attribute: BUILD_BUG_ON failed: sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre)
893 | BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
| ^
include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^
include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^
include/linux/compiler_types.h:425:2: note: expanded from macro 'compiletime_assert'
425 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:413:2: note: expanded from macro '_compiletime_assert'
413 | __compiletime_assert(condition, msg, prefix, suffix)
| ^
include/linux/compiler_types.h:406:4: note: expanded from macro '__compiletime_assert'
406 | prefix ## suffix(); \
| ^
<scratch space>:6:1: note: expanded from here
6 | __compiletime_assert_389
| ^
>> kernel/bpf/task_iter.c:917:2: error: call to '__compiletime_assert_391' declared with 'error' attribute: BUILD_BUG_ON failed: sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post)
917 | BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
| ^
include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^
include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^
include/linux/compiler_types.h:425:2: note: expanded from macro 'compiletime_assert'
425 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:413:2: note: expanded from macro '_compiletime_assert'
413 | __compiletime_assert(condition, msg, prefix, suffix)
| ^
include/linux/compiler_types.h:406:4: note: expanded from macro '__compiletime_assert'
406 | prefix ## suffix(); \
| ^
<scratch space>:14:1: note: expanded from here
14 | __compiletime_assert_391
| ^
12 warnings and 2 errors generated.


vim +893 kernel/bpf/task_iter.c

887
888 __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
889 struct cgroup_subsys_state *root)
890 {
891 struct bpf_iter_css_kern *kit = (void *)it;
892
> 893 BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
894 BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
895 kit->root = root;
896 kit->pos = NULL;
897 return 0;
898 }
899
900 __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
901 {
902 struct bpf_iter_css_kern *kit = (void *)it;
903
904 kit->pos = css_next_descendant_pre(kit->pos, kit->root);
905 return kit->pos;
906 }
907
908 __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
909 {
910 }
911
912 __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
913 struct cgroup_subsys_state *root)
914 {
915 struct bpf_iter_css_kern *kit = (void *)it;
916
> 917 BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
918 BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
919 kit->root = root;
920 kit->pos = NULL;
921 return 0;
922 }
923

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-09-13 12:20:43

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs

Hi Chuyi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url: https://github.com/intel-lab-lkp/linux/commits/Chuyi-Zhou/cgroup-Prepare-for-using-css_task_iter_-in-BPF/20230912-150454
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20230912070149.969939-5-zhouchuyi%40bytedance.com
patch subject: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
config: hexagon-randconfig-r032-20230913 (https://download.01.org/0day-ci/archive/20230913/[email protected]/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

573 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
| ^
In file included from kernel/bpf/task_iter.c:9:
In file included from include/linux/filter.h:9:
In file included from include/linux/bpf.h:31:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:26:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:337:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
584 | __raw_writeb(value, PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
594 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
604 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
kernel/bpf/task_iter.c:820:7: error: use of undeclared identifier 'CSS_TASK_ITER_PROCS'
820 | case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
| ^
kernel/bpf/task_iter.c:820:29: error: use of undeclared identifier 'CSS_TASK_ITER_THREADED'
820 | case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
| ^
kernel/bpf/task_iter.c:821:7: error: use of undeclared identifier 'CSS_TASK_ITER_PROCS'
821 | case CSS_TASK_ITER_PROCS:
| ^
kernel/bpf/task_iter.c:828:24: error: invalid application of 'sizeof' to an incomplete type 'struct css_task_iter'
828 | kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);
| ^ ~~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/task_iter.c:807:9: note: forward declaration of 'struct css_task_iter'
807 | struct css_task_iter *css_it;
| ^
kernel/bpf/task_iter.c:831:2: error: call to undeclared function 'css_task_iter_start'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
831 | css_task_iter_start(css, flags, kit->css_it);
| ^
kernel/bpf/task_iter.c:831:2: note: did you mean '__sg_page_iter_start'?
include/linux/scatterlist.h:573:6: note: '__sg_page_iter_start' declared here
573 | void __sg_page_iter_start(struct sg_page_iter *piter,
| ^
kernel/bpf/task_iter.c:810:17: warning: no previous prototype for function 'bpf_iter_css_task_new' [-Wmissing-prototypes]
810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
| ^
kernel/bpf/task_iter.c:810:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
| ^
| static
kernel/bpf/task_iter.c:841:9: error: call to undeclared function 'css_task_iter_next'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
841 | return css_task_iter_next(kit->css_it);
| ^
kernel/bpf/task_iter.c:841:9: error: incompatible integer to pointer conversion returning 'int' from a function with result type 'struct task_struct *' [-Wint-conversion]
841 | return css_task_iter_next(kit->css_it);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/task_iter.c:835:33: warning: no previous prototype for function 'bpf_iter_css_task_next' [-Wmissing-prototypes]
835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
| ^
kernel/bpf/task_iter.c:835:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
| ^
| static
kernel/bpf/task_iter.c:850:2: error: call to undeclared function 'css_task_iter_end'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
850 | css_task_iter_end(kit->css_it);
| ^
kernel/bpf/task_iter.c:844:18: warning: no previous prototype for function 'bpf_iter_css_task_destroy' [-Wmissing-prototypes]
844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
| ^
kernel/bpf/task_iter.c:844:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
| ^
| static
kernel/bpf/task_iter.c:858:17: warning: no previous prototype for function 'bpf_iter_process_new' [-Wmissing-prototypes]
858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
| ^
kernel/bpf/task_iter.c:858:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
| ^
| static
kernel/bpf/task_iter.c:870:33: warning: no previous prototype for function 'bpf_iter_process_next' [-Wmissing-prototypes]
870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
| ^
kernel/bpf/task_iter.c:870:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
| ^
| static
kernel/bpf/task_iter.c:879:18: warning: no previous prototype for function 'bpf_iter_process_destroy' [-Wmissing-prototypes]
879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
| ^
kernel/bpf/task_iter.c:879:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
| ^
| static
>> kernel/bpf/task_iter.c:888:17: warning: no previous prototype for function 'bpf_iter_css_pre_new' [-Wmissing-prototypes]
888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
| ^
kernel/bpf/task_iter.c:888:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
| ^
| static
kernel/bpf/task_iter.c:904:13: error: call to undeclared function 'css_next_descendant_pre'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
904 | kit->pos = css_next_descendant_pre(kit->pos, kit->root);
| ^
kernel/bpf/task_iter.c:904:11: error: incompatible integer to pointer conversion assigning to 'struct cgroup_subsys_state *' from 'int' [-Wint-conversion]
904 | kit->pos = css_next_descendant_pre(kit->pos, kit->root);
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:900:41: warning: no previous prototype for function 'bpf_iter_css_pre_next' [-Wmissing-prototypes]
900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
| ^
kernel/bpf/task_iter.c:900:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
| ^
| static
>> kernel/bpf/task_iter.c:908:18: warning: no previous prototype for function 'bpf_iter_css_pre_destroy' [-Wmissing-prototypes]
908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
| ^
kernel/bpf/task_iter.c:908:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
| ^
| static
>> kernel/bpf/task_iter.c:912:17: warning: no previous prototype for function 'bpf_iter_css_post_new' [-Wmissing-prototypes]
912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
| ^
kernel/bpf/task_iter.c:912:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
| ^
| static
kernel/bpf/task_iter.c:928:13: error: call to undeclared function 'css_next_descendant_post'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
928 | kit->pos = css_next_descendant_post(kit->pos, kit->root);
| ^
kernel/bpf/task_iter.c:928:11: error: incompatible integer to pointer conversion assigning to 'struct cgroup_subsys_state *' from 'int' [-Wint-conversion]
928 | kit->pos = css_next_descendant_post(kit->pos, kit->root);
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:924:41: warning: no previous prototype for function 'bpf_iter_css_post_next' [-Wmissing-prototypes]
924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
| ^
kernel/bpf/task_iter.c:924:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
| ^
| static
>> kernel/bpf/task_iter.c:932:18: warning: no previous prototype for function 'bpf_iter_css_post_destroy' [-Wmissing-prototypes]
932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
| ^
kernel/bpf/task_iter.c:932:13: note: declare 'static' if the function is not intended to be used outside of this translation unit
932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
| ^
| static
18 warnings and 12 errors generated.


vim +/bpf_iter_css_pre_new +888 kernel/bpf/task_iter.c

878
> 879 __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
880 {
881 }
882
883 struct bpf_iter_css_kern {
884 struct cgroup_subsys_state *root;
885 struct cgroup_subsys_state *pos;
886 } __attribute__((aligned(8)));
887
> 888 __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
889 struct cgroup_subsys_state *root)
890 {
891 struct bpf_iter_css_kern *kit = (void *)it;
892
893 BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
894 BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
895 kit->root = root;
896 kit->pos = NULL;
897 return 0;
898 }
899
> 900 __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
901 {
902 struct bpf_iter_css_kern *kit = (void *)it;
903
> 904 kit->pos = css_next_descendant_pre(kit->pos, kit->root);
905 return kit->pos;
906 }
907
> 908 __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
909 {
910 }
911
> 912 __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
913 struct cgroup_subsys_state *root)
914 {
915 struct bpf_iter_css_kern *kit = (void *)it;
916
917 BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
918 BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
919 kit->root = root;
920 kit->pos = NULL;
921 return 0;
922 }
923
> 924 __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
925 {
926 struct bpf_iter_css_kern *kit = (void *)it;
927
> 928 kit->pos = css_next_descendant_post(kit->pos, kit->root);
929 return kit->pos;
930 }
931
> 932 __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
933 {
934 }
935

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-09-13 16:18:24

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs

Hello, Alexei.

在 2023/9/13 01:13, Alexei Starovoitov 写道:
> On Tue, Sep 12, 2023 at 03:01:45PM +0800, Chuyi Zhou wrote:
>> This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_css_task in open-coded
>> iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
>> end}. BPF programs can use these kfuncs through bpf_for_each macro for
>> iteration of all tasks under a css.
>>
>> css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
>> the bpf side has to be careful in where it allows to use this iter.
>> Currently we only allow it in bpf_lsm and bpf iter-s.
>>
>> Signed-off-by: Chuyi Zhou <[email protected]>
>> ---
>> include/uapi/linux/bpf.h | 4 +++
>> kernel/bpf/helpers.c | 3 +++
>> kernel/bpf/task_iter.c | 48 ++++++++++++++++++++++++++++++++++
>> kernel/bpf/verifier.c | 23 ++++++++++++++++
>> tools/include/uapi/linux/bpf.h | 4 +++
>> tools/lib/bpf/bpf_helpers.h | 7 +++++
>> 6 files changed, 89 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 73b155e52204..de02c0971428 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7318,4 +7318,8 @@ struct bpf_iter_num {
>> __u64 __opaque[1];
>> } __attribute__((aligned(8)));
>>
>> +struct bpf_iter_css_task {
>> + __u64 __opaque[1];
>> +} __attribute__((aligned(8)));
>> +
>> #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index b0a9834f1051..d6a16becfbb9 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
>> BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
>> BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
>> BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
>
> Looking at selftest:
> struct cgroup_subsys_state *css = &cgrp->self;
>
> realized that we're missing KF_TRUSTED_ARGS here.
>
>> +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 7473068ed313..d8539cc05ffd 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -803,6 +803,54 @@ const struct bpf_func_proto bpf_find_vma_proto = {
>> .arg5_type = ARG_ANYTHING,
>> };
>>
>> +struct bpf_iter_css_task_kern {
>> + struct css_task_iter *css_it;
>> +} __attribute__((aligned(8)));
>> +
>> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
>> + struct cgroup_subsys_state *css, unsigned int flags)
>
> the verifier does a type check, but it's not strong enough.
> We need KF_TRUSTED_ARGS to make sure the pointer is valid.
> The BTF_TYPE_SAFE_RCU(struct cgroup) {
> probably doesn't need to change, since '&cgrp->self' is not a pointer deref.
> The verifier should understand that cgroup_subsys_state is also PTR_TRUSTED
> just like 'cgrp' pointer.

Got it. It seems we should also apply this to bpf_iter_css_{pre,post}_new.

>
> Also please add negative tests in patch 6.
> Like doing bpf_rcu_read_unlock() in the middle and check that the verifier
> catches such mistake.

I will do it in next version.

Thanks.

2023-09-13 16:43:38

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

Hello.

在 2023/9/12 15:01, Chuyi Zhou 写道:
> css_iter and process_iter should be used in rcu section. Specifically, in
> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> use them directly.
>
> This patch checks whether we are in rcu cs before we want to invoke
> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> reject if reg->type is UNTRUSTED.

I use the following BPF Prog to test this patch:

SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
int iter_task_for_each_sleep(void *ctx)
{
struct task_struct *task;
struct task_struct *cur_task = bpf_get_current_task_btf();

if (cur_task->pid != target_pid)
return 0;
bpf_rcu_read_lock();
bpf_for_each(process, task) {
bpf_rcu_read_unlock();
if (task->pid == target_pid)
process_cnt += 1;
bpf_rcu_read_lock();
}
bpf_rcu_read_unlock();
return 0;
}

Unfortunately, we can pass the verifier.

Then I add some printk-messages before setting/clearing state to help debug:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d151e6b43a5f..35f3fa9471a9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
bpf_verifier_env *env,
__mark_reg_known_zero(st);
st->type = PTR_TO_STACK; /* we don't have dedicated reg
type */
if (is_iter_need_rcu(meta)) {
+ printk("mark reg_addr : %px", st);
if (in_rcu_cs(env))
st->type |= MEM_RCU;
else
@@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
bpf_verifier_env *env, struct bpf_insn *insn,
return -EINVAL;
} else if (rcu_unlock) {
bpf_for_each_reg_in_vstate(env->cur_state,
state, reg, ({
+ printk("clear reg_addr : %px MEM_RCU :
%d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
PTR_UNTRUSTED);
if (reg->type & MEM_RCU) {
- printk("clear reg addr : %lld",
reg);
reg->type &= ~(MEM_RCU |
PTR_MAYBE_NULL);
reg->type |= PTR_UNTRUSTED;
}


The demsg log:

[ 393.705324] mark reg_addr : ffff88814e40e200

[ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
PTR_UNTRUSTED : 0

[ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
PTR_UNTRUSTED : 0

[ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
PTR_UNTRUSTED : 0
....
....

I didn't see ffff88814e40e200 is cleared as expected because
bpf_for_each_reg_in_vstate didn't find it.

It seems when we are doing bpf_read_unlock() in the middle of iteration
and want to clearing state through bpf_for_each_reg_in_vstate, we can
not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
mark_stack_slots_iter().

I thought maybe the correct answer here is operating the *iter_reg*
parameter in mark_stack_slots_iter() direcly so we can find it in
bpf_for_each_reg_in_vstate.

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6a6827ba7a18..53330ddf2b3c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1218,6 +1218,12 @@ static int mark_stack_slots_iter(struct
bpf_verifier_env *env,
mark_stack_slot_scratched(env, spi - i);
}

+ if (is_iter_need_rcu(meta)) {
+ if (in_rcu_cs(env))
+ reg->type |= MEM_RCU;
+ else
+ reg->type |= PTR_UNTRUSTED;
+ }
return 0;
}

@@ -1307,7 +1315,8 @@ static bool is_iter_reg_valid_init(struct
bpf_verifier_env *env, struct bpf_reg_
if (slot->slot_type[j] != STACK_ITER)
Kumarreturn false;
}
-
+ if (reg->type & PTR_UNTRUSTED)
+ return false;
return true;
}

However, it did not work either. The reason it didn't work is the state
of iter_reg will be cleared implicitly before the
is_iter_reg_valid_init() even we don't call bpf_rcu_unlock.

It would be appreciate if you could give some suggestion. Maby it worthy
to try the solution proposed by Kumar?[1]

[1]
https://lore.kernel.org/lkml/CAP01T77cWxWNwq5HLr+Woiu7k4-P3QQfJWX1OeQJUkxW3=P4bA@mail.gmail.com/#t



2023-09-14 10:15:48

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS



在 2023/9/13 21:53, Chuyi Zhou 写道:
> Hello.
>
> 在 2023/9/12 15:01, Chuyi Zhou 写道:
>> css_iter and process_iter should be used in rcu section. Specifically, in
>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
>> use them directly.
>>
>> This patch checks whether we are in rcu cs before we want to invoke
>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
>> reject if reg->type is UNTRUSTED.
>
> I use the following BPF Prog to test this patch:
>
> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> int iter_task_for_each_sleep(void *ctx)
> {
>     struct task_struct *task;
>     struct task_struct *cur_task = bpf_get_current_task_btf();
>
>     if (cur_task->pid != target_pid)
>         return 0;
>     bpf_rcu_read_lock();
>     bpf_for_each(process, task) {
>         bpf_rcu_read_unlock();
>         if (task->pid == target_pid)
>             process_cnt += 1;
>         bpf_rcu_read_lock();
>     }
>     bpf_rcu_read_unlock();
>     return 0;
> }
>
> Unfortunately, we can pass the verifier.
>
> Then I add some printk-messages before setting/clearing state to help
> debug:
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d151e6b43a5f..35f3fa9471a9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
> bpf_verifier_env *env,
>                 __mark_reg_known_zero(st);
>                 st->type = PTR_TO_STACK; /* we don't have dedicated reg
> type */
>                 if (is_iter_need_rcu(meta)) {
> +                       printk("mark reg_addr : %px", st);
>                         if (in_rcu_cs(env))
>                                 st->type |= MEM_RCU;
>                         else
> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
> bpf_verifier_env *env, struct bpf_insn *insn,
>                         return -EINVAL;
>                 } else if (rcu_unlock) {
>                         bpf_for_each_reg_in_vstate(env->cur_state,
> state, reg, ({
> +                               printk("clear reg_addr : %px MEM_RCU :
> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
> PTR_UNTRUSTED);
>                                 if (reg->type & MEM_RCU) {
> -                                       printk("clear reg addr : %lld",
> reg);
>                                         reg->type &= ~(MEM_RCU |
> PTR_MAYBE_NULL);
>                                         reg->type |= PTR_UNTRUSTED;
>                                 }
>
>
> The demsg log:
>
> [  393.705324] mark reg_addr : ffff88814e40e200
>
> [  393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
> PTR_UNTRUSTED : 0
>
> [  393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
> PTR_UNTRUSTED : 0
>
> [  393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
> PTR_UNTRUSTED : 0
> ....
> ....
>
> I didn't see ffff88814e40e200 is cleared as expected because
> bpf_for_each_reg_in_vstate didn't find it.
>
> It seems when we are doing bpf_read_unlock() in the middle of iteration
> and want to clearing state through bpf_for_each_reg_in_vstate, we can
> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
> mark_stack_slots_iter().
>

bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
mark_stack_slots_iter() we has marked the slots *STACK_ITER*

With the following change, everything seems work OK.

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index a3236651ec64..83c5ecccadb4 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -387,7 +387,7 @@ struct bpf_verifier_state {

#define bpf_get_spilled_reg(slot, frame) \
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
- (frame->stack[slot].slot_type[0] == STACK_SPILL)) \
+ (frame->stack[slot].slot_type[0] == STACK_SPILL ||
frame->stack[slot].slot_type[0] == STACK_ITER)) \
? &frame->stack[slot].spilled_ptr : NULL)

I am not sure whether this would harm some logic implicitly when using
bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
maybe we should add a extra parameter to control the picking behaviour.

#define bpf_get_spilled_reg(slot, frame, stack_type)
\
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
(frame->stack[slot].slot_type[0] == stack_type)) \
? &frame->stack[slot].spilled_ptr : NULL)

Thanks.

2023-09-14 20:17:57

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs

Hi Chuyi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url: https://github.com/intel-lab-lkp/linux/commits/Chuyi-Zhou/cgroup-Prepare-for-using-css_task_iter_-in-BPF/20230912-150454
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20230912070149.969939-5-zhouchuyi%40bytedance.com
patch subject: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs
config: s390-defconfig (https://download.01.org/0day-ci/archive/20230913/[email protected]/config)
compiler: s390-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230913/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

kernel/bpf/task_iter.c:810:17: warning: no previous prototype for 'bpf_iter_css_task_new' [-Wmissing-prototypes]
810 | __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
| ^~~~~~~~~~~~~~~~~~~~~
kernel/bpf/task_iter.c:835:33: warning: no previous prototype for 'bpf_iter_css_task_next' [-Wmissing-prototypes]
835 | __bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
| ^~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/task_iter.c:844:18: warning: no previous prototype for 'bpf_iter_css_task_destroy' [-Wmissing-prototypes]
844 | __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/task_iter.c:858:17: warning: no previous prototype for 'bpf_iter_process_new' [-Wmissing-prototypes]
858 | __bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
| ^~~~~~~~~~~~~~~~~~~~
kernel/bpf/task_iter.c:870:33: warning: no previous prototype for 'bpf_iter_process_next' [-Wmissing-prototypes]
870 | __bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
| ^~~~~~~~~~~~~~~~~~~~~
kernel/bpf/task_iter.c:879:18: warning: no previous prototype for 'bpf_iter_process_destroy' [-Wmissing-prototypes]
879 | __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
| ^~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:888:17: warning: no previous prototype for 'bpf_iter_css_pre_new' [-Wmissing-prototypes]
888 | __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
| ^~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:900:41: warning: no previous prototype for 'bpf_iter_css_pre_next' [-Wmissing-prototypes]
900 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
| ^~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:908:18: warning: no previous prototype for 'bpf_iter_css_pre_destroy' [-Wmissing-prototypes]
908 | __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
| ^~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:912:17: warning: no previous prototype for 'bpf_iter_css_post_new' [-Wmissing-prototypes]
912 | __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
| ^~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:924:41: warning: no previous prototype for 'bpf_iter_css_post_next' [-Wmissing-prototypes]
924 | __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
| ^~~~~~~~~~~~~~~~~~~~~~
>> kernel/bpf/task_iter.c:932:18: warning: no previous prototype for 'bpf_iter_css_post_destroy' [-Wmissing-prototypes]
932 | __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
| ^~~~~~~~~~~~~~~~~~~~~~~~~


vim +/bpf_iter_css_pre_new +888 kernel/bpf/task_iter.c

887
> 888 __bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
889 struct cgroup_subsys_state *root)
890 {
891 struct bpf_iter_css_kern *kit = (void *)it;
892
893 BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
894 BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
895 kit->root = root;
896 kit->pos = NULL;
897 return 0;
898 }
899
> 900 __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
901 {
902 struct bpf_iter_css_kern *kit = (void *)it;
903
904 kit->pos = css_next_descendant_pre(kit->pos, kit->root);
905 return kit->pos;
906 }
907
> 908 __bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
909 {
910 }
911
> 912 __bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
913 struct cgroup_subsys_state *root)
914 {
915 struct bpf_iter_css_kern *kit = (void *)it;
916
917 BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
918 BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
919 kit->root = root;
920 kit->pos = NULL;
921 return 0;
922 }
923
> 924 __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
925 {
926 struct bpf_iter_css_kern *kit = (void *)it;
927
928 kit->pos = css_next_descendant_post(kit->pos, kit->root);
929 return kit->pos;
930 }
931
> 932 __bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
933 {
934 }
935

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-09-15 00:31:50

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs

On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>
> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
> allow creation and manipulation of struct bpf_iter_css in open-coded
> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
> post}. BPF programs can use these kfuncs through bpf_for_each macro for
> iteration of all descendant css under a root css.
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> include/uapi/linux/bpf.h | 8 +++++
> kernel/bpf/helpers.c | 6 ++++
> kernel/bpf/task_iter.c | 53 ++++++++++++++++++++++++++++++++++
> tools/include/uapi/linux/bpf.h | 8 +++++
> tools/lib/bpf/bpf_helpers.h | 12 ++++++++
> 5 files changed, 87 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index befa55b52e29..57760afc13d0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
> __u64 __opaque[1];
> } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_pre {
> + __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_css_post {
> + __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9b7d2c6f99d1..ca1f6404af9e 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 9d1927dc3a06..8963fc779b87 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> {
> }
>
> +struct bpf_iter_css_kern {
> + struct cgroup_subsys_state *root;
> + struct cgroup_subsys_state *pos;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> + struct cgroup_subsys_state *root)

similar to my comment on previous patches, please see
kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
consistent. We have one iterator that accepts parameters defining
iteration order and starting cgroup. Unless there are some technical
reasons we can't follow similar approach with this open-coded iter,
let's use the same approach. We can even reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.


> +{
> + struct bpf_iter_css_kern *kit = (void *)it;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
> + kit->root = root;
> + kit->pos = NULL;
> + return 0;
> +}
> +
> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
> +{
> + struct bpf_iter_css_kern *kit = (void *)it;
> +
> + kit->pos = css_next_descendant_pre(kit->pos, kit->root);
> + return kit->pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
> +{
> +}
> +
> +__bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
> + struct cgroup_subsys_state *root)
> +{
> + struct bpf_iter_css_kern *kit = (void *)it;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
> + kit->root = root;
> + kit->pos = NULL;
> + return 0;
> +}
> +
> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
> +{
> + struct bpf_iter_css_kern *kit = (void *)it;
> +
> + kit->pos = css_next_descendant_post(kit->pos, kit->root);
> + return kit->pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
> +{
> +}
> +
> DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
> static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index befa55b52e29..57760afc13d0 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
> __u64 __opaque[1];
> } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_pre {
> + __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_css_post {
> + __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 858252c2641c..6e5bd9ef14d6 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -315,6 +315,18 @@ extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
> extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
> extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;
>
> +struct bpf_iter_css_pre;
> +extern int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> + struct cgroup_subsys_state *root) __weak __ksym;
> +extern struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it) __weak __ksym;
> +extern void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it) __weak __ksym;
> +
> +struct bpf_iter_css_post;
> +extern int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
> + struct cgroup_subsys_state *root) __weak __ksym;
> +extern struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it) __weak __ksym;
> +extern void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it) __weak __ksym;
> +
> #ifndef bpf_for_each
> /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
> * using BPF open-coded iterators without having to write mundane explicit
> --
> 2.20.1
>

2023-09-15 03:00:34

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>
> css_iter and process_iter should be used in rcu section. Specifically, in
> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> use them directly.
>
> This patch checks whether we are in rcu cs before we want to invoke
> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> reject if reg->type is UNTRUSTED.

it would be nice to mention where this MEM_RCU is turned into
UNTRUSTED when we do rcu_read_unlock(). For someone unfamiliar with
these parts of verifier (like me) this is completely unobvious.

>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> kernel/bpf/verifier.c | 30 ++++++++++++++++++++++++++++--
> 1 file changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2367483bf4c2..6a6827ba7a18 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1172,7 +1172,13 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg
>
> static void __mark_reg_known_zero(struct bpf_reg_state *reg);
>
> +static bool in_rcu_cs(struct bpf_verifier_env *env);
> +
> +/* check whether we are using bpf_iter_process_*() or bpf_iter_css_*() */
> +static bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta);
> +
> static int mark_stack_slots_iter(struct bpf_verifier_env *env,
> + struct bpf_kfunc_call_arg_meta *meta,
> struct bpf_reg_state *reg, int insn_idx,
> struct btf *btf, u32 btf_id, int nr_slots)
> {
> @@ -1193,6 +1199,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env,
>
> __mark_reg_known_zero(st);
> st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
> + if (is_iter_need_rcu(meta)) {
> + if (in_rcu_cs(env))
> + st->type |= MEM_RCU;
> + else
> + st->type |= PTR_UNTRUSTED;
> + }
> st->live |= REG_LIVE_WRITTEN;
> st->ref_obj_id = i == 0 ? id : 0;
> st->iter.btf = btf;
> @@ -1281,6 +1293,8 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_
> struct bpf_stack_state *slot = &state->stack[spi - i];
> struct bpf_reg_state *st = &slot->spilled_ptr;
>
> + if (st->type & PTR_UNTRUSTED)
> + return false;
> /* only main (first) slot has ref_obj_id set */
> if (i == 0 && !st->ref_obj_id)
> return false;
> @@ -7503,13 +7517,13 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id
> return err;
> }
>
> - err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots);
> + err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots);
> if (err)
> return err;
> } else {
> /* iter_next() or iter_destroy() expect initialized iter state*/
> if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) {
> - verbose(env, "expected an initialized iter_%s as arg #%d\n",
> + verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n",
> iter_type_str(meta->btf, btf_id), regno);

this message makes no sense, but even if reworded it would be
confusing for users. So maybe do the RCU check separately and report a
clear message that this iterator is expected to be within a single
continuous rcu_read_{lock+unlock} region.

I do think tracking RCU regions explicitly would make for much easier
to follow code, better messages, etc. Probably would be beneficial for
some other RCU-protected features. But that's a separate topic.

> return -EINVAL;
> }
> @@ -10382,6 +10396,18 @@ BTF_ID(func, bpf_percpu_obj_new_impl)
> BTF_ID(func, bpf_percpu_obj_drop_impl)
> BTF_ID(func, bpf_iter_css_task_new)
>
> +BTF_SET_START(rcu_protect_kfuns_set)
> +BTF_ID(func, bpf_iter_process_new)
> +BTF_ID(func, bpf_iter_css_pre_new)
> +BTF_ID(func, bpf_iter_css_post_new)
> +BTF_SET_END(rcu_protect_kfuns_set)
> +

instead of maintaining these extra special sets, why not add a KF
flag, like KF_RCU_PROTECTED?


> +static inline bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta)
> +{
> + return btf_id_set_contains(&rcu_protect_kfuns_set, meta->func_id);
> +}
> +
> +
> static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
> {
> if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
> --
> 2.20.1
>

2023-09-15 06:40:43

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs

On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>
> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_process in open-coded iterator
> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> iterate all processes in the system.
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> include/uapi/linux/bpf.h | 4 ++++
> kernel/bpf/helpers.c | 3 +++
> kernel/bpf/task_iter.c | 29 +++++++++++++++++++++++++++++
> tools/include/uapi/linux/bpf.h | 4 ++++
> tools/lib/bpf/bpf_helpers.h | 5 +++++
> 5 files changed, 45 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index de02c0971428..befa55b52e29 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
> __u64 __opaque[1];
> } __attribute__((aligned(8)));
>
> +struct bpf_iter_process {
> + __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index d6a16becfbb9..9b7d2c6f99d1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index d8539cc05ffd..9d1927dc3a06 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> kfree(kit->css_it);
> }
>
> +struct bpf_iter_process_kern {
> + struct task_struct *tsk;
> +} __attribute__((aligned(8)));
> +

Few high level thoughts. I think it would be good to follow
SEC("iter/task") naming and approach. Open-coded iterators in many
ways are in-kernel counterpart to iterator programs, so keeping them
close enough within reason is useful for knowledge transfer.

SEC("iter/task") allows to:
a) iterate all threads in the system
b) iterate all threads for a given TGID
c) it also allows to "iterate" a single thread or process, but that's
a bit less relevant for in-kernel iterator, but we can still support
them, why not?

I'm not sure if it supports iterating all processes (as in group
leaders of each task group) in the system, but if it's possible I
think we should support it at least for open-coded iterator, seems
like a very useful functionality.

So to that end, let's design a small set of input arguments for
bpf_iter_process_new() that would allow to specify this as flags +
either (optional) struct task_struct * pointer to represent
task/process or PID/TGID.


> +__bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)

Also, given iterator in the previous is called css_task, and that we
have iter/task, iter/task_vma, etc iterator programs, shouldn't this
one be called bpf_iter_task_new(), which also will be consistent with
Dave's task_vma open-coded iterator?

> +{
> + struct bpf_iter_process_kern *kit = (void *)it;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_process_kern) != sizeof(struct bpf_iter_process));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_process_kern) !=
> + __alignof__(struct bpf_iter_process));
> +
> + kit->tsk = &init_task;
> + return 0;
> +}
> +
> +__bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
> +{
> + struct bpf_iter_process_kern *kit = (void *)it;
> +
> + kit->tsk = next_task(kit->tsk);
> +
> + return kit->tsk == &init_task ? NULL : kit->tsk;
> +}
> +
> +__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> +{
> +}
> +
> DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
> static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index de02c0971428..befa55b52e29 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
> __u64 __opaque[1];
> } __attribute__((aligned(8)));
>
> +struct bpf_iter_process {
> + __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index f48723c6c593..858252c2641c 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -310,6 +310,11 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
> extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;
>
> +struct bpf_iter_process;
> +extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
> +extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
> +extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;
> +

same, please add this to bpf_experimental, not bpf_helpers.h


> #ifndef bpf_for_each
> /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
> * using BPF open-coded iterators without having to write mundane explicit
> --
> 2.20.1
>

2023-09-15 07:13:59

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs

Hello.

在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>>
>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_process in open-coded iterator
>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>> iterate all processes in the system.
>>
>> Signed-off-by: Chuyi Zhou <[email protected]>
>> ---
>> include/uapi/linux/bpf.h | 4 ++++
>> kernel/bpf/helpers.c | 3 +++
>> kernel/bpf/task_iter.c | 29 +++++++++++++++++++++++++++++
>> tools/include/uapi/linux/bpf.h | 4 ++++
>> tools/lib/bpf/bpf_helpers.h | 5 +++++
>> 5 files changed, 45 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index de02c0971428..befa55b52e29 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
>> __u64 __opaque[1];
>> } __attribute__((aligned(8)));
>>
>> +struct bpf_iter_process {
>> + __u64 __opaque[1];
>> +} __attribute__((aligned(8)));
>> +
>> #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index d6a16becfbb9..9b7d2c6f99d1 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index d8539cc05ffd..9d1927dc3a06 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
>> kfree(kit->css_it);
>> }
>>
>> +struct bpf_iter_process_kern {
>> + struct task_struct *tsk;
>> +} __attribute__((aligned(8)));
>> +
>
> Few high level thoughts. I think it would be good to follow
> SEC("iter/task") naming and approach. Open-coded iterators in many
> ways are in-kernel counterpart to iterator programs, so keeping them
> close enough within reason is useful for knowledge transfer.
>
> SEC("iter/task") allows to:
> a) iterate all threads in the system
> b) iterate all threads for a given TGID
> c) it also allows to "iterate" a single thread or process, but that's
> a bit less relevant for in-kernel iterator, but we can still support
> them, why not?
>
> I'm not sure if it supports iterating all processes (as in group
> leaders of each task group) in the system, but if it's possible I
> think we should support it at least for open-coded iterator, seems
> like a very useful functionality.
>
> So to that end, let's design a small set of input arguments for
> bpf_iter_process_new() that would allow to specify this as flags +
> either (optional) struct task_struct * pointer to represent
> task/process or PID/TGID.
>

IIUC, we should define the following task_new kfunc

struct bpf_iter_task_kern {
struct task_struct *start;
struct task_struct *cur;
unsigned int flag;
} __attribute__((aligned(8)));

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_css_task *it,
struct task_struct *start, unsigned int flags)

If we want to iterate all threads of a task, just pass it to *start*,
and if we want to iterating all process in the system, users may need to
pass a nullptr to the *start*. But it seems current BPF verifier will
reject the nullptr to task_struct. The error message meybe:
"Possibly NULL pointer passed to trusted argx"

I noticed the __opt annotation in kfunc document. It seems with
following we can pass the nullptr to *start*

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_css_task *it,
void *start__opt, unsigned int flags__szk)

However, in this way, user can pass any invalid pointer to the kfunc
without verifying. Besides, it seems __opt is only allowed to use with
__szk together and flags__szk is ambiguous in semantics.

Do you have better ideas? Or I missing something ?

Thanks

2023-09-15 08:29:20

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <[email protected]> wrote:
>
>
>
> 在 2023/9/13 21:53, Chuyi Zhou 写道:
> > Hello.
> >
> > 在 2023/9/12 15:01, Chuyi Zhou 写道:
> >> css_iter and process_iter should be used in rcu section. Specifically, in
> >> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> >> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> >> use them directly.
> >>
> >> This patch checks whether we are in rcu cs before we want to invoke
> >> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> >> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> >> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> >> reject if reg->type is UNTRUSTED.
> >
> > I use the following BPF Prog to test this patch:
> >
> > SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> > int iter_task_for_each_sleep(void *ctx)
> > {
> > struct task_struct *task;
> > struct task_struct *cur_task = bpf_get_current_task_btf();
> >
> > if (cur_task->pid != target_pid)
> > return 0;
> > bpf_rcu_read_lock();
> > bpf_for_each(process, task) {
> > bpf_rcu_read_unlock();
> > if (task->pid == target_pid)
> > process_cnt += 1;
> > bpf_rcu_read_lock();
> > }
> > bpf_rcu_read_unlock();
> > return 0;
> > }
> >
> > Unfortunately, we can pass the verifier.
> >
> > Then I add some printk-messages before setting/clearing state to help
> > debug:
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index d151e6b43a5f..35f3fa9471a9 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
> > bpf_verifier_env *env,
> > __mark_reg_known_zero(st);
> > st->type = PTR_TO_STACK; /* we don't have dedicated reg
> > type */
> > if (is_iter_need_rcu(meta)) {
> > + printk("mark reg_addr : %px", st);
> > if (in_rcu_cs(env))
> > st->type |= MEM_RCU;
> > else
> > @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
> > bpf_verifier_env *env, struct bpf_insn *insn,
> > return -EINVAL;
> > } else if (rcu_unlock) {
> > bpf_for_each_reg_in_vstate(env->cur_state,
> > state, reg, ({
> > + printk("clear reg_addr : %px MEM_RCU :
> > %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
> > PTR_UNTRUSTED);
> > if (reg->type & MEM_RCU) {
> > - printk("clear reg addr : %lld",
> > reg);
> > reg->type &= ~(MEM_RCU |
> > PTR_MAYBE_NULL);
> > reg->type |= PTR_UNTRUSTED;
> > }
> >
> >
> > The demsg log:
> >
> > [ 393.705324] mark reg_addr : ffff88814e40e200
> >
> > [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> >
> > [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> >
> > [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
> > PTR_UNTRUSTED : 0
> > ....
> > ....
> >
> > I didn't see ffff88814e40e200 is cleared as expected because
> > bpf_for_each_reg_in_vstate didn't find it.
> >
> > It seems when we are doing bpf_read_unlock() in the middle of iteration
> > and want to clearing state through bpf_for_each_reg_in_vstate, we can
> > not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
> > mark_stack_slots_iter().
> >
>
> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
> mark_stack_slots_iter() we has marked the slots *STACK_ITER*
>
> With the following change, everything seems work OK.
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index a3236651ec64..83c5ecccadb4 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -387,7 +387,7 @@ struct bpf_verifier_state {
>
> #define bpf_get_spilled_reg(slot, frame) \
> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \
> - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \
> + (frame->stack[slot].slot_type[0] == STACK_SPILL ||
> frame->stack[slot].slot_type[0] == STACK_ITER)) \
> ? &frame->stack[slot].spilled_ptr : NULL)
>
> I am not sure whether this would harm some logic implicitly when using
> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
> maybe we should add a extra parameter to control the picking behaviour.
>
> #define bpf_get_spilled_reg(slot, frame, stack_type)
> \
> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \
> (frame->stack[slot].slot_type[0] == stack_type)) \
> ? &frame->stack[slot].spilled_ptr : NULL)
>
> Thanks.

I don't think it's safe to just make bpf_get_spilled_reg, and
subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg
just suddenly start iterating iterator states and/or dynptrs. At least
some of existing uses of those assume they are really working just
with registers.

2023-09-15 10:40:42

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Introduce css_task open-coded iterator kfuncs

On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>
> This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
> creation and manipulation of struct bpf_iter_css_task in open-coded
> iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
> end}. BPF programs can use these kfuncs through bpf_for_each macro for
> iteration of all tasks under a css.
>
> css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
> the bpf side has to be careful in where it allows to use this iter.
> Currently we only allow it in bpf_lsm and bpf iter-s.
>
> Signed-off-by: Chuyi Zhou <[email protected]>
> ---
> include/uapi/linux/bpf.h | 4 +++
> kernel/bpf/helpers.c | 3 +++
> kernel/bpf/task_iter.c | 48 ++++++++++++++++++++++++++++++++++
> kernel/bpf/verifier.c | 23 ++++++++++++++++
> tools/include/uapi/linux/bpf.h | 4 +++
> tools/lib/bpf/bpf_helpers.h | 7 +++++
> 6 files changed, 89 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 73b155e52204..de02c0971428 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7318,4 +7318,8 @@ struct bpf_iter_num {
> __u64 __opaque[1];
> } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_task {
> + __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index b0a9834f1051..d6a16becfbb9 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW)
> BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 7473068ed313..d8539cc05ffd 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -803,6 +803,54 @@ const struct bpf_func_proto bpf_find_vma_proto = {
> .arg5_type = ARG_ANYTHING,
> };
>
> +struct bpf_iter_css_task_kern {
> + struct css_task_iter *css_it;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> + struct cgroup_subsys_state *css, unsigned int flags)
> +{
> + struct bpf_iter_css_task_kern *kit = (void *)it;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task));
> + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) !=
> + __alignof__(struct bpf_iter_css_task));
> +
> + switch (flags) {
> + case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED:
> + case CSS_TASK_ITER_PROCS:
> + case 0:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + kit->css_it = kzalloc(sizeof(struct css_task_iter), GFP_KERNEL);

Dave used bpf_mem_alloc() inside his iterator, any reason to not use it here?


> + if (!kit->css_it)
> + return -ENOMEM;
> + css_task_iter_start(css, flags, kit->css_it);
> + return 0;
> +}
> +
> +__bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it)
> +{
> + struct bpf_iter_css_task_kern *kit = (void *)it;
> +
> + if (!kit->css_it)
> + return NULL;
> + return css_task_iter_next(kit->css_it);
> +}
> +
> +__bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> +{
> + struct bpf_iter_css_task_kern *kit = (void *)it;
> +
> + if (!kit->css_it)
> + return;
> + css_task_iter_end(kit->css_it);
> + kfree(kit->css_it);
> +}
> +
> DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
> static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index dbba2b806017..2367483bf4c2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -10332,6 +10332,7 @@ enum special_kfunc_type {
> KF_bpf_dynptr_clone,
> KF_bpf_percpu_obj_new_impl,
> KF_bpf_percpu_obj_drop_impl,
> + KF_bpf_iter_css_task_new,
> };
>
> BTF_SET_START(special_kfunc_set)
> @@ -10354,6 +10355,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
> BTF_ID(func, bpf_dynptr_clone)
> BTF_ID(func, bpf_percpu_obj_new_impl)
> BTF_ID(func, bpf_percpu_obj_drop_impl)
> +BTF_ID(func, bpf_iter_css_task_new)
> BTF_SET_END(special_kfunc_set)
>
> BTF_ID_LIST(special_kfunc_list)
> @@ -10378,6 +10380,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr)
> BTF_ID(func, bpf_dynptr_clone)
> BTF_ID(func, bpf_percpu_obj_new_impl)
> BTF_ID(func, bpf_percpu_obj_drop_impl)
> +BTF_ID(func, bpf_iter_css_task_new)
>
> static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
> {
> @@ -10902,6 +10905,20 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env,
> &meta->arg_rbtree_root.field);
> }
>
> +static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
> +{
> + enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
> +
> + switch (prog_type) {
> + case BPF_PROG_TYPE_LSM:
> + return true;
> + case BPF_TRACE_ITER:
> + return env->prog->aux->sleepable;
> + default:
> + return false;
> + }
> +}
> +
> static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta,
> int insn_idx)
> {
> @@ -11152,6 +11169,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> break;
> }
> case KF_ARG_PTR_TO_ITER:
> + if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) {
> + if (!check_css_task_iter_allowlist(env)) {
> + verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n");
> + return -EINVAL;
> + }
> + }
> ret = process_iter_arg(env, regno, insn_idx, meta);
> if (ret < 0)
> return ret;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 73b155e52204..de02c0971428 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7318,4 +7318,8 @@ struct bpf_iter_num {
> __u64 __opaque[1];
> } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_task {
> + __u64 __opaque[1];
> +} __attribute__((aligned(8)));
> +
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 77ceea575dc7..f48723c6c593 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -303,6 +303,13 @@ extern int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) __weak
> extern int *bpf_iter_num_next(struct bpf_iter_num *it) __weak __ksym;
> extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym;
>
> +struct bpf_iter_css_task;
> +struct cgroup_subsys_state;
> +extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it,
> + struct cgroup_subsys_state *css, unsigned int flags) __weak __ksym;
> +extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym;
> +extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym;

please move this into bpf_experimental.h under selftests, this
shouldn't be in libbpf's stable API headers


> +
> #ifndef bpf_for_each
> /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
> * using BPF open-coded iterators without having to write mundane explicit
> --
> 2.20.1
>

2023-09-15 14:01:27

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [External] Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

Hello.

在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <[email protected]> wrote:
>>
>>
>>
>> 在 2023/9/13 21:53, Chuyi Zhou 写道:
>>> Hello.
>>>
>>> 在 2023/9/12 15:01, Chuyi Zhou 写道:
>>>> css_iter and process_iter should be used in rcu section. Specifically, in
>>>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
>>>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
>>>> use them directly.
>>>>
>>>> This patch checks whether we are in rcu cs before we want to invoke
>>>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
>>>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
>>>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
>>>> reject if reg->type is UNTRUSTED.
>>>
>>> I use the following BPF Prog to test this patch:
>>>
>>> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
>>> int iter_task_for_each_sleep(void *ctx)
>>> {
>>> struct task_struct *task;
>>> struct task_struct *cur_task = bpf_get_current_task_btf();
>>>
>>> if (cur_task->pid != target_pid)
>>> return 0;
>>> bpf_rcu_read_lock();
>>> bpf_for_each(process, task) {
>>> bpf_rcu_read_unlock();
>>> if (task->pid == target_pid)
>>> process_cnt += 1;
>>> bpf_rcu_read_lock();
>>> }
>>> bpf_rcu_read_unlock();
>>> return 0;
>>> }
>>>
>>> Unfortunately, we can pass the verifier.
>>>
>>> Then I add some printk-messages before setting/clearing state to help
>>> debug:
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index d151e6b43a5f..35f3fa9471a9 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
>>> bpf_verifier_env *env,
>>> __mark_reg_known_zero(st);
>>> st->type = PTR_TO_STACK; /* we don't have dedicated reg
>>> type */
>>> if (is_iter_need_rcu(meta)) {
>>> + printk("mark reg_addr : %px", st);
>>> if (in_rcu_cs(env))
>>> st->type |= MEM_RCU;
>>> else
>>> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
>>> bpf_verifier_env *env, struct bpf_insn *insn,
>>> return -EINVAL;
>>> } else if (rcu_unlock) {
>>> bpf_for_each_reg_in_vstate(env->cur_state,
>>> state, reg, ({
>>> + printk("clear reg_addr : %px MEM_RCU :
>>> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
>>> PTR_UNTRUSTED);
>>> if (reg->type & MEM_RCU) {
>>> - printk("clear reg addr : %lld",
>>> reg);
>>> reg->type &= ~(MEM_RCU |
>>> PTR_MAYBE_NULL);
>>> reg->type |= PTR_UNTRUSTED;
>>> }
>>>
>>>
>>> The demsg log:
>>>
>>> [ 393.705324] mark reg_addr : ffff88814e40e200
>>>
>>> [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
>>> PTR_UNTRUSTED : 0
>>>
>>> [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
>>> PTR_UNTRUSTED : 0
>>>
>>> [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
>>> PTR_UNTRUSTED : 0
>>> ....
>>> ....
>>>
>>> I didn't see ffff88814e40e200 is cleared as expected because
>>> bpf_for_each_reg_in_vstate didn't find it.
>>>
>>> It seems when we are doing bpf_read_unlock() in the middle of iteration
>>> and want to clearing state through bpf_for_each_reg_in_vstate, we can
>>> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
>>> mark_stack_slots_iter().
>>>
>>
>> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
>> mark_stack_slots_iter() we has marked the slots *STACK_ITER*
>>
>> With the following change, everything seems work OK.
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index a3236651ec64..83c5ecccadb4 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -387,7 +387,7 @@ struct bpf_verifier_state {
>>
>> #define bpf_get_spilled_reg(slot, frame) \
>> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \
>> - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \
>> + (frame->stack[slot].slot_type[0] == STACK_SPILL ||
>> frame->stack[slot].slot_type[0] == STACK_ITER)) \
>> ? &frame->stack[slot].spilled_ptr : NULL)
>>
>> I am not sure whether this would harm some logic implicitly when using
>> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
>> maybe we should add a extra parameter to control the picking behaviour.
>>
>> #define bpf_get_spilled_reg(slot, frame, stack_type)
>> \
>> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \
>> (frame->stack[slot].slot_type[0] == stack_type)) \
>> ? &frame->stack[slot].spilled_ptr : NULL)
>>
>> Thanks.
>
> I don't think it's safe to just make bpf_get_spilled_reg, and
> subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg
> just suddenly start iterating iterator states and/or dynptrs. At least
> some of existing uses of those assume they are really working just
> with registers.

IIUC, when we are doing bpf_rcu_unlock, we do need to clear the state of
reg including STACK_ITER.

Maybe here we only need change the logic when using
bpf_for_each_reg_in_vstate to clear state in bpf_rcu_unlock and keep
everything else unchanged ?

Thanks.

2023-09-15 16:15:12

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs



在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>>
>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
>> creation and manipulation of struct bpf_iter_process in open-coded iterator
>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>> iterate all processes in the system.
>>
>> Signed-off-by: Chuyi Zhou <[email protected]>
>> ---
>> include/uapi/linux/bpf.h | 4 ++++
>> kernel/bpf/helpers.c | 3 +++
>> kernel/bpf/task_iter.c | 29 +++++++++++++++++++++++++++++
>> tools/include/uapi/linux/bpf.h | 4 ++++
>> tools/lib/bpf/bpf_helpers.h | 5 +++++
>> 5 files changed, 45 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index de02c0971428..befa55b52e29 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
>> __u64 __opaque[1];
>> } __attribute__((aligned(8)));
>>
>> +struct bpf_iter_process {
>> + __u64 __opaque[1];
>> +} __attribute__((aligned(8)));
>> +
>> #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index d6a16becfbb9..9b7d2c6f99d1 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
>> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index d8539cc05ffd..9d1927dc3a06 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
>> kfree(kit->css_it);
>> }
>>
>> +struct bpf_iter_process_kern {
>> + struct task_struct *tsk;
>> +} __attribute__((aligned(8)));
>> +
>
> Few high level thoughts. I think it would be good to follow
> SEC("iter/task") naming and approach. Open-coded iterators in many
> ways are in-kernel counterpart to iterator programs, so keeping them
> close enough within reason is useful for knowledge transfer.
>
> SEC("iter/task") allows to:
> a) iterate all threads in the system
> b) iterate all threads for a given TGID
> c) it also allows to "iterate" a single thread or process, but that's
> a bit less relevant for in-kernel iterator, but we can still support
> them, why not?
>
> I'm not sure if it supports iterating all processes (as in group
> leaders of each task group) in the system, but if it's possible I
> think we should support it at least for open-coded iterator, seems
> like a very useful functionality.
>
> So to that end, let's design a small set of input arguments for
> bpf_iter_process_new() that would allow to specify this as flags +
> either (optional) struct task_struct * pointer to represent
> task/process or PID/TGID.
>

Another concern from Alexei was the readability of the API of open-coded
in BPF Program[1].

bpf_for_each(task, curr) is straightforward. Users can easily understand
that this API does the same thing as 'for_each_process' in kernel.

However, if we keep the approach of SEC("iter/task")

enum ITER_ITEM {
ITER_TASK,
ITER_THREAD,
}

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
task_struct *group_task, enum ITER_ITEM type)

the API have to chang:


bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
the system
bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
thread of group_leader
bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
all the process in the system

Useres may guess what are this API actually doing....

So, I'm thinking if we can add a layer of abstraction to hide the
details from the users:

#define bpf_for_each_process(task) \
bpf_for_each(task, curr, NULL, ITERATE_TASK)


It would be nice if you could give me some better suggestions.

Thanks!

[1]
https://lore.kernel.org/lkml/CAADnVQLbDWUxFen-RS67C86sOE5DykEPD8xyihJ2RnG1WEnTQg@mail.gmail.com/

2023-09-15 19:27:58

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs

Hello.

在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>>
>> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
>> allow creation and manipulation of struct bpf_iter_css in open-coded
>> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
>> post}. BPF programs can use these kfuncs through bpf_for_each macro for
>> iteration of all descendant css under a root css.
>>
>> Signed-off-by: Chuyi Zhou <[email protected]>
>> ---
>> include/uapi/linux/bpf.h | 8 +++++
>> kernel/bpf/helpers.c | 6 ++++
>> kernel/bpf/task_iter.c | 53 ++++++++++++++++++++++++++++++++++
>> tools/include/uapi/linux/bpf.h | 8 +++++
>> tools/lib/bpf/bpf_helpers.h | 12 ++++++++
>> 5 files changed, 87 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index befa55b52e29..57760afc13d0 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
>> __u64 __opaque[1];
>> } __attribute__((aligned(8)));
>>
>> +struct bpf_iter_css_pre {
>> + __u64 __opaque[2];
>> +} __attribute__((aligned(8)));
>> +
>> +struct bpf_iter_css_post {
>> + __u64 __opaque[2];
>> +} __attribute__((aligned(8)));
>> +
>> #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 9b7d2c6f99d1..ca1f6404af9e 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>> BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>> BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
>> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 9d1927dc3a06..8963fc779b87 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
>> {
>> }
>>
>> +struct bpf_iter_css_kern {
>> + struct cgroup_subsys_state *root;
>> + struct cgroup_subsys_state *pos;
>> +} __attribute__((aligned(8)));
>> +
>> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
>> + struct cgroup_subsys_state *root)
>
> similar to my comment on previous patches, please see
> kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
> consistent. We have one iterator that accepts parameters defining
> iteration order and starting cgroup. Unless there are some technical
> reasons we can't follow similar approach with this open-coded iter,
> let's use the same approach. We can even reuse
> BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
> BPF_CGROUP_ITER_ANCESTORS_UP enums.
>

I know your concern. It would be nice if we keep consistent with
kernel/bpf/cgroup_iter.c

But this patch actually want to support iterating css
(cgroup_subsys_state) not cgroup (css is more low lever).
With css_iter we can do something like
"for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre"
in BPF Progs which is hard for cgroup_iter. In the future we can use
this iterator to plug some customizable policy in other resource control
system.

BTW, what I did in RFC actually very similar with the approach of
cgroup_iter.
(https://lore.kernel.org/all/[email protected]/).

Thanks.

2023-09-15 22:43:02

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [External] Re: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS

On Thu, Sep 14, 2023 at 10:46 PM Chuyi Zhou <[email protected]> wrote:
>
> Hello.
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <[email protected]> wrote:
> >>
> >>
> >>
> >> 在 2023/9/13 21:53, Chuyi Zhou 写道:
> >>> Hello.
> >>>
> >>> 在 2023/9/12 15:01, Chuyi Zhou 写道:
> >>>> css_iter and process_iter should be used in rcu section. Specifically, in
> >>>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these
> >>>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
> >>>> use them directly.
> >>>>
> >>>> This patch checks whether we are in rcu cs before we want to invoke
> >>>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
> >>>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
> >>>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
> >>>> reject if reg->type is UNTRUSTED.
> >>>
> >>> I use the following BPF Prog to test this patch:
> >>>
> >>> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
> >>> int iter_task_for_each_sleep(void *ctx)
> >>> {
> >>> struct task_struct *task;
> >>> struct task_struct *cur_task = bpf_get_current_task_btf();
> >>>
> >>> if (cur_task->pid != target_pid)
> >>> return 0;
> >>> bpf_rcu_read_lock();
> >>> bpf_for_each(process, task) {
> >>> bpf_rcu_read_unlock();
> >>> if (task->pid == target_pid)
> >>> process_cnt += 1;
> >>> bpf_rcu_read_lock();
> >>> }
> >>> bpf_rcu_read_unlock();
> >>> return 0;
> >>> }
> >>>
> >>> Unfortunately, we can pass the verifier.
> >>>
> >>> Then I add some printk-messages before setting/clearing state to help
> >>> debug:
> >>>
> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>> index d151e6b43a5f..35f3fa9471a9 100644
> >>> --- a/kernel/bpf/verifier.c
> >>> +++ b/kernel/bpf/verifier.c
> >>> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct
> >>> bpf_verifier_env *env,
> >>> __mark_reg_known_zero(st);
> >>> st->type = PTR_TO_STACK; /* we don't have dedicated reg
> >>> type */
> >>> if (is_iter_need_rcu(meta)) {
> >>> + printk("mark reg_addr : %px", st);
> >>> if (in_rcu_cs(env))
> >>> st->type |= MEM_RCU;
> >>> else
> >>> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct
> >>> bpf_verifier_env *env, struct bpf_insn *insn,
> >>> return -EINVAL;
> >>> } else if (rcu_unlock) {
> >>> bpf_for_each_reg_in_vstate(env->cur_state,
> >>> state, reg, ({
> >>> + printk("clear reg_addr : %px MEM_RCU :
> >>> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type &
> >>> PTR_UNTRUSTED);
> >>> if (reg->type & MEM_RCU) {
> >>> - printk("clear reg addr : %lld",
> >>> reg);
> >>> reg->type &= ~(MEM_RCU |
> >>> PTR_MAYBE_NULL);
> >>> reg->type |= PTR_UNTRUSTED;
> >>> }
> >>>
> >>>
> >>> The demsg log:
> >>>
> >>> [ 393.705324] mark reg_addr : ffff88814e40e200
> >>>
> >>> [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>>
> >>> [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>>
> >>> [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0
> >>> PTR_UNTRUSTED : 0
> >>> ....
> >>> ....
> >>>
> >>> I didn't see ffff88814e40e200 is cleared as expected because
> >>> bpf_for_each_reg_in_vstate didn't find it.
> >>>
> >>> It seems when we are doing bpf_read_unlock() in the middle of iteration
> >>> and want to clearing state through bpf_for_each_reg_in_vstate, we can
> >>> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in
> >>> mark_stack_slots_iter().
> >>>
> >>
> >> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in
> >> mark_stack_slots_iter() we has marked the slots *STACK_ITER*
> >>
> >> With the following change, everything seems work OK.
> >>
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index a3236651ec64..83c5ecccadb4 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -387,7 +387,7 @@ struct bpf_verifier_state {
> >>
> >> #define bpf_get_spilled_reg(slot, frame) \
> >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \
> >> - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \
> >> + (frame->stack[slot].slot_type[0] == STACK_SPILL ||
> >> frame->stack[slot].slot_type[0] == STACK_ITER)) \
> >> ? &frame->stack[slot].spilled_ptr : NULL)
> >>
> >> I am not sure whether this would harm some logic implicitly when using
> >> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so,
> >> maybe we should add a extra parameter to control the picking behaviour.
> >>
> >> #define bpf_get_spilled_reg(slot, frame, stack_type)
> >> \
> >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \
> >> (frame->stack[slot].slot_type[0] == stack_type)) \
> >> ? &frame->stack[slot].spilled_ptr : NULL)
> >>
> >> Thanks.
> >
> > I don't think it's safe to just make bpf_get_spilled_reg, and
> > subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg
> > just suddenly start iterating iterator states and/or dynptrs. At least
> > some of existing uses of those assume they are really working just
> > with registers.
>
> IIUC, when we are doing bpf_rcu_unlock, we do need to clear the state of
> reg including STACK_ITER.
>
> Maybe here we only need change the logic when using
> bpf_for_each_reg_in_vstate to clear state in bpf_rcu_unlock and keep
> everything else unchanged ?

Right, maybe. I see 10 uses of bpf_for_each_reg_in_vstate() in
kernel/bpf/verifier.c. Before we change the definition of
bpf_for_each_reg_in_vstate() we should validate that iterating dynptr
and iter states doesn't break any of them, that's all.

>
> Thanks.

2023-09-16 01:39:46

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs

On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <[email protected]> wrote:
>
>
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
> >>
> >> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_process in open-coded iterator
> >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >> iterate all processes in the system.
> >>
> >> Signed-off-by: Chuyi Zhou <[email protected]>
> >> ---
> >> include/uapi/linux/bpf.h | 4 ++++
> >> kernel/bpf/helpers.c | 3 +++
> >> kernel/bpf/task_iter.c | 29 +++++++++++++++++++++++++++++
> >> tools/include/uapi/linux/bpf.h | 4 ++++
> >> tools/lib/bpf/bpf_helpers.h | 5 +++++
> >> 5 files changed, 45 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index de02c0971428..befa55b52e29 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
> >> __u64 __opaque[1];
> >> } __attribute__((aligned(8)));
> >>
> >> +struct bpf_iter_process {
> >> + __u64 __opaque[1];
> >> +} __attribute__((aligned(8)));
> >> +
> >> #endif /* _UAPI__LINUX_BPF_H__ */
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index d6a16becfbb9..9b7d2c6f99d1 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index d8539cc05ffd..9d1927dc3a06 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> >> kfree(kit->css_it);
> >> }
> >>
> >> +struct bpf_iter_process_kern {
> >> + struct task_struct *tsk;
> >> +} __attribute__((aligned(8)));
> >> +
> >
> > Few high level thoughts. I think it would be good to follow
> > SEC("iter/task") naming and approach. Open-coded iterators in many
> > ways are in-kernel counterpart to iterator programs, so keeping them
> > close enough within reason is useful for knowledge transfer.
> >
> > SEC("iter/task") allows to:
> > a) iterate all threads in the system
> > b) iterate all threads for a given TGID
> > c) it also allows to "iterate" a single thread or process, but that's
> > a bit less relevant for in-kernel iterator, but we can still support
> > them, why not?
> >
> > I'm not sure if it supports iterating all processes (as in group
> > leaders of each task group) in the system, but if it's possible I
> > think we should support it at least for open-coded iterator, seems
> > like a very useful functionality.
> >
> > So to that end, let's design a small set of input arguments for
> > bpf_iter_process_new() that would allow to specify this as flags +
> > either (optional) struct task_struct * pointer to represent
> > task/process or PID/TGID.
> >
>
> Another concern from Alexei was the readability of the API of open-coded
> in BPF Program[1].
>
> bpf_for_each(task, curr) is straightforward. Users can easily understand
> that this API does the same thing as 'for_each_process' in kernel.

In general, users might have no idea about for_each_process macro in
the kernel, so I don't find this particular argument very convincing.

We can add a separate set of iterator kfuncs for every useful
combination of conditions, of course, but it's a double-edged sword.
Needing to use a different iterator just to specify a different
direction of cgroup iteration (from the example you referred in [1])
also means that it's now harder to write some generic function that
needs to do something for all cgroups matching some criteria where the
order might be coming as an argument.

Similarly for task iterators. It's not hard to imagine some processing
that can be equivalently done per thread or per process in the system,
or on each thread of the process, depending on some conditions or
external configuration. Having to do three different
bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
nature of the thing that is iterated over is the same, and it's just a
different set of filters to specify which subset of those items should
be iterated, I think it's better to try to stick to the same iterator
with few simple arguments. IMO, of course, there is no objectively
best approach.

>
> However, if we keep the approach of SEC("iter/task")
>
> enum ITER_ITEM {
> ITER_TASK,
> ITER_THREAD,
> }
>
> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
> task_struct *group_task, enum ITER_ITEM type)
>
> the API have to chang:
>
>
> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
> the system
> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
> thread of group_leader
> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
> all the process in the system
>
> Useres may guess what are this API actually doing....

I'd expect users to consult documentation before trying to use an
unfamiliar cutting-edge functionality. So let's try to keep
documentation clear and up to the point. Extra flag argument doesn't
seem to be a big deal.


>
> So, I'm thinking if we can add a layer of abstraction to hide the
> details from the users:
>
> #define bpf_for_each_process(task) \
> bpf_for_each(task, curr, NULL, ITERATE_TASK)
>
>
> It would be nice if you could give me some better suggestions.

No, please no. This macro wrapper is useless and just obfuscates what
is going on. If users think it's helpful for them, they can trivially
define it for their own code.

>
> Thanks!
>
> [1]
> https://lore.kernel.org/lkml/CAADnVQLbDWUxFen-RS67C86sOE5DykEPD8xyihJ2RnG1WEnTQg@mail.gmail.com/

2023-09-16 02:50:30

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs

On Fri, Sep 15, 2023 at 4:57 AM Chuyi Zhou <[email protected]> wrote:
>
> Hello.
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
> >>
> >> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
> >> allow creation and manipulation of struct bpf_iter_css in open-coded
> >> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
> >> post}. BPF programs can use these kfuncs through bpf_for_each macro for
> >> iteration of all descendant css under a root css.
> >>
> >> Signed-off-by: Chuyi Zhou <[email protected]>
> >> ---
> >> include/uapi/linux/bpf.h | 8 +++++
> >> kernel/bpf/helpers.c | 6 ++++
> >> kernel/bpf/task_iter.c | 53 ++++++++++++++++++++++++++++++++++
> >> tools/include/uapi/linux/bpf.h | 8 +++++
> >> tools/lib/bpf/bpf_helpers.h | 12 ++++++++
> >> 5 files changed, 87 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index befa55b52e29..57760afc13d0 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
> >> __u64 __opaque[1];
> >> } __attribute__((aligned(8)));
> >>
> >> +struct bpf_iter_css_pre {
> >> + __u64 __opaque[2];
> >> +} __attribute__((aligned(8)));
> >> +
> >> +struct bpf_iter_css_post {
> >> + __u64 __opaque[2];
> >> +} __attribute__((aligned(8)));
> >> +
> >> #endif /* _UAPI__LINUX_BPF_H__ */
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index 9b7d2c6f99d1..ca1f6404af9e 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> >> BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> >> BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index 9d1927dc3a06..8963fc779b87 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> >> {
> >> }
> >>
> >> +struct bpf_iter_css_kern {
> >> + struct cgroup_subsys_state *root;
> >> + struct cgroup_subsys_state *pos;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> >> + struct cgroup_subsys_state *root)
> >
> > similar to my comment on previous patches, please see
> > kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
> > consistent. We have one iterator that accepts parameters defining
> > iteration order and starting cgroup. Unless there are some technical
> > reasons we can't follow similar approach with this open-coded iter,
> > let's use the same approach. We can even reuse
> > BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
> > BPF_CGROUP_ITER_ANCESTORS_UP enums.
> >
>
> I know your concern. It would be nice if we keep consistent with
> kernel/bpf/cgroup_iter.c
>
> But this patch actually want to support iterating css
> (cgroup_subsys_state) not cgroup (css is more low lever).
> With css_iter we can do something like
> "for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre"
> in BPF Progs which is hard for cgroup_iter. In the future we can use
> this iterator to plug some customizable policy in other resource control
> system.

That's fine if it's not exactly cgroup iter and returns a different
kernel object. But let's at least consistently use
BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST/BPF_CGROUP_ITER_ANCESTORS_UP
approach as a way to specify iteration order?

>
> BTW, what I did in RFC actually very similar with the approach of
> cgroup_iter.
> (https://lore.kernel.org/all/[email protected]/).
>
> Thanks.

2023-09-16 16:39:40

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs

Hello.

在 2023/9/16 04:37, Andrii Nakryiko 写道:
> On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <[email protected]> wrote:
>> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
>>> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>>>>
>>>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
>>>> creation and manipulation of struct bpf_iter_process in open-coded iterator
>>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>>>> iterate all processes in the system.
>>>>
[...cut...]
>>>
>>> Few high level thoughts. I think it would be good to follow
>>> SEC("iter/task") naming and approach. Open-coded iterators in many
>>> ways are in-kernel counterpart to iterator programs, so keeping them
>>> close enough within reason is useful for knowledge transfer.
>>>
>>> SEC("iter/task") allows to:
>>> a) iterate all threads in the system
>>> b) iterate all threads for a given TGID
>>> c) it also allows to "iterate" a single thread or process, but that's
>>> a bit less relevant for in-kernel iterator, but we can still support
>>> them, why not?
>>>
>>> I'm not sure if it supports iterating all processes (as in group
>>> leaders of each task group) in the system, but if it's possible I
>>> think we should support it at least for open-coded iterator, seems
>>> like a very useful functionality.
>>>
>>> So to that end, let's design a small set of input arguments for
>>> bpf_iter_process_new() that would allow to specify this as flags +
>>> either (optional) struct task_struct * pointer to represent
>>> task/process or PID/TGID.
>>>
>>
>> Another concern from Alexei was the readability of the API of open-coded
>> in BPF Program[1].
>>
>> bpf_for_each(task, curr) is straightforward. Users can easily understand
>> that this API does the same thing as 'for_each_process' in kernel.
>
> In general, users might have no idea about for_each_process macro in
> the kernel, so I don't find this particular argument very convincing.
>
> We can add a separate set of iterator kfuncs for every useful
> combination of conditions, of course, but it's a double-edged sword.
> Needing to use a different iterator just to specify a different
> direction of cgroup iteration (from the example you referred in [1])
> also means that it's now harder to write some generic function that
> needs to do something for all cgroups matching some criteria where the
> order might be coming as an argument.
>
> Similarly for task iterators. It's not hard to imagine some processing
> that can be equivalently done per thread or per process in the system,
> or on each thread of the process, depending on some conditions or
> external configuration. Having to do three different
> bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
> nature of the thing that is iterated over is the same, and it's just a
> different set of filters to specify which subset of those items should
> be iterated, I think it's better to try to stick to the same iterator
> with few simple arguments. IMO, of course, there is no objectively
> best approach.
>
>>
>> However, if we keep the approach of SEC("iter/task")
>>
>> enum ITER_ITEM {
>> ITER_TASK,
>> ITER_THREAD,
>> }
>>
>> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
>> task_struct *group_task, enum ITER_ITEM type)
>>
>> the API have to chang:
>>
>>
>> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
>> the system
>> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
>> thread of group_leader
>> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
>> all the process in the system
>>
>> Useres may guess what are this API actually doing....
>
> I'd expect users to consult documentation before trying to use an
> unfamiliar cutting-edge functionality. So let's try to keep
> documentation clear and up to the point. Extra flag argument doesn't
> seem to be a big deal.

Thanks for your suggestion!

Before we begin working on the next version, I have outlined a detailed
API design here:

1.task_iter

It will be used to iterate process/threads like SEC("iter/task"). Here
we should better to follow the naming and approach SEC("iter/task"):

enum {
ITERATE_PROCESS,
ITERATE_THREAD,
}

__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct
task_struct *task, int flag);

If we want to iterate all processes in the system, the iteration will
start from the *task* which is passed from user.(since process in the
system are connected through a linked list)

Additionally, the *task* can allow users to specify iterating all
threads within a task group.

SEC("xxx")
int xxxx(void *ctx)
{
struct task_struct *pos;
struct task_struct *cur_task = bpf_get_current_task_btf();

bpf_rcu_read_lock();

// iterating all process in the system start from cur_task
bpf_for_each(task, pos, cur_task, ITERATE_PROCESS) {

}

// iterate all thread belongs to cur_task group.
bpf_for_each(task, pos, cur_task, ITERATE_THREAD) {

}

bpf_rcu_read_unlock();
return 0;
}

Iterating all thread of each process is great(ITERATE_ALL). But maybe
let's break it down step by step and implement
ITERATE_PROCESS/ITERATE_THREAD first? (I'm little worried about the cpu
overhead of ITERATE_ALL, since we are doing a heavy job in BPF Prog)

I wanted to reuse BPF_TASK_ITER_ALL/BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID
insted of new enums like ITERATE_PROCESS/ITERATE_THREAD. But it seems
necessary. In BPF Prog, we usually operate task_struct directly instead
of pid/tgid. It's a little weird to use
BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID here:

bpf_for_each(task, pos, cur_task, BPF_TASK_ITER_TID) {
}

On the other hand,
BPF_TASK_ITER_ALL/BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID are inner flags
that are hidden from the users.
Exposing ITERATE_PROCESS/ITERATE_THREAD will not cause confusion to user.


2. css_iter.

css_iter will be used to:
(1) iterating subsystem, like
for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
(2) iterating cgroup. (patch-6's selfetest has a basic example)

css(cgroup_subsys_state) is more fundamental than struct cgroup. I think
we'd better operating css rather than cgroup, since it's can be hard for
cgroup_iter to achive (2). So here we keep the name of "css_iter",
BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST/BPF_CGROUP_ITER_ANCESTORS_UP
can be reused.


__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
struct cgroup_subsys_state *root, unsigned int flag)

bpf_for_each(css, root, BPF_CGROUP_ITER_DESCENDANTS_PRE)

Thanks.




2023-09-18 21:42:45

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 1/6] cgroup: Prepare for using css_task_iter_*() in BPF

On Tue, Sep 12, 2023 at 03:01:44PM +0800, Chuyi Zhou wrote:
> This patch makes some preparations for using css_task_iter_*() in BPF
> Program.
>
> 1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to
> use them. Convert them to enum so bpf prog can take them from vmlinux.h.
>
> 2. In the next patch we will add css_task_iter_*() in common kfuncs which
> is not safe. Since css_task_iter_*() does spin_unlock_irq() which might
> screw up irq flags depending on the context where bpf prog is running.
> So we should use irqsave/irqrestore here and the switching is harmless.
>
> Signed-off-by: Chuyi Zhou <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2023-09-20 01:36:59

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs

On Sat, Sep 16, 2023 at 7:03 AM Chuyi Zhou <[email protected]> wrote:
>
> Hello.
>
> 在 2023/9/16 04:37, Andrii Nakryiko 写道:
> > On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <[email protected]> wrote:
> >> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> >>> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
> >>>>
> >>>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> >>>> creation and manipulation of struct bpf_iter_process in open-coded iterator
> >>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >>>> iterate all processes in the system.
> >>>>
> [...cut...]
> >>>
> >>> Few high level thoughts. I think it would be good to follow
> >>> SEC("iter/task") naming and approach. Open-coded iterators in many
> >>> ways are in-kernel counterpart to iterator programs, so keeping them
> >>> close enough within reason is useful for knowledge transfer.
> >>>
> >>> SEC("iter/task") allows to:
> >>> a) iterate all threads in the system
> >>> b) iterate all threads for a given TGID
> >>> c) it also allows to "iterate" a single thread or process, but that's
> >>> a bit less relevant for in-kernel iterator, but we can still support
> >>> them, why not?
> >>>
> >>> I'm not sure if it supports iterating all processes (as in group
> >>> leaders of each task group) in the system, but if it's possible I
> >>> think we should support it at least for open-coded iterator, seems
> >>> like a very useful functionality.
> >>>
> >>> So to that end, let's design a small set of input arguments for
> >>> bpf_iter_process_new() that would allow to specify this as flags +
> >>> either (optional) struct task_struct * pointer to represent
> >>> task/process or PID/TGID.
> >>>
> >>
> >> Another concern from Alexei was the readability of the API of open-coded
> >> in BPF Program[1].
> >>
> >> bpf_for_each(task, curr) is straightforward. Users can easily understand
> >> that this API does the same thing as 'for_each_process' in kernel.
> >
> > In general, users might have no idea about for_each_process macro in
> > the kernel, so I don't find this particular argument very convincing.
> >
> > We can add a separate set of iterator kfuncs for every useful
> > combination of conditions, of course, but it's a double-edged sword.
> > Needing to use a different iterator just to specify a different
> > direction of cgroup iteration (from the example you referred in [1])
> > also means that it's now harder to write some generic function that
> > needs to do something for all cgroups matching some criteria where the
> > order might be coming as an argument.
> >
> > Similarly for task iterators. It's not hard to imagine some processing
> > that can be equivalently done per thread or per process in the system,
> > or on each thread of the process, depending on some conditions or
> > external configuration. Having to do three different
> > bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
> > nature of the thing that is iterated over is the same, and it's just a
> > different set of filters to specify which subset of those items should
> > be iterated, I think it's better to try to stick to the same iterator
> > with few simple arguments. IMO, of course, there is no objectively
> > best approach.
> >
> >>
> >> However, if we keep the approach of SEC("iter/task")
> >>
> >> enum ITER_ITEM {
> >> ITER_TASK,
> >> ITER_THREAD,
> >> }
> >>
> >> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
> >> task_struct *group_task, enum ITER_ITEM type)
> >>
> >> the API have to chang:
> >>
> >>
> >> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
> >> the system
> >> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
> >> thread of group_leader
> >> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
> >> all the process in the system
> >>
> >> Useres may guess what are this API actually doing....
> >
> > I'd expect users to consult documentation before trying to use an
> > unfamiliar cutting-edge functionality. So let's try to keep
> > documentation clear and up to the point. Extra flag argument doesn't
> > seem to be a big deal.
>
> Thanks for your suggestion!
>
> Before we begin working on the next version, I have outlined a detailed
> API design here:
>
> 1.task_iter
>
> It will be used to iterate process/threads like SEC("iter/task"). Here
> we should better to follow the naming and approach SEC("iter/task"):
>
> enum {
> ITERATE_PROCESS,
> ITERATE_THREAD,
> }
>
> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct
> task_struct *task, int flag);
>
> If we want to iterate all processes in the system, the iteration will
> start from the *task* which is passed from user.(since process in the
> system are connected through a linked list)

but will go through all of them anyways, right? it's kind of
surprising from usability standpoint to have to pass some task_struct
to iterate all of them, tbh. I wonder if it's hard to adjust kfunc
validation to allow "nullable" pointers? We can look at that
separately, of course.

>
> Additionally, the *task* can allow users to specify iterating all
> threads within a task group.
>
> SEC("xxx")
> int xxxx(void *ctx)
> {
> struct task_struct *pos;
> struct task_struct *cur_task = bpf_get_current_task_btf();
>
> bpf_rcu_read_lock();
>
> // iterating all process in the system start from cur_task
> bpf_for_each(task, pos, cur_task, ITERATE_PROCESS) {
>
> }
>
> // iterate all thread belongs to cur_task group.
> bpf_for_each(task, pos, cur_task, ITERATE_THREAD) {
>
> }
>
> bpf_rcu_read_unlock();
> return 0;
> }
>
> Iterating all thread of each process is great(ITERATE_ALL). But maybe
> let's break it down step by step and implement
> ITERATE_PROCESS/ITERATE_THREAD first? (I'm little worried about the cpu
> overhead of ITERATE_ALL, since we are doing a heavy job in BPF Prog)
>

Hm... but if it was a sleepable BPF program and
bpf_rcu_read_{lock,unlock}() was only per task, then it shouldn't be
the problem? See enum bpf_cgroup_iter_order.


> I wanted to reuse BPF_TASK_ITER_ALL/BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID
> insted of new enums like ITERATE_PROCESS/ITERATE_THREAD. But it seems
> necessary. In BPF Prog, we usually operate task_struct directly instead
> of pid/tgid. It's a little weird to use
> BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID here:

enum bpf_iter_task_type is internal type, so we can rename
BPF_TASK_ITER_TID to BPF_TASK_ITER_THREAD and BPF_TASK_ITER_PROC (or
add them as aliases). At the very least, we should use consistent
BPF_TASK_ITER_xxx naming, instead of just ITERATE_PROCESS. See

>
> bpf_for_each(task, pos, cur_task, BPF_TASK_ITER_TID) {
> }
>
> On the other hand,
> BPF_TASK_ITER_ALL/BPF_TASK_ITER_TID/BPF_TASK_ITER_TGID are inner flags
> that are hidden from the users.
> Exposing ITERATE_PROCESS/ITERATE_THREAD will not cause confusion to user.
>

inner types are not a problem when used with vmlinux.h


>
> 2. css_iter.
>
> css_iter will be used to:
> (1) iterating subsystem, like
> for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
> (2) iterating cgroup. (patch-6's selfetest has a basic example)
>
> css(cgroup_subsys_state) is more fundamental than struct cgroup. I think
> we'd better operating css rather than cgroup, since it's can be hard for
> cgroup_iter to achive (2). So here we keep the name of "css_iter",
> BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST/BPF_CGROUP_ITER_ANCESTORS_UP
> can be reused.
>
>
> __bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
> struct cgroup_subsys_state *root, unsigned int flag)
>
> bpf_for_each(css, root, BPF_CGROUP_ITER_DESCENDANTS_PRE)
>

Makes sense, yep, thanks.

> Thanks.
>
>
>
>

2023-09-20 21:43:05

by Chuyi Zhou

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs



在 2023/9/20 07:30, Andrii Nakryiko 写道:
> On Sat, Sep 16, 2023 at 7:03 AM Chuyi Zhou <[email protected]> wrote:
>>
>> Hello.
>>
>> 在 2023/9/16 04:37, Andrii Nakryiko 写道:
>>> On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <[email protected]> wrote:
>>>> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
>>>>> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <[email protected]> wrote:
>>>>>>
>>>>>> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
>>>>>> creation and manipulation of struct bpf_iter_process in open-coded iterator
>>>>>> style. BPF programs can use these kfuncs or through bpf_for_each macro to
>>>>>> iterate all processes in the system.
>>>>>>
>> [...cut...]
>>>>>
>>>>> Few high level thoughts. I think it would be good to follow
>>>>> SEC("iter/task") naming and approach. Open-coded iterators in many
>>>>> ways are in-kernel counterpart to iterator programs, so keeping them
>>>>> close enough within reason is useful for knowledge transfer.
>>>>>
>>>>> SEC("iter/task") allows to:
>>>>> a) iterate all threads in the system
>>>>> b) iterate all threads for a given TGID
>>>>> c) it also allows to "iterate" a single thread or process, but that's
>>>>> a bit less relevant for in-kernel iterator, but we can still support
>>>>> them, why not?
>>>>>
>>>>> I'm not sure if it supports iterating all processes (as in group
>>>>> leaders of each task group) in the system, but if it's possible I
>>>>> think we should support it at least for open-coded iterator, seems
>>>>> like a very useful functionality.
>>>>>
>>>>> So to that end, let's design a small set of input arguments for
>>>>> bpf_iter_process_new() that would allow to specify this as flags +
>>>>> either (optional) struct task_struct * pointer to represent
>>>>> task/process or PID/TGID.
>>>>>
>>>>
>>>> Another concern from Alexei was the readability of the API of open-coded
>>>> in BPF Program[1].
>>>>
>>>> bpf_for_each(task, curr) is straightforward. Users can easily understand
>>>> that this API does the same thing as 'for_each_process' in kernel.
>>>
>>> In general, users might have no idea about for_each_process macro in
>>> the kernel, so I don't find this particular argument very convincing.
>>>
>>> We can add a separate set of iterator kfuncs for every useful
>>> combination of conditions, of course, but it's a double-edged sword.
>>> Needing to use a different iterator just to specify a different
>>> direction of cgroup iteration (from the example you referred in [1])
>>> also means that it's now harder to write some generic function that
>>> needs to do something for all cgroups matching some criteria where the
>>> order might be coming as an argument.
>>>
>>> Similarly for task iterators. It's not hard to imagine some processing
>>> that can be equivalently done per thread or per process in the system,
>>> or on each thread of the process, depending on some conditions or
>>> external configuration. Having to do three different
>>> bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
>>> nature of the thing that is iterated over is the same, and it's just a
>>> different set of filters to specify which subset of those items should
>>> be iterated, I think it's better to try to stick to the same iterator
>>> with few simple arguments. IMO, of course, there is no objectively
>>> best approach.
>>>
>>>>
>>>> However, if we keep the approach of SEC("iter/task")
>>>>
>>>> enum ITER_ITEM {
>>>> ITER_TASK,
>>>> ITER_THREAD,
>>>> }
>>>>
>>>> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
>>>> task_struct *group_task, enum ITER_ITEM type)
>>>>
>>>> the API have to chang:
>>>>
>>>>
>>>> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
>>>> the system
>>>> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
>>>> thread of group_leader
>>>> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
>>>> all the process in the system
>>>>
>>>> Useres may guess what are this API actually doing....
>>>
>>> I'd expect users to consult documentation before trying to use an
>>> unfamiliar cutting-edge functionality. So let's try to keep
>>> documentation clear and up to the point. Extra flag argument doesn't
>>> seem to be a big deal.
>>
>> Thanks for your suggestion!
>>
>> Before we begin working on the next version, I have outlined a detailed
>> API design here:
>>
>> 1.task_iter
>>
>> It will be used to iterate process/threads like SEC("iter/task"). Here
>> we should better to follow the naming and approach SEC("iter/task"):
>>
>> enum {
>> ITERATE_PROCESS,
>> ITERATE_THREAD,
>> }
>>
>> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct
>> task_struct *task, int flag);
>>
>> If we want to iterate all processes in the system, the iteration will
>> start from the *task* which is passed from user.(since process in the
>> system are connected through a linked list)
>
> but will go through all of them anyways, right? it's kind of
> surprising from usability standpoint to have to pass some task_struct
> to iterate all of them, tbh. I wonder if it's hard to adjust kfunc
> validation to allow "nullable" pointers? We can look at that
> separately, of course.
>

Yes, little subtle here. When we want to iterate threads of a task, we
may want the verifier to ensure the *task* is a valid pointer, which
would require KF_TRUSTED_ARGS. However, when we want to iterate all
process/threads in the system, we want to accept a null pointer.

Anyway, I will try to explore a workaround in the verifier here.

Thanks.