by kernel test robot

[permalink] [raw]

On Wed, Mar 13, 2024 at 12:42 AM Alexei Starovoitov
<[email protected]> wrote:
>
> On Mon, Mar 11, 2024 at 7:42 PM 梦龙董 <[email protected]> wrote:
> >
[......]
>
> I see.
> I thought you're sharing the trampoline across attachments.
> (since bpf prog is the same).

That seems to be a good idea, which I hadn't thought before.

> But above approach cannot possibly work with a shared trampoline.
> You need to create individual trampoline for all attachment
> and point them to single bpf prog.
>
> tbh I'm less excited about this feature now, since sharing
> the prog across different attachments is nice, but it won't scale
> to thousands of attachments.
> I assumed that there will be a single trampoline with max(argno)
> across attachments and attach/detach will scale to thousands.
>
> With individual trampoline this will work for up to a hundred
> attachments max.

What does "a hundred attachments max" means? Can't I
trace thousands of kernel functions with a bpf program of
tracing multi-link?

>
> Let's step back.
> What is the exact use case you're trying to solve?
> Not an artificial one as selftest in patch 9, but the real use case?

I have a tool, which is used to diagnose network problems,
and its name is "nettrace". It will trace many kernel functions, whose
function args contain "skb", like this:

/nettrace -p icmp
begin trace...
***************** ffff889be8fbd500,ffff889be8fbcd00 ***************
[1272349.614564] [dev_gro_receive ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614579] [__netif_receive_skb_core] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614585] [ip_rcv ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614592] [ip_rcv_core ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614599] [skb_clone ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614616] [nf_hook_slow ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614629] [nft_do_chain ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614635] [ip_rcv_finish ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614643] [ip_route_input_slow ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614647] [fib_validate_source ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614652] [ip_local_deliver ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614658] [nf_hook_slow ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614663] [ip_local_deliver_finish] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614666] [icmp_rcv ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614671] [icmp_echo ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614675] [icmp_reply ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614715] [consume_skb ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614722] [packet_rcv ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220
[1272349.614725] [consume_skb ] ICMP: 169.254.128.15 ->
172.27.0.6 ping request, seq: 48220

For now, I have to create a bpf program for every kernel
function that I want to trace, which is up to 200.

With this multi-link, I only need to create 5 bpf program,
like this:

int BPF_PROG(trace_skb_1, struct *skb);
int BPF_PROG(trace_skb_2, u64 arg0, struct *skb);
int BPF_PROG(trace_skb_3, u64 arg0, u64 arg1, struct *skb);
int BPF_PROG(trace_skb_4, u64 arg0, u64 arg1, u64 arg2, struct *skb);
int BPF_PROG(trace_skb_5, u64 arg0, u64 arg1, u64 arg2, u64 arg3, struct *skb);

Then, I can attach trace_skb_1 to all the kernel functions that
I want to trace and whose first arg is skb; attach trace_skb_2 to kernel
functions whose 2nd arg is skb, etc.

Or, I can create only one bpf program and store the index
of skb to the attachment cookie, and attach this program to all
the kernel functions that I want to trace.

This is my use case. With the multi-link, now I only have
1 bpf program, 1 bpf link, 200 trampolines, instead of 200
bpf programs, 200 bpf link and 200 trampolines.

The shared trampoline you mentioned seems to be a
wonderful idea, which can make the 200 trampolines
to one. Let me have a look, we create a trampoline and
record the max args count of all the target functions, let's
mark it as arg_count.

During generating the trampoline, we assume that the
function args count is arg_count. During attaching, we
check the consistency of all the target functions, just like
what we do now.

Am I right?

Thanks!
Menglong Dong

2024-03-12 01:49:29

by Alexei Starovoitov

[permalink] [raw]

Subject: Re: [PATCH bpf-next v2 2/9] bpf: refactor the modules_array to ptr_array

On Mon, Mar 11, 2024 at 2:34 AM Menglong Dong
<[email protected]> wrote:
>
> Refactor the struct modules_array to more general struct ptr_array, which
> is used to store the pointers.
>
> Meanwhiles, introduce the bpf_try_add_ptr(), which checks the existing of
> the ptr before adding it to the array.
>
> Seems it should be moved to another files in "lib", and I'm not sure where
> to add it now, and let's move it to kernel/bpf/syscall.c for now.
>
> Signed-off-by: Menglong Dong <[email protected]>
> ---
> include/linux/bpf.h | 10 +++++++++
> kernel/bpf/syscall.c | 37 +++++++++++++++++++++++++++++++
> kernel/trace/bpf_trace.c | 48 ++++++----------------------------------
> 3 files changed, 54 insertions(+), 41 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 0f677fdcfcc7..997765cdf474 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -304,6 +304,16 @@ struct bpf_map {
> s64 __percpu *elem_count;
> };
>
> +struct ptr_array {
> + void **ptrs;
> + int cnt;
> + int cap;
> +};
> +
> +int bpf_add_ptr(struct ptr_array *arr, void *ptr);
> +bool bpf_has_ptr(struct ptr_array *arr, struct module *mod);
> +int bpf_try_add_ptr(struct ptr_array *arr, void *ptr);
> +
> static inline const char *btf_field_type_name(enum btf_field_type type)
> {
> switch (type) {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index f63f4da4db5e..4f230fd1f8e4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -479,6 +479,43 @@ static void bpf_map_release_memcg(struct bpf_map *map)
> }
> #endif
>
> +int bpf_add_ptr(struct ptr_array *arr, void *ptr)
> +{
> + void **ptrs;
> +
> + if (arr->cnt == arr->cap) {
> + arr->cap = max(16, arr->cap * 3 / 2);
> + ptrs = krealloc_array(arr->ptrs, arr->cap, sizeof(*ptrs), GFP_KERNEL);
> + if (!ptrs)
> + return -ENOMEM;
> + arr->ptrs = ptrs;
> + }
> +
> + arr->ptrs[arr->cnt] = ptr;
> + arr->cnt++;
> + return 0;
> +}
> +
> +bool bpf_has_ptr(struct ptr_array *arr, struct module *mod)

Don't you need 'void *mod' here?

> +{
> + int i;
> +
> + for (i = arr->cnt - 1; i >= 0; i--) {
> + if (arr->ptrs[i] == mod)
> + return true;
> + }
> + return false;
> +}

..

> - kprobe_multi_put_modules(arr.mods, arr.mods_cnt);
> - kfree(arr.mods);
> + kprobe_multi_put_modules((struct module **)arr.ptrs, arr.cnt);

Do you really need to type cast? Compiler doesn't convert void**
automatically?