2024-04-22 07:22:33

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

Arrays of progs are underlying using regular arrays, but they can only
be updated from a syscall.
Therefore, they should be safe to use while in a sleepable context.

This is required to be able to call bpf_tail_call() from a sleepable
tracing bpf program.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
Hi,

a small patch to allow to have:

```
SEC("fmod_ret.s/__hid_bpf_tail_call_sleepable")
int BPF_PROG(hid_tail_call_sleepable, struct hid_bpf_ctx *hctx)
{
bpf_tail_call(ctx, &hid_jmp_table, hctx->index);

return 0;
}
```

This should allow me to add bpf hooks to functions that communicate with
the hardware.

Cheers,
Benjamin
---
kernel/bpf/verifier.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 68cfd6fc6ad4..880b32795136 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18171,6 +18171,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
case BPF_MAP_TYPE_QUEUE:
case BPF_MAP_TYPE_STACK:
case BPF_MAP_TYPE_ARENA:
+ case BPF_MAP_TYPE_PROG_ARRAY:
break;
default:
verbose(env,

---
base-commit: 735f5b8a7ccf383e50d76f7d1c25769eee474812
change-id: 20240422-sleepable_array_progs-e0c07b17cabb

Best regards,
--
Benjamin Tissoires <[email protected]>



2024-04-22 15:44:52

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On 4/22/24 9:16 AM, Benjamin Tissoires wrote:
> Arrays of progs are underlying using regular arrays, but they can only
> be updated from a syscall.
> Therefore, they should be safe to use while in a sleepable context.
>
> This is required to be able to call bpf_tail_call() from a sleepable
> tracing bpf program.
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
> ---
> Hi,
>
> a small patch to allow to have:
>
> ```
> SEC("fmod_ret.s/__hid_bpf_tail_call_sleepable")
> int BPF_PROG(hid_tail_call_sleepable, struct hid_bpf_ctx *hctx)
> {
> bpf_tail_call(ctx, &hid_jmp_table, hctx->index);
>
> return 0;
> }
> ```
>
> This should allow me to add bpf hooks to functions that communicate with
> the hardware.

Could you also add selftests to it? In particular, I'm thinking that this is not
sufficient given also bpf_prog_map_compatible() needs to be extended to check on
prog->sleepable. For example we would need to disallow calling sleepable programs
in that map from non-sleepable context.

> kernel/bpf/verifier.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 68cfd6fc6ad4..880b32795136 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -18171,6 +18171,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
> case BPF_MAP_TYPE_QUEUE:
> case BPF_MAP_TYPE_STACK:
> case BPF_MAP_TYPE_ARENA:
> + case BPF_MAP_TYPE_PROG_ARRAY:
> break;
> default:
> verbose(env,
>
> ---
> base-commit: 735f5b8a7ccf383e50d76f7d1c25769eee474812
> change-id: 20240422-sleepable_array_progs-e0c07b17cabb
>
> Best regards,
>


2024-04-22 17:08:16

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On Apr 22 2024, Daniel Borkmann wrote:
> On 4/22/24 9:16 AM, Benjamin Tissoires wrote:
> > Arrays of progs are underlying using regular arrays, but they can only
> > be updated from a syscall.
> > Therefore, they should be safe to use while in a sleepable context.
> >
> > This is required to be able to call bpf_tail_call() from a sleepable
> > tracing bpf program.
> >
> > Signed-off-by: Benjamin Tissoires <[email protected]>
> > ---
> > Hi,
> >
> > a small patch to allow to have:
> >
> > ```
> > SEC("fmod_ret.s/__hid_bpf_tail_call_sleepable")
> > int BPF_PROG(hid_tail_call_sleepable, struct hid_bpf_ctx *hctx)
> > {
> > bpf_tail_call(ctx, &hid_jmp_table, hctx->index);
> >
> > return 0;
> > }
> > ```
> >
> > This should allow me to add bpf hooks to functions that communicate with
> > the hardware.
>
> Could you also add selftests to it? In particular, I'm thinking that this is not
> sufficient given also bpf_prog_map_compatible() needs to be extended to check on
> prog->sleepable. For example we would need to disallow calling sleepable programs
> in that map from non-sleepable context.

Just to be sure, if I have to change bpf_prog_map_compatible(), that
means that a prog array map can only have sleepable or non-sleepable
programs, but not both at the same time?

FWIW, indeed, I just tested and the BPF verifier/core is happy with this
patch only if the bpf_tail_call is issued from a non-sleepable context
(and crashes as expected).

But that seems to be a different issue TBH: I can store a sleepable BPF
program in a prog array and run it from a non sleepable context. I don't
need the patch at all as bpf_tail_call() is normally declared. I assume
your suggestion to change bpf_prog_map_compatible() will fix that part.

I'll digg some more tomorrow.

Cheers,
Benjamin

>
> > kernel/bpf/verifier.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 68cfd6fc6ad4..880b32795136 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -18171,6 +18171,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
> > case BPF_MAP_TYPE_QUEUE:
> > case BPF_MAP_TYPE_STACK:
> > case BPF_MAP_TYPE_ARENA:
> > + case BPF_MAP_TYPE_PROG_ARRAY:
> > break;
> > default:
> > verbose(env,
> >
> > ---
> > base-commit: 735f5b8a7ccf383e50d76f7d1c25769eee474812
> > change-id: 20240422-sleepable_array_progs-e0c07b17cabb
> >
> > Best regards,
> >
>

2024-04-24 14:28:13

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On Apr 22 2024, Benjamin Tissoires wrote:
> On Apr 22 2024, Daniel Borkmann wrote:
> > On 4/22/24 9:16 AM, Benjamin Tissoires wrote:
> > > Arrays of progs are underlying using regular arrays, but they can only
> > > be updated from a syscall.
> > > Therefore, they should be safe to use while in a sleepable context.
> > >
> > > This is required to be able to call bpf_tail_call() from a sleepable
> > > tracing bpf program.
> > >
> > > Signed-off-by: Benjamin Tissoires <[email protected]>
> > > ---
> > > Hi,
> > >
> > > a small patch to allow to have:
> > >
> > > ```
> > > SEC("fmod_ret.s/__hid_bpf_tail_call_sleepable")
> > > int BPF_PROG(hid_tail_call_sleepable, struct hid_bpf_ctx *hctx)
> > > {
> > > bpf_tail_call(ctx, &hid_jmp_table, hctx->index);
> > >
> > > return 0;
> > > }
> > > ```
> > >
> > > This should allow me to add bpf hooks to functions that communicate with
> > > the hardware.
> >
> > Could you also add selftests to it? In particular, I'm thinking that this is not
> > sufficient given also bpf_prog_map_compatible() needs to be extended to check on
> > prog->sleepable. For example we would need to disallow calling sleepable programs
> > in that map from non-sleepable context.
>
> Just to be sure, if I have to change bpf_prog_map_compatible(), that
> means that a prog array map can only have sleepable or non-sleepable
> programs, but not both at the same time?
>
> FWIW, indeed, I just tested and the BPF verifier/core is happy with this
> patch only if the bpf_tail_call is issued from a non-sleepable context
> (and crashes as expected).
>
> But that seems to be a different issue TBH: I can store a sleepable BPF
> program in a prog array and run it from a non sleepable context. I don't
> need the patch at all as bpf_tail_call() is normally declared. I assume
> your suggestion to change bpf_prog_map_compatible() will fix that part.
>
> I'll digg some more tomorrow.
>

Quick update:
forcing the prog array to only contain sleepable programs or not seems
to do the trick, but I'm down a rabbit hole as when I return from my
trampoline, I get an invalid page fault, trying to execute NX-protected
page.

I'll report if it's because of HID-BPF or if there are more work to be
doing for bpf_tail_call (which I suspect).

Cheers,
Benjamin

2024-04-24 19:45:29

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On Wed, Apr 24, 2024 at 7:17 AM Benjamin Tissoires <[email protected]> wrote:
>
> On Apr 22 2024, Benjamin Tissoires wrote:
> > On Apr 22 2024, Daniel Borkmann wrote:
> > > On 4/22/24 9:16 AM, Benjamin Tissoires wrote:
> > > > Arrays of progs are underlying using regular arrays, but they can only
> > > > be updated from a syscall.
> > > > Therefore, they should be safe to use while in a sleepable context.
> > > >
> > > > This is required to be able to call bpf_tail_call() from a sleepable
> > > > tracing bpf program.
> > > >
> > > > Signed-off-by: Benjamin Tissoires <[email protected]>
> > > > ---
> > > > Hi,
> > > >
> > > > a small patch to allow to have:
> > > >
> > > > ```
> > > > SEC("fmod_ret.s/__hid_bpf_tail_call_sleepable")
> > > > int BPF_PROG(hid_tail_call_sleepable, struct hid_bpf_ctx *hctx)
> > > > {
> > > > bpf_tail_call(ctx, &hid_jmp_table, hctx->index);
> > > >
> > > > return 0;
> > > > }
> > > > ```
> > > >
> > > > This should allow me to add bpf hooks to functions that communicate with
> > > > the hardware.
> > >
> > > Could you also add selftests to it? In particular, I'm thinking that this is not
> > > sufficient given also bpf_prog_map_compatible() needs to be extended to check on
> > > prog->sleepable. For example we would need to disallow calling sleepable programs
> > > in that map from non-sleepable context.
> >
> > Just to be sure, if I have to change bpf_prog_map_compatible(), that
> > means that a prog array map can only have sleepable or non-sleepable
> > programs, but not both at the same time?
> >
> > FWIW, indeed, I just tested and the BPF verifier/core is happy with this
> > patch only if the bpf_tail_call is issued from a non-sleepable context
> > (and crashes as expected).
> >
> > But that seems to be a different issue TBH: I can store a sleepable BPF
> > program in a prog array and run it from a non sleepable context. I don't
> > need the patch at all as bpf_tail_call() is normally declared. I assume
> > your suggestion to change bpf_prog_map_compatible() will fix that part.
> >
> > I'll digg some more tomorrow.
> >
>
> Quick update:
> forcing the prog array to only contain sleepable programs or not seems
> to do the trick, but I'm down a rabbit hole as when I return from my
> trampoline, I get an invalid page fault, trying to execute NX-protected
> page.
>
> I'll report if it's because of HID-BPF or if there are more work to be
> doing for bpf_tail_call (which I suspect).

bpf_tail_call is an old mechanism.
Instead of making it work for sleepable (which is ok to do)
have you considered using "freplace" logic to "add bpf hooks to functions" ?
You can have a global noinline function and replace it at run-time
with another bpf program.
Like:
__attribute__ ((noinline))
int get_constant(long val)
{
return val - 122;
}

in progs/test_pkt_access.c

is replaced with progs/freplace_get_constant.c

With freplace you can pass normal arguments, do the call and get
return value, while with bpf_tail_call it's ctx only and no return.

2024-04-30 10:05:37

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On Apr 24 2024, Alexei Starovoitov wrote:
> On Wed, Apr 24, 2024 at 7:17 AM Benjamin Tissoires <[email protected]> wrote:
> >
> > On Apr 22 2024, Benjamin Tissoires wrote:
> > > On Apr 22 2024, Daniel Borkmann wrote:
> > > > On 4/22/24 9:16 AM, Benjamin Tissoires wrote:
> > > > > Arrays of progs are underlying using regular arrays, but they can only
> > > > > be updated from a syscall.
> > > > > Therefore, they should be safe to use while in a sleepable context.
> > > > >
> > > > > This is required to be able to call bpf_tail_call() from a sleepable
> > > > > tracing bpf program.
> > > > >
> > > > > Signed-off-by: Benjamin Tissoires <[email protected]>
> > > > > ---
> > > > > Hi,
> > > > >
> > > > > a small patch to allow to have:
> > > > >
> > > > > ```
> > > > > SEC("fmod_ret.s/__hid_bpf_tail_call_sleepable")
> > > > > int BPF_PROG(hid_tail_call_sleepable, struct hid_bpf_ctx *hctx)
> > > > > {
> > > > > bpf_tail_call(ctx, &hid_jmp_table, hctx->index);
> > > > >
> > > > > return 0;
> > > > > }
> > > > > ```
> > > > >
> > > > > This should allow me to add bpf hooks to functions that communicate with
> > > > > the hardware.
> > > >
> > > > Could you also add selftests to it? In particular, I'm thinking that this is not
> > > > sufficient given also bpf_prog_map_compatible() needs to be extended to check on
> > > > prog->sleepable. For example we would need to disallow calling sleepable programs
> > > > in that map from non-sleepable context.
> > >
> > > Just to be sure, if I have to change bpf_prog_map_compatible(), that
> > > means that a prog array map can only have sleepable or non-sleepable
> > > programs, but not both at the same time?
> > >
> > > FWIW, indeed, I just tested and the BPF verifier/core is happy with this
> > > patch only if the bpf_tail_call is issued from a non-sleepable context
> > > (and crashes as expected).
> > >
> > > But that seems to be a different issue TBH: I can store a sleepable BPF
> > > program in a prog array and run it from a non sleepable context. I don't
> > > need the patch at all as bpf_tail_call() is normally declared. I assume
> > > your suggestion to change bpf_prog_map_compatible() will fix that part.
> > >
> > > I'll digg some more tomorrow.
> > >
> >
> > Quick update:
> > forcing the prog array to only contain sleepable programs or not seems
> > to do the trick, but I'm down a rabbit hole as when I return from my
> > trampoline, I get an invalid page fault, trying to execute NX-protected
> > page.
> >
> > I'll report if it's because of HID-BPF or if there are more work to be
> > doing for bpf_tail_call (which I suspect).
>
> bpf_tail_call is an old mechanism.
> Instead of making it work for sleepable (which is ok to do)
> have you considered using "freplace" logic to "add bpf hooks to functions" ?
> You can have a global noinline function and replace it at run-time
> with another bpf program.
> Like:
> __attribute__ ((noinline))
> int get_constant(long val)
> {
> return val - 122;
> }
>
> in progs/test_pkt_access.c
>
> is replaced with progs/freplace_get_constant.c
>
> With freplace you can pass normal arguments, do the call and get
> return value, while with bpf_tail_call it's ctx only and no return.

This is interesting. Thanks!

However, I'm not sure that this would fit for my use case.

Basically, what I am doing is storing a list of bpf program I want to
run on a particular device for a given function.

Right now, what I am doing is (in simplified pseudo code):
- in a bpf program, the user calls hid_bpf_attach_prog(hid_device, program_fd)
where program fd is a tracing program on a never executed function
but this allows to know the type of program to run
- the kernel stores that program into a dedicated prog array bpf_map
pre-loaded at boot time
- when a event comes in, the kernel walks through the list of attached
programs, calls __hid_bpf_tail_call() and there is a tracing program
attached to it that just do the bpf_tail_call.

This works and is simple enough from the user point of view, but is
rather inefficient and clunky from the kernel point of view IMO.

The freplace mechnism would definitely work if I had a tracing-like
function to call, where I need to run the program any time the function
gets called. But given that I want per-device filtering, I'm not sure
how I could make this work. But given that I need to enable or not the
bpf_program, I'm not sure how I could make it work from the kernel point
of view.

I tried using a simple bpf_prog_run() (which is exactly what I need in
the end) but I couldn't really convince the bpf verifier that the
provided context is a struct hid_bpf_ctx kernel pointer, and it felt not
quite right.

So after seeing how the bpf_wq worked internally, and how simple it is
now to call a bpf program from the kernel as a simple function call, I
played around with allowing kfunc to declare async callback functions.

I have a working prototype (probably not fully functional for all of the
cases), but I would like to know if you think it would be interesting to
have 3 new suffixes:
- "__async" for declaring an static bpf program that can be stored in
the kernel and which would be non sleepable
- "__s_async" same as before, but for sleepable operations
- "__aux" (or "__prog_aux") for that extra parameter to
bpf_wq_set_callback_impl() which contains the struct bpf_prog*.

(I still don't have the __aux yet FWIW)

The way I'm doing it is looking at the btf information to fetch the
signature of the parameters of the callback, this way we can declare any
callback without having to teach the verifier of is arguments (5 max).

Is this something you would be comfortable with or is there a simpler
mechanism already in place to call the bpf programs from the kernel
without the ctx limitations?

I can also easily switch the bpf_wq specific cases in the verifier with
those suffixes. There are still one or two wq specifics I haven't
implemented through __s_async, but that would still makes things more
generic.

Cheers,
Benjamin

2024-05-07 00:27:32

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On Tue, Apr 30, 2024 at 3:03 AM Benjamin Tissoires <[email protected]> wrote:
>
>
> Right now, what I am doing is (in simplified pseudo code):
> - in a bpf program, the user calls hid_bpf_attach_prog(hid_device, program_fd)
> where program fd is a tracing program on a never executed function
> but this allows to know the type of program to run
> - the kernel stores that program into a dedicated prog array bpf_map
> pre-loaded at boot time
> - when a event comes in, the kernel walks through the list of attached
> programs, calls __hid_bpf_tail_call() and there is a tracing program
> attached to it that just do the bpf_tail_call.
>
> This works and is simple enough from the user point of view, but is
> rather inefficient and clunky from the kernel point of view IMO.
>
> The freplace mechnism would definitely work if I had a tracing-like
> function to call, where I need to run the program any time the function
> gets called. But given that I want per-device filtering, I'm not sure
> how I could make this work. But given that I need to enable or not the
> bpf_program, I'm not sure how I could make it work from the kernel point
> of view.
>
> I tried using a simple bpf_prog_run() (which is exactly what I need in
> the end) but I couldn't really convince the bpf verifier that the
> provided context is a struct hid_bpf_ctx kernel pointer, and it felt not
> quite right.
>
> So after seeing how the bpf_wq worked internally, and how simple it is
> now to call a bpf program from the kernel as a simple function call, I
> played around with allowing kfunc to declare async callback functions.
>
> I have a working prototype (probably not fully functional for all of the
> cases), but I would like to know if you think it would be interesting to
> have 3 new suffixes:
> - "__async" for declaring an static bpf program that can be stored in
> the kernel and which would be non sleepable
> - "__s_async" same as before, but for sleepable operations
> - "__aux" (or "__prog_aux") for that extra parameter to
> bpf_wq_set_callback_impl() which contains the struct bpf_prog*.

Sorry for the delay. I've been traveling.

I don't quite understand how these suffixes will work.
You mean arguments of kfuncs to tell kfunc that an argument
is a pointer to async callback?
Sort-of generalization of is_async_callback_calling_kfunc() ?
I cannot connect the dots.

Feels dangerous to open up bpf prog calling to arbitrary kfuncs.
wq/timer/others are calling progs with:
callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
I feel we'll struggle to code review kfuncs that do such things.
Plus all prog life time management concerns.
wq/timer do careful bpf_prog_put/get dance.

> (I still don't have the __aux yet FWIW)
>
> The way I'm doing it is looking at the btf information to fetch the
> signature of the parameters of the callback, this way we can declare any
> callback without having to teach the verifier of is arguments (5 max).
>
> Is this something you would be comfortable with or is there a simpler
> mechanism already in place to call the bpf programs from the kernel
> without the ctx limitations?

"ctx limitations" you mean 5 args?
Have you looked at struct_ops ?
It can have any number of args.
Maybe declare struct_ops in hid and let user space provide struct_ops progs?

> I can also easily switch the bpf_wq specific cases in the verifier with
> those suffixes. There are still one or two wq specifics I haven't
> implemented through __s_async, but that would still makes things more
> generic.
>
> Cheers,
> Benjamin

2024-05-07 13:45:28

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On May 06 2024, Alexei Starovoitov wrote:
> On Tue, Apr 30, 2024 at 3:03 AM Benjamin Tissoires <[email protected]> wrote:
> >
> >
> > Right now, what I am doing is (in simplified pseudo code):
> > - in a bpf program, the user calls hid_bpf_attach_prog(hid_device, program_fd)
> > where program fd is a tracing program on a never executed function
> > but this allows to know the type of program to run
> > - the kernel stores that program into a dedicated prog array bpf_map
> > pre-loaded at boot time
> > - when a event comes in, the kernel walks through the list of attached
> > programs, calls __hid_bpf_tail_call() and there is a tracing program
> > attached to it that just do the bpf_tail_call.
> >
> > This works and is simple enough from the user point of view, but is
> > rather inefficient and clunky from the kernel point of view IMO.
> >
> > The freplace mechnism would definitely work if I had a tracing-like
> > function to call, where I need to run the program any time the function
> > gets called. But given that I want per-device filtering, I'm not sure
> > how I could make this work. But given that I need to enable or not the
> > bpf_program, I'm not sure how I could make it work from the kernel point
> > of view.
> >
> > I tried using a simple bpf_prog_run() (which is exactly what I need in
> > the end) but I couldn't really convince the bpf verifier that the
> > provided context is a struct hid_bpf_ctx kernel pointer, and it felt not
> > quite right.
> >
> > So after seeing how the bpf_wq worked internally, and how simple it is
> > now to call a bpf program from the kernel as a simple function call, I
> > played around with allowing kfunc to declare async callback functions.
> >
> > I have a working prototype (probably not fully functional for all of the
> > cases), but I would like to know if you think it would be interesting to
> > have 3 new suffixes:
> > - "__async" for declaring an static bpf program that can be stored in
> > the kernel and which would be non sleepable
> > - "__s_async" same as before, but for sleepable operations
> > - "__aux" (or "__prog_aux") for that extra parameter to
> > bpf_wq_set_callback_impl() which contains the struct bpf_prog*.
>
> Sorry for the delay. I've been traveling.

no worries. I'll be off myself the next few days too :)

>
> I don't quite understand how these suffixes will work.
> You mean arguments of kfuncs to tell kfunc that an argument
> is a pointer to async callback?
> Sort-of generalization of is_async_callback_calling_kfunc() ?
> I cannot connect the dots.

Yes, exactly that. See [0] for my current WIP. I've just sent it, not
for reviews, but so you see what I meant here.

>
> Feels dangerous to open up bpf prog calling to arbitrary kfuncs.
> wq/timer/others are calling progs with:
> callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
> I feel we'll struggle to code review kfuncs that do such things.
> Plus all prog life time management concerns.
> wq/timer do careful bpf_prog_put/get dance.

That part is indeed painful.

>
> > (I still don't have the __aux yet FWIW)
> >
> > The way I'm doing it is looking at the btf information to fetch the
> > signature of the parameters of the callback, this way we can declare any
> > callback without having to teach the verifier of is arguments (5 max).
> >
> > Is this something you would be comfortable with or is there a simpler
> > mechanism already in place to call the bpf programs from the kernel
> > without the ctx limitations?
>
> "ctx limitations" you mean 5 args?

No, I was meaning the special context argument for bpf_prog_run. But
anyway, this is relatively moot if struct_ops that you mention below is
working :)

> Have you looked at struct_ops ?
> It can have any number of args.
> Maybe declare struct_ops in hid and let user space provide struct_ops progs?

Last time I checked, I thought struct_ops were only for defining one set
of operations. And you could overwrite them exactly once.
But after reading more carefully how it was used in tcp_cong.c, it seems
we can have multiple programs which define the same struct_ops, and then
it's the kernel which will choose which one needs to be run.

AFAICT there is no sleepable context concerns as well, it depends on how
the kernel starts the various ops.

Last, I'm not entirely sure how I can specify which struct_ops needs to be
attached to which device, but it's worth a shot. I've already realized
that I would probably have to drop the current way of HID-BPF is running,
so now it's just technical bits to assemble :)

>
> > I can also easily switch the bpf_wq specific cases in the verifier with
> > those suffixes. There are still one or two wq specifics I haven't
> > implemented through __s_async, but that would still makes things more
> > generic.
> >

Cheers,
Benjamin

[0] https://lore.kernel.org/bpf/[email protected]/T/#t

2024-05-08 02:03:37

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On Tue, May 7, 2024 at 6:32 AM Benjamin Tissoires <[email protected]> wrote:
>
> Yes, exactly that. See [0] for my current WIP. I've just sent it, not
> for reviews, but so you see what I meant here.

The patches helped to understand, for sure, and on surface
they kind of make sense, but without seeing what is that
hid specific kfunc that will use it
it's hard to make a call.
The (u64)(long) casting concerns and prog lifetime are
difficult to get right. The verifier won't help and it all
will fall on code reviews.
So I'd rather not go this route.
Let's explore first what exactly the goal here.
We've talked about sleepable tail_calls, this async callbacks
from hid kfuncs, and struct-ops.
Maybe none of them fit well and we need something else.
Could you please explain (maybe once again) what is the end goal?

> Last time I checked, I thought struct_ops were only for defining one set
> of operations. And you could overwrite them exactly once.
> But after reading more carefully how it was used in tcp_cong.c, it seems
> we can have multiple programs which define the same struct_ops, and then
> it's the kernel which will choose which one needs to be run.

struct-ops is pretty much a mechanism for kernel to define
a set of callbacks and bpf prog to provide implementation for
these callbacks. The kernel choses when to call them.
tcp-bpf is one such user. sched_ext is another and more advanced.
Currently struct-ops bpf prog loading/attaching mechanism
only specifies the struct-ops. There is no device-id argument,
but that can be extended and kernel can keep per-device a set
of bpf progs.
struct-ops is a bit of overkill if you have only one callback.
It's typically for a set of callbacks.

> Last, I'm not entirely sure how I can specify which struct_ops needs to be
> attached to which device, but it's worth a shot. I've already realized
> that I would probably have to drop the current way of HID-BPF is running,
> so now it's just technical bits to assemble :)

You need to call different bpf progs per device, right?
If indirect call is fine from performance pov,
then tailcall or struct_ops+device_argument might fit.

If you want max perf with direct calls then
we'd need to generalize xdp dispatcher.

So far it sounds that tailcalls might be the best actually,
since prog lifetime is handled by prog array map.
Maybe instead of bpf_tail_call helper we should add a kfunc that
will operate on prog array differently?
(if current bpf_tail_call semantics don't fit).

2024-05-08 12:03:55

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On May 07 2024, Alexei Starovoitov wrote:
> On Tue, May 7, 2024 at 6:32 AM Benjamin Tissoires <[email protected]> wrote:
> >
> > Yes, exactly that. See [0] for my current WIP. I've just sent it, not
> > for reviews, but so you see what I meant here.
>
> The patches helped to understand, for sure, and on surface
> they kind of make sense, but without seeing what is that
> hid specific kfunc that will use it
> it's hard to make a call.

I've posted my HID WIP on [1]. It probably won't compile as my local
original branch was having a merge of HID and bpf trees.

> The (u64)(long) casting concerns and prog lifetime are
> difficult to get right. The verifier won't help and it all
> will fall on code reviews.

yeah, this is a concern.

> So I'd rather not go this route.
> Let's explore first what exactly the goal here.
> We've talked about sleepable tail_calls, this async callbacks
> from hid kfuncs, and struct-ops.
> Maybe none of them fit well and we need something else.
> Could you please explain (maybe once again) what is the end goal?

right now I need 4 hooks in HID, the first 2 are already upstream:
- whenever I need to retrieve the report descriptor (this happens in a
sleepable context, but non sleepable is fine)
- whenever I receive an event from a device (non sleepable context, this
is coming from a hard IRQ context)
- whenever someone tries to write to the device through
hid_hw_raw_request (original context is sleepable, and for being able
to communicate with the device we need sleepable context in bpf)
- same but from hid_hw_output_report

Again, the first 2 are working just fine.

Implementing the latter twos requires sleepable context because we
might:

1. a request is made from user-space
2. we jump into hid-bpf
3. the bpf program "converts" the request from report ID 1 to 2 (because
we export a slightly different API)
4. the bpf program directly emits hid_bpf_raw_request (sleepable
operation)
5. the bpf program returns the correct value
6. hid-core doesn't attempt to communicate with the device as bpf
already did.

In the series, I also realized that I need sleepable and non sleepable
contexts for this kind of situation, because I want tracing and
firewalling available (non sleepable context), while still allowing to
communicate with the device. But when you communicate with the device
from bpf, the sleepable bpf program is not invoked or this allows
infinite loops.

>
> > Last time I checked, I thought struct_ops were only for defining one set
> > of operations. And you could overwrite them exactly once.
> > But after reading more carefully how it was used in tcp_cong.c, it seems
> > we can have multiple programs which define the same struct_ops, and then
> > it's the kernel which will choose which one needs to be run.
>
> struct-ops is pretty much a mechanism for kernel to define
> a set of callbacks and bpf prog to provide implementation for
> these callbacks. The kernel choses when to call them.
> tcp-bpf is one such user. sched_ext is another and more advanced.
> Currently struct-ops bpf prog loading/attaching mechanism
> only specifies the struct-ops. There is no device-id argument,
> but that can be extended and kernel can keep per-device a set
> of bpf progs.
> struct-ops is a bit of overkill if you have only one callback.
> It's typically for a set of callbacks.

In the end I have 4. However, I might have programs that overwrite twice
the same callback (see the 2 SEC("fmod_ret/hid_bpf_device_event") in
[2]).

>
> > Last, I'm not entirely sure how I can specify which struct_ops needs to be
> > attached to which device, but it's worth a shot. I've already realized
> > that I would probably have to drop the current way of HID-BPF is running,
> > so now it's just technical bits to assemble :)
>
> You need to call different bpf progs per device, right?

yes

> If indirect call is fine from performance pov,
> then tailcall or struct_ops+device_argument might fit.

performance is not a requirement. It's better if we have low latency but
we are not talking the same requirements than network.

>
> If you want max perf with direct calls then
> we'd need to generalize xdp dispatcher.

I'll need to have a deeper look at it, yeah.

>
> So far it sounds that tailcalls might be the best actually,
> since prog lifetime is handled by prog array map.
> Maybe instead of bpf_tail_call helper we should add a kfunc that
> will operate on prog array differently?
> (if current bpf_tail_call semantics don't fit).

Actually I'd like to remove bpf_tail_call entirely, because it requires
to pre-load a BPF program at boot, and in some situations (RHEL) this
creates issues. I haven't been able to debug what was happening, I
couldn't reproduce it myself, but removing that bit would be nice :)

Cheers,
Benjamin

[1] https://lore.kernel.org/bpf/[email protected]/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git/tree/drivers/hid/bpf/progs/XPPen__ArtistPro16Gen2.bpf.c?h=for-next

2024-05-08 22:25:57

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On Wed, May 8, 2024 at 4:53 AM Benjamin Tissoires <[email protected]> wrote:
>
> On May 07 2024, Alexei Starovoitov wrote:
> > On Tue, May 7, 2024 at 6:32 AM Benjamin Tissoires <[email protected]> wrote:
> > >
> > > Yes, exactly that. See [0] for my current WIP. I've just sent it, not
> > > for reviews, but so you see what I meant here.
> >
> > The patches helped to understand, for sure, and on surface
> > they kind of make sense, but without seeing what is that
> > hid specific kfunc that will use it
> > it's hard to make a call.
>
> I've posted my HID WIP on [1]. It probably won't compile as my local
> original branch was having a merge of HID and bpf trees.

Thanks for this context.
Now it makes a lot more sense.
And the first patches look fine and simplification is impressive,
but this part:
+ SEC("syscall")
+ int name(struct attach_prog_args *ctx)
+ {
+ ctx->retval = hid_bpf_attach_prog_impl(ctx->hid,
+ type,
+ __##name,
+ ctx->insert_head ? HID_BPF_FLAG_INSERT_HEAD :
+ HID_BPF_FLAG_NONE,
+ NULL);

is too odd.
Essentially you're adding a kfunc on hid side just to call it
from a temporary "syscall" program to register another prog.
A fake prog just to call a kfunc is a bit too much.

The overall mechanism is pretty much a customized struct-ops.

I think struct-ops infra provides better api-s, safety guarantees,
bpf_link support, prog management including reentrance check, etc.
It needs to be parametrized, so it's not just
SEC("struct_ops/kern_callback_name")
so that the skeleton loading phase can pass device id or something.

> > The (u64)(long) casting concerns and prog lifetime are
> > difficult to get right. The verifier won't help and it all
> > will fall on code reviews.
>
> yeah, this is a concern.

Not only that. The special kfunc does migrate_disable
before calling callback, but it needs rcu_lock or tracing lock,
plus reentrance checks.

>
> > So I'd rather not go this route.
> > Let's explore first what exactly the goal here.
> > We've talked about sleepable tail_calls, this async callbacks
> > from hid kfuncs, and struct-ops.
> > Maybe none of them fit well and we need something else.
> > Could you please explain (maybe once again) what is the end goal?
>
> right now I need 4 hooks in HID, the first 2 are already upstream:
> - whenever I need to retrieve the report descriptor (this happens in a
> sleepable context, but non sleepable is fine)
> - whenever I receive an event from a device (non sleepable context, this
> is coming from a hard IRQ context)
> - whenever someone tries to write to the device through
> hid_hw_raw_request (original context is sleepable, and for being able
> to communicate with the device we need sleepable context in bpf)
> - same but from hid_hw_output_report
>
> Again, the first 2 are working just fine.
>
> Implementing the latter twos requires sleepable context because we
> might:
>
> 1. a request is made from user-space
> 2. we jump into hid-bpf
> 3. the bpf program "converts" the request from report ID 1 to 2 (because
> we export a slightly different API)
> 4. the bpf program directly emits hid_bpf_raw_request (sleepable
> operation)
> 5. the bpf program returns the correct value
> 6. hid-core doesn't attempt to communicate with the device as bpf
> already did.
>
> In the series, I also realized that I need sleepable and non sleepable
> contexts for this kind of situation, because I want tracing and
> firewalling available (non sleepable context), while still allowing to
> communicate with the device. But when you communicate with the device
> from bpf, the sleepable bpf program is not invoked or this allows
> infinite loops.

I don't get the point about infinite loops.
fyi struct_ops already supports sleepable and non-sleepable callbacks.
See progs/dummy_st_ops_success.c
SEC(".struct_ops")
struct bpf_dummy_ops dummy_1 = {
.test_1 = (void *)test_1,
.test_2 = (void *)test_2,
.test_sleepable = (void *)test_sleepable,
};

two callbacks are normal and another one is sleepable.

The generated bpf trampoline will have the right
__bpf_prog_enter* wrappers for all 3 progs,
so the kernel code will be just do ops->callback_name().

> >
> > > Last time I checked, I thought struct_ops were only for defining one set
> > > of operations. And you could overwrite them exactly once.
> > > But after reading more carefully how it was used in tcp_cong.c, it seems
> > > we can have multiple programs which define the same struct_ops, and then
> > > it's the kernel which will choose which one needs to be run.
> >
> > struct-ops is pretty much a mechanism for kernel to define
> > a set of callbacks and bpf prog to provide implementation for
> > these callbacks. The kernel choses when to call them.
> > tcp-bpf is one such user. sched_ext is another and more advanced.
> > Currently struct-ops bpf prog loading/attaching mechanism
> > only specifies the struct-ops. There is no device-id argument,
> > but that can be extended and kernel can keep per-device a set
> > of bpf progs.
> > struct-ops is a bit of overkill if you have only one callback.
> > It's typically for a set of callbacks.
>
> In the end I have 4. However, I might have programs that overwrite twice
> the same callback (see the 2 SEC("fmod_ret/hid_bpf_device_event") in
> [2]).
>
> >
> > > Last, I'm not entirely sure how I can specify which struct_ops needs to be
> > > attached to which device, but it's worth a shot. I've already realized
> > > that I would probably have to drop the current way of HID-BPF is running,
> > > so now it's just technical bits to assemble :)
> >
> > You need to call different bpf progs per device, right?
>
> yes
>
> > If indirect call is fine from performance pov,
> > then tailcall or struct_ops+device_argument might fit.
>
> performance is not a requirement. It's better if we have low latency but
> we are not talking the same requirements than network.
>
> >
> > If you want max perf with direct calls then
> > we'd need to generalize xdp dispatcher.
>
> I'll need to have a deeper look at it, yeah.
>
> >
> > So far it sounds that tailcalls might be the best actually,
> > since prog lifetime is handled by prog array map.
> > Maybe instead of bpf_tail_call helper we should add a kfunc that
> > will operate on prog array differently?
> > (if current bpf_tail_call semantics don't fit).
>
> Actually I'd like to remove bpf_tail_call entirely, because it requires
> to pre-load a BPF program at boot, and in some situations (RHEL) this
> creates issues. I haven't been able to debug what was happening, I
> couldn't reproduce it myself, but removing that bit would be nice :)

We probably need to debug it anyway, since it sounds that it's
related to preloaded bpf skeleton and not tail_call logic itself.

After looking through all that it seems to me that
parametrized struct-ops is the way to go.

2024-05-10 10:04:31

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On May 08 2024, Alexei Starovoitov wrote:
> On Wed, May 8, 2024 at 4:53 AM Benjamin Tissoires <[email protected]> wrote:
> >
> > On May 07 2024, Alexei Starovoitov wrote:
> > > On Tue, May 7, 2024 at 6:32 AM Benjamin Tissoires <[email protected]> wrote:
> > > >
> > > > Yes, exactly that. See [0] for my current WIP. I've just sent it, not
> > > > for reviews, but so you see what I meant here.
> > >
> > > The patches helped to understand, for sure, and on surface
> > > they kind of make sense, but without seeing what is that
> > > hid specific kfunc that will use it
> > > it's hard to make a call.
> >
> > I've posted my HID WIP on [1]. It probably won't compile as my local
> > original branch was having a merge of HID and bpf trees.
>
> Thanks for this context.
> Now it makes a lot more sense.
> And the first patches look fine and simplification is impressive,
> but this part:
> + SEC("syscall")
> + int name(struct attach_prog_args *ctx)
> + {
> + ctx->retval = hid_bpf_attach_prog_impl(ctx->hid,
> + type,
> + __##name,
> + ctx->insert_head ? HID_BPF_FLAG_INSERT_HEAD :
> + HID_BPF_FLAG_NONE,
> + NULL);
>
> is too odd.
> Essentially you're adding a kfunc on hid side just to call it
> from a temporary "syscall" program to register another prog.
> A fake prog just to call a kfunc is a bit too much.
>
> The overall mechanism is pretty much a customized struct-ops.
>
> I think struct-ops infra provides better api-s, safety guarantees,
> bpf_link support, prog management including reentrance check, etc.

Ack!

> It needs to be parametrized, so it's not just
> SEC("struct_ops/kern_callback_name")
> so that the skeleton loading phase can pass device id or something.

I'm not sure how to parametrize staticaly. I can rely on the modalias,
but then that might not be ideal. Right now my loader gets called by a
udev rule, and then call a .probe syscall. If this returns success, then
the bpf programs are attached to the given hid device.

I saw that the struct_ops can have "data" fields. If we can change the
static value before calling attach, I should have a register callback
being able to retrieve that hid id, and then attach the struct_ops to
the correct hid device in the jump table. Anyway, I'll test this later.

Something along:

SEC(".struct_ops")
struct hid_bpf_ops dummy_1 = {
.input_event = (void *)test_1,
.rdesc_fixup = (void *)test_2,
.hw_raw_request = (void *)test_sleepable,
.hid_id = 0,
};

And in the loader, I call __load(), change the value ->hid_id, and then
__attach().

>
> > > The (u64)(long) casting concerns and prog lifetime are
> > > difficult to get right. The verifier won't help and it all
> > > will fall on code reviews.
> >
> > yeah, this is a concern.
>
> Not only that. The special kfunc does migrate_disable
> before calling callback, but it needs rcu_lock or tracing lock,
> plus reentrance checks.
>
> >
> > > So I'd rather not go this route.
> > > Let's explore first what exactly the goal here.
> > > We've talked about sleepable tail_calls, this async callbacks
> > > from hid kfuncs, and struct-ops.
> > > Maybe none of them fit well and we need something else.
> > > Could you please explain (maybe once again) what is the end goal?
> >
> > right now I need 4 hooks in HID, the first 2 are already upstream:
> > - whenever I need to retrieve the report descriptor (this happens in a
> > sleepable context, but non sleepable is fine)
> > - whenever I receive an event from a device (non sleepable context, this
> > is coming from a hard IRQ context)
> > - whenever someone tries to write to the device through
> > hid_hw_raw_request (original context is sleepable, and for being able
> > to communicate with the device we need sleepable context in bpf)
> > - same but from hid_hw_output_report
> >
> > Again, the first 2 are working just fine.
> >
> > Implementing the latter twos requires sleepable context because we
> > might:
> >
> > 1. a request is made from user-space
> > 2. we jump into hid-bpf
> > 3. the bpf program "converts" the request from report ID 1 to 2 (because
> > we export a slightly different API)
> > 4. the bpf program directly emits hid_bpf_raw_request (sleepable
> > operation)
> > 5. the bpf program returns the correct value
> > 6. hid-core doesn't attempt to communicate with the device as bpf
> > already did.
> >
> > In the series, I also realized that I need sleepable and non sleepable
> > contexts for this kind of situation, because I want tracing and
> > firewalling available (non sleepable context), while still allowing to
> > communicate with the device. But when you communicate with the device
> > from bpf, the sleepable bpf program is not invoked or this allows
> > infinite loops.
>
> I don't get the point about infinite loops.

If I don´t put restrictions on how the bpf program communicate with the
device I might have:

1. someone calls hid_hw_raw_request from hidraw
2. bpf jumps into filter for hid_hw_raw_request
3. the bpf program calls hid_bpf_raw_request (which internally calls
hid_hw_raw_request)
4. go back to 2.

But again, not a big deal: if I do not allow entering a sleepable bpf
program from hid_bpf_raw_request (so from a bpf program), instead of 4.
above, we prevent entering the same bpf program as the program in 2.
needs to be sleepable.

> fyi struct_ops already supports sleepable and non-sleepable callbacks.
> See progs/dummy_st_ops_success.c
> SEC(".struct_ops")
> struct bpf_dummy_ops dummy_1 = {
> .test_1 = (void *)test_1,
> .test_2 = (void *)test_2,
> .test_sleepable = (void *)test_sleepable,
> };
>
> two callbacks are normal and another one is sleepable.
>
> The generated bpf trampoline will have the right
> __bpf_prog_enter* wrappers for all 3 progs,
> so the kernel code will be just do ops->callback_name().

Great!

So I think I have most of the pieces available... I just need to write
the code :)

>
> > >
> > > > Last time I checked, I thought struct_ops were only for defining one set
> > > > of operations. And you could overwrite them exactly once.
> > > > But after reading more carefully how it was used in tcp_cong.c, it seems
> > > > we can have multiple programs which define the same struct_ops, and then
> > > > it's the kernel which will choose which one needs to be run.
> > >
> > > struct-ops is pretty much a mechanism for kernel to define
> > > a set of callbacks and bpf prog to provide implementation for
> > > these callbacks. The kernel choses when to call them.
> > > tcp-bpf is one such user. sched_ext is another and more advanced.
> > > Currently struct-ops bpf prog loading/attaching mechanism
> > > only specifies the struct-ops. There is no device-id argument,
> > > but that can be extended and kernel can keep per-device a set
> > > of bpf progs.
> > > struct-ops is a bit of overkill if you have only one callback.
> > > It's typically for a set of callbacks.
> >
> > In the end I have 4. However, I might have programs that overwrite twice
> > the same callback (see the 2 SEC("fmod_ret/hid_bpf_device_event") in
> > [2]).
> >
> > >
> > > > Last, I'm not entirely sure how I can specify which struct_ops needs to be
> > > > attached to which device, but it's worth a shot. I've already realized
> > > > that I would probably have to drop the current way of HID-BPF is running,
> > > > so now it's just technical bits to assemble :)
> > >
> > > You need to call different bpf progs per device, right?
> >
> > yes
> >
> > > If indirect call is fine from performance pov,
> > > then tailcall or struct_ops+device_argument might fit.
> >
> > performance is not a requirement. It's better if we have low latency but
> > we are not talking the same requirements than network.
> >
> > >
> > > If you want max perf with direct calls then
> > > we'd need to generalize xdp dispatcher.
> >
> > I'll need to have a deeper look at it, yeah.
> >
> > >
> > > So far it sounds that tailcalls might be the best actually,
> > > since prog lifetime is handled by prog array map.
> > > Maybe instead of bpf_tail_call helper we should add a kfunc that
> > > will operate on prog array differently?
> > > (if current bpf_tail_call semantics don't fit).
> >
> > Actually I'd like to remove bpf_tail_call entirely, because it requires
> > to pre-load a BPF program at boot, and in some situations (RHEL) this
> > creates issues. I haven't been able to debug what was happening, I
> > couldn't reproduce it myself, but removing that bit would be nice :)
>
> We probably need to debug it anyway, since it sounds that it's
> related to preloaded bpf skeleton and not tail_call logic itself.

I really think this happens with RHEL because some part are backported and
some are not. So unless we get upstream reports, it's more likely a RHEL
only issue (which makes things even harder to debug for me).

>
> After looking through all that it seems to me that
> parametrized struct-ops is the way to go.

Yeah, I think so too. I'll proabbly start working on it next Monday.

Thanks a lot for all of your feedback :)

Cheers,
Benjamin

2024-05-10 14:23:39

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On Fri, May 10, 2024 at 3:04 AM Benjamin Tissoires <[email protected]> wrote:
>
> On May 08 2024, Alexei Starovoitov wrote:
> > On Wed, May 8, 2024 at 4:53 AM Benjamin Tissoires <[email protected]> wrote:
> > >
> > > On May 07 2024, Alexei Starovoitov wrote:
> > > > On Tue, May 7, 2024 at 6:32 AM Benjamin Tissoires <[email protected]> wrote:
> > > > >
> > > > > Yes, exactly that. See [0] for my current WIP. I've just sent it, not
> > > > > for reviews, but so you see what I meant here.
> > > >
> > > > The patches helped to understand, for sure, and on surface
> > > > they kind of make sense, but without seeing what is that
> > > > hid specific kfunc that will use it
> > > > it's hard to make a call.
> > >
> > > I've posted my HID WIP on [1]. It probably won't compile as my local
> > > original branch was having a merge of HID and bpf trees.
> >
> > Thanks for this context.
> > Now it makes a lot more sense.
> > And the first patches look fine and simplification is impressive,
> > but this part:
> > + SEC("syscall")
> > + int name(struct attach_prog_args *ctx)
> > + {
> > + ctx->retval = hid_bpf_attach_prog_impl(ctx->hid,
> > + type,
> > + __##name,
> > + ctx->insert_head ? HID_BPF_FLAG_INSERT_HEAD :
> > + HID_BPF_FLAG_NONE,
> > + NULL);
> >
> > is too odd.
> > Essentially you're adding a kfunc on hid side just to call it
> > from a temporary "syscall" program to register another prog.
> > A fake prog just to call a kfunc is a bit too much.
> >
> > The overall mechanism is pretty much a customized struct-ops.
> >
> > I think struct-ops infra provides better api-s, safety guarantees,
> > bpf_link support, prog management including reentrance check, etc.
>
> Ack!
>
> > It needs to be parametrized, so it's not just
> > SEC("struct_ops/kern_callback_name")
> > so that the skeleton loading phase can pass device id or something.
>
> I'm not sure how to parametrize staticaly. I can rely on the modalias,
> but then that might not be ideal. Right now my loader gets called by a
> udev rule, and then call a .probe syscall. If this returns success, then
> the bpf programs are attached to the given hid device.
>
> I saw that the struct_ops can have "data" fields. If we can change the
> static value before calling attach, I should have a register callback
> being able to retrieve that hid id, and then attach the struct_ops to
> the correct hid device in the jump table. Anyway, I'll test this later.

yep.

> Something along:
>
> SEC(".struct_ops")
> struct hid_bpf_ops dummy_1 = {
> .input_event = (void *)test_1,
> .rdesc_fixup = (void *)test_2,
> .hw_raw_request = (void *)test_sleepable,
> .hid_id = 0,
> };
>
> And in the loader, I call __load(), change the value ->hid_id, and then
> __attach().

__open() then change hid_id then __load().

See prog_tests/test_struct_ops_module.c.

>
> >
> > > > The (u64)(long) casting concerns and prog lifetime are
> > > > difficult to get right. The verifier won't help and it all
> > > > will fall on code reviews.
> > >
> > > yeah, this is a concern.
> >
> > Not only that. The special kfunc does migrate_disable
> > before calling callback, but it needs rcu_lock or tracing lock,
> > plus reentrance checks.
> >
> > >
> > > > So I'd rather not go this route.
> > > > Let's explore first what exactly the goal here.
> > > > We've talked about sleepable tail_calls, this async callbacks
> > > > from hid kfuncs, and struct-ops.
> > > > Maybe none of them fit well and we need something else.
> > > > Could you please explain (maybe once again) what is the end goal?
> > >
> > > right now I need 4 hooks in HID, the first 2 are already upstream:
> > > - whenever I need to retrieve the report descriptor (this happens in a
> > > sleepable context, but non sleepable is fine)
> > > - whenever I receive an event from a device (non sleepable context, this
> > > is coming from a hard IRQ context)
> > > - whenever someone tries to write to the device through
> > > hid_hw_raw_request (original context is sleepable, and for being able
> > > to communicate with the device we need sleepable context in bpf)
> > > - same but from hid_hw_output_report
> > >
> > > Again, the first 2 are working just fine.
> > >
> > > Implementing the latter twos requires sleepable context because we
> > > might:
> > >
> > > 1. a request is made from user-space
> > > 2. we jump into hid-bpf
> > > 3. the bpf program "converts" the request from report ID 1 to 2 (because
> > > we export a slightly different API)
> > > 4. the bpf program directly emits hid_bpf_raw_request (sleepable
> > > operation)
> > > 5. the bpf program returns the correct value
> > > 6. hid-core doesn't attempt to communicate with the device as bpf
> > > already did.
> > >
> > > In the series, I also realized that I need sleepable and non sleepable
> > > contexts for this kind of situation, because I want tracing and
> > > firewalling available (non sleepable context), while still allowing to
> > > communicate with the device. But when you communicate with the device
> > > from bpf, the sleepable bpf program is not invoked or this allows
> > > infinite loops.
> >
> > I don't get the point about infinite loops.
>
> If I don´t put restrictions on how the bpf program communicate with the
> device I might have:
>
> 1. someone calls hid_hw_raw_request from hidraw
> 2. bpf jumps into filter for hid_hw_raw_request
> 3. the bpf program calls hid_bpf_raw_request (which internally calls
> hid_hw_raw_request)
> 4. go back to 2.

sleepable progs also have recursion checks.
So if it recurses into the same prog it will catch it.
Ideally, of course, such loops are prevented statically.

> But again, not a big deal: if I do not allow entering a sleepable bpf
> program from hid_bpf_raw_request (so from a bpf program), instead of 4.
> above, we prevent entering the same bpf program as the program in 2.
> needs to be sleepable.
>
> > fyi struct_ops already supports sleepable and non-sleepable callbacks.
> > See progs/dummy_st_ops_success.c
> > SEC(".struct_ops")
> > struct bpf_dummy_ops dummy_1 = {
> > .test_1 = (void *)test_1,
> > .test_2 = (void *)test_2,
> > .test_sleepable = (void *)test_sleepable,
> > };
> >
> > two callbacks are normal and another one is sleepable.
> >
> > The generated bpf trampoline will have the right
> > __bpf_prog_enter* wrappers for all 3 progs,
> > so the kernel code will be just do ops->callback_name().
>
> Great!
>
> So I think I have most of the pieces available... I just need to write
> the code :)
>
> >
> > > >
> > > > > Last time I checked, I thought struct_ops were only for defining one set
> > > > > of operations. And you could overwrite them exactly once.
> > > > > But after reading more carefully how it was used in tcp_cong.c, it seems
> > > > > we can have multiple programs which define the same struct_ops, and then
> > > > > it's the kernel which will choose which one needs to be run.
> > > >
> > > > struct-ops is pretty much a mechanism for kernel to define
> > > > a set of callbacks and bpf prog to provide implementation for
> > > > these callbacks. The kernel choses when to call them.
> > > > tcp-bpf is one such user. sched_ext is another and more advanced.
> > > > Currently struct-ops bpf prog loading/attaching mechanism
> > > > only specifies the struct-ops. There is no device-id argument,
> > > > but that can be extended and kernel can keep per-device a set
> > > > of bpf progs.
> > > > struct-ops is a bit of overkill if you have only one callback.
> > > > It's typically for a set of callbacks.
> > >
> > > In the end I have 4. However, I might have programs that overwrite twice
> > > the same callback (see the 2 SEC("fmod_ret/hid_bpf_device_event") in
> > > [2]).
> > >
> > > >
> > > > > Last, I'm not entirely sure how I can specify which struct_ops needs to be
> > > > > attached to which device, but it's worth a shot. I've already realized
> > > > > that I would probably have to drop the current way of HID-BPF is running,
> > > > > so now it's just technical bits to assemble :)
> > > >
> > > > You need to call different bpf progs per device, right?
> > >
> > > yes
> > >
> > > > If indirect call is fine from performance pov,
> > > > then tailcall or struct_ops+device_argument might fit.
> > >
> > > performance is not a requirement. It's better if we have low latency but
> > > we are not talking the same requirements than network.
> > >
> > > >
> > > > If you want max perf with direct calls then
> > > > we'd need to generalize xdp dispatcher.
> > >
> > > I'll need to have a deeper look at it, yeah.
> > >
> > > >
> > > > So far it sounds that tailcalls might be the best actually,
> > > > since prog lifetime is handled by prog array map.
> > > > Maybe instead of bpf_tail_call helper we should add a kfunc that
> > > > will operate on prog array differently?
> > > > (if current bpf_tail_call semantics don't fit).
> > >
> > > Actually I'd like to remove bpf_tail_call entirely, because it requires
> > > to pre-load a BPF program at boot, and in some situations (RHEL) this
> > > creates issues. I haven't been able to debug what was happening, I
> > > couldn't reproduce it myself, but removing that bit would be nice :)
> >
> > We probably need to debug it anyway, since it sounds that it's
> > related to preloaded bpf skeleton and not tail_call logic itself.
>
> I really think this happens with RHEL because some part are backported and
> some are not. So unless we get upstream reports, it's more likely a RHEL
> only issue (which makes things even harder to debug for me).
>
> >
> > After looking through all that it seems to me that
> > parametrized struct-ops is the way to go.
>
> Yeah, I think so too. I'll proabbly start working on it next Monday.

A bunch of us will be at lsfmmbpf next week.
So email traffic will be slow.

2024-05-28 14:26:53

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH] bpf: verifier: allow arrays of progs to be used in sleepable context

On May 10 2024, Alexei Starovoitov wrote:
> On Fri, May 10, 2024 at 3:04 AM Benjamin Tissoires <[email protected]> wrote:
> >
> > On May 08 2024, Alexei Starovoitov wrote:
> > > On Wed, May 8, 2024 at 4:53 AM Benjamin Tissoires <[email protected]> wrote:
> > > >
> > > > On May 07 2024, Alexei Starovoitov wrote:
> > > > > On Tue, May 7, 2024 at 6:32 AM Benjamin Tissoires <[email protected]> wrote:
> > > > > >
> > > > > > Yes, exactly that. See [0] for my current WIP. I've just sent it, not
> > > > > > for reviews, but so you see what I meant here.
> > > > >
> > > > > The patches helped to understand, for sure, and on surface
> > > > > they kind of make sense, but without seeing what is that
> > > > > hid specific kfunc that will use it
> > > > > it's hard to make a call.
> > > >
> > > > I've posted my HID WIP on [1]. It probably won't compile as my local
> > > > original branch was having a merge of HID and bpf trees.
> > >
> > > Thanks for this context.
> > > Now it makes a lot more sense.
> > > And the first patches look fine and simplification is impressive,
> > > but this part:
> > > + SEC("syscall")
> > > + int name(struct attach_prog_args *ctx)
> > > + {
> > > + ctx->retval = hid_bpf_attach_prog_impl(ctx->hid,
> > > + type,
> > > + __##name,
> > > + ctx->insert_head ? HID_BPF_FLAG_INSERT_HEAD :
> > > + HID_BPF_FLAG_NONE,
> > > + NULL);
> > >
> > > is too odd.
> > > Essentially you're adding a kfunc on hid side just to call it
> > > from a temporary "syscall" program to register another prog.
> > > A fake prog just to call a kfunc is a bit too much.
> > >
> > > The overall mechanism is pretty much a customized struct-ops.
> > >
> > > I think struct-ops infra provides better api-s, safety guarantees,
> > > bpf_link support, prog management including reentrance check, etc.
> >
> > Ack!
> >
> > > It needs to be parametrized, so it's not just
> > > SEC("struct_ops/kern_callback_name")
> > > so that the skeleton loading phase can pass device id or something.
> >
> > I'm not sure how to parametrize staticaly. I can rely on the modalias,
> > but then that might not be ideal. Right now my loader gets called by a
> > udev rule, and then call a .probe syscall. If this returns success, then
> > the bpf programs are attached to the given hid device.
> >
> > I saw that the struct_ops can have "data" fields. If we can change the
> > static value before calling attach, I should have a register callback
> > being able to retrieve that hid id, and then attach the struct_ops to
> > the correct hid device in the jump table. Anyway, I'll test this later.
>
> yep.
>
> > Something along:
> >
> > SEC(".struct_ops")
> > struct hid_bpf_ops dummy_1 = {
> > .input_event = (void *)test_1,
> > .rdesc_fixup = (void *)test_2,
> > .hw_raw_request = (void *)test_sleepable,
> > .hid_id = 0,
> > };
> >
> > And in the loader, I call __load(), change the value ->hid_id, and then
> > __attach().
>
> __open() then change hid_id then __load().
>
> See prog_tests/test_struct_ops_module.c.
>
> >
> > >
> > > > > The (u64)(long) casting concerns and prog lifetime are
> > > > > difficult to get right. The verifier won't help and it all
> > > > > will fall on code reviews.
> > > >
> > > > yeah, this is a concern.
> > >
> > > Not only that. The special kfunc does migrate_disable
> > > before calling callback, but it needs rcu_lock or tracing lock,
> > > plus reentrance checks.
> > >
> > > >
> > > > > So I'd rather not go this route.
> > > > > Let's explore first what exactly the goal here.
> > > > > We've talked about sleepable tail_calls, this async callbacks
> > > > > from hid kfuncs, and struct-ops.
> > > > > Maybe none of them fit well and we need something else.
> > > > > Could you please explain (maybe once again) what is the end goal?
> > > >
> > > > right now I need 4 hooks in HID, the first 2 are already upstream:
> > > > - whenever I need to retrieve the report descriptor (this happens in a
> > > > sleepable context, but non sleepable is fine)
> > > > - whenever I receive an event from a device (non sleepable context, this
> > > > is coming from a hard IRQ context)
> > > > - whenever someone tries to write to the device through
> > > > hid_hw_raw_request (original context is sleepable, and for being able
> > > > to communicate with the device we need sleepable context in bpf)
> > > > - same but from hid_hw_output_report
> > > >
> > > > Again, the first 2 are working just fine.
> > > >
> > > > Implementing the latter twos requires sleepable context because we
> > > > might:
> > > >
> > > > 1. a request is made from user-space
> > > > 2. we jump into hid-bpf
> > > > 3. the bpf program "converts" the request from report ID 1 to 2 (because
> > > > we export a slightly different API)
> > > > 4. the bpf program directly emits hid_bpf_raw_request (sleepable
> > > > operation)
> > > > 5. the bpf program returns the correct value
> > > > 6. hid-core doesn't attempt to communicate with the device as bpf
> > > > already did.
> > > >
> > > > In the series, I also realized that I need sleepable and non sleepable
> > > > contexts for this kind of situation, because I want tracing and
> > > > firewalling available (non sleepable context), while still allowing to
> > > > communicate with the device. But when you communicate with the device
> > > > from bpf, the sleepable bpf program is not invoked or this allows
> > > > infinite loops.
> > >
> > > I don't get the point about infinite loops.
> >
> > If I don´t put restrictions on how the bpf program communicate with the
> > device I might have:
> >
> > 1. someone calls hid_hw_raw_request from hidraw
> > 2. bpf jumps into filter for hid_hw_raw_request
> > 3. the bpf program calls hid_bpf_raw_request (which internally calls
> > hid_hw_raw_request)
> > 4. go back to 2.
>
> sleepable progs also have recursion checks.
> So if it recurses into the same prog it will catch it.
> Ideally, of course, such loops are prevented statically.
>
> > But again, not a big deal: if I do not allow entering a sleepable bpf
> > program from hid_bpf_raw_request (so from a bpf program), instead of 4.
> > above, we prevent entering the same bpf program as the program in 2.
> > needs to be sleepable.
> >
> > > fyi struct_ops already supports sleepable and non-sleepable callbacks.
> > > See progs/dummy_st_ops_success.c
> > > SEC(".struct_ops")
> > > struct bpf_dummy_ops dummy_1 = {
> > > .test_1 = (void *)test_1,
> > > .test_2 = (void *)test_2,
> > > .test_sleepable = (void *)test_sleepable,
> > > };
> > >
> > > two callbacks are normal and another one is sleepable.
> > >
> > > The generated bpf trampoline will have the right
> > > __bpf_prog_enter* wrappers for all 3 progs,
> > > so the kernel code will be just do ops->callback_name().
> >
> > Great!
> >
> > So I think I have most of the pieces available... I just need to write
> > the code :)
> >
> > >
> > > > >
> > > > > > Last time I checked, I thought struct_ops were only for defining one set
> > > > > > of operations. And you could overwrite them exactly once.
> > > > > > But after reading more carefully how it was used in tcp_cong.c, it seems
> > > > > > we can have multiple programs which define the same struct_ops, and then
> > > > > > it's the kernel which will choose which one needs to be run.
> > > > >
> > > > > struct-ops is pretty much a mechanism for kernel to define
> > > > > a set of callbacks and bpf prog to provide implementation for
> > > > > these callbacks. The kernel choses when to call them.
> > > > > tcp-bpf is one such user. sched_ext is another and more advanced.
> > > > > Currently struct-ops bpf prog loading/attaching mechanism
> > > > > only specifies the struct-ops. There is no device-id argument,
> > > > > but that can be extended and kernel can keep per-device a set
> > > > > of bpf progs.
> > > > > struct-ops is a bit of overkill if you have only one callback.
> > > > > It's typically for a set of callbacks.
> > > >
> > > > In the end I have 4. However, I might have programs that overwrite twice
> > > > the same callback (see the 2 SEC("fmod_ret/hid_bpf_device_event") in
> > > > [2]).
> > > >
> > > > >
> > > > > > Last, I'm not entirely sure how I can specify which struct_ops needs to be
> > > > > > attached to which device, but it's worth a shot. I've already realized
> > > > > > that I would probably have to drop the current way of HID-BPF is running,
> > > > > > so now it's just technical bits to assemble :)
> > > > >
> > > > > You need to call different bpf progs per device, right?
> > > >
> > > > yes
> > > >
> > > > > If indirect call is fine from performance pov,
> > > > > then tailcall or struct_ops+device_argument might fit.
> > > >
> > > > performance is not a requirement. It's better if we have low latency but
> > > > we are not talking the same requirements than network.
> > > >
> > > > >
> > > > > If you want max perf with direct calls then
> > > > > we'd need to generalize xdp dispatcher.
> > > >
> > > > I'll need to have a deeper look at it, yeah.
> > > >
> > > > >
> > > > > So far it sounds that tailcalls might be the best actually,
> > > > > since prog lifetime is handled by prog array map.
> > > > > Maybe instead of bpf_tail_call helper we should add a kfunc that
> > > > > will operate on prog array differently?
> > > > > (if current bpf_tail_call semantics don't fit).
> > > >
> > > > Actually I'd like to remove bpf_tail_call entirely, because it requires
> > > > to pre-load a BPF program at boot, and in some situations (RHEL) this
> > > > creates issues. I haven't been able to debug what was happening, I
> > > > couldn't reproduce it myself, but removing that bit would be nice :)
> > >
> > > We probably need to debug it anyway, since it sounds that it's
> > > related to preloaded bpf skeleton and not tail_call logic itself.
> >
> > I really think this happens with RHEL because some part are backported and
> > some are not. So unless we get upstream reports, it's more likely a RHEL
> > only issue (which makes things even harder to debug for me).
> >
> > >
> > > After looking through all that it seems to me that
> > > parametrized struct-ops is the way to go.
> >
> > Yeah, I think so too. I'll proabbly start working on it next Monday.
>
> A bunch of us will be at lsfmmbpf next week.
> So email traffic will be slow.

Patch series is finally out [0].

Writting the kernel code was actually much easier than expected, but I
took some more time to also work on the user space loader to have
compatibility with pre/post struct_ops.

Anyway, thanks a lot for the help here. I would appreciate any review
from the bpf folks on patch 3/13, as the rest is mostly HID plumbing,
conversion from the old tracing approach and docs.

Cheers,
Benjamin


[0] https://lore.kernel.org/linux-input/[email protected]/T/#t