by Toke Høiland-Jørgensen

[permalink] [raw]

Subject: Re: [PATCH RFC bpf-next 0/9] allow HID-BPF to do device IOs

Alexei Starovoitov <[email protected]> writes:

> On Tue, Feb 13, 2024 at 08:51:26PM +0100, Toke Høiland-Jørgensen wrote:
>> Kumar Kartikeya Dwivedi <[email protected]> writes:
>>
>> > On Tue, 13 Feb 2024 at 18:46, Benjamin Tissoires <[email protected]> wrote:
>> >>
>> >> On Feb 12 2024, Alexei Starovoitov wrote:
>> >> > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires
>> >> > <[email protected]> wrote:
>> >> > >
>> >> > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <[email protected]> wrote:
>> >> > > >
>> >> > > > Benjamin Tissoires <[email protected]> writes:
>> >> > > >
>> >> [...]
>> >> > I agree that workqueue delegation fits into the bpf_timer concept and
>> >> > a lot of code can and should be shared.
>> >>
>> >> Thanks Alexei for the detailed answer. I've given it an attempt but still can not
>> >> figure it out entirely.
>> >>
>> >> > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :)
>> >> > Too bad, bpf_timer_set_callback() doesn't have a flag argument,
>> >> > so we need a new kfunc to set a sleepable callback.
>> >> > Maybe
>> >> > bpf_timer_set_sleepable_cb() ?
>> >>
>> >> OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag?
>> >>
>> >> > The verifier will set is_async_cb = true for it (like it does for regular cb-s).
>> >> > And since prog->aux->sleepable is kinda "global" we need another
>> >> > per subprog flag:
>> >> > bool is_sleepable: 1;
>> >>
>> >> done (in push_callback_call())
>> >>
>> >> >
>> >> > We can factor out a check "if (prog->aux->sleepable)" into a helper
>> >> > that will check that "global" flag and another env->cur_state->in_sleepable
>> >> > flag that will work similar to active_rcu_lock.
>> >>
>> >> done (I think), cf patch 2 below
>> >>
>> >> > Once the verifier starts processing subprog->is_sleepable
>> >> > it will set cur_state->in_sleepable = true;
>> >> > to make all subprogs called from that cb to be recognized as sleepable too.
>> >>
>> >> That's the point I don't know where to put the new code.
>> >>
>> >
>> > I think that would go in the already existing special case for
>> > push_async_cb where you get the verifier state of the async callback.
>> > You can make setting the boolean in that verifier state conditional on
>> > whether it's your kfunc/helper you're processing taking a sleepable
>> > callback.
>> >
>> >> It seems the best place would be in do_check(), but I am under the impression
>> >> that the code of the callback is added at the end of the instruction list, meaning
>> >> that I do not know where it starts, and which subprog index it corresponds to.
>> >>
>> >> >
>> >> > A bit of a challenge is what to do with global subprogs,
>> >> > since they're verified lazily. They can be called from
>> >> > sleepable and non-sleepable contex. Should be solvable.
>> >>
>> >> I must confess this is way over me (and given that I didn't even managed to make
>> >> the "easy" case working, that might explain things a little :-P )
>> >>
>> >
>> > I think it will be solvable but made somewhat difficult by the fact
>> > that even if we mark subprog_info of some global_func A as
>> > in_sleepable, so that we explore it as sleepable during its
>> > verification, we might encounter later another global_func that calls
>> > a global func, already explored as non-sleepable, in sleepable
>> > context. In this case I think we need to redo the verification of that
>> > global func as sleepable once again. It could be that it is called
>> > from both non-sleepable and sleepable contexts, so both paths
>> > (in_sleepable = true, and in_sleepable = false) need to be explored,
>> > or we could reject such cases, but it might be a little restrictive.
>> >
>> > Some common helper global func unrelated to caller context doing some
>> > auxiliary work, called from sleepable timer callback and normal main
>> > subprog might be an example where rejection will be prohibitive.
>> >
>> > An approach might be to explore main and global subprogs once as we do
>> > now, and then keep a list of global subprogs that need to be revisited
>> > as in_sleepable (due to being called from a sleepable context) and
>> > trigger do_check_common for them again, this might have to be repeated
>> > as the list grows on each iteration, but eventually we will have
>> > explored all of them as in_sleepable if need be, and the loop will
>> > end. Surely, this trades off logical simplicity of verifier code with
>> > redoing verification of global subprogs again.
>> >
>> > To add items to such a list, for each global subprog we encounter that
>> > needs to be analyzed as in_sleepable, we will also collect all its
>> > callee global subprogs by walking its instructions (a bit like
>> > check_max_stack_depth does).
>>
>> Sorry if I'm being dense, but why is all this needed if it's already
>> possible to just define the timer callback from a program type that
>> allows sleeping, and then set the actual timeout from a different
>> program that is not sleepable? Isn't the set_sleepable_cb() kfunc just a
>> convenience then? Or did I misunderstand and it's not actually possible
>> to mix callback/timer arming from different program types?
>
> More than just convience.
> bpf_set_sleepable_cb() might need to be called from non-sleepable and
> there could be no way to hack it around with fake sleepable entry.
> bpf_timer_cancel() clears callback_fn.
> So if prog wants to bpf_timer_start() and later bpf_timer_cancel()
> it would need to bpf_set_sleepable_cb() every time before bpf_timer_start().
> And at that time it might be in non-sleepable ctx.

Ah, right, makes sense; didn't think about bpf_timer_cancel(). Thanks
for the explanation :)

-Toke

2024-02-14 17:12:31

by Benjamin Tissoires

[permalink] [raw]

Subject: Re: [PATCH RFC bpf-next 0/9] allow HID-BPF to do device IOs

On Feb 13 2024, Kumar Kartikeya Dwivedi wrote:
> On Tue, 13 Feb 2024 at 18:46, Benjamin Tissoires <[email protected]> wrote:
> >
> > On Feb 12 2024, Alexei Starovoitov wrote:
> > > On Mon, Feb 12, 2024 at 10:21 AM Benjamin Tissoires
> > > <[email protected]> wrote:
> > > >
> > > > On Mon, Feb 12, 2024 at 6:46 PM Toke Høiland-Jørgensen <[email protected]> wrote:
> > > > >
> > > > > Benjamin Tissoires <[email protected]> writes:
> > > > >
> > [...]
> > > I agree that workqueue delegation fits into the bpf_timer concept and
> > > a lot of code can and should be shared.
> >
> > Thanks Alexei for the detailed answer. I've given it an attempt but still can not
> > figure it out entirely.
> >
> > > All the lessons(bugs) learned with bpf_timer don't need to be re-discovered :)
> > > Too bad, bpf_timer_set_callback() doesn't have a flag argument,
> > > so we need a new kfunc to set a sleepable callback.
> > > Maybe
> > > bpf_timer_set_sleepable_cb() ?
> >
> > OK. So I guess I should drop Toke's suggestion with the bpf_timer_ini() flag?
> >
> > > The verifier will set is_async_cb = true for it (like it does for regular cb-s).
> > > And since prog->aux->sleepable is kinda "global" we need another
> > > per subprog flag:
> > > bool is_sleepable: 1;
> >
> > done (in push_callback_call())
> >
> > >
> > > We can factor out a check "if (prog->aux->sleepable)" into a helper
> > > that will check that "global" flag and another env->cur_state->in_sleepable
> > > flag that will work similar to active_rcu_lock.
> >
> > done (I think), cf patch 2 below
> >
> > > Once the verifier starts processing subprog->is_sleepable
> > > it will set cur_state->in_sleepable = true;
> > > to make all subprogs called from that cb to be recognized as sleepable too.
> >
> > That's the point I don't know where to put the new code.
> >
>
> I think that would go in the already existing special case for
> push_async_cb where you get the verifier state of the async callback.
> You can make setting the boolean in that verifier state conditional on
> whether it's your kfunc/helper you're processing taking a sleepable
> callback.

Hehe, thanks a lot. Indeed, it was a simple fix. I tried to put this
everywhere but here, and with your help got it working in 2 mins :)

>
> > It seems the best place would be in do_check(), but I am under the impression
> > that the code of the callback is added at the end of the instruction list, meaning
> > that I do not know where it starts, and which subprog index it corresponds to.
> >
> > >
> > > A bit of a challenge is what to do with global subprogs,
> > > since they're verified lazily. They can be called from
> > > sleepable and non-sleepable contex. Should be solvable.
> >
> > I must confess this is way over me (and given that I didn't even managed to make
> > the "easy" case working, that might explain things a little :-P )
> >
>
> I think it will be solvable but made somewhat difficult by the fact
> that even if we mark subprog_info of some global_func A as
> in_sleepable, so that we explore it as sleepable during its
> verification, we might encounter later another global_func that calls
> a global func, already explored as non-sleepable, in sleepable
> context. In this case I think we need to redo the verification of that
> global func as sleepable once again. It could be that it is called
> from both non-sleepable and sleepable contexts, so both paths
> (in_sleepable = true, and in_sleepable = false) need to be explored,
> or we could reject such cases, but it might be a little restrictive.
>
> Some common helper global func unrelated to caller context doing some
> auxiliary work, called from sleepable timer callback and normal main
> subprog might be an example where rejection will be prohibitive.
>
> An approach might be to explore main and global subprogs once as we do
> now, and then keep a list of global subprogs that need to be revisited
> as in_sleepable (due to being called from a sleepable context) and
> trigger do_check_common for them again, this might have to be repeated
> as the list grows on each iteration, but eventually we will have
> explored all of them as in_sleepable if need be, and the loop will
> end. Surely, this trades off logical simplicity of verifier code with
> redoing verification of global subprogs again.
>
> To add items to such a list, for each global subprog we encounter that
> needs to be analyzed as in_sleepable, we will also collect all its
> callee global subprogs by walking its instructions (a bit like
> check_max_stack_depth does).

FWIW, this (or Alexei's suggestion) is still not implemented in v2

>
> > >
> > > Overall I think this feature is needed urgently,
> > > so if you don't have cycles to work on this soon,
> > > I can prioritize it right after bpf_arena work.
> >
> > I can try to spare a few cycles on it. Even if your instructions were on
> > spot, I still can't make the subprogs recognized as sleepable.
> >
> > For reference, this is where I am (probably bogus, but seems to be
> > working when timer_set_sleepable_cb() is called from a sleepable context
> > as mentioned by Toke):
> >
>
> I just skimmed the patch but I think it's already 90% there. The only
> other change I would suggest is switching from helper to kfunc, as
> originally proposed by Alexei.

the kfunc was a rabbit hole:
- I needed to teach the verifier about BPF_TIMER in kfunc
- I needed to teach the verifier about the kfunc itself
- I'm failing at calling the callback :(

Anyway, I'm about to send a second RFC so we can discuss on the code and
see where my monkey patching capabilities are reaching their limits.

Cheers,
Benjamin