LinuxLists.cc - Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf

2021-05-05 18:56:49

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
>
> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> per-cpu buffer that they use to store temporary data (arguments to
> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> by the end of their scope with bpf_bprintf_cleanup.
>
> If one of these helpers gets called within the scope of one of these
> helpers, for example: a first bpf program gets called, uses

Can we afford having few struct bpf_printf_bufs? They are just 512
bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
only situation where this can occur, right? If someone is doing
bpf_snprintf() and interrupt occurs and we run another BPF program, it
will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
second BPF program, etc. We can't eliminate the probability, but
having a small stack of buffers would make the probability so
miniscule as to not worry about it at all.

Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
the changes are minimal. Nestedness property is preserved for
non-sleepable BPF programs, right? If we want this to work for
sleepable we'd need to either: 1) disable migration or 2) instead of
assuming a stack of buffers, do a loop to find unused one. Should be
acceptable performance-wise, as it's not the fastest code anyway
(printf'ing in general).

In any case, re-using the same buffer for sort-of-optional-to-work
bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
suboptimal, so seems worth fixing this.

Thoughts?

> bpf_trace_printk which calls raw_spin_lock_irqsave which is traced by
> another bpf program that calls bpf_trace_printk again, then the second
> "get" fails. Essentially, these helpers are not re-entrant, and it's not
> that bad because they would simply return -EBUSY and recover gracefully.
>
> However, when this happens, the code hits a WARN_ON_ONCE. The guidelines
> in include/asm-generic/bug.h say "Do not use these macros [...] on
> transient conditions like ENOMEM or EAGAIN."
>
> This condition qualifies as transient, for example, the next
> raw_spin_lock_irqsave probe is likely to succeed, so it does not deserve
> a WARN_ON_ONCE.
>
> The guidelines also say "Do not use these macros when checking for
> invalid external inputs (e.g. invalid system call arguments" and, in a
> way, this can be seen as an invalid input because syzkaller triggered
> it.
>
> Signed-off-by: Florent Revest <[email protected]>
> Reported-by: syzbot <[email protected]>
> Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")
> ---
> kernel/bpf/helpers.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 544773970dbc..007fa26eb3f5 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -709,7 +709,7 @@ static int try_get_fmt_tmp_buf(char **tmp_buf)
>
> preempt_disable();
> used = this_cpu_inc_return(bpf_printf_buf_used);
> - if (WARN_ON_ONCE(used > 1)) {
> + if (used > 1) {
> this_cpu_dec(bpf_printf_buf_used);
> preempt_enable();
> return -EBUSY;
> --
> 2.31.1.527.g47e6f16901-goog
>

2021-05-05 21:07:00

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
>>
>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
>> per-cpu buffer that they use to store temporary data (arguments to
>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
>> by the end of their scope with bpf_bprintf_cleanup.
>>
>> If one of these helpers gets called within the scope of one of these
>> helpers, for example: a first bpf program gets called, uses
>
> Can we afford having few struct bpf_printf_bufs? They are just 512
> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> only situation where this can occur, right? If someone is doing
> bpf_snprintf() and interrupt occurs and we run another BPF program, it
> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> second BPF program, etc. We can't eliminate the probability, but
> having a small stack of buffers would make the probability so
> miniscule as to not worry about it at all.
>
> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> the changes are minimal. Nestedness property is preserved for
> non-sleepable BPF programs, right? If we want this to work for
> sleepable we'd need to either: 1) disable migration or 2) instead of
> assuming a stack of buffers, do a loop to find unused one. Should be
> acceptable performance-wise, as it's not the fastest code anyway
> (printf'ing in general).
>
> In any case, re-using the same buffer for sort-of-optional-to-work
> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> suboptimal, so seems worth fixing this.
>
> Thoughts?

Yes, agree, it would otherwise be really hard to debug. I had the same
thought on why not allowing nesting here given users very likely expect
these helpers to just work for all the contexts.

Thanks,
Daniel

2021-05-05 21:15:24

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <[email protected]> wrote:
>
> On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
> >>
> >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> >> per-cpu buffer that they use to store temporary data (arguments to
> >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> >> by the end of their scope with bpf_bprintf_cleanup.
> >>
> >> If one of these helpers gets called within the scope of one of these
> >> helpers, for example: a first bpf program gets called, uses
> >
> > Can we afford having few struct bpf_printf_bufs? They are just 512
> > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > only situation where this can occur, right? If someone is doing
> > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > second BPF program, etc. We can't eliminate the probability, but
> > having a small stack of buffers would make the probability so
> > miniscule as to not worry about it at all.
> >
> > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > the changes are minimal. Nestedness property is preserved for
> > non-sleepable BPF programs, right? If we want this to work for
> > sleepable we'd need to either: 1) disable migration or 2) instead of

oh wait, we already disable migration for sleepable BPF progs, so it
should be good to do nestedness level only

> > assuming a stack of buffers, do a loop to find unused one. Should be
> > acceptable performance-wise, as it's not the fastest code anyway
> > (printf'ing in general).
> >
> > In any case, re-using the same buffer for sort-of-optional-to-work
> > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > suboptimal, so seems worth fixing this.
> >
> > Thoughts?
>
> Yes, agree, it would otherwise be really hard to debug. I had the same
> thought on why not allowing nesting here given users very likely expect
> these helpers to just work for all the contexts.
>
> Thanks,
> Daniel

2021-05-05 21:22:24

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <[email protected]> wrote:
> >
> > On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
> > >>
> > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> > >> per-cpu buffer that they use to store temporary data (arguments to
> > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> > >> by the end of their scope with bpf_bprintf_cleanup.
> > >>
> > >> If one of these helpers gets called within the scope of one of these
> > >> helpers, for example: a first bpf program gets called, uses
> > >
> > > Can we afford having few struct bpf_printf_bufs? They are just 512
> > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > > only situation where this can occur, right? If someone is doing
> > > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > > second BPF program, etc. We can't eliminate the probability, but
> > > having a small stack of buffers would make the probability so
> > > miniscule as to not worry about it at all.
> > >
> > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > > the changes are minimal. Nestedness property is preserved for
> > > non-sleepable BPF programs, right? If we want this to work for
> > > sleepable we'd need to either: 1) disable migration or 2) instead of
>
> oh wait, we already disable migration for sleepable BPF progs, so it
> should be good to do nestedness level only

actually, migrate_disable() might not be enough. Unless it is
impossible for some reason I miss, worst case it could be that two
sleepable programs (A and B) can be intermixed on the same CPU: A
starts&sleeps - B starts&sleeps - A continues&returns - B continues
and nestedness doesn't work anymore. So something like "reserving a
slot" would work better.

>
> > > assuming a stack of buffers, do a loop to find unused one. Should be
> > > acceptable performance-wise, as it's not the fastest code anyway
> > > (printf'ing in general).
> > >
> > > In any case, re-using the same buffer for sort-of-optional-to-work
> > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > > suboptimal, so seems worth fixing this.
> > >
> > > Thoughts?
> >
> > Yes, agree, it would otherwise be really hard to debug. I had the same
> > thought on why not allowing nesting here given users very likely expect
> > these helpers to just work for all the contexts.
> >
> > Thanks,
> > Daniel

2021-05-05 22:43:20

by Florent Revest

[permalink] [raw]

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
> <[email protected]> wrote:
> >
> > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <[email protected]> wrote:
> > >
> > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
> > > >>
> > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> > > >> per-cpu buffer that they use to store temporary data (arguments to
> > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> > > >> by the end of their scope with bpf_bprintf_cleanup.
> > > >>
> > > >> If one of these helpers gets called within the scope of one of these
> > > >> helpers, for example: a first bpf program gets called, uses
> > > >
> > > > Can we afford having few struct bpf_printf_bufs? They are just 512
> > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > > > only situation where this can occur, right? If someone is doing
> > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > > > second BPF program, etc. We can't eliminate the probability, but
> > > > having a small stack of buffers would make the probability so
> > > > miniscule as to not worry about it at all.
> > > >
> > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > > > the changes are minimal. Nestedness property is preserved for
> > > > non-sleepable BPF programs, right? If we want this to work for
> > > > sleepable we'd need to either: 1) disable migration or 2) instead of
> >
> > oh wait, we already disable migration for sleepable BPF progs, so it
> > should be good to do nestedness level only
>
> actually, migrate_disable() might not be enough. Unless it is
> impossible for some reason I miss, worst case it could be that two
> sleepable programs (A and B) can be intermixed on the same CPU: A
> starts&sleeps - B starts&sleeps - A continues&returns - B continues
> and nestedness doesn't work anymore. So something like "reserving a
> slot" would work better.

Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?

> >
> > > > assuming a stack of buffers, do a loop to find unused one. Should be
> > > > acceptable performance-wise, as it's not the fastest code anyway
> > > > (printf'ing in general).
> > > >
> > > > In any case, re-using the same buffer for sort-of-optional-to-work
> > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > > > suboptimal, so seems worth fixing this.
> > > >
> > > > Thoughts?
> > >
> > > Yes, agree, it would otherwise be really hard to debug. I had the same
> > > thought on why not allowing nesting here given users very likely expect
> > > these helpers to just work for all the contexts.
> > >
> > > Thanks,
> > > Daniel

What would you think of just letting the helpers own these 512 bytes
buffers as local variables on their stacks ? Then bpf_prepare_bprintf
would only need to write there, there would be no acquire semantic
(like try_get_fmt_tmp_buf) and the stack frame would just be freed on
the helper return so there would be no bpf_printf_cleanup either. We
would also not pre-reserve static memory for all CPUs and it becomes
trivial to handle re-entrant helper calls.

I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
code but I've not been convinced of its necessity.

2021-05-06 19:55:11

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On Wed, May 5, 2021 at 3:29 PM Florent Revest <[email protected]> wrote:
>
> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
> <[email protected]> wrote:
> >
> > On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
> > <[email protected]> wrote:
> > >
> > > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <[email protected]> wrote:
> > > >
> > > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
> > > > >>
> > > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> > > > >> per-cpu buffer that they use to store temporary data (arguments to
> > > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> > > > >> by the end of their scope with bpf_bprintf_cleanup.
> > > > >>
> > > > >> If one of these helpers gets called within the scope of one of these
> > > > >> helpers, for example: a first bpf program gets called, uses
> > > > >
> > > > > Can we afford having few struct bpf_printf_bufs? They are just 512
> > > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > > > > only situation where this can occur, right? If someone is doing
> > > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > > > > second BPF program, etc. We can't eliminate the probability, but
> > > > > having a small stack of buffers would make the probability so
> > > > > miniscule as to not worry about it at all.
> > > > >
> > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > > > > the changes are minimal. Nestedness property is preserved for
> > > > > non-sleepable BPF programs, right? If we want this to work for
> > > > > sleepable we'd need to either: 1) disable migration or 2) instead of
> > >
> > > oh wait, we already disable migration for sleepable BPF progs, so it
> > > should be good to do nestedness level only
> >
> > actually, migrate_disable() might not be enough. Unless it is
> > impossible for some reason I miss, worst case it could be that two
> > sleepable programs (A and B) can be intermixed on the same CPU: A
> > starts&sleeps - B starts&sleeps - A continues&returns - B continues
> > and nestedness doesn't work anymore. So something like "reserving a
> > slot" would work better.
>
> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
>
> > >
> > > > > assuming a stack of buffers, do a loop to find unused one. Should be
> > > > > acceptable performance-wise, as it's not the fastest code anyway
> > > > > (printf'ing in general).
> > > > >
> > > > > In any case, re-using the same buffer for sort-of-optional-to-work
> > > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > > > > suboptimal, so seems worth fixing this.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Yes, agree, it would otherwise be really hard to debug. I had the same
> > > > thought on why not allowing nesting here given users very likely expect
> > > > these helpers to just work for all the contexts.
> > > >
> > > > Thanks,
> > > > Daniel
>
> What would you think of just letting the helpers own these 512 bytes
> buffers as local variables on their stacks ? Then bpf_prepare_bprintf
> would only need to write there, there would be no acquire semantic
> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
> the helper return so there would be no bpf_printf_cleanup either. We
> would also not pre-reserve static memory for all CPUs and it becomes
> trivial to handle re-entrant helper calls.
>
> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
> code but I've not been convinced of its necessity.

I got the impression that extra 512 bytes on the kernel stack is quite
a lot and that's why we have per-cpu buffers. Especially that
bpf_trace_printk() can be called from any context, including NMI.

2021-05-06 20:32:04

by Florent Revest

[permalink] [raw]

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Wed, May 5, 2021 at 3:29 PM Florent Revest <[email protected]> wrote:
> >
> > On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
> > <[email protected]> wrote:
> > >
> > > On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <[email protected]> wrote:
> > > > >
> > > > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > > > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
> > > > > >>
> > > > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> > > > > >> per-cpu buffer that they use to store temporary data (arguments to
> > > > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> > > > > >> by the end of their scope with bpf_bprintf_cleanup.
> > > > > >>
> > > > > >> If one of these helpers gets called within the scope of one of these
> > > > > >> helpers, for example: a first bpf program gets called, uses
> > > > > >
> > > > > > Can we afford having few struct bpf_printf_bufs? They are just 512
> > > > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > > > > > only situation where this can occur, right? If someone is doing
> > > > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > > > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > > > > > second BPF program, etc. We can't eliminate the probability, but
> > > > > > having a small stack of buffers would make the probability so
> > > > > > miniscule as to not worry about it at all.
> > > > > >
> > > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > > > > > the changes are minimal. Nestedness property is preserved for
> > > > > > non-sleepable BPF programs, right? If we want this to work for
> > > > > > sleepable we'd need to either: 1) disable migration or 2) instead of
> > > >
> > > > oh wait, we already disable migration for sleepable BPF progs, so it
> > > > should be good to do nestedness level only
> > >
> > > actually, migrate_disable() might not be enough. Unless it is
> > > impossible for some reason I miss, worst case it could be that two
> > > sleepable programs (A and B) can be intermixed on the same CPU: A
> > > starts&sleeps - B starts&sleeps - A continues&returns - B continues
> > > and nestedness doesn't work anymore. So something like "reserving a
> > > slot" would work better.
> >
> > Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
> >
> > > >
> > > > > > assuming a stack of buffers, do a loop to find unused one. Should be
> > > > > > acceptable performance-wise, as it's not the fastest code anyway
> > > > > > (printf'ing in general).
> > > > > >
> > > > > > In any case, re-using the same buffer for sort-of-optional-to-work
> > > > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > > > > > suboptimal, so seems worth fixing this.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > Yes, agree, it would otherwise be really hard to debug. I had the same
> > > > > thought on why not allowing nesting here given users very likely expect
> > > > > these helpers to just work for all the contexts.
> > > > >
> > > > > Thanks,
> > > > > Daniel
> >
> > What would you think of just letting the helpers own these 512 bytes
> > buffers as local variables on their stacks ? Then bpf_prepare_bprintf
> > would only need to write there, there would be no acquire semantic
> > (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
> > the helper return so there would be no bpf_printf_cleanup either. We
> > would also not pre-reserve static memory for all CPUs and it becomes
> > trivial to handle re-entrant helper calls.
> >
> > I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
> > code but I've not been convinced of its necessity.
>
> I got the impression that extra 512 bytes on the kernel stack is quite
> a lot and that's why we have per-cpu buffers. Especially that
> bpf_trace_printk() can be called from any context, including NMI.

Ok, I understand.

What about having one buffer per helper, synchronized with a spinlock?
Actually, bpf_trace_printk already has that, not for the bprintf
arguments but for the bprintf output so this wouldn't change much to
the performance of the helpers anyway:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385

These helpers are not performance sensitive so a per-cpu stack of
buffers feels over-engineered to me (and is also complexity I feel a
bit uncomfortable with).

2021-05-06 21:40:47

by Daniel Borkmann

[permalink] [raw]

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On 5/6/21 10:17 PM, Florent Revest wrote:
> On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko
> <[email protected]> wrote:
>> On Wed, May 5, 2021 at 3:29 PM Florent Revest <[email protected]> wrote:
>>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
>>> <[email protected]> wrote:
>>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
>>>> <[email protected]> wrote:
>>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <[email protected]> wrote:
>>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
>>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
>>>>>>>>
>>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
>>>>>>>> per-cpu buffer that they use to store temporary data (arguments to
>>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
>>>>>>>> by the end of their scope with bpf_bprintf_cleanup.
>>>>>>>>
>>>>>>>> If one of these helpers gets called within the scope of one of these
>>>>>>>> helpers, for example: a first bpf program gets called, uses
>>>>>>>
>>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512
>>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
>>>>>>> only situation where this can occur, right? If someone is doing
>>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it
>>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
>>>>>>> second BPF program, etc. We can't eliminate the probability, but
>>>>>>> having a small stack of buffers would make the probability so
>>>>>>> miniscule as to not worry about it at all.
>>>>>>>
>>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
>>>>>>> the changes are minimal. Nestedness property is preserved for
>>>>>>> non-sleepable BPF programs, right? If we want this to work for
>>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of
>>>>>
>>>>> oh wait, we already disable migration for sleepable BPF progs, so it
>>>>> should be good to do nestedness level only
>>>>
>>>> actually, migrate_disable() might not be enough. Unless it is
>>>> impossible for some reason I miss, worst case it could be that two
>>>> sleepable programs (A and B) can be intermixed on the same CPU: A
>>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues
>>>> and nestedness doesn't work anymore. So something like "reserving a
>>>> slot" would work better.
>>>
>>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
>>>
>>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be
>>>>>>> acceptable performance-wise, as it's not the fastest code anyway
>>>>>>> (printf'ing in general).
>>>>>>>
>>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work
>>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
>>>>>>> suboptimal, so seems worth fixing this.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same
>>>>>> thought on why not allowing nesting here given users very likely expect
>>>>>> these helpers to just work for all the contexts.
>>>>>>
>>>>>> Thanks,
>>>>>> Daniel
>>>
>>> What would you think of just letting the helpers own these 512 bytes
>>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf
>>> would only need to write there, there would be no acquire semantic
>>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
>>> the helper return so there would be no bpf_printf_cleanup either. We
>>> would also not pre-reserve static memory for all CPUs and it becomes
>>> trivial to handle re-entrant helper calls.
>>>
>>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
>>> code but I've not been convinced of its necessity.
>>
>> I got the impression that extra 512 bytes on the kernel stack is quite
>> a lot and that's why we have per-cpu buffers. Especially that
>> bpf_trace_printk() can be called from any context, including NMI.
>
> Ok, I understand.
>
> What about having one buffer per helper, synchronized with a spinlock?
> Actually, bpf_trace_printk already has that, not for the bprintf
> arguments but for the bprintf output so this wouldn't change much to
> the performance of the helpers anyway:
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385
>
> These helpers are not performance sensitive so a per-cpu stack of
> buffers feels over-engineered to me (and is also complexity I feel a
> bit uncomfortable with).

But wouldn't this have same potential of causing a deadlock? Simple example
would be if you have a tracing prog attached to bstr_printf(), and one of
the other helpers using the same lock called from a non-tracing prog. If
it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii
mentioned earlier. We've had few prior examples with similar issues [0].

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99

2021-05-07 15:21:21

by Florent Revest

[permalink] [raw]

Subject: Re: [PATCH bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare

On Thu, May 6, 2021 at 11:38 PM Daniel Borkmann <[email protected]> wrote:
>
> On 5/6/21 10:17 PM, Florent Revest wrote:
> > On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko
> > <[email protected]> wrote:
> >> On Wed, May 5, 2021 at 3:29 PM Florent Revest <[email protected]> wrote:
> >>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
> >>> <[email protected]> wrote:
> >>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
> >>>> <[email protected]> wrote:
> >>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <[email protected]> wrote:
> >>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> >>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> >>>>>>>> per-cpu buffer that they use to store temporary data (arguments to
> >>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> >>>>>>>> by the end of their scope with bpf_bprintf_cleanup.
> >>>>>>>>
> >>>>>>>> If one of these helpers gets called within the scope of one of these
> >>>>>>>> helpers, for example: a first bpf program gets called, uses
> >>>>>>>
> >>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512
> >>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> >>>>>>> only situation where this can occur, right? If someone is doing
> >>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it
> >>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> >>>>>>> second BPF program, etc. We can't eliminate the probability, but
> >>>>>>> having a small stack of buffers would make the probability so
> >>>>>>> miniscule as to not worry about it at all.
> >>>>>>>
> >>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> >>>>>>> the changes are minimal. Nestedness property is preserved for
> >>>>>>> non-sleepable BPF programs, right? If we want this to work for
> >>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of
> >>>>>
> >>>>> oh wait, we already disable migration for sleepable BPF progs, so it
> >>>>> should be good to do nestedness level only
> >>>>
> >>>> actually, migrate_disable() might not be enough. Unless it is
> >>>> impossible for some reason I miss, worst case it could be that two
> >>>> sleepable programs (A and B) can be intermixed on the same CPU: A
> >>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues
> >>>> and nestedness doesn't work anymore. So something like "reserving a
> >>>> slot" would work better.
> >>>
> >>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
> >>>
> >>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be
> >>>>>>> acceptable performance-wise, as it's not the fastest code anyway
> >>>>>>> (printf'ing in general).
> >>>>>>>
> >>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work
> >>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> >>>>>>> suboptimal, so seems worth fixing this.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>
> >>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same
> >>>>>> thought on why not allowing nesting here given users very likely expect
> >>>>>> these helpers to just work for all the contexts.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Daniel
> >>>
> >>> What would you think of just letting the helpers own these 512 bytes
> >>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf
> >>> would only need to write there, there would be no acquire semantic
> >>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
> >>> the helper return so there would be no bpf_printf_cleanup either. We
> >>> would also not pre-reserve static memory for all CPUs and it becomes
> >>> trivial to handle re-entrant helper calls.
> >>>
> >>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
> >>> code but I've not been convinced of its necessity.
> >>
> >> I got the impression that extra 512 bytes on the kernel stack is quite
> >> a lot and that's why we have per-cpu buffers. Especially that
> >> bpf_trace_printk() can be called from any context, including NMI.
> >
> > Ok, I understand.
> >
> > What about having one buffer per helper, synchronized with a spinlock?
> > Actually, bpf_trace_printk already has that, not for the bprintf
> > arguments but for the bprintf output so this wouldn't change much to
> > the performance of the helpers anyway:
> > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385
> >
> > These helpers are not performance sensitive so a per-cpu stack of
> > buffers feels over-engineered to me (and is also complexity I feel a
> > bit uncomfortable with).
>
> But wouldn't this have same potential of causing a deadlock? Simple example
> would be if you have a tracing prog attached to bstr_printf(), and one of
> the other helpers using the same lock called from a non-tracing prog. If

Ah, right, I see :/

> it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii
> mentioned earlier. We've had few prior examples with similar issues [0].
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99

Ok it's not as bad as I imagined, thank you Daniel :) I'll look into
it beginning of next week.