The error indicates that the verifier is letting through a program with
a stack depth bigger than 512.
This is due to the verifier not checking the stack depth after
instruction rewrites are perfomed. For example, the MAY_GOTO instruction
adds 8 bytes to the stack, which means that if the stack at the moment
was already 512 bytes it would overflow after rewriting the instruction.
The fix involves adding a stack depth check after all instruction
rewrites are performed.
Reported-by: [email protected]
Signed-off-by: Camila Alvarez <[email protected]>
---
kernel/bpf/verifier.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 63749ad5ac6b..a9e23b6b8e8f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -21285,6 +21285,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
if (ret == 0)
ret = do_misc_fixups(env);
+ /* max stack depth verification must be done after rewrites as well */
+ if (ret == 0)
+ ret = check_max_stack_depth(env);
+
/* do 32-bit optimization after insn patching has done so those patched
* insns could be handled correctly.
*/
--
2.34.1
On Sat, May 4, 2024 at 6:49 PM Camila Alvarez <[email protected]> wrote:
>
> The error indicates that the verifier is letting through a program with
> a stack depth bigger than 512.
>
> This is due to the verifier not checking the stack depth after
> instruction rewrites are perfomed. For example, the MAY_GOTO instruction
> adds 8 bytes to the stack, which means that if the stack at the moment
> was already 512 bytes it would overflow after rewriting the instruction.
This is by design. may_goto and other constructs like bpf_loop
inlining can consume a few words above 512 limit.
> The fix involves adding a stack depth check after all instruction
> rewrites are performed.
>
> Reported-by: [email protected]
This syzbot report is likely unrelated.
It says that it bisected it to may_goto, but it has this report
before may_goto was introduced, so bisection is incorrect.
pw-bot: cr
> Signed-off-by: Camila Alvarez <[email protected]>
> ---
> kernel/bpf/verifier.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 63749ad5ac6b..a9e23b6b8e8f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -21285,6 +21285,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
> if (ret == 0)
> ret = do_misc_fixups(env);
>
> + /* max stack depth verification must be done after rewrites as well */
> + if (ret == 0)
> + ret = check_max_stack_depth(env);
> +
> /* do 32-bit optimization after insn patching has done so those patched
> * insns could be handled correctly.
> */
> --
> 2.34.1
>
On Sun, 5 May 2024, Alexei Starovoitov wrote:
> On Sat, May 4, 2024 at 6:49 PM Camila Alvarez <[email protected]> wrote:
>>
>> The error indicates that the verifier is letting through a program with
>> a stack depth bigger than 512.
>>
>> This is due to the verifier not checking the stack depth after
>> instruction rewrites are perfomed. For example, the MAY_GOTO instruction
>> adds 8 bytes to the stack, which means that if the stack at the moment
>> was already 512 bytes it would overflow after rewriting the instruction.
>
> This is by design. may_goto and other constructs like bpf_loop
> inlining can consume a few words above 512 limit.
>
Is this the only case where the verifier should allow the stack to go over
the 512 limit? If that's the case, maybe we could use the extra stack
depth to store how much the rewrites affect the stack depth? This would
only be used to obtain the correct interpreter when
CONFIG_BPF_JIT_ALWAYS_ON is not set.
That would allow choosing the interpreter by considering the stack depth
before the rewrites.
>> The fix involves adding a stack depth check after all instruction
>> rewrites are performed.
>>
>> Reported-by: [email protected]
>
> This syzbot report is likely unrelated.
> It says that it bisected it to may_goto, but it has this report
> before may_goto was introduced, so bisection is incorrect.
>
> pw-bot: cr
I can see that may_goto was introduced on march 6th, and the first report
was on march 13th. Is there any report I'm missing?
>
>> Signed-off-by: Camila Alvarez <[email protected]>
>> ---
>> kernel/bpf/verifier.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 63749ad5ac6b..a9e23b6b8e8f 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -21285,6 +21285,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
>> if (ret == 0)
>> ret = do_misc_fixups(env);
>>
>> + /* max stack depth verification must be done after rewrites as well */
>> + if (ret == 0)
>> + ret = check_max_stack_depth(env);
>> +
>> /* do 32-bit optimization after insn patching has done so those patched
>> * insns could be handled correctly.
>> */
>> --
>> 2.34.1
>>
>
On Sun, May 5, 2024 at 4:18 PM Camila Alvarez Inostroza
<[email protected]> wrote:
>
>
>
> On Sun, 5 May 2024, Alexei Starovoitov wrote:
>
> > On Sat, May 4, 2024 at 6:49 PM Camila Alvarez <[email protected]> wrote:
> >>
> >> The error indicates that the verifier is letting through a program with
> >> a stack depth bigger than 512.
> >>
> >> This is due to the verifier not checking the stack depth after
> >> instruction rewrites are perfomed. For example, the MAY_GOTO instruction
> >> adds 8 bytes to the stack, which means that if the stack at the moment
> >> was already 512 bytes it would overflow after rewriting the instruction.
> >
> > This is by design. may_goto and other constructs like bpf_loop
> > inlining can consume a few words above 512 limit.
> >
>
> Is this the only case where the verifier should allow the stack to go over
> the 512 limit? If that's the case, maybe we could use the extra stack
> depth to store how much the rewrites affect the stack depth? This would
> only be used to obtain the correct interpreter when
> CONFIG_BPF_JIT_ALWAYS_ON is not set.
> That would allow choosing the interpreter by considering the stack depth
> before the rewrites.
>
> >> The fix involves adding a stack depth check after all instruction
> >> rewrites are performed.
> >>
> >> Reported-by: [email protected]
> >
> > This syzbot report is likely unrelated.
> > It says that it bisected it to may_goto, but it has this report
> > before may_goto was introduced, so bisection is incorrect.
> >
> > pw-bot: cr
>
> I can see that may_goto was introduced on march 6th, and the first report
> was on march 13th. Is there any report I'm missing?
Could you please craft a selftest for this issue then?
It will be much easier to reason about the fix.
We can either add another interpreter to interpreters_args[]
or just gate may_goto with prog->jit_requested.