[permalink] [raw]

Subject: Re: [Bug Report] bpf: incorrectly pruning runtime execution path

On Thu, Dec 14, 2023 at 6:28 PM Eduard Zingerman <[email protected]> wrote:
>
> On Thu, 2023-12-14 at 18:16 -0800, Alexei Starovoitov wrote:
> [...]
> > > E.g. for the test-case at hand:
> > >
> > > 0: (85) call bpf_get_prandom_u32#7 ; R0=scalar()
> > > 1: (bf) r7 = r0 ; R0=scalar(id=1) R7_w=scalar(id=1)
> > > 2: (bf) r8 = r0 ; R0=scalar(id=1) R8_w=scalar(id=1)
> > > 3: (85) call bpf_get_prandom_u32#7 ; R0=scalar()
> > > --- checkpoint #1 r7.id = 1, r8.id = 1 ---
> > > 4: (25) if r0 > 0x1 goto pc+0 ; R0=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,...)
> > > --- checkpoint #2 r7.id = 1, r8.id = 1 ---
> > > 5: (3d) if r8 >= r0 goto pc+3 ; R0=1 R8=0 | record r8.id=1 in jump history
> > > 6: (0f) r8 += r8 ; R8=0
> >
> > can we detect that any register link is broken and force checkpoint here?
>
> Should be possible. I'll try this in the morning and check veristat results.
>
> By the way, I added some stats collection for find_equal_scalars() and see
> the following results when run on ./test_progs:
> - maximal number of registers with same id per call: 3
> - average number of registers with same id per call: 1.4

What if we keep 8 extra bytes in jump/instruction history and encode
up to 8 linked registers/slots:

1. 1 bit to mark whether it's a src_reg set, or dst_reg set
2. 1 bit to mark whether it's a stack slot or register
3. 6 bits (0..63 values) to record register or slot number

If we ever need more than 8 linked registers, we can just forcefully
some "links" by resetting some IDs?

BTW, is it only conditional jumps that need to record this linked
register sets? Did we previously discuss why we don't need this for
any other operation?

2023-12-15 16:23:00

by Eduard Zingerman

[permalink] [raw]

Subject: Re: [Bug Report] bpf: incorrectly pruning runtime execution path

On Thu, 2023-12-14 at 21:20 -0800, Andrii Nakryiko wrote:
[...]
> > > can we detect that any register link is broken and force checkpoint here?
> >
> > Should be possible. I'll try this in the morning and check veristat results.

{still working on this}

> > By the way, I added some stats collection for find_equal_scalars() and see
> > the following results when run on ./test_progs:
> > - maximal number of registers with same id per call: 3
> > - average number of registers with same id per call: 1.4
>
> What if we keep 8 extra bytes in jump/instruction history and encode
> up to 8 linked registers/slots:
>
> 1. 1 bit to mark whether it's a src_reg set, or dst_reg set
> 2. 1 bit to mark whether it's a stack slot or register
> 3. 6 bits (0..63 values) to record register or slot number
>
> If we ever need more than 8 linked registers, we can just forcefully
> some "links" by resetting some IDs?

That should work as well.
Probably don't need src/dst bit, as backtracker marks both as precise
when processing conditional jump.

You mean "just forcefully [breaking] some "links" by resetting ...", right?

> BTW, is it only conditional jumps that need to record this linked
> register sets? Did we previously discuss why we don't need this for
> any other operation?

Don't think that we discussed it.
Here is my reasoning: the range transfer happens at find_equal_scalars()
which is called only from check_cond_jmp_op().
I think there are no other effects IDs have for scalar values.
Thus, covering conditional jumps seems sufficient.

2023-12-15 17:01:50

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [Bug Report] bpf: incorrectly pruning runtime execution path

On Fri, Dec 15, 2023 at 8:22 AM Eduard Zingerman <[email protected]> wrote:
>
> On Thu, 2023-12-14 at 21:20 -0800, Andrii Nakryiko wrote:
> [...]
> > > > can we detect that any register link is broken and force checkpoint here?
> > >
> > > Should be possible. I'll try this in the morning and check veristat results.
>
> {still working on this}
>
> > > By the way, I added some stats collection for find_equal_scalars() and see
> > > the following results when run on ./test_progs:
> > > - maximal number of registers with same id per call: 3
> > > - average number of registers with same id per call: 1.4
> >
> > What if we keep 8 extra bytes in jump/instruction history and encode
> > up to 8 linked registers/slots:
> >
> > 1. 1 bit to mark whether it's a src_reg set, or dst_reg set
> > 2. 1 bit to mark whether it's a stack slot or register
> > 3. 6 bits (0..63 values) to record register or slot number
> >
> > If we ever need more than 8 linked registers, we can just forcefully
> > some "links" by resetting some IDs?
>
> That should work as well.
> Probably don't need src/dst bit, as backtracker marks both as precise
> when processing conditional jump.

yeah, probably

>
> You mean "just forcefully [breaking] some "links" by resetting ...", right?

yeah, breaking, sorry, inattentive brain :)

>
> > BTW, is it only conditional jumps that need to record this linked
> > register sets? Did we previously discuss why we don't need this for
> > any other operation?
>
> Don't think that we discussed it.
> Here is my reasoning: the range transfer happens at find_equal_scalars()
> which is called only from check_cond_jmp_op().
> I think there are no other effects IDs have for scalar values.
> Thus, covering conditional jumps seems sufficient.
>
>

yep, makes sense, any other operation is breaking the link

2023-12-15 20:56:01

by Eduard Zingerman

[permalink] [raw]

Subject: Re: [Bug Report] bpf: incorrectly pruning runtime execution path

On Thu, 2023-12-14 at 18:16 -0800, Alexei Starovoitov wrote:
[...]
> > E.g. for the test-case at hand:
> >
> > 0: (85) call bpf_get_prandom_u32#7 ; R0=scalar()
> > 1: (bf) r7 = r0 ; R0=scalar(id=1) R7_w=scalar(id=1)
> > 2: (bf) r8 = r0 ; R0=scalar(id=1) R8_w=scalar(id=1)
> > 3: (85) call bpf_get_prandom_u32#7 ; R0=scalar()
> > --- checkpoint #1 r7.id = 1, r8.id = 1 ---
> > 4: (25) if r0 > 0x1 goto pc+0 ; R0=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,...)
> > --- checkpoint #2 r7.id = 1, r8.id = 1 ---
> > 5: (3d) if r8 >= r0 goto pc+3 ; R0=1 R8=0 | record r8.id=1 in jump history
> > 6: (0f) r8 += r8 ; R8=0
>
> can we detect that any register link is broken and force checkpoint here?

I implemented this and pushed to github [0] for the moment.
The minimized test case and original reproducer are both passing.
About 15 self-tests are failing, I looked through each once and
failures seem to be caused by changes in the log.
I might have missed something, though.

Veristat results are "not great, not terrible", the full table
comparing this patch to master is at [1], the summary is as follows:
- average increase in number of processed instructions: 3%
- max increase in number of processed instructions: 81%
- average increase in number of processed states : 76%
- max increase in number of processed states : 482%

The hack with adding BPF_ID_TRANSFERED_RANGE bit to scalar id, if that
id was used for range transfer is ugly but necessary.
W/o it number of processed states might increase 10x times for some selftests.

I will now implement 8-byte jump history modification suggested by
Andrii in order to compare patches.

[0] https://github.com/eddyz87/bpf/tree/find-equal-scalars-and-precision-fix-new-states
[1] https://gist.github.com/eddyz87/73e3c6df31a80ad8660ae079e16ae365