2022-04-22 21:04:43

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: invalidate unused part of bpf_prog_pack

Hi Linus,

> On Apr 21, 2022, at 2:28 PM, Linus Torvalds <[email protected]> wrote:
>
> On Thu, Apr 21, 2022 at 12:41 PM Song Liu <[email protected]> wrote:
>>
>> The extra logic I had in the original patch was to erase the memory
>> when a BPF program is freed. In this case, the memory will be
>> returned to the bpf_prog_pack, and stays as RO+X. Actually, I
>> am not quite sure whether we need this logic. If not, we only need
>> the much simpler version.
>
> Oh, I think it would be good to do at free time too.
>
> I just would want that to use the same function we already have for
> the allocation-time thing, instead of introducing completely new
> infrastructure. That was what looked very odd to me.
>
> Now, the _smallest_ patch would likely be to just save away that
> 'bpf_fill_ill_insns' function pointer in the 'struct bpf_prog_pack'
> thing.

[...]
>
> Why not just agree on a name - I suggest 'bpf_jit_fill_hole()' - and
> just get rid of that stupid 'bpf_jit_fill_hole_t' type name that only
> exists because of this thing?

Last night, I had a version which is about 90% same as this idea.

However, we cannot really use the same function at free time. The
huge page is RO+X at free time, but we are only zeroing out a chunk
of it. So regular memset/memcpy won’t work. Instead, we will need
something like bpf_arch_text_copy().

Thanks,
Song


2022-04-22 22:08:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH bpf] bpf: invalidate unused part of bpf_prog_pack

On Thu, Apr 21, 2022 at 2:53 PM Song Liu <[email protected]> wrote:
>
> However, we cannot really use the same function at free time. The
> huge page is RO+X at free time, but we are only zeroing out a chunk
> of it. So regular memset/memcpy won’t work. Instead, we will need
> something like bpf_arch_text_copy().

I actually think bpf_arch_text_copy() is another horribly badly done thing.

It seems only implemented on x86 (I'm not sure how anything else is
supposed to work, I didn't go look), and there it is horribly badly
done, using __text_poke() that does all these magical things just to
make it atomic wrt concurrent code execution.

None of which is *AT*ALL* relevant for this case, since concurrent
code execution simply isn't a thing (and if it were, you would already
have lost).

And if that wasn't pointless enough, it does all that magic "map the
page writably at a different virtual address using poking_addr in
poking_mm" and a different address space entirely.

All of that is required for REAL KERNEL CODE.

But the thing is, for bpf_prog_pack, all of that is just completely
pointless and stupid complexity.

We already *have* the other non-executable address that is writable:
it's the actual pages that got vmalloc'ed. Just use vmalloc_to_page()
and it's RIGHT THERE.

At which point you just use the same bpf_jit_fill_hole() function, and
you're done.

In other words, all of this seems excessively stupidly done, for no
good reason. It's only making it much too complicated, and just not
doing the right thing at all.

Linus