2022-04-21 16:25:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

On Tue, Apr 19, 2022 at 7:03 PM Alexei Starovoitov
<[email protected]> wrote:
>
> Here is the quote from Song's cover letter for bpf_prog_pack series:

I care about performance as much as the next person, but I care about
correctness too.

That large-page code was a disaster, and was buggy and broken.

And even with those four patches, it's still broken.

End result: there's no way that thigh gets re-enabled without the
correctness being in place.

At a minimum, to re-enable it, it needs (a) that zeroing and (b)
actual numbers on real loads. (not some artificial benchmark).

Because without (a) there's no way in hell I'll enable it.

And without (b), "performance" isn't actually an argument.

Linus


2022-04-22 17:40:13

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

Hi Linus,

> On Apr 19, 2022, at 7:18 PM, Linus Torvalds <[email protected]> wrote:
>
> On Tue, Apr 19, 2022 at 7:03 PM Alexei Starovoitov
> <[email protected]> wrote:
>>
>> Here is the quote from Song's cover letter for bpf_prog_pack series:
>
> I care about performance as much as the next person, but I care about
> correctness too.
>
> That large-page code was a disaster, and was buggy and broken.
>
> And even with those four patches, it's still broken.
>
> End result: there's no way that thigh gets re-enabled without the
> correctness being in place.
>
> At a minimum, to re-enable it, it needs (a) that zeroing and (b)
> actual numbers on real loads. (not some artificial benchmark).
>
> Because without (a) there's no way in hell I'll enable it.
>
> And without (b), "performance" isn't actually an argument.

I will send patch to do (a) later this week.

For (b), we have seen direct map fragmentation causing visible
performance drop for our major services. This is the shadow
production benchmark, so it is not possible to run it out of
our data centers. Tracing showed that BPF program was the top
trigger of these direct map splits.

Thanks,
Song

2022-04-22 18:54:02

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

On Wed, Apr 20, 2022 at 02:42:37PM +0000, Song Liu wrote:
> For (b), we have seen direct map fragmentation causing visible
> performance drop for our major services. This is the shadow
> production benchmark, so it is not possible to run it out of
> our data centers. Tracing showed that BPF program was the top
> trigger of these direct map splits.

It's often not easy to reproduce issues like these, but I've
ran into that before for other Proof of Concept issues before
and the solution has been a Linux selftest. For instance a
"multithreaded" bombing for kmod can be triggered with
lib/test_kmod.c and tools/testing/selftests/kmod/kmod.sh

Would desinging a selftest to abuse eBPF JIT be a possible
way to reproduce the issue?

Luis