2021-10-26 15:50:09

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

Hi,

These patches rewrite the way retpolines are rewritten. Currently objtool emits
alternative entries for most retpoline calls. However trying to extend that led
to trouble (ELF files are horrid).

Therefore completely overhaul this and have objtool emit a .retpoline_sites
section that lists all compiler generated retpoline thunk calls. Then the
kernel can do with them as it pleases.

Notably it will:

- rewrite them to indirect instructions for !RETPOLINE
- rewrite them to lfence; indirect; for RETPOLINE_AMD,
where size allows (boo clang!)

Specifically, the !RETPOLINE case can now also deal with the clang-special
conditional-indirect-tail-call:

Jcc __x86_indirect_thunk_\reg.

Finally, also update the x86 BPF jit to catch up to recent times and do these
same things.

All this should help improve performance by removing an indirection.

Patches can (soon) be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core

Changes since v2:

- rewrite the __x86_indirect_thunk_array[] stuff again
- rewrite the retpoline,amd rewrite logic, it now also supports
rewriting the Jcc case, if the original instruction is long enough, but
more importantly, it's simpler code.
- bpf label simplification patch
- random assorted cleanups
- actually managed to get bpf selftests working

---
arch/um/kernel/um_arch.c | 4 +
arch/x86/include/asm/GEN-for-each-reg.h | 14 ++-
arch/x86/include/asm/alternative.h | 1 +
arch/x86/include/asm/asm-prototypes.h | 18 ---
arch/x86/include/asm/nospec-branch.h | 72 ++---------
arch/x86/kernel/alternative.c | 189 ++++++++++++++++++++++++++++-
arch/x86/kernel/cpu/bugs.c | 7 --
arch/x86/kernel/module.c | 9 +-
arch/x86/kernel/vmlinux.lds.S | 14 +++
arch/x86/lib/retpoline.S | 56 ++-------
arch/x86/net/bpf_jit_comp.c | 160 +++++++++---------------
arch/x86/net/bpf_jit_comp32.c | 22 +++-
tools/objtool/arch/x86/decode.c | 120 ------------------
tools/objtool/check.c | 208 ++++++++++++++++++++++----------
tools/objtool/elf.c | 84 -------------
tools/objtool/include/objtool/check.h | 1 -
tools/objtool/include/objtool/elf.h | 6 +-
tools/objtool/special.c | 8 --
18 files changed, 472 insertions(+), 521 deletions(-)


2021-10-27 07:07:17

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, Oct 26, 2021 at 5:05 AM Peter Zijlstra <[email protected]> wrote:
>
> Hi,
>
> These patches rewrite the way retpolines are rewritten. Currently objtool emits
> alternative entries for most retpoline calls. However trying to extend that led
> to trouble (ELF files are horrid).
>
> Therefore completely overhaul this and have objtool emit a .retpoline_sites
> section that lists all compiler generated retpoline thunk calls. Then the
> kernel can do with them as it pleases.
>
> Notably it will:
>
> - rewrite them to indirect instructions for !RETPOLINE
> - rewrite them to lfence; indirect; for RETPOLINE_AMD,
> where size allows (boo clang!)
>
> Specifically, the !RETPOLINE case can now also deal with the clang-special
> conditional-indirect-tail-call:
>
> Jcc __x86_indirect_thunk_\reg.
>
> Finally, also update the x86 BPF jit to catch up to recent times and do these
> same things.
>
> All this should help improve performance by removing an indirection.
>
> Patches can (soon) be found here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core
>
> Changes since v2:
>
> - rewrite the __x86_indirect_thunk_array[] stuff again
> - rewrite the retpoline,amd rewrite logic, it now also supports
> rewriting the Jcc case, if the original instruction is long enough, but
> more importantly, it's simpler code.
> - bpf label simplification patch
> - random assorted cleanups
> - actually managed to get bpf selftests working

Great.
The patchset didn't go through BPF CI though.
See
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

It's a merge conflict. The patchset failed to apply to both bpf and
bpf-next trees:

Cmd('git') failed due to: exit code(128)
cmdline: git am -3
stdout: 'Applying: objtool: Classify symbols
Patch failed at 0001 objtool: Classify symbols
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
stderr: 'error: sha1 information is lacking or useless
(tools/objtool/check.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch'

2021-10-27 07:08:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, Oct 26, 2021 at 11:26:57AM -0700, Alexei Starovoitov wrote:

> It's a merge conflict. The patchset failed to apply to both bpf and
> bpf-next trees:

Figures :/ I suspect it relies on tip/objtool/core at the very least and
possibly some of the x86 trees as well.

I can locally merge tip/master with bpf, but getting a CI to do that
might be tricky.

2021-10-27 10:15:11

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, Oct 26, 2021 at 11:45 AM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Oct 26, 2021 at 11:26:57AM -0700, Alexei Starovoitov wrote:
>
> > It's a merge conflict. The patchset failed to apply to both bpf and
> > bpf-next trees:
>
> Figures :/ I suspect it relies on tip/objtool/core at the very least and
> possibly some of the x86 trees as well.
>
> I can locally merge tip/master with bpf, but getting a CI to do that
> might be tricky.

We have an ability in CI to supply few additional patches on top bpf/bpf-next
trees, but that's usually done for the cases where we've merged a fix into
one tree, but it's needed in both while bpf->net->linus->net-next->bpf-next
circle is still pending.

Does tip/objtool/core dependency relevant for this set?
Can you rebase the current set on top of bpf-next and send it to the list
just to get CI to run it? We won't be merging it into bpf-next, of course.
I'm mainly interested in seeing all that additional tests passing that
we have in bpf-next.

2021-10-27 10:31:18

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, Oct 26, 2021 at 2:05 PM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Oct 26, 2021 at 01:00:04PM -0700, Alexei Starovoitov wrote:
> > On Tue, Oct 26, 2021 at 11:45 AM Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Tue, Oct 26, 2021 at 11:26:57AM -0700, Alexei Starovoitov wrote:
> > >
> > > > It's a merge conflict. The patchset failed to apply to both bpf and
> > > > bpf-next trees:
> > >
> > > Figures :/ I suspect it relies on tip/objtool/core at the very least and
> > > possibly some of the x86 trees as well.
> > >
> > > I can locally merge tip/master with bpf, but getting a CI to do that
> > > might be tricky.
> >
> > We have an ability in CI to supply few additional patches on top bpf/bpf-next
> > trees, but that's usually done for the cases where we've merged a fix into
> > one tree, but it's needed in both while bpf->net->linus->net-next->bpf-next
> > circle is still pending.
> >
> > Does tip/objtool/core dependency relevant for this set?
> > Can you rebase the current set on top of bpf-next and send it to the list
> > just to get CI to run it? We won't be merging it into bpf-next, of course.
> > I'm mainly interested in seeing all that additional tests passing that
> > we have in bpf-next.
>
> I should be able to rebase it just to that, let me try that in the am
> though, brain is fairly fried atm. Do you really want me to post it to
> the list, or is a git repo good enough?

Please post it. CI cannot pull it from the repo.

2021-10-27 10:31:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, Oct 26, 2021 at 01:00:04PM -0700, Alexei Starovoitov wrote:
> On Tue, Oct 26, 2021 at 11:45 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Tue, Oct 26, 2021 at 11:26:57AM -0700, Alexei Starovoitov wrote:
> >
> > > It's a merge conflict. The patchset failed to apply to both bpf and
> > > bpf-next trees:
> >
> > Figures :/ I suspect it relies on tip/objtool/core at the very least and
> > possibly some of the x86 trees as well.
> >
> > I can locally merge tip/master with bpf, but getting a CI to do that
> > might be tricky.
>
> We have an ability in CI to supply few additional patches on top bpf/bpf-next
> trees, but that's usually done for the cases where we've merged a fix into
> one tree, but it's needed in both while bpf->net->linus->net-next->bpf-next
> circle is still pending.
>
> Does tip/objtool/core dependency relevant for this set?
> Can you rebase the current set on top of bpf-next and send it to the list
> just to get CI to run it? We won't be merging it into bpf-next, of course.
> I'm mainly interested in seeing all that additional tests passing that
> we have in bpf-next.

I should be able to rebase it just to that, let me try that in the am
though, brain is fairly fried atm. Do you really want me to post it to
the list, or is a git repo good enough?

2021-10-27 21:20:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, Oct 26, 2021 at 02:05:55PM -0700, Alexei Starovoitov wrote:

> Please post it. CI cannot pull it from the repo.

Done:

https://lore.kernel.org/bpf/[email protected]/T/#t

2021-10-27 21:33:58

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Wed, Oct 27, 2021 at 2:02 AM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Oct 26, 2021 at 02:05:55PM -0700, Alexei Starovoitov wrote:
>
> > Please post it. CI cannot pull it from the repo.
>
> Done:
>
> https://lore.kernel.org/bpf/[email protected]/T/#t

Perfect. Thanks!
Looks like vger got into 'delete all users' mode.
Looks like my and many other emails were automatically unsubscribed
from many mailing lists.
bpf@vger, netdev@vger, linux-kernel@vger user count looks very low
compared to a few weeks ago.
I doubt all these users decided to unsubscribe themselves.

Anyway looks like BPF CI doesn't see the patchset yet for some reason
(we're debugging),
but I've tested your patchset manually the same way CI would do
and reviewed most of the patches.
Please add:
Acked-by: Alexei Starovoitov <[email protected]>
to bpf patches 16 and 17
and
Tested-by: Alexei Starovoitov <[email protected]>
for the whole set.

Thanks!

2021-10-28 05:19:25

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, Oct 26, 2021 at 02:01:32PM +0200, Peter Zijlstra wrote:
> Hi,
>
> These patches rewrite the way retpolines are rewritten. Currently objtool emits
> alternative entries for most retpoline calls. However trying to extend that led
> to trouble (ELF files are horrid).
>
> Therefore completely overhaul this and have objtool emit a .retpoline_sites
> section that lists all compiler generated retpoline thunk calls. Then the
> kernel can do with them as it pleases.
>
> Notably it will:
>
> - rewrite them to indirect instructions for !RETPOLINE
> - rewrite them to lfence; indirect; for RETPOLINE_AMD,
> where size allows (boo clang!)
>
> Specifically, the !RETPOLINE case can now also deal with the clang-special
> conditional-indirect-tail-call:
>
> Jcc __x86_indirect_thunk_\reg.
>
> Finally, also update the x86 BPF jit to catch up to recent times and do these
> same things.
>
> All this should help improve performance by removing an indirection.
>
> Patches can (soon) be found here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core
>
> Changes since v2:
>
> - rewrite the __x86_indirect_thunk_array[] stuff again
> - rewrite the retpoline,amd rewrite logic, it now also supports
> rewriting the Jcc case, if the original instruction is long enough, but
> more importantly, it's simpler code.
> - bpf label simplification patch
> - random assorted cleanups
> - actually managed to get bpf selftests working

Good stuff!

Acked-by: Josh Poimboeuf <[email protected]>

--
Josh

2021-10-28 17:47:31

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, Oct 26, 2021 at 02:01:32PM +0200, Peter Zijlstra wrote:
> Hi,
>
> These patches rewrite the way retpolines are rewritten. Currently objtool emits
> alternative entries for most retpoline calls. However trying to extend that led
> to trouble (ELF files are horrid).
>
> Therefore completely overhaul this and have objtool emit a .retpoline_sites
> section that lists all compiler generated retpoline thunk calls. Then the
> kernel can do with them as it pleases.
>
> Notably it will:
>
> - rewrite them to indirect instructions for !RETPOLINE
> - rewrite them to lfence; indirect; for RETPOLINE_AMD,
> where size allows (boo clang!)
>
> Specifically, the !RETPOLINE case can now also deal with the clang-special
> conditional-indirect-tail-call:
>
> Jcc __x86_indirect_thunk_\reg.
>
> Finally, also update the x86 BPF jit to catch up to recent times and do these
> same things.
>
> All this should help improve performance by removing an indirection.
>
> Patches can (soon) be found here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core
>
> Changes since v2:
>
> - rewrite the __x86_indirect_thunk_array[] stuff again
> - rewrite the retpoline,amd rewrite logic, it now also supports
> rewriting the Jcc case, if the original instruction is long enough, but
> more importantly, it's simpler code.
> - bpf label simplification patch
> - random assorted cleanups
> - actually managed to get bpf selftests working
>
> ---
> arch/um/kernel/um_arch.c | 4 +
> arch/x86/include/asm/GEN-for-each-reg.h | 14 ++-
> arch/x86/include/asm/alternative.h | 1 +
> arch/x86/include/asm/asm-prototypes.h | 18 ---
> arch/x86/include/asm/nospec-branch.h | 72 ++---------
> arch/x86/kernel/alternative.c | 189 ++++++++++++++++++++++++++++-
> arch/x86/kernel/cpu/bugs.c | 7 --
> arch/x86/kernel/module.c | 9 +-
> arch/x86/kernel/vmlinux.lds.S | 14 +++
> arch/x86/lib/retpoline.S | 56 ++-------
> arch/x86/net/bpf_jit_comp.c | 160 +++++++++---------------
> arch/x86/net/bpf_jit_comp32.c | 22 +++-
> tools/objtool/arch/x86/decode.c | 120 ------------------
> tools/objtool/check.c | 208 ++++++++++++++++++++++----------
> tools/objtool/elf.c | 84 -------------
> tools/objtool/include/objtool/check.h | 1 -
> tools/objtool/include/objtool/elf.h | 6 +-
> tools/objtool/special.c | 8 --
> 18 files changed, 472 insertions(+), 521 deletions(-)

Ok, this all looks real nice, thx!

Reviewed-by: Borislav Petkov <[email protected]>

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-10-29 12:45:52

by Miroslav Benes

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] x86: Rewrite the retpoline rewrite logic

On Tue, 26 Oct 2021, Peter Zijlstra wrote:

> Hi,
>
> These patches rewrite the way retpolines are rewritten. Currently objtool emits
> alternative entries for most retpoline calls. However trying to extend that led
> to trouble (ELF files are horrid).
>
> Therefore completely overhaul this and have objtool emit a .retpoline_sites
> section that lists all compiler generated retpoline thunk calls. Then the
> kernel can do with them as it pleases.
>
> Notably it will:
>
> - rewrite them to indirect instructions for !RETPOLINE
> - rewrite them to lfence; indirect; for RETPOLINE_AMD,
> where size allows (boo clang!)
>
> Specifically, the !RETPOLINE case can now also deal with the clang-special
> conditional-indirect-tail-call:
>
> Jcc __x86_indirect_thunk_\reg.
>
> Finally, also update the x86 BPF jit to catch up to recent times and do these
> same things.
>
> All this should help improve performance by removing an indirection.
>
> Patches can (soon) be found here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core
>
> Changes since v2:
>
> - rewrite the __x86_indirect_thunk_array[] stuff again
> - rewrite the retpoline,amd rewrite logic, it now also supports
> rewriting the Jcc case, if the original instruction is long enough, but
> more importantly, it's simpler code.
> - bpf label simplification patch
> - random assorted cleanups
> - actually managed to get bpf selftests working

It is already in tip, but FWIW the objtool changes look good to me

Reviewed-by: Miroslav Benes <[email protected]>

M