2021-02-18 19:09:22

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH 2/2] x86/retpoline: Compress retpolines

By using int3 as a speculation fence instead of lfence, we can shrink
the longest alternative to just 15 bytes:

0: e8 05 00 00 00 callq a <.altinstr_replacement+0xa>
5: f3 90 pause
7: cc int3
8: eb fb jmp 5 <.altinstr_replacement+0x5>
a: 48 89 04 24 mov %rax,(%rsp)
e: c3 retq

This means we can change the alignment from 32 to 16 bytes and get 4
retpolines per cacheline, $I win.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/lib/retpoline.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -16,7 +16,7 @@
.Lspec_trap_\@:
UNWIND_HINT_EMPTY
pause
- lfence
+ int3
jmp .Lspec_trap_\@
.Ldo_rop_\@:
mov %\reg, (%_ASM_SP)
@@ -27,7 +27,7 @@
.macro THUNK reg
.section .text.__x86.indirect_thunk

- .align 32
+ .align 16
SYM_FUNC_START(__x86_indirect_thunk_\reg)

ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \



2021-02-19 07:16:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] x86/retpoline: Compress retpolines

On Thu, Feb 18, 2021 at 05:59:40PM +0100, Peter Zijlstra wrote:
> By using int3 as a speculation fence instead of lfence, we can shrink
> the longest alternative to just 15 bytes:
>
> 0: e8 05 00 00 00 callq a <.altinstr_replacement+0xa>
> 5: f3 90 pause
> 7: cc int3
> 8: eb fb jmp 5 <.altinstr_replacement+0x5>
> a: 48 89 04 24 mov %rax,(%rsp)
> e: c3 retq
>
> This means we can change the alignment from 32 to 16 bytes and get 4
> retpolines per cacheline, $I win.

You mean I$ :)

In any case, for both:

Reviewed-by: Borislav Petkov <[email protected]>

and it looks real nice here, the size:

readelf -s vmlinux | grep __x86_indirect
78966: ffffffff81c023e0 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
81653: ffffffff81c02390 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
82338: ffffffff81c02430 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
82955: ffffffff81c02380 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
85057: ffffffff81c023f0 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
89996: ffffffff81c023a0 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
91094: ffffffff81c02400 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
91278: ffffffff81c023b0 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
92015: ffffffff81c02360 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
92722: ffffffff81c023c0 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
97062: ffffffff81c02410 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
98687: ffffffff81c023d0 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
99076: ffffffff81c02350 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
99500: ffffffff81c02370 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]
100579: ffffffff81c02420 15 FUNC GLOBAL DEFAULT 1 __x86_indirect_t[...]

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-22 11:32:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/2] x86/retpoline: Compress retpolines

On Fri, Feb 19, 2021 at 08:14:39AM +0100, Borislav Petkov wrote:
> On Thu, Feb 18, 2021 at 05:59:40PM +0100, Peter Zijlstra wrote:
> > By using int3 as a speculation fence instead of lfence, we can shrink
> > the longest alternative to just 15 bytes:
> >
> > 0: e8 05 00 00 00 callq a <.altinstr_replacement+0xa>
> > 5: f3 90 pause
> > 7: cc int3
> > 8: eb fb jmp 5 <.altinstr_replacement+0x5>
> > a: 48 89 04 24 mov %rax,(%rsp)
> > e: c3 retq
> >
> > This means we can change the alignment from 32 to 16 bytes and get 4
> > retpolines per cacheline, $I win.
>
> You mean I$ :)

Typin' so hard.

> In any case, for both:
>
> Reviewed-by: Borislav Petkov <[email protected]>

Thanks, except I've been told there is a performance implication. But
since all that happened in sekrit, none of that is recorded :/

I was hoping for some people (Tony, Paul) to respond with more data.
Also, Andrew said that if we ditch the lfence we could also ditch the
pause.

So people, please speak up, and if possible share any data you still
might have from back when retpolines were developed such that we can
have it on record.