2021-02-18 19:06:58

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH 0/2] x86/retpoline: Retpoline on a diet

Hi!

The first patch rearranges the implementation and consolidates unused bytes.
The second patch uses INT3 over LFENCE to shrink the retpoline to 15 bytes, by
which 4 can live in a cacheline.

Patches have been boot tested on my IVB.


2021-02-18 19:32:09

by Borislav Petkov

[permalink] [raw]
Subject: [RFC PATCH] x86/retpolines: Prevent speculation after RET

On Thu, Feb 18, 2021 at 05:59:38PM +0100, Peter Zijlstra wrote:
> Hi!
>
> The first patch rearranges the implementation and consolidates unused bytes.
> The second patch uses INT3 over LFENCE to shrink the retpoline to 15 bytes, by
> which 4 can live in a cacheline.
>
> Patches have been boot tested on my IVB.

And here's the patch that prompted all this:

---
From: Borislav Petkov <[email protected]>
Date: Thu, 18 Feb 2021 17:21:24 +0100

Both vendors speculate after a near RET in some way:

Intel:

"Unlike near indirect CALL and near indirect JMP, the processor will not
speculatively execute the next sequential instruction after a near RET
unless that instruction is also the target of a jump or is a target in a
branch predictor."

AMD:

"Some AMD processors when they first encounter a branch do not stall
dispatch and use the branches dynamic execution to determine the target.
Therefore, they will speculatively dispatch the sequential instructions
after the branch. This happens for near return instructions where it is
not clear what code may exist sequentially after the return instruction.
This behavior also occurs with jmp/call instructions with indirect
targets. Software should place a LFENCE or another dispatch serializing
instruction after the return or jmp/call indirect instruction to prevent
this sequential speculation."

The AMD side doesn't really need the LFENCE because it'll do LFENCE;
JMP/CALL <target> due to X86_FEATURE_RETPOLINE_AMD, before it reaches
the RET.

Objtool bits provided by Peter Zijlstra (Intel) <[email protected]>

Reported-by: Rudolf Marek <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Co-developed-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/lib/retpoline.S | 1 +
tools/objtool/arch/x86/decode.c | 5 +++++
tools/objtool/check.c | 6 ++++++
tools/objtool/include/objtool/arch.h | 1 +
4 files changed, 13 insertions(+)

diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 397d408e8244..3f8652aaf84d 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -31,6 +31,7 @@ SYM_FUNC_START_NOALIGN(__x86_retpoline_\reg)
mov %\reg, (%_ASM_SP)
UNWIND_HINT_FUNC
ret
+ lfence
SYM_FUNC_END(__x86_retpoline_\reg)

.endm
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 549813cff8ab..84a5e3cfa72d 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -464,6 +464,11 @@ int arch_decode_instruction(const struct elf *elf, const struct section *sec,
op->src.type = OP_SRC_POP;
op->dest.type = OP_DEST_MEM;
}
+
+ } else if (op2 == 0xae && modrm == 0xe8) {
+
+ /* lfence */
+ *type = INSN_NOSPEC;
}

break;
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 331a763d8775..9ab84f0c4032 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2868,6 +2868,12 @@ static bool ignore_unreachable_insn(struct objtool_file *file, struct instructio
insn->jump_dest && insn->jump_dest->type == INSN_BUG)))
return true;

+ /*
+ * We allow speculation traps after RETURN instructions.
+ */
+ if (prev_insn->type == INSN_RETURN && insn->type == INSN_NOSPEC)
+ return true;
+
/*
* Check if this (or a subsequent) instruction is related to
* CONFIG_UBSAN or CONFIG_KASAN.
diff --git a/tools/objtool/include/objtool/arch.h b/tools/objtool/include/objtool/arch.h
index 6ff0685f5cc5..faf0c0afd938 100644
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -26,6 +26,7 @@ enum insn_type {
INSN_CLAC,
INSN_STD,
INSN_CLD,
+ INSN_NOSPEC,
INSN_OTHER,
};

--
2.29.2

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-18 19:45:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/retpolines: Prevent speculation after RET

On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote:
> Both vendors speculate after a near RET in some way:
>
> Intel:
>
> "Unlike near indirect CALL and near indirect JMP, the processor will not
> speculatively execute the next sequential instruction after a near RET
> unless that instruction is also the target of a jump or is a target in a
> branch predictor."

Right, the way I read that means it's not a problem for us here.

> AMD:
>
> "Some AMD processors when they first encounter a branch do not stall
> dispatch and use the branches dynamic execution to determine the target.
> Therefore, they will speculatively dispatch the sequential instructions
> after the branch. This happens for near return instructions where it is
> not clear what code may exist sequentially after the return instruction.
> This behavior also occurs with jmp/call instructions with indirect
> targets. Software should place a LFENCE or another dispatch serializing
> instruction after the return or jmp/call indirect instruction to prevent
> this sequential speculation."
>
> The AMD side doesn't really need the LFENCE because it'll do LFENCE;
> JMP/CALL <target> due to X86_FEATURE_RETPOLINE_AMD, before it reaches
> the RET.

It never reached the RET.

So all in all, I really don't see why we'd need this.

Now, if AMD were to say something like: hey, that retpoline is pretty
awesome, we ought to use that instead of an uconditional LFENCE, then
sure, but as is, I don't think so.

2021-02-18 19:45:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/retpolines: Prevent speculation after RET

On Thu, Feb 18, 2021 at 08:02:31PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote:
> > Both vendors speculate after a near RET in some way:
> >
> > Intel:
> >
> > "Unlike near indirect CALL and near indirect JMP, the processor will not
> > speculatively execute the next sequential instruction after a near RET
> > unless that instruction is also the target of a jump or is a target in a
> > branch predictor."
>
> Right, the way I read that means it's not a problem for us here.

Look at that other thread: the instruction *after* the RET can be
speculatively executed if that instruction is the target of a jump or it
is in a branch predictor.

And yes, the text is confusing and no one from Intel has clarified
definitively yet what that text means exactly.

> Now, if AMD were to say something like: hey, that retpoline is pretty
> awesome, we ought to use that instead of an uconditional LFENCE, then
> sure, but as is, I don't think so.

AMD prefers the LFENCE instead of the ratpoline sequence.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-19 08:17:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/retpolines: Prevent speculation after RET

On Thu, Feb 18, 2021 at 08:11:38PM +0100, Borislav Petkov wrote:
> On Thu, Feb 18, 2021 at 08:02:31PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote:
> > > Both vendors speculate after a near RET in some way:
> > >
> > > Intel:
> > >
> > > "Unlike near indirect CALL and near indirect JMP, the processor will not
> > > speculatively execute the next sequential instruction after a near RET
> > > unless that instruction is also the target of a jump or is a target in a
> > > branch predictor."
> >
> > Right, the way I read that means it's not a problem for us here.
>
> Look at that other thread: the instruction *after* the RET can be
> speculatively executed if that instruction is the target of a jump or it
> is in a branch predictor.

Right, but that has nothing to do with the RET instruction itself. You
can speculatively execute any random instruction by training the BTB,
which is I suppose the entire point of things :-)

So the way I read it is that: RET does not 'leak' speculation, but if
you target the instruction after RET with any other speculation crud,
ofcourse you can get it to 'run'.

And until further clarified, I'll stick with that :-)

2021-02-19 09:33:56

by David Laight

[permalink] [raw]
Subject: RE: [RFC PATCH] x86/retpolines: Prevent speculation after RET

From: Peter Zijlstra
> Sent: 18 February 2021 19:03
>
> On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote:
> > Both vendors speculate after a near RET in some way:
> >
> > Intel:
> >
> > "Unlike near indirect CALL and near indirect JMP, the processor will not
> > speculatively execute the next sequential instruction after a near RET
> > unless that instruction is also the target of a jump or is a target in a
> > branch predictor."
>
> Right, the way I read that means it's not a problem for us here.

They got a lawyer to write that sentence :-)
What on earth is that 'unless' clause about?
Either:
1) The instructions might be speculatively executed for some entirely
different reason.
or:
2) The cpu might use the BTB to determine the instruction that follows the
RET - and so might happen to execute the instruction that follows it.

I can't manage to read it in any way that suggests that the cpu will
ignore the fact it is a RET and start executing the instruction that
follows.
(Unlike some ARM cpus which do seem to do that.)

> > AMD:
> >
> > "Some AMD processors when they first encounter a branch do not stall
> > dispatch and use the branches dynamic execution to determine the target.
> > Therefore, they will speculatively dispatch the sequential instructions
> > after the branch. This happens for near return instructions where it is
> > not clear what code may exist sequentially after the return instruction.

Sounds like the conditional branch prediction (and the BTB?) get used for RET
instructions when the 'return address stack' is invalid.

> > This behavior also occurs with jmp/call instructions with indirect
> > targets. Software should place a LFENCE or another dispatch serializing
> > instruction after the return or jmp/call indirect instruction to prevent
> > this sequential speculation."
> >
> > The AMD side doesn't really need the LFENCE because it'll do LFENCE;
> > JMP/CALL <target> due to X86_FEATURE_RETPOLINE_AMD, before it reaches
> > the RET.
>
> It never reached the RET.
>
> So all in all, I really don't see why we'd need this.

I read that as implying that some AMD cpu can sometimes treat the RET as
a conditional branch and so speculatively assume it isn't taken.
So you need an LFENCE (or ???) following the RET at the end of every function.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)