[This is an alternative to David's earlier patch to only
handle context switch. It handles more cases.]
Skylake needs some additional protections over plain RETPOLINE
for Spectre_v2.
The CPU can fall back to the potentially poisioned indirect branch
predictor when the return buffer underflows.
This patch kit extends RETPOLINE to guard against many (but not
all) of such cases by filling the return buffer.
- Context switch when switching from shallow to deeper call chain
- Idle which clears the return buffer
- Interrupts which cause deep call chains
This is done with a new SPECTRE_V2 defense mode and feature flag.
The mitigations are only enabled on Skylake, patched out
on other CPUs.
Git tree at
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git spec/skl-rsb-3
Against current tip/x86/pti
v1: Initial post, but containing some patches posted earlier, but
reworked.
From: Andi Kleen <[email protected]>
CPUs have return buffers which store the return address for
RET to predict function returns. Some CPUs (Skylake, some Broadwells)
can fall back to indirect branch prediction on return buffer underflow.
With retpoline we want to avoid uncontrolled indirect branches,
which could be poisoned by ring 3, so we need to avoid uncontrolled
return buffer underflows in the kernel.
This can happen when we're context switching from a shallower to a
deeper kernel stack. The deeper kernel stack would eventually underflow
the return buffer, which again would fall back to the indirect branch predictor.
The other thread could be running a system call trigger
by an attacker too, so the context switch would help the attacked
thread to fall back to an uncontrolled indirect branch,
which then would use the values passed in by the attacker.
To guard against this fill the return buffer with controlled
content during context switch. This prevents any underflows.
This is only enabled on Skylake.
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/entry/entry_32.S | 14 ++++++++++++++
arch/x86/entry/entry_64.S | 14 ++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a1f28a54f23a..bbecb7c2f6cb 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -250,6 +250,20 @@ ENTRY(__switch_to_asm)
popl %ebx
popl %ebp
+ /*
+ * When we switch from a shallower to a deeper call stack
+ * the call stack will underflow in the kernel in the next task.
+ * This could cause the CPU to fall back to indirect branch
+ * prediction, which may be poisoned.
+ *
+ * To guard against that always fill the return stack with
+ * known values.
+ *
+ * We do this in assembler because it needs to be before
+ * any calls on the new stack, and this can be difficult to
+ * ensure in a complex C function like __switch_to.
+ */
+ FILL_RETURN_BUFFER %ecx, RSB_FILL_LOOPS, X86_FEATURE_RETURN_UNDERFLOW
jmp __switch_to
END(__switch_to_asm)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 59874bc1aed2..3caac129cd07 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -495,6 +495,20 @@ ENTRY(__switch_to_asm)
popq %rbx
popq %rbp
+ /*
+ * When we switch from a shallower to a deeper call stack
+ * the call stack will underflow in the kernel in the next task.
+ * This could cause the CPU to fall back to indirect branch
+ * prediction, which may be poisoned.
+ *
+ * To guard against that always fill the return stack with
+ * known values.
+ *
+ * We do this in assembler because it needs to be before
+ * any calls on the new stack, and this can be difficult to
+ * ensure in a complex C function like __switch_to.
+ */
+ FILL_RETURN_BUFFER %r8, RSB_FILL_LOOPS, X86_FEATURE_RETURN_UNDERFLOW
jmp __switch_to
END(__switch_to_asm)
--
2.14.3
From: Andi Kleen <[email protected]>
When we go into deeper idle states the return buffer could be cleared
in MWAIT, but then another thread which wakes up earlier might
be poisoning the indirect branch predictor. Then when the return
buffer underflows there might an uncontrolled indirect branch.
To guard against this always fill the return buffer when exiting idle.
This is only enabled on Skylake.
v2:
Switch to using inline fill_return_buffer macro
Also handle mwait_idle
Port to new fill_return_buffer infrastructure
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/mwait.h | 10 +++++++++-
arch/x86/kernel/process.c | 1 +
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 39a2fb29378a..632b6b39fe01 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -6,6 +6,7 @@
#include <linux/sched/idle.h>
#include <asm/cpufeature.h>
+#include <asm/nospec-branch.h>
#define MWAIT_SUBSTATE_MASK 0xf
#define MWAIT_CSTATE_MASK 0xf
@@ -107,8 +108,15 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
}
__monitor((void *)¤t_thread_info()->flags, 0, 0);
- if (!need_resched())
+ if (!need_resched()) {
__mwait(eax, ecx);
+ /*
+ * idle could have cleared the return buffer,
+ * so fill it to prevent uncontrolled
+ * speculation.
+ */
+ fill_return_buffer();
+ }
}
current_clr_polling();
}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3cb2486c47e4..9a7c1bb0e001 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -466,6 +466,7 @@ static __cpuidle void mwait_idle(void)
__sti_mwait(0, 0);
else
local_irq_enable();
+ fill_return_buffer();
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
} else {
local_irq_enable();
--
2.14.3
From: Andi Kleen <[email protected]>
Interrupts can have rather deep call chains on top of the original
call chain. Fill the return buffer on Skylake when returning from
an interrupt to the kernel, to avoid return buffer underflows
later.
This only needs to be done when returning to the kernel,
so interrupts interrupting user space are not impacted.
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/entry/entry_32.S | 16 +++++++++++++---
arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++
2 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index bbecb7c2f6cb..a58b0ae7121c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -65,7 +65,6 @@
# define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
#else
# define preempt_stop(clobbers)
-# define resume_kernel restore_all
#endif
.macro TRACE_IRQS_IRET
@@ -349,8 +348,17 @@ ENTRY(resume_userspace)
jmp restore_all
END(ret_from_exception)
-#ifdef CONFIG_PREEMPT
ENTRY(resume_kernel)
+ /*
+ * Interrupts/faults could cause the return buffer of the CPU
+ * to overflow, which would lead to a underflow later,
+ * which may lead to a uncontrolled indirect branch.
+ * Fill the return buffer when returning to the kernel.
+ */
+ FILL_RETURN_BUFFER %eax, RSB_FILL_LOOPS, X86_FEATURE_RETURN_UNDERFLOW
+5:
+
+#ifdef CONFIG_PREEMPT
DISABLE_INTERRUPTS(CLBR_ANY)
.Lneed_resched:
cmpl $0, PER_CPU_VAR(__preempt_count)
@@ -359,8 +367,10 @@ ENTRY(resume_kernel)
jz restore_all
call preempt_schedule_irq
jmp .Lneed_resched
-END(resume_kernel)
+#else
+ jmp restore_all
#endif
+END(resume_kernel)
GLOBAL(__begin_SYSENTER_singlestep_region)
/*
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3caac129cd07..36ee97fac6af 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -792,6 +792,15 @@ retint_kernel:
TRACE_IRQS_IRETQ
GLOBAL(restore_regs_and_return_to_kernel)
+ /*
+ * Interrupts/faults could cause the return buffer of the CPU
+ * to overflow, which would lead to a underflow later,
+ * which may lead to a uncontrolled indirect branch.
+ * Fill the return buffer when returning to the kernel.
+ */
+ FILL_RETURN_BUFFER %rax, RSB_FILL_LOOPS, X86_FEATURE_RETURN_UNDERFLOW
+4:
+
#ifdef CONFIG_DEBUG_ENTRY
/* Assert that pt_regs indicates kernel mode. */
testb $3, CS(%rsp)
@@ -1660,6 +1669,10 @@ nested_nmi:
nested_nmi_out:
popq %rdx
+ /*
+ * No need to clear return buffer here because the outter NMI will do it,
+ * and we assume two NMIs will not overflow the return buffer.
+ */
/* We are returning to kernel mode, so this cannot result in a fault. */
iretq
@@ -1757,6 +1770,15 @@ end_repeat_nmi:
nmi_swapgs:
SWAPGS_UNSAFE_STACK
nmi_restore:
+ /*
+ * NMI could cause the return buffer of the CPU
+ * to overflow, which would lead to a underflow later,
+ * which may lead to a uncontrolled indirect branch.
+ * Fill the return buffer when returning to the kernel.
+ */
+
+ FILL_RETURN_BUFFER %rax, RSB_FILL_LOOPS, X86_FEATURE_RETURN_UNDERFLOW
+5:
POP_EXTRA_REGS
POP_C_REGS
--
2.14.3
From: Andi Kleen <[email protected]>
On Skylake we want additional protections against spectre_v2
over the normal RETPOLINE against underflowing return buffers.
On return buffer underflow the CPU could fall back to
the poisoned indirect branch predictor.
This is not needed on CPUs before Skylake.
Add a new RETPOLINE_UNDERFLOW that enables the additional
protections.
This is RETPOLINE_GENERIC, but also adds additional mitigations
for return buffer overflow/underflow. It enables a new FEATURE
bit enabling the underflow protection mode.
Automatically select this mode on Skylake
Add a new fill_return_buffer() C function that depends on this
mode. It will be used in follow-on patches.
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/nospec-branch.h | 25 +++++++++++++++++++++++++
arch/x86/kernel/cpu/bugs.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 56 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f275447862f4..80610bcbce12 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
#define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */
#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
+#define X86_FEATURE_RETURN_UNDERFLOW ( 7*32+19) /* Avoid return stack underflows */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 402a11c803c3..780999a45b8c 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -186,6 +186,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_RETPOLINE_MINIMAL,
SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
SPECTRE_V2_RETPOLINE_GENERIC,
+ SPECTRE_V2_RETPOLINE_UNDERFLOW,
SPECTRE_V2_RETPOLINE_AMD,
SPECTRE_V2_IBRS,
};
@@ -210,5 +211,29 @@ static inline void vmexit_fill_RSB(void)
: "r" (loops) : "memory" );
#endif
}
+
+/*
+ * Fill the return buffer to avoid return buffer overflow.
+ *
+ * This is different from the one above because it is controlled
+ * by a different feature bit. It should be used for any fixes
+ * for call chains deeper than 16, or the context switch.
+ */
+static inline void fill_return_buffer(void)
+{
+#ifdef CONFIG_RETPOLINE
+ unsigned long loops = RSB_CLEAR_LOOPS / 2;
+
+ asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE
+ ALTERNATIVE("jmp 910f",
+ __stringify(__FILL_RETURN_BUFFER(%0, RSB_CLEAR_LOOPS, %1)),
+ X86_FEATURE_RETURN_UNDERFLOW)
+ "910:"
+ : "=&r" (loops), ASM_CALL_CONSTRAINT
+ : "r" (loops) : "memory" );
+#endif
+}
+
+
#endif /* __ASSEMBLY__ */
#endif /* __NOSPEC_BRANCH_H__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index e4dc26185aa7..75addb8ef4a5 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -23,6 +23,7 @@
#include <asm/alternative.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/intel-family.h>
static void __init spectre_v2_select_mitigation(void);
@@ -77,6 +78,7 @@ enum spectre_v2_mitigation_cmd {
SPECTRE_V2_CMD_FORCE,
SPECTRE_V2_CMD_RETPOLINE,
SPECTRE_V2_CMD_RETPOLINE_GENERIC,
+ SPECTRE_V2_CMD_RETPOLINE_UNDERFLOW,
SPECTRE_V2_CMD_RETPOLINE_AMD,
};
@@ -86,6 +88,7 @@ static const char *spectre_v2_strings[] = {
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline",
[SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline",
+ [SPECTRE_V2_RETPOLINE_UNDERFLOW] = "Mitigation: Full retpoline with underflow protection",
};
#undef pr_fmt
@@ -117,6 +120,22 @@ static inline bool match_option(const char *arg, int arglen, const char *opt)
return len == arglen && !strncmp(arg, opt, len);
}
+static bool cpu_needs_underflow_protection(void)
+{
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+ boot_cpu_data.x86 != 6)
+ return false;
+ switch (boot_cpu_data.x86_model) {
+ case INTEL_FAM6_SKYLAKE_MOBILE:
+ case INTEL_FAM6_SKYLAKE_DESKTOP:
+ case INTEL_FAM6_SKYLAKE_X:
+ case INTEL_FAM6_KABYLAKE_MOBILE:
+ case INTEL_FAM6_KABYLAKE_DESKTOP:
+ return true;
+ }
+ return false;
+}
+
static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
{
char arg[20];
@@ -143,6 +162,9 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
} else if (match_option(arg, ret, "retpoline,generic")) {
spec2_print_if_insecure("generic retpoline selected on command line.");
return SPECTRE_V2_CMD_RETPOLINE_GENERIC;
+ } else if (match_option(arg, ret, "retpoline,underflow")) {
+ spec2_print_if_insecure("generic retpoline with underflow protection selected on command line.");
+ return SPECTRE_V2_CMD_RETPOLINE_UNDERFLOW;
} else if (match_option(arg, ret, "auto")) {
return SPECTRE_V2_CMD_AUTO;
}
@@ -189,6 +211,9 @@ static void __init spectre_v2_select_mitigation(void)
if (IS_ENABLED(CONFIG_RETPOLINE))
goto retpoline_auto;
break;
+ case SPECTRE_V2_CMD_RETPOLINE_UNDERFLOW:
+ if (IS_ENABLED(CONFIG_RETPOLINE))
+ goto retpoline_generic;
}
pr_err("kernel not compiled with retpoline; no mitigation available!");
return;
@@ -209,6 +234,11 @@ static void __init spectre_v2_select_mitigation(void)
mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC :
SPECTRE_V2_RETPOLINE_MINIMAL;
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ if (mode == SPECTRE_V2_RETPOLINE_GENERIC &&
+ cpu_needs_underflow_protection()) {
+ mode = SPECTRE_V2_RETPOLINE_UNDERFLOW;
+ setup_force_cpu_cap(X86_FEATURE_RETURN_UNDERFLOW);
+ }
}
spectre_v2_enabled = mode;
--
2.14.3
On Fri, 2018-01-12 at 10:45 -0800, Andi Kleen wrote:
> [This is an alternative to David's earlier patch to only
> handle context switch. It handles more cases.]
>
> Skylake needs some additional protections over plain RETPOLINE
> for Spectre_v2.
>
> The CPU can fall back to the potentially poisioned indirect branch
> predictor when the return buffer underflows.
>
> This patch kit extends RETPOLINE to guard against many (but not
> all) of such cases by filling the return buffer.
>
> - Context switch when switching from shallow to deeper call chain
> - Idle which clears the return buffer
> - Interrupts which cause deep call chains
>
> This is done with a new SPECTRE_V2 defense mode and feature flag.
>
> The mitigations are only enabled on Skylake, patched out
> on other CPUs.
Thanks for exploring what it would take to do this.
I admit I'm still not convinced. I think Skylake should probably just
default to IBRS (since the performance doesn't suck *quite* as much
there as it does on earlier CPUs) or give the user a command-line
option to use retpoline with the RSB-stuffing that is already
implemented.
Skylake still loses if it takes an SMI, right? Or encounters a call
stack of more than 16 in depth?
>
> Skylake still loses if it takes an SMI, right?
SMMs are usually rare, especially on servers, and are usually
not very predictible, and even if you have
one it would need to interrupt on a quite narrow window, so difficult
to arrange. I don't worry about SMM.
> Or encounters a call
> stack of more than 16 in depth?
Yes that's it. This could be mitigated over time.
-Andi
On Fri, 12 Jan 2018, Andi Kleen wrote:
> > Skylake still loses if it takes an SMI, right?
>
> SMMs are usually rare, especially on servers, and are usually
> not very predictible, and even if you have
FWIW, a data point: SMIs can be generated on demand by userspace on
thinkpad laptops, but they will be triggered from within a kernel
context. I very much doubt this is a rare pattern...
However, since in this vendor-firmware-nastyness pattern the SMIs are
fired off from inside the kernel (inside the ACPI subsystem), it would
be covered by the mitigations done when entering the kernel from
syscalls and hardware interrupts. Hopefully.
--
Henrique Holschuh
> @@ -209,6 +234,11 @@ static void __init spectre_v2_select_mitigation(void)
> mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC :
> SPECTRE_V2_RETPOLINE_MINIMAL;
> setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> + if (mode == SPECTRE_V2_RETPOLINE_GENERIC &&
> + cpu_needs_underflow_protection()) {
> + mode = SPECTRE_V2_RETPOLINE_UNDERFLOW;
> + setup_force_cpu_cap(X86_FEATURE_RETURN_UNDERFLOW);
> + }
> }
Why does this require a retp-compatible compiler to be enabled?
Thanks,
Dominik
On Fri, Jan 12, 2018 at 11:13:13PM +0100, Dominik Brodowski wrote:
> > @@ -209,6 +234,11 @@ static void __init spectre_v2_select_mitigation(void)
> > mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC :
> > SPECTRE_V2_RETPOLINE_MINIMAL;
> > setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> > + if (mode == SPECTRE_V2_RETPOLINE_GENERIC &&
> > + cpu_needs_underflow_protection()) {
> > + mode = SPECTRE_V2_RETPOLINE_UNDERFLOW;
> > + setup_force_cpu_cap(X86_FEATURE_RETURN_UNDERFLOW);
> > + }
> > }
>
> Why does this require a retp-compatible compiler to be enabled?
Without the compiler you have a lot of other problems already,
so it's fairly pointless.
-Andi
On 01/12/2018 05:03 PM, Henrique de Moraes Holschuh wrote:
> On Fri, 12 Jan 2018, Andi Kleen wrote:
>>> Skylake still loses if it takes an SMI, right?
>>
>> SMMs are usually rare, especially on servers, and are usually
>> not very predictible, and even if you have
>
> FWIW, a data point: SMIs can be generated on demand by userspace on
> thinkpad laptops, but they will be triggered from within a kernel
> context. I very much doubt this is a rare pattern...
Sure. Just touch some "legacy" hardware that the vendor emulates in a
nasty SMI handler. It's definitely not acceptable to assume that SMIs
can't be generated under the control of some malicious user code.
Our numbers on Skylake weren't bad, and there seem to be all kinds of
corner cases, so again, it seems as if IBRS is the safest choice.
Jon.
On Mon, 2018-01-15 at 03:26 -0500, Jon Masters wrote:
>
> Our numbers on Skylake weren't bad, and there seem to be all kinds of
> corner cases, so again, it seems as if IBRS is the safest choice.
If only someone were rapidly iterating the IBRS patch set on top of the
latest tip, fixing the issues and making it consistent with everything
else that's being done...
On Mon, 15 Jan 2018, Jon Masters wrote:
> On 01/12/2018 05:03 PM, Henrique de Moraes Holschuh wrote:
> > On Fri, 12 Jan 2018, Andi Kleen wrote:
> >>> Skylake still loses if it takes an SMI, right?
> >>
> >> SMMs are usually rare, especially on servers, and are usually
> >> not very predictible, and even if you have
> >
> > FWIW, a data point: SMIs can be generated on demand by userspace on
> > thinkpad laptops, but they will be triggered from within a kernel
> > context. I very much doubt this is a rare pattern...
>
> Sure. Just touch some "legacy" hardware that the vendor emulates in a
> nasty SMI handler. It's definitely not acceptable to assume that SMIs
> can't be generated under the control of some malicious user code.
We all know that there are holes, but can we finally sit down and do a
proper analysis whether they are practically exploitable or not.
A laptop is single user, i.e. the most likely attack vector is java
script. So please elaborate how you abuse that from JS.
If the laptop is compromised in a way that malicious code is executed on it
outside JS, then the SMI hole is the least of your worries, really.
> Our numbers on Skylake weren't bad, and there seem to be all kinds of
> corner cases, so again, it seems as if IBRS is the safest choice.
Talk is cheap. Show numbers comparing the full retpoline/RBS mitigation
compared to IBRS.
Thanks,
tglx
On Mon, 2018-01-15 at 11:03 +0100, Thomas Gleixner wrote:
>
> > Our numbers on Skylake weren't bad, and there seem to be all kinds of
> > corner cases, so again, it seems as if IBRS is the safest choice.
>
> Talk is cheap. Show numbers comparing the full retpoline/RBS mitigation
> compared to IBRS.
Right now my intent is to add IBRS support and use it by default on
Skylake instead of retpoline. Users will be able to add
'spectre_v2=retpoline' to use retpoline on Skylake instead, if they
want to. But we'll default to the safest option.
Once that is done, we can do proper comparisons, and also look at
Andi's patches to attempt to make retpoline completely safe on Skylake.
> On Jan 15, 2018, at 12:26 AM, Jon Masters <[email protected]> wrote:
>
>> On 01/12/2018 05:03 PM, Henrique de Moraes Holschuh wrote:
>> On Fri, 12 Jan 2018, Andi Kleen wrote:
>>>> Skylake still loses if it takes an SMI, right?
>>>
>>> SMMs are usually rare, especially on servers, and are usually
>>> not very predictible, and even if you have
>>
>> FWIW, a data point: SMIs can be generated on demand by userspace on
>> thinkpad laptops, but they will be triggered from within a kernel
>> context. I very much doubt this is a rare pattern...
>
> Sure. Just touch some "legacy" hardware that the vendor emulates in a
> nasty SMI handler. It's definitely not acceptable to assume that SMIs
> can't be generated under the control of some malicious user code.
>
> Our numbers on Skylake weren't bad, and there seem to be all kinds of
> corner cases, so again, it seems as if IBRS is the safest choice.
>
And keep in mind that SMIs generally hit all CPUs at once, making them extra nasty.
Can we get firmware vendors to refill the return buffer just before RSM?
On 15/01/18 16:57, Andy Lutomirski wrote:
>
>> On Jan 15, 2018, at 12:26 AM, Jon Masters <[email protected]> wrote:
>>
>>> On 01/12/2018 05:03 PM, Henrique de Moraes Holschuh wrote:
>>> On Fri, 12 Jan 2018, Andi Kleen wrote:
>>>>> Skylake still loses if it takes an SMI, right?
>>>> SMMs are usually rare, especially on servers, and are usually
>>>> not very predictible, and even if you have
>>> FWIW, a data point: SMIs can be generated on demand by userspace on
>>> thinkpad laptops, but they will be triggered from within a kernel
>>> context. I very much doubt this is a rare pattern...
>> Sure. Just touch some "legacy" hardware that the vendor emulates in a
>> nasty SMI handler. It's definitely not acceptable to assume that SMIs
>> can't be generated under the control of some malicious user code.
>>
>> Our numbers on Skylake weren't bad, and there seem to be all kinds of
>> corner cases, so again, it seems as if IBRS is the safest choice.
>>
> And keep in mind that SMIs generally hit all CPUs at once, making them extra nasty.
>
> Can we get firmware vendors to refill the return buffer just before RSM?
Refill or not, you are aware that a correctly timed SMI in a leaf
function will cause the next ret to speculate into userspace, because
there is guaranteed peturbance in the RSB? (On the expectation that the
SMM handler isn't entirely devoid of function calls).
Having firmware refill the RSB only makes a difference if you are on
Skylake+ were RSB underflows are bad, and you're not using IBRS to
protect your indirect predictions.
~Andrew
> Having firmware refill the RSB only makes a difference if you are on
> Skylake+ were RSB underflows are bad, and you're not using IBRS to
> protect your indirect predictions.
... and before that you don't need it.
> On Jan 15, 2018, at 9:38 AM, Andrew Cooper <[email protected]> wrote:
>
>> On 15/01/18 16:57, Andy Lutomirski wrote:
>>
>>>> On Jan 15, 2018, at 12:26 AM, Jon Masters <[email protected]> wrote:
>>>>
>>>> On 01/12/2018 05:03 PM, Henrique de Moraes Holschuh wrote:
>>>> On Fri, 12 Jan 2018, Andi Kleen wrote:
>>>>>> Skylake still loses if it takes an SMI, right?
>>>>> SMMs are usually rare, especially on servers, and are usually
>>>>> not very predictible, and even if you have
>>>> FWIW, a data point: SMIs can be generated on demand by userspace on
>>>> thinkpad laptops, but they will be triggered from within a kernel
>>>> context. I very much doubt this is a rare pattern...
>>> Sure. Just touch some "legacy" hardware that the vendor emulates in a
>>> nasty SMI handler. It's definitely not acceptable to assume that SMIs
>>> can't be generated under the control of some malicious user code.
>>>
>>> Our numbers on Skylake weren't bad, and there seem to be all kinds of
>>> corner cases, so again, it seems as if IBRS is the safest choice.
>>>
>> And keep in mind that SMIs generally hit all CPUs at once, making them extra nasty.
>>
>> Can we get firmware vendors to refill the return buffer just before RSM?
>
> Refill or not, you are aware that a correctly timed SMI in a leaf
> function will cause the next ret to speculate into userspace, because
> there is guaranteed peturbance in the RSB? (On the expectation that the
> SMM handler isn't entirely devoid of function calls).
Couldn't firmware fill the RSB with a some known safe address, maybe even 0, and then immediately do RSM?
>
> Having firmware refill the RSB only makes a difference if you are on
> Skylake+ were RSB underflows are bad, and you're not using IBRS to
> protect your indirect predictions.
>
> ~Andrew
On Mon, 2018-01-15 at 10:06 -0800, Andy Lutomirski wrote:
>
> > Refill or not, you are aware that a correctly timed SMI in a leaf
> > function will cause the next ret to speculate into userspace, because
> > there is guaranteed peturbance in the RSB? (On the expectation that the
> > SMM handler isn't entirely devoid of function calls).
>
> Couldn't firmware fill the RSB with a some known safe address, maybe
> even 0, and then immediately do RSM?
Why don't we just unconditionally declare that *all* firmware that uses
SMI for anything at all is broken?
> On Jan 15, 2018, at 10:07 AM, David Woodhouse <[email protected]> wrote:
>
>> On Mon, 2018-01-15 at 10:06 -0800, Andy Lutomirski wrote:
>>
>>> Refill or not, you are aware that a correctly timed SMI in a leaf
>>> function will cause the next ret to speculate into userspace, because
>>> there is guaranteed peturbance in the RSB? (On the expectation that the
>>> SMM handler isn't entirely devoid of function calls).
>>
>> Couldn't firmware fill the RSB with a some known safe address, maybe
>> even 0, and then immediately do RSM?
>
> Why don't we just unconditionally declare that *all* firmware that uses
> SMI for anything at all is broken?
>
Because then Intel would need a new reference design for EFI authenticated variables. Of course, this is totally doable and should be done anyway.
Also, Paolo reports that some misdesign or other makes it important to use SMI on CPU hotplug.