This is a mitigation for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
Using GCC patches available from the hjl/indirect/gcc-7-branch/master
branch of https://github.com/hjl-tools/gcc/commits/hjl and by manually
patching assembler code, all vulnerable indirect branches (that occur
after userspace first runs) are eliminated from the kernel.
They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.
Fedora 27 packages of the updated compiler are available at
https://koji.fedoraproject.org/koji/taskinfo?taskID=24065739
v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative
v5: Silence MODVERSIONS warnings
Use pause;jmp loop instead of lfence;jmp
Switch to X86_FEATURE_RETPOLINE positive feature logic
Emit thunks inline from assembler macros
Merge AMD support into initial patch
v6: Update to latest GCC patches with no dots in symbols
Fix MODVERSIONS properly(ish)
Fix typo breaking 32-bit, introduced in V5
Never set X86_FEATURE_RETPOLINE_AMD yet, pending confirmation
Andi Kleen (3):
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline: Add boot time option to disable retpoline
x86/retpoline: Exclude objtool with retpoline
David Woodhouse (7):
x86/retpoline: Add initial retpoline support
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
Documentation/admin-guide/kernel-parameters.txt | 3 +
arch/x86/Kconfig | 17 ++++-
arch/x86/Kconfig.debug | 6 +-
arch/x86/Makefile | 10 +++
arch/x86/crypto/aesni-intel_asm.S | 5 +-
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 +-
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 +-
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +-
arch/x86/entry/entry_32.S | 5 +-
arch/x86/entry/entry_64.S | 12 +++-
arch/x86/include/asm/asm-prototypes.h | 25 +++++++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/mshyperv.h | 18 ++---
arch/x86/include/asm/nospec-branch.h | 92 +++++++++++++++++++++++++
arch/x86/include/asm/xen/hypercall.h | 5 +-
arch/x86/kernel/cpu/common.c | 3 +
arch/x86/kernel/cpu/intel.c | 11 +++
arch/x86/kernel/ftrace_32.S | 6 +-
arch/x86/kernel/ftrace_64.S | 8 +--
arch/x86/kernel/irq_32.c | 9 +--
arch/x86/lib/Makefile | 1 +
arch/x86/lib/checksum_32.S | 7 +-
arch/x86/lib/retpoline.S | 48 +++++++++++++
23 files changed, 264 insertions(+), 38 deletions(-)
create mode 100644 arch/x86/include/asm/nospec-branch.h
create mode 100644 arch/x86/lib/retpoline.S
--
2.7.4
Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.
This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
some circumstances, IBRS microcode features may be used instead, and the
retpoline can be disabled.
On AMD CPUs if lfence is serialising, the retpoline can be dramatically
simplified to a simple "lfence; jmp *\reg". A future patch, after it has
been verified that lfence really is serialising in all circumstances, can
enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
to X86_FEATURE_RETPOLINE.
[Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/Kconfig | 13 +++++
arch/x86/Makefile | 10 ++++
arch/x86/include/asm/asm-prototypes.h | 25 ++++++++++
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/nospec-branch.h | 92 +++++++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 3 ++
arch/x86/lib/Makefile | 1 +
arch/x86/lib/retpoline.S | 48 ++++++++++++++++++
8 files changed, 194 insertions(+)
create mode 100644 arch/x86/include/asm/nospec-branch.h
create mode 100644 arch/x86/lib/retpoline.S
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cd5199d..77c58ae 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -428,6 +428,19 @@ config GOLDFISH
def_bool y
depends on X86_GOLDFISH
+config RETPOLINE
+ bool "Avoid speculative indirect branches in kernel"
+ default y
+ help
+ Compile kernel with the retpoline compiler options to guard against
+ kernel-to-user data leaks by avoiding speculative indirect
+ branches. Requires a compiler with -mindirect-branch=thunk-extern
+ support for full protection. The kernel may run slower.
+
+ Without compiler support, at least indirect branches in assembler
+ code are eliminated. Since this includes the syscall entry path,
+ it is not entirely pointless.
+
config INTEL_RDT
bool "Intel Resource Director Technology support"
default n
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index a20eacd..918e550 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -235,6 +235,16 @@ KBUILD_CFLAGS += -Wno-sign-compare
#
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
+# Avoid indirect branches in kernel to deal with Spectre
+ifdef CONFIG_RETPOLINE
+ RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
+ ifneq ($(RETPOLINE_CFLAGS),)
+ KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
+ else
+ $(warning Retpoline not supported in compiler. System may be insecure.)
+ endif
+endif
+
archscripts: scripts_basic
$(Q)$(MAKE) $(build)=arch/x86/tools relocs
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index ff700d8..0927cdc 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -11,7 +11,32 @@
#include <asm/pgtable.h>
#include <asm/special_insns.h>
#include <asm/preempt.h>
+#include <asm/asm.h>
#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
#endif
+
+#ifdef CONFIG_RETPOLINE
+#ifdef CONFIG_X86_32
+#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void);
+#else
+#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void);
+INDIRECT_THUNK(8)
+INDIRECT_THUNK(9)
+INDIRECT_THUNK(10)
+INDIRECT_THUNK(11)
+INDIRECT_THUNK(12)
+INDIRECT_THUNK(13)
+INDIRECT_THUNK(14)
+INDIRECT_THUNK(15)
+#endif
+INDIRECT_THUNK(ax)
+INDIRECT_THUNK(bx)
+INDIRECT_THUNK(cx)
+INDIRECT_THUNK(dx)
+INDIRECT_THUNK(si)
+INDIRECT_THUNK(di)
+INDIRECT_THUNK(bp)
+INDIRECT_THUNK(sp)
+#endif /* CONFIG_RETPOLINE */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 511d909..2d65438 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -203,6 +203,8 @@
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */
#define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */
+#define X86_FEATURE_RETPOLINE ( 7*32+12) /* Intel Retpoline mitigation for Spectre variant 2 */
+#define X86_FEATURE_RETPOLINE_AMD ( 7*32+13) /* AMD Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */
#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
new file mode 100644
index 0000000..b8c8eea
--- /dev/null
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __NOSPEC_BRANCH_H__
+#define __NOSPEC_BRANCH_H__
+
+#include <asm/alternative.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeatures.h>
+
+#ifdef __ASSEMBLY__
+
+/*
+ * These are the bare retpoline primitives for indirect jmp and call.
+ * Do not use these directly; they only exist to make the ALTERNATIVE
+ * invocation below less ugly.
+ */
+.macro RETPOLINE_JMP reg:req
+ call 1112f
+1111: pause
+ jmp 1111b
+1112: mov \reg, (%_ASM_SP)
+ ret
+.endm
+
+.macro RETPOLINE_CALL reg:req
+ jmp 1113f
+1110: RETPOLINE_JMP \reg
+1113: call 1110b
+.endm
+
+/*
+ * NOSPEC_JMP and NOSPEC_CALL macros can be used instead of a simple
+ * indirect jmp/call which may be susceptible to the Spectre variant 2
+ * attack.
+ */
+.macro NOSPEC_JMP reg:req
+#ifdef CONFIG_RETPOLINE
+ ALTERNATIVE_2 __stringify(jmp *\reg), \
+ __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
+ __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
+#else
+ jmp *\reg
+#endif
+.endm
+
+.macro NOSPEC_CALL reg:req
+#ifdef CONFIG_RETPOLINE
+ ALTERNATIVE_2 __stringify(call *\reg), \
+ __stringify(RETPOLINE_CALL \reg), X86_FEATURE_RETPOLINE,\
+ __stringify(lfence; call *\reg), X86_FEATURE_RETPOLINE_AMD
+#else
+ call *\reg
+#endif
+.endm
+
+#else /* __ASSEMBLY__ */
+
+#if defined(CONFIG_X86_64) && defined(RETPOLINE)
+/*
+ * Since the inline asm uses the %V modifier which is only in newer GCC,
+ * the 64-bit one is dependent on RETPOLINE not CONFIG_RETPOLINE.
+ */
+# define NOSPEC_CALL ALTERNATIVE( \
+ "call *%[thunk_target]\n", \
+ "call __x86_indirect_thunk_%V[thunk_target]\n", \
+ X86_FEATURE_RETPOLINE)
+# define THUNK_TARGET(addr) [thunk_target] "r" (addr)
+#elif defined(CONFIG_X86_32) && defined(CONFIG_RETPOLINE)
+/*
+ * For i386 we use the original ret-equivalent retpoline, because
+ * otherwise we'll run out of registers. We don't care about CET
+ * here, anyway.
+ */
+# define NOSPEC_CALL ALTERNATIVE( \
+ "call *%[thunk_target]\n", \
+ " jmp 1113f; " \
+ "1110: call 1112f; " \
+ "1111: pause; " \
+ " jmp 1111b; " \
+ "1112: addl $4, %%esp; " \
+ " pushl %[thunk_target]; " \
+ " ret; " \
+ "1113: call 1110b;\n", \
+ X86_FEATURE_RETPOLINE)
+# define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
+#else /* No retpoline */
+# define NOSPEC_CALL "call *%[thunk_target]\n"
+# define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
+#endif
+
+#endif /* __ASSEMBLY__ */
+#endif /* __NOSPEC_BRANCH_H__ */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 372ba3f..cfa5042 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -904,6 +904,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
+#ifdef CONFIG_RETPOLINE
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+#endif
fpu__init_system(c);
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 457f681..d435c89 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -26,6 +26,7 @@ lib-y += memcpy_$(BITS).o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
+lib-$(CONFIG_RETPOLINE) += retpoline.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
new file mode 100644
index 0000000..74d0a1d
--- /dev/null
+++ b/arch/x86/lib/retpoline.S
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/stringify.h>
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/cpufeatures.h>
+#include <asm/alternative-asm.h>
+#include <asm/export.h>
+#include <asm/nospec-branch.h>
+
+.macro THUNK reg
+ .section .text.__x86.indirect_thunk.\reg
+
+ENTRY(__x86_indirect_thunk_\reg)
+ CFI_STARTPROC
+ NOSPEC_JMP %\reg
+ CFI_ENDPROC
+ENDPROC(__x86_indirect_thunk_\reg)
+.endm
+
+/*
+ * Despite being an assembler file we can't just use .irp here
+ * because __KSYM_DEPS__ only uses the C preprocessor and would
+ * only see one instance of "__x86_indirect_thunk_\reg" rather
+ * than one per register with the correct names. So we do it
+ * the simple and nasty way...
+ */
+#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
+#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
+
+GENERATE_THUNK(_ASM_AX)
+GENERATE_THUNK(_ASM_BX)
+GENERATE_THUNK(_ASM_CX)
+GENERATE_THUNK(_ASM_DX)
+GENERATE_THUNK(_ASM_SI)
+GENERATE_THUNK(_ASM_DI)
+GENERATE_THUNK(_ASM_BP)
+GENERATE_THUNK(_ASM_SP)
+#ifdef CONFIG_64BIT
+GENERATE_THUNK(r8)
+GENERATE_THUNK(r9)
+GENERATE_THUNK(r10)
+GENERATE_THUNK(r11)
+GENERATE_THUNK(r12)
+GENERATE_THUNK(r13)
+GENERATE_THUNK(r14)
+GENERATE_THUNK(r15)
+#endif
--
2.7.4
Convert all indirect jumps in ftrace assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/kernel/ftrace_32.S | 6 ++++--
arch/x86/kernel/ftrace_64.S | 8 ++++----
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
index b6c6468..c3842c9 100644
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -8,6 +8,7 @@
#include <asm/segment.h>
#include <asm/export.h>
#include <asm/ftrace.h>
+#include <asm/nospec-branch.h>
#ifdef CC_USING_FENTRY
# define function_hook __fentry__
@@ -197,7 +198,8 @@ ftrace_stub:
movl 0x4(%ebp), %edx
subl $MCOUNT_INSN_SIZE, %eax
- call *ftrace_trace_function
+ movl ftrace_trace_function, %ecx
+ NOSPEC_CALL %ecx
popl %edx
popl %ecx
@@ -241,5 +243,5 @@ return_to_handler:
movl %eax, %ecx
popl %edx
popl %eax
- jmp *%ecx
+ NOSPEC_JMP %ecx
#endif
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index c832291..0893068 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -7,7 +7,7 @@
#include <asm/ptrace.h>
#include <asm/ftrace.h>
#include <asm/export.h>
-
+#include <asm/nospec-branch.h>
.code64
.section .entry.text, "ax"
@@ -286,8 +286,8 @@ trace:
* ip and parent ip are used and the list function is called when
* function tracing is enabled.
*/
- call *ftrace_trace_function
-
+ movq ftrace_trace_function, %r8
+ NOSPEC_CALL %r8
restore_mcount_regs
jmp fgraph_trace
@@ -329,5 +329,5 @@ GLOBAL(return_to_handler)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $24, %rsp
- jmp *%rdi
+ NOSPEC_JMP %rdi
#endif
--
2.7.4
Convert indirect jumps in core 32/64bit entry assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.
Don't use NOSPEC_CALL in entry_SYSCALL_64_fastpath because the return
address after the 'call' instruction must be *precisely* at the
.Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work,
and the use of alternatives will mess that up unless we play horrid
games to prepend with NOPs and make the variants the same length. It's
not worth it; in the case where we ALTERNATIVE out the retpoline, the
first instruction at __x86.indirect_thunk.rax is going to be a bare
jmp *%rax anyway.
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/entry/entry_32.S | 5 +++--
arch/x86/entry/entry_64.S | 12 +++++++++---
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ace8f32..cf9ef33 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -44,6 +44,7 @@
#include <asm/asm.h>
#include <asm/smap.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
.section .entry.text, "ax"
@@ -290,7 +291,7 @@ ENTRY(ret_from_fork)
/* kernel thread */
1: movl %edi, %eax
- call *%ebx
+ NOSPEC_CALL %ebx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
@@ -919,7 +920,7 @@ common_exception:
movl %ecx, %es
TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
- call *%edi
+ NOSPEC_CALL %edi
jmp ret_from_exception
END(common_exception)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ed31d00..9bce6ed 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
#include <asm/pgtable_types.h>
#include <asm/export.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
#include <linux/err.h>
#include "calling.h"
@@ -187,7 +188,7 @@ ENTRY(entry_SYSCALL_64_trampoline)
*/
pushq %rdi
movq $entry_SYSCALL_64_stage2, %rdi
- jmp *%rdi
+ NOSPEC_JMP %rdi
END(entry_SYSCALL_64_trampoline)
.popsection
@@ -266,7 +267,12 @@ entry_SYSCALL_64_fastpath:
* It might end up jumping to the slow path. If it jumps, RAX
* and all argument registers are clobbered.
*/
+#ifdef CONFIG_RETPOLINE
+ movq sys_call_table(, %rax, 8), %rax
+ call __x86_indirect_thunk_rax
+#else
call *sys_call_table(, %rax, 8)
+#endif
.Lentry_SYSCALL_64_after_fastpath_call:
movq %rax, RAX(%rsp)
@@ -438,7 +444,7 @@ ENTRY(stub_ptregs_64)
jmp entry_SYSCALL64_slow_path
1:
- jmp *%rax /* Called from C */
+ NOSPEC_JMP %rax /* Called from C */
END(stub_ptregs_64)
.macro ptregs_stub func
@@ -517,7 +523,7 @@ ENTRY(ret_from_fork)
1:
/* kernel thread */
movq %r12, %rdi
- call *%rbx
+ NOSPEC_CALL %rbx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
--
2.7.4
Convert all indirect jumps in hyperv inline asm code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/include/asm/mshyperv.h | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 581bb54..6534e57 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -7,6 +7,7 @@
#include <linux/nmi.h>
#include <asm/io.h>
#include <asm/hyperv.h>
+#include <asm/nospec-branch.h>
/*
* The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
@@ -186,10 +187,11 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
return U64_MAX;
__asm__ __volatile__("mov %4, %%r8\n"
- "call *%5"
+ NOSPEC_CALL
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
"+c" (control), "+d" (input_address)
- : "r" (output_address), "m" (hv_hypercall_pg)
+ : "r" (output_address),
+ THUNK_TARGET(hv_hypercall_pg)
: "cc", "memory", "r8", "r9", "r10", "r11");
#else
u32 input_address_hi = upper_32_bits(input_address);
@@ -200,13 +202,13 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
if (!hv_hypercall_pg)
return U64_MAX;
- __asm__ __volatile__("call *%7"
+ __asm__ __volatile__(NOSPEC_CALL
: "=A" (hv_status),
"+c" (input_address_lo), ASM_CALL_CONSTRAINT
: "A" (control),
"b" (input_address_hi),
"D"(output_address_hi), "S"(output_address_lo),
- "m" (hv_hypercall_pg)
+ THUNK_TARGET(hv_hypercall_pg)
: "cc", "memory");
#endif /* !x86_64 */
return hv_status;
@@ -227,10 +229,10 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
#ifdef CONFIG_X86_64
{
- __asm__ __volatile__("call *%4"
+ __asm__ __volatile__(NOSPEC_CALL
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
"+c" (control), "+d" (input1)
- : "m" (hv_hypercall_pg)
+ : THUNK_TARGET(hv_hypercall_pg)
: "cc", "r8", "r9", "r10", "r11");
}
#else
@@ -238,13 +240,13 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
u32 input1_hi = upper_32_bits(input1);
u32 input1_lo = lower_32_bits(input1);
- __asm__ __volatile__ ("call *%5"
+ __asm__ __volatile__ (NOSPEC_CALL
: "=A"(hv_status),
"+c"(input1_lo),
ASM_CALL_CONSTRAINT
: "A" (control),
"b" (input1_hi),
- "m" (hv_hypercall_pg)
+ THUNK_TARGET(hv_hypercall_pg)
: "cc", "edi", "esi");
}
#endif
--
2.7.4
Convert all indirect jumps in 32bit checksum assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/lib/checksum_32.S | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 4d34bb5..98cf15d 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -29,7 +29,8 @@
#include <asm/errno.h>
#include <asm/asm.h>
#include <asm/export.h>
-
+#include <asm/nospec-branch.h>
+
/*
* computes a partial checksum, e.g. for TCP/UDP fragments
*/
@@ -156,7 +157,7 @@ ENTRY(csum_partial)
negl %ebx
lea 45f(%ebx,%ebx,2), %ebx
testl %esi, %esi
- jmp *%ebx
+ NOSPEC_JMP %ebx
# Handle 2-byte-aligned regions
20: addw (%esi), %ax
@@ -439,7 +440,7 @@ ENTRY(csum_partial_copy_generic)
andl $-32,%edx
lea 3f(%ebx,%ebx), %ebx
testl %esi, %esi
- jmp *%ebx
+ NOSPEC_JMP %ebx
1: addl $64,%esi
addl $64,%edi
SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
--
2.7.4
From: Andi Kleen <[email protected]>
Convert all indirect jumps in 32bit irq inline asm code to use
non speculative sequences.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/kernel/irq_32.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index a83b334..e1e58f7 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -20,6 +20,7 @@
#include <linux/mm.h>
#include <asm/apic.h>
+#include <asm/nospec-branch.h>
#ifdef CONFIG_DEBUG_STACKOVERFLOW
@@ -55,11 +56,11 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
static void call_on_stack(void *func, void *stack)
{
asm volatile("xchgl %%ebx,%%esp \n"
- "call *%%edi \n"
+ NOSPEC_CALL
"movl %%ebx,%%esp \n"
: "=b" (stack)
: "0" (stack),
- "D"(func)
+ [thunk_target] "D"(func)
: "memory", "cc", "edx", "ecx", "eax");
}
@@ -95,11 +96,11 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc)
call_on_stack(print_stack_overflow, isp);
asm volatile("xchgl %%ebx,%%esp \n"
- "call *%%edi \n"
+ NOSPEC_CALL
"movl %%ebx,%%esp \n"
: "=a" (arg1), "=b" (isp)
: "0" (desc), "1" (isp),
- "D" (desc->handle_irq)
+ [thunk_target] "D" (desc->handle_irq)
: "memory", "cc", "ecx");
return 1;
}
--
2.7.4
Convert all indirect jumps in crypto assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/crypto/aesni-intel_asm.S | 5 +++--
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 ++-
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 ++-
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 ++-
4 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 16627fe..f128680 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -32,6 +32,7 @@
#include <linux/linkage.h>
#include <asm/inst.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
/*
* The following macros are used to move an (un)aligned 16 byte value to/from
@@ -2884,7 +2885,7 @@ ENTRY(aesni_xts_crypt8)
pxor INC, STATE4
movdqu IV, 0x30(OUTP)
- call *%r11
+ NOSPEC_CALL %r11
movdqu 0x00(OUTP), INC
pxor INC, STATE1
@@ -2929,7 +2930,7 @@ ENTRY(aesni_xts_crypt8)
_aesni_gf128mul_x_ble()
movups IV, (IVP)
- call *%r11
+ NOSPEC_CALL %r11
movdqu 0x40(OUTP), INC
pxor INC, STATE1
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e..ba3f075 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -17,6 +17,7 @@
#include <linux/linkage.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
#define CAMELLIA_TABLE_BYTE_LEN 272
@@ -1227,7 +1228,7 @@ camellia_xts_crypt_16way:
vpxor 14 * 16(%rax), %xmm15, %xmm14;
vpxor 15 * 16(%rax), %xmm15, %xmm15;
- call *%r9;
+ NOSPEC_CALL %r9;
addq $(16 * 16), %rsp;
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index eee5b39..9b0a88a 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -12,6 +12,7 @@
#include <linux/linkage.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
#define CAMELLIA_TABLE_BYTE_LEN 272
@@ -1343,7 +1344,7 @@ camellia_xts_crypt_32way:
vpxor 14 * 32(%rax), %ymm15, %ymm14;
vpxor 15 * 32(%rax), %ymm15, %ymm15;
- call *%r9;
+ NOSPEC_CALL %r9;
addq $(16 * 32), %rsp;
diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index 7a7de27..05178b44 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -45,6 +45,7 @@
#include <asm/inst.h>
#include <linux/linkage.h>
+#include <asm/nospec-branch.h>
## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction
@@ -172,7 +173,7 @@ continue_block:
movzxw (bufp, %rax, 2), len
lea crc_array(%rip), bufp
lea (bufp, len, 1), bufp
- jmp *bufp
+ NOSPEC_JMP bufp
################################################################
## 2a) PROCESS FULL BLOCKS:
--
2.7.4
Convert indirect call in Xen hypercall to use non-speculative sequence,
when CONFIG_RETPOLINE is enabled.
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/include/asm/xen/hypercall.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 7cb282e..393c004 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -44,6 +44,7 @@
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/smap.h>
+#include <asm/nospec-branch.h>
#include <xen/interface/xen.h>
#include <xen/interface/sched.h>
@@ -217,9 +218,9 @@ privcmd_call(unsigned call,
__HYPERCALL_5ARG(a1, a2, a3, a4, a5);
stac();
- asm volatile("call *%[call]"
+ asm volatile(NOSPEC_CALL
: __HYPERCALL_5PARAM
- : [call] "a" (&hypercall_page[call])
+ : [thunk_target] "a" (&hypercall_page[call])
: __HYPERCALL_CLOBBER5);
clac();
--
2.7.4
From: Andi Kleen <[email protected]>
objtool's assembler nanny currently cannot deal with the code generated
by the retpoline compiler and throws hundreds of warnings, mostly
because it sees calls that don't have a symbolic target.
Exclude all the options that rely on objtool when RETPOLINE is active.
This mainly means that we use the frame pointer unwinder and livepatch
is not supported.
Eventually objtool can be fixed to handle this.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
arch/x86/Kconfig | 4 ++--
arch/x86/Kconfig.debug | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 77c58ae..651d25f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -171,8 +171,8 @@ config X86
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
- select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION
- select HAVE_STACK_VALIDATION if X86_64
+ select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION && !RETPOLINE
+ select HAVE_STACK_VALIDATION if X86_64 && !RETPOLINE
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_USER_RETURN_NOTIFIER
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 6293a87..9f3928d 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -359,8 +359,8 @@ config PUNIT_ATOM_DEBUG
choice
prompt "Choose kernel unwinder"
- default UNWINDER_ORC if X86_64
- default UNWINDER_FRAME_POINTER if X86_32
+ default UNWINDER_ORC if X86_64 && !RETPOLINE
+ default UNWINDER_FRAME_POINTER if X86_32 || RETPOLINE
---help---
This determines which method will be used for unwinding kernel stack
traces for panics, oopses, bugs, warnings, perf, /proc/<pid>/stack,
@@ -368,7 +368,7 @@ choice
config UNWINDER_ORC
bool "ORC unwinder"
- depends on X86_64
+ depends on X86_64 && !RETPOLINE
select STACK_VALIDATION
---help---
This option enables the ORC (Oops Rewind Capability) unwinder for
--
2.7.4
From: Andi Kleen <[email protected]>
Add a noretpoline option boot to disable retpoline and patch out the
extra sequences. It cannot patch out the jumps to the thunk functions
from code generated by the compiler, but those thunks turn into a single
indirect branch now.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Acked-By: Arjan van de Ven <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 3 +++
arch/x86/kernel/cpu/intel.c | 11 +++++++++++
2 files changed, 14 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9059917..d443141 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2596,6 +2596,9 @@
nohugeiomap [KNL,x86] Disable kernel huge I/O mappings.
+ noretpoline [X86] Disable the retpoline kernel indirect branch speculation
+ workarounds. System may allow data leaks with this option.
+
nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index b720dac..35e123e 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -31,6 +31,17 @@
#include <asm/apic.h>
#endif
+#ifdef RETPOLINE
+static int __init noretpoline_setup(char *__unused)
+{
+ pr_info("Retpoline runtime disabled\n");
+ setup_clear_cpu_cap(X86_FEATURE_RETPOLINE);
+ setup_clear_cpu_cap(X86_FEATURE_RETPOLINE_AMD);
+ return 1;
+}
+__setup("noretpoline", noretpoline_setup);
+#endif
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
--
2.7.4
On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse <[email protected]> wrote:
> This is a mitigation for the 'variant 2' attack described in
> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
Ok, I don't love the patches, but I see nothing horribly wrong here
either, and I assume the performance impact of this is pretty minimal.
Thomas? I'm obviously doing rc7 today without these, but I assume the
x86 maintainers are resigned to this all. And yes, we'll have at least
an rc8 this release..
Linus
On Sun, 7 Jan 2018, Linus Torvalds wrote:
> On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse <[email protected]> wrote:
> > This is a mitigation for the 'variant 2' attack described in
> > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
>
> Ok, I don't love the patches, but I see nothing horribly wrong here
> either, and I assume the performance impact of this is pretty minimal.
>
> Thomas? I'm obviously doing rc7 today without these, but I assume the
> x86 maintainers are resigned to this all.
That seems to be the general mental state for lots of involved people.
Thanks,
tglx
On Sun, 7 Jan 2018, David Woodhouse wrote:
Cc+ Josh Poimboeuf <[email protected]>
Sigh....
> From: Andi Kleen <[email protected]>
>
> objtool's assembler nanny currently cannot deal with the code generated
> by the retpoline compiler and throws hundreds of warnings, mostly
> because it sees calls that don't have a symbolic target.
>
> Exclude all the options that rely on objtool when RETPOLINE is active.
>
> This mainly means that we use the frame pointer unwinder and livepatch
> is not supported.
>
> Eventually objtool can be fixed to handle this.
>
> Signed-off-by: Andi Kleen <[email protected]>
> Signed-off-by: David Woodhouse <[email protected]>
> Acked-By: Arjan van de Ven <[email protected]>
> ---
> arch/x86/Kconfig | 4 ++--
> arch/x86/Kconfig.debug | 6 +++---
> 2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 77c58ae..651d25f 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -171,8 +171,8 @@ config X86
> select HAVE_PERF_USER_STACK_DUMP
> select HAVE_RCU_TABLE_FREE
> select HAVE_REGS_AND_STACK_ACCESS_API
> - select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION
> - select HAVE_STACK_VALIDATION if X86_64
> + select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION && !RETPOLINE
> + select HAVE_STACK_VALIDATION if X86_64 && !RETPOLINE
> select HAVE_SYSCALL_TRACEPOINTS
> select HAVE_UNSTABLE_SCHED_CLOCK
> select HAVE_USER_RETURN_NOTIFIER
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index 6293a87..9f3928d 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -359,8 +359,8 @@ config PUNIT_ATOM_DEBUG
>
> choice
> prompt "Choose kernel unwinder"
> - default UNWINDER_ORC if X86_64
> - default UNWINDER_FRAME_POINTER if X86_32
> + default UNWINDER_ORC if X86_64 && !RETPOLINE
> + default UNWINDER_FRAME_POINTER if X86_32 || RETPOLINE
> ---help---
> This determines which method will be used for unwinding kernel stack
> traces for panics, oopses, bugs, warnings, perf, /proc/<pid>/stack,
> @@ -368,7 +368,7 @@ choice
>
> config UNWINDER_ORC
> bool "ORC unwinder"
> - depends on X86_64
> + depends on X86_64 && !RETPOLINE
> select STACK_VALIDATION
> ---help---
> This option enables the ORC (Oops Rewind Capability) unwinder for
> --
> 2.7.4
>
>
On Mon, 2018-01-08 at 11:25 +0100, Thomas Gleixner wrote:
> On Sun, 7 Jan 2018, David Woodhouse wrote:
>
> Cc+ Josh Poimboeuf <[email protected]>
>
> Sigh....
>
> > From: Andi Kleen <[email protected]>
> >
> > objtool's assembler nanny currently cannot deal with the code generated
> > by the retpoline compiler and throws hundreds of warnings, mostly
> > because it sees calls that don't have a symbolic target.
> >
> > Exclude all the options that rely on objtool when RETPOLINE is active.
> >
> > This mainly means that we use the frame pointer unwinder and livepatch
> > is not supported.
> >
> > Eventually objtool can be fixed to handle this.
I believe Josh is on vacation and I've asked Amit to take a look at
this, at least as far as ensuring that it doesn't actually disable
kpatch.
On Mon, 8 Jan 2018, Paul Turner wrote:
> user->kernel in the absence of SMEP:
> In the absence of SMEP, we must worry about user-generated RSB entries
> being consumable by kernel execution.
> Generally speaking, for synchronous execution this will not occur (e.g.
> syscall, interrupt), however, one important case remains.
> When we context switch between two threads, we should flush the RSB so that
> execution generated from the unbalanced return path on the thread that we
> just scheduled into, cannot consume RSB entries potentially installed by
> the prior thread.
I am still unclear whether this closes it completely, as when HT is on,
the RSB is shared between the threads, right? Therefore one thread can
poision it for the other without even context switch happening.
--
Jiri Kosina
SUSE Labs
[ First send did not make list because gmail ate its plain-text force
when I pasted content. ]
One detail that is missing is that we still need RSB refill in some cases.
This is not because the retpoline sequence itself will underflow (it
is actually guaranteed not to, since it consumes only RSB entries that
it generates.
But either to avoid poisoning of the RSB entries themselves, or to
avoid the hardware turning to alternate predictors on RSB underflow.
Enumerating the cases we care about:
user->kernel in the absence of SMEP:
In the absence of SMEP, we must worry about user-generated RSB entries
being consumable by kernel execution.
Generally speaking, for synchronous execution this will not occur
(e.g. syscall, interrupt), however, one important case remains.
When we context switch between two threads, we should flush the RSB so
that execution generated from the unbalanced return path on the thread
that we just scheduled into, cannot consume RSB entries potentially
installed by the prior thread.
kernel->kernel independent of SMEP:
While much harder to coordinate, facilities such as eBPF potentially
allow exploitable return targets to be created.
Generally speaking (particularly if eBPF has been disabled) the risk
is _much_ lower here, since we can only return into kernel execution
that was already occurring on another thread (which could e.g. likely
be attacked there directly independent of RSB poisoning.)
guest->hypervisor, independent of SMEP:
For guest ring0 -> host ring0 transitions, it is possible that the
tagging only includes that the entry was only generated in a ring0
context. Meaning that a guest generated entry may be consumed by the
host. This admits:
hypervisor_run_vcpu_implementation() {
<enter hardware virtualization context>
… run virtualized work (1)
<leave hardware virtualization context>
< update vmcs state, prior to any function return > (2)
< return from hypervisor_run_vcpu_implementation() to handle VMEXIT > (3)
}
A guest to craft poisoned entries at (1) which, if not flushed at (2),
may immediately be eligible for consumption at (3).
While the cases above involve the crafting and use of poisoned
entries. Recall also that one of the initial conditions was that we
should avoid RSB underflow as some CPUs may try to use other indirect
predictors when this occurs.
The cases we care about here are:
- When we return _into_ protected execution. For the kernel, this
means when we exit interrupt context into kernel context, since may
have emptied or reduced the number of RSB entries while in iinterrupt
context.
- Context switch (even if we are returning to user code, we need to at
unwind the scheduler/triggering frames that preempted it previously,
considering that detail, this is a subset of the above, but listed for
completeness)
- On VMEXIT (it turns out we need to worry about both poisoned
entries, and no entries, the solution is a single refill nonetheless).
- Leaving deeper (>C1) c-states, which may have flushed hardware state
- Where we are unwinding call-chains of >16 entries[*]
[*] This is obviously the trickiest case. Fortunately, it is tough to
exploit since such call-chains are reasonably rare, and action must
typically be predicted at a considerable distance from where current
execution lies. Both dramatically increasing the feasibility of an
attack and lowering the bit-rate (number of ops per attempt is
necessarily increased). For our systems, since we control the binary
image we can determine this through aggregate profiling of every
machine in the fleet. I'm happy to provide those symbols; but it's
obviously restricted from complete coverage due to code differences.
Generally, this is a level of paranoia no typical user will likely
care about and only applies to a subset of CPUs.
A sequence for efficiently refilling the RSB is:
mov $8, %rax;
.align 16;
3: call 4f;
3p: pause; call 3p;
.align 16;
4: call 5f;
4p: pause; call 4p;
.align 16;
5: dec %rax;
jnz 3b;
add $(16*8), %rsp;
This implementation uses 8 loops, with 2 calls per iteration. This is
marginally faster than a single call per iteration. We did not
observe useful benefit (particularly relative to text size) from
further unrolling. This may also be usefully split into smaller (e.g.
4 or 8 call) segments where we can usefully pipeline/intermix with
other operations. It includes retpoline type traps so that if an
entry is consumed, it cannot lead to controlled speculation. On my
test system it took ~43 cycles on average. Note that non-zero
displacement calls should be used as these may be optimized to not
interact with the RSB due to their use in fetching RIP for 32-bit
relocations.
On Mon, Jan 8, 2018 at 2:34 AM, Paul Turner <[email protected]> wrote:
> One detail that is missing is that we still need RSB refill in some cases.
> This is not because the retpoline sequence itself will underflow (it is
> actually guaranteed not to, since it consumes only RSB entries that it
> generates.
> But either to avoid poisoning of the RSB entries themselves, or to avoid the
> hardware turning to alternate predictors on RSB underflow.
>
> Enumerating the cases we care about:
>
> user->kernel in the absence of SMEP:
> In the absence of SMEP, we must worry about user-generated RSB entries being
> consumable by kernel execution.
> Generally speaking, for synchronous execution this will not occur (e.g.
> syscall, interrupt), however, one important case remains.
> When we context switch between two threads, we should flush the RSB so that
> execution generated from the unbalanced return path on the thread that we
> just scheduled into, cannot consume RSB entries potentially installed by the
> prior thread.
>
> kernel->kernel independent of SMEP:
> While much harder to coordinate, facilities such as eBPF potentially allow
> exploitable return targets to be created.
> Generally speaking (particularly if eBPF has been disabled) the risk is
> _much_ lower here, since we can only return into kernel execution that was
> already occurring on another thread (which could e.g. likely be attacked
> there directly independent of RSB poisoning.)
>
> guest->hypervisor, independent of SMEP:
> For guest ring0 -> host ring0 transitions, it is possible that the tagging
> only includes that the entry was only generated in a ring0 context. Meaning
> that a guest generated entry may be consumed by the host. This admits:
>
> hypervisor_run_vcpu_implementation() {
> <enter hardware virtualization context>
> … run virtualized work (1)
> <leave hardware virtualization context>
> < update vmcs state, prior to any function return > (2)
> < return from hypervisor_run_vcpu_implementation() to handle VMEXIT > (3)
> }
>
> A guest to craft poisoned entries at (1) which, if not flushed at (2), may
> immediately be eligible for consumption at (3).
>
> While the cases above involve the crafting and use of poisoned entries.
> Recall also that one of the initial conditions was that we should avoid RSB
> underflow as some CPUs may try to use other indirect predictors when this
> occurs.
>
> The cases we care about here are:
> - When we return _into_ protected execution. For the kernel, this means
> when we exit interrupt context into kernel context, since may have emptied
> or reduced the number of RSB entries while in iinterrupt context.
> - Context switch (even if we are returning to user code, we need to at
> unwind the scheduler/triggering frames that preempted it previously,
> considering that detail, this is a subset of the above, but listed for
> completeness)
> - On VMEXIT (it turns out we need to worry about both poisoned entries, and
> no entries, the solution is a single refill nonetheless).
> - Leaving deeper (>C1) c-states, which may have flushed hardware state
> - Where we are unwinding call-chains of >16 entries[*]
>
> [*] This is obviously the trickiest case. Fortunately, it is tough to
> exploit since such call-chains are reasonably rare, and action must
> typically be predicted at a considerable distance from where current
> execution lies. Both dramatically increasing the feasibility of an attack
> and lowering the bit-rate (number of ops per attempt is necessarily
> increased). For our systems, since we control the binary image we can
> determine this through aggregate profiling of every machine in the fleet.
> I'm happy to provide those symbols; but it's obviously restricted from
> complete coverage due to code differences. Generally, this is a level of
> paranoia no typical user will likely care about and only applies to a subset
> of CPUs.
>
>
>
>
>
> On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse <[email protected]> wrote:
>>
>> This is a mitigation for the 'variant 2' attack described in
>>
>> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
>>
>> Using GCC patches available from the hjl/indirect/gcc-7-branch/master
>> branch of https://github.com/hjl-tools/gcc/commits/hjl and by manually
>> patching assembler code, all vulnerable indirect branches (that occur
>> after userspace first runs) are eliminated from the kernel.
>>
>> They are replaced with a 'retpoline' call sequence which deliberately
>> prevents speculation.
>>
>> Fedora 27 packages of the updated compiler are available at
>> https://koji.fedoraproject.org/koji/taskinfo?taskID=24065739
>>
>>
>> v1: Initial post.
>> v2: Add CONFIG_RETPOLINE to build kernel without it.
>> Change warning messages.
>> Hide modpost warning message
>> v3: Update to the latest CET-capable retpoline version
>> Reinstate ALTERNATIVE support
>> v4: Finish reconciling Andi's and my patch sets, bug fixes.
>> Exclude objtool support for now
>> Add 'noretpoline' boot option
>> Add AMD retpoline alternative
>> v5: Silence MODVERSIONS warnings
>> Use pause;jmp loop instead of lfence;jmp
>> Switch to X86_FEATURE_RETPOLINE positive feature logic
>> Emit thunks inline from assembler macros
>> Merge AMD support into initial patch
>> v6: Update to latest GCC patches with no dots in symbols
>> Fix MODVERSIONS properly(ish)
>> Fix typo breaking 32-bit, introduced in V5
>> Never set X86_FEATURE_RETPOLINE_AMD yet, pending confirmation
>>
>> Andi Kleen (3):
>> x86/retpoline/irq32: Convert assembler indirect jumps
>> x86/retpoline: Add boot time option to disable retpoline
>> x86/retpoline: Exclude objtool with retpoline
>>
>> David Woodhouse (7):
>> x86/retpoline: Add initial retpoline support
>> x86/retpoline/crypto: Convert crypto assembler indirect jumps
>> x86/retpoline/entry: Convert entry assembler indirect jumps
>> x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
>> x86/retpoline/hyperv: Convert assembler indirect jumps
>> x86/retpoline/xen: Convert Xen hypercall indirect jumps
>> x86/retpoline/checksum32: Convert assembler indirect jumps
>>
>> Documentation/admin-guide/kernel-parameters.txt | 3 +
>> arch/x86/Kconfig | 17 ++++-
>> arch/x86/Kconfig.debug | 6 +-
>> arch/x86/Makefile | 10 +++
>> arch/x86/crypto/aesni-intel_asm.S | 5 +-
>> arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 +-
>> arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 +-
>> arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +-
>> arch/x86/entry/entry_32.S | 5 +-
>> arch/x86/entry/entry_64.S | 12 +++-
>> arch/x86/include/asm/asm-prototypes.h | 25 +++++++
>> arch/x86/include/asm/cpufeatures.h | 2 +
>> arch/x86/include/asm/mshyperv.h | 18 ++---
>> arch/x86/include/asm/nospec-branch.h | 92
>> +++++++++++++++++++++++++
>> arch/x86/include/asm/xen/hypercall.h | 5 +-
>> arch/x86/kernel/cpu/common.c | 3 +
>> arch/x86/kernel/cpu/intel.c | 11 +++
>> arch/x86/kernel/ftrace_32.S | 6 +-
>> arch/x86/kernel/ftrace_64.S | 8 +--
>> arch/x86/kernel/irq_32.c | 9 +--
>> arch/x86/lib/Makefile | 1 +
>> arch/x86/lib/checksum_32.S | 7 +-
>> arch/x86/lib/retpoline.S | 48 +++++++++++++
>> 23 files changed, 264 insertions(+), 38 deletions(-)
>> create mode 100644 arch/x86/include/asm/nospec-branch.h
>> create mode 100644 arch/x86/lib/retpoline.S
>>
>> --
>> 2.7.4
>>
>
On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
> One detail that is missing is that we still need RSB refill in some
> cases.
> This is not because the retpoline sequence itself will underflow (it
> is actually guaranteed not to, since it consumes only RSB entries
> that it generates.
> But either to avoid poisoning of the RSB entries themselves, or to
> avoid the hardware turning to alternate predictors on RSB underflow.
>
> Enumerating the cases we care about:
>
> • user->kernel in the absence of SMEP:
> In the absence of SMEP, we must worry about user-generated RSB
> entries being consumable by kernel execution.
> Generally speaking, for synchronous execution this will not occur
> (e.g. syscall, interrupt), however, one important case remains.
> When we context switch between two threads, we should flush the RSB
> so that execution generated from the unbalanced return path on the
> thread that we just scheduled into, cannot consume RSB entries
> potentially installed by the prior thread.
Or IBPB here, yes? That's what we had in the original patch set when
retpoline came last, and what I assume will be put back again once we
*finally* get our act together and reinstate the full set of microcode
patches.
> kernel->kernel independent of SMEP:
> While much harder to coordinate, facilities such as eBPF potentially
> allow exploitable return targets to be created.
> Generally speaking (particularly if eBPF has been disabled) the risk
> is _much_ lower here, since we can only return into kernel execution
> that was already occurring on another thread (which could e.g. likely
> be attacked there directly independent of RSB poisoning.)
>
> guest->hypervisor, independent of SMEP:
> For guest ring0 -> host ring0 transitions, it is possible that the
> tagging only includes that the entry was only generated in a ring0
> context. Meaning that a guest generated entry may be consumed by the
> host. This admits:
We are also stuffing the RSB on vmexit in the IBRS/IBPB patch set,
aren't we?
On Sun, Jan 07, 2018 at 10:11:16PM +0000, David Woodhouse wrote:
> +#ifdef __ASSEMBLY__
> +
> +/*
> + * These are the bare retpoline primitives for indirect jmp and call.
> + * Do not use these directly; they only exist to make the ALTERNATIVE
> + * invocation below less ugly.
> + */
> +.macro RETPOLINE_JMP reg:req
> + call 1112f
> +1111: pause
> + jmp 1111b
> +1112: mov \reg, (%_ASM_SP)
> + ret
> +.endm
Should this not use local name labels instead?
.macro RETPOLINE_JMP reg:req
call .Ldo_rop_\@
.Lspec_trap_\@:
pause
jmp .Lspec_trap_\@
.Ldo_rop_\@:
mov \reg, (%_ASM_SP)
ret
.endm
And I suppose it might be nice to put a little comment with them
explaining how they work.
> +/*
> + * For i386 we use the original ret-equivalent retpoline, because
> + * otherwise we'll run out of registers. We don't care about CET
> + * here, anyway.
> + */
> +# define NOSPEC_CALL ALTERNATIVE( \
> + "call *%[thunk_target]\n", \
> + " jmp 1113f; " \
> + "1110: call 1112f; " \
> + "1111: pause; " \
> + " jmp 1111b; " \
> + "1112: addl $4, %%esp; " \
> + " pushl %[thunk_target]; " \
> + " ret; " \
> + "1113: call 1110b;\n", \
> + X86_FEATURE_RETPOLINE)
Ideally this would too, just not sure that works in inline asm.
On Mon, Jan 8, 2018 at 2:38 AM, Jiri Kosina <[email protected]> wrote:
> On Mon, 8 Jan 2018, Paul Turner wrote:
>
>> user->kernel in the absence of SMEP:
>> In the absence of SMEP, we must worry about user-generated RSB entries
>> being consumable by kernel execution.
>> Generally speaking, for synchronous execution this will not occur (e.g.
>> syscall, interrupt), however, one important case remains.
>> When we context switch between two threads, we should flush the RSB so that
>> execution generated from the unbalanced return path on the thread that we
>> just scheduled into, cannot consume RSB entries potentially installed by
>> the prior thread.
>
> I am still unclear whether this closes it completely, as when HT is on,
> the RSB is shared between the threads, right? Therefore one thread can
> poision it for the other without even context switch happening.
>
See 2.6.1.1 [Replicated resources]:
"The return stack predictor is replicated to improve branch
prediction of return instructions"
(This is part of the reason that the sequence is attractive; its use
of the RSB to control prediction naturally prevents cross-sibling
attack.)
> --
> Jiri Kosina
> SUSE Labs
>
On Mon, 2018-01-08 at 11:45 +0100, Peter Zijlstra wrote:
>
>
> Should this not use local name labels instead?
>
> .macro RETPOLINE_JMP reg:req
> call .Ldo_rop_\@
> .Lspec_trap_\@:
> pause
> jmp .Lspec_trap_\@
> .Ldo_rop_\@:
> mov \reg, (%_ASM_SP)
> ret
> .endm
Not if you want to be able to use them twice in the same .S file.
On Mon, Jan 8, 2018 at 2:45 AM, David Woodhouse <[email protected]> wrote:
> On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
>> One detail that is missing is that we still need RSB refill in some
>> cases.
>> This is not because the retpoline sequence itself will underflow (it
>> is actually guaranteed not to, since it consumes only RSB entries
>> that it generates.
>> But either to avoid poisoning of the RSB entries themselves, or to
>> avoid the hardware turning to alternate predictors on RSB underflow.
>>
>> Enumerating the cases we care about:
>>
>> • user->kernel in the absence of SMEP:
>> In the absence of SMEP, we must worry about user-generated RSB
>> entries being consumable by kernel execution.
>> Generally speaking, for synchronous execution this will not occur
>> (e.g. syscall, interrupt), however, one important case remains.
>> When we context switch between two threads, we should flush the RSB
>> so that execution generated from the unbalanced return path on the
>> thread that we just scheduled into, cannot consume RSB entries
>> potentially installed by the prior thread.
>
> Or IBPB here, yes? That's what we had in the original patch set when
> retpoline came last, and what I assume will be put back again once we
> *finally* get our act together and reinstate the full set of microcode
> patches.
IBPB is *much* more expensive than the sequence I suggested.
If the kernel has been protected with a retpoline compilation, it is
much faster to not use IBPB here; we only need to prevent
ret-poisoning in this case.
>
>> kernel->kernel independent of SMEP:
>> While much harder to coordinate, facilities such as eBPF potentially
>> allow exploitable return targets to be created.
>> Generally speaking (particularly if eBPF has been disabled) the risk
>> is _much_ lower here, since we can only return into kernel execution
>> that was already occurring on another thread (which could e.g. likely
>> be attacked there directly independent of RSB poisoning.)
>>
>> guest->hypervisor, independent of SMEP:
>> For guest ring0 -> host ring0 transitions, it is possible that the
>> tagging only includes that the entry was only generated in a ring0
>> context. Meaning that a guest generated entry may be consumed by the
>> host. This admits:
>
> We are also stuffing the RSB on vmexit in the IBRS/IBPB patch set,
> aren't we?
A) I am enumerating all of the cases for completeness. It was missed
by many that this detail was necessary on this patch, independently of
IBRS.
B) On the parts duplicated in (A), for specifics that are contributory to
correctness in both cases, we should not hand-wave over the fact that
they may or may not be covered by another patch-set. Users need to
understand what's required for complete protection. Particularly if they
are backporting.
On Mon, Jan 08, 2018 at 10:53:02AM +0000, David Woodhouse wrote:
> On Mon, 2018-01-08 at 11:45 +0100, Peter Zijlstra wrote:
> >
> >
> > Should this not use local name labels instead?
> >
> > .macro RETPOLINE_JMP reg:req
> > ????????call????.Ldo_rop_\@
> > .Lspec_trap_\@:
> > ????????pause
> > ????????jmp .Lspec_trap_\@
> > .Ldo_rop_\@:
> > ????????mov?????\reg, (%_ASM_SP)
> > ????????ret
> > .endm
>
> Not if you want to be able to use them twice in the same .S file.
Should work fine, the \@ expands to a per-instance magic thing IIRC. All
the PTI helpers do this too and that works just fine, see
arch/x86/entry/calling.h
On 08/01/18 10:42, Paul Turner wrote:
> A sequence for efficiently refilling the RSB is:
> mov $8, %rax;
> .align 16;
> 3: call 4f;
> 3p: pause; call 3p;
> .align 16;
> 4: call 5f;
> 4p: pause; call 4p;
> .align 16;
> 5: dec %rax;
> jnz 3b;
> add $(16*8), %rsp;
> This implementation uses 8 loops, with 2 calls per iteration. This is
> marginally faster than a single call per iteration. We did not
> observe useful benefit (particularly relative to text size) from
> further unrolling. This may also be usefully split into smaller (e.g.
> 4 or 8 call) segments where we can usefully pipeline/intermix with
> other operations. It includes retpoline type traps so that if an
> entry is consumed, it cannot lead to controlled speculation. On my
> test system it took ~43 cycles on average. Note that non-zero
> displacement calls should be used as these may be optimized to not
> interact with the RSB due to their use in fetching RIP for 32-bit
> relocations.
Guidance from both Intel and AMD still states that 32 calls are required
in general. Is your above code optimised for a specific processor which
you know the RSB to be smaller on?
~Andrew
For Intel the manuals state that it's 16 entries -- 2.5.2.1
Agner also reports 16 (presumably experimentally measured) e.g.
http://www.agner.org/optimize/microarchitecture.pdf [3.8]
For AMD it can be larger, for example 32 entries on Fam17h (but 16
entries on Fam16h).
For future proofing a binary, or a new AMD processor, 32 calls are
required. I would suggest tuning this based on the current CPU (which
also covers the future case while saving cycles now) to save overhead.
On Mon, Jan 8, 2018 at 3:16 AM, Andrew Cooper <[email protected]> wrote:
> On 08/01/18 10:42, Paul Turner wrote:
>> A sequence for efficiently refilling the RSB is:
>> mov $8, %rax;
>> .align 16;
>> 3: call 4f;
>> 3p: pause; call 3p;
>> .align 16;
>> 4: call 5f;
>> 4p: pause; call 4p;
>> .align 16;
>> 5: dec %rax;
>> jnz 3b;
>> add $(16*8), %rsp;
>> This implementation uses 8 loops, with 2 calls per iteration. This is
>> marginally faster than a single call per iteration. We did not
>> observe useful benefit (particularly relative to text size) from
>> further unrolling. This may also be usefully split into smaller (e.g.
>> 4 or 8 call) segments where we can usefully pipeline/intermix with
>> other operations. It includes retpoline type traps so that if an
>> entry is consumed, it cannot lead to controlled speculation. On my
>> test system it took ~43 cycles on average. Note that non-zero
>> displacement calls should be used as these may be optimized to not
>> interact with the RSB due to their use in fetching RIP for 32-bit
>> relocations.
>
> Guidance from both Intel and AMD still states that 32 calls are required
> in general. Is your above code optimised for a specific processor which
> you know the RSB to be smaller on?
>
> ~Andrew
> On Mon, Jan 08, 2018 at 10:53:02AM +0000, David Woodhouse wrote:
>> On Mon, 2018-01-08 at 11:45 +0100, Peter Zijlstra wrote:
>> >
>> >
>> > Should this not use local name labels instead?
>> >
>> > .macro RETPOLINE_JMP reg:req
>> > call .Ldo_rop_\@
>> > .Lspec_trap_\@:
>> > pause
>> > jmp .Lspec_trap_\@
>> > .Ldo_rop_\@:
>> > mov \reg, (%_ASM_SP)
>> > ret
>> > .endm
>>
>> Not if you want to be able to use them twice in the same .S file.
>
> Should work fine, the \@ expands to a per-instance magic thing IIRC. All
> the PTI helpers do this too and that works just fine, see
> arch/x86/entry/calling.h
>
Ah, OK. Will do that when home in a few hours then.
--
dwmw2
> On Mon, Jan 8, 2018 at 2:45 AM, David Woodhouse <[email protected]>
> wrote:
>> On Mon, 2018-01-08 at 02:34 -0800, Paul Turner wrote:
>>> One detail that is missing is that we still need RSB refill in some
>>> cases.
>>> This is not because the retpoline sequence itself will underflow (it
>>> is actually guaranteed not to, since it consumes only RSB entries
>>> that it generates.
>>> But either to avoid poisoning of the RSB entries themselves, or to
>>> avoid the hardware turning to alternate predictors on RSB underflow.
>>>
>>> Enumerating the cases we care about:
>>>
>>> • user->kernel in the absence of SMEP:
>>> In the absence of SMEP, we must worry about user-generated RSB
>>> entries being consumable by kernel execution.
>>> Generally speaking, for synchronous execution this will not occur
>>> (e.g. syscall, interrupt), however, one important case remains.
>>> When we context switch between two threads, we should flush the RSB
>>> so that execution generated from the unbalanced return path on the
>>> thread that we just scheduled into, cannot consume RSB entries
>>> potentially installed by the prior thread.
>>
>> Or IBPB here, yes? That's what we had in the original patch set when
>> retpoline came last, and what I assume will be put back again once we
>> *finally* get our act together and reinstate the full set of microcode
>> patches.
>
> IBPB is *much* more expensive than the sequence I suggested.
> If the kernel has been protected with a retpoline compilation, it is
> much faster to not use IBPB here; we only need to prevent
> ret-poisoning in this case.
Retpoline protects the kernel but IBPB is needed on context switch anyway
to protect userspace processes from each other.
But...
> A) I am enumerating all of the cases for completeness. It was missed
> by many that this detail was necessary on this patch, independently of
> IBRS.
> B) On the parts duplicated in (A), for specifics that are contributory to
> correctness in both cases, we should not hand-wave over the fact that
> they may or may not be covered by another patch-set. Users need to
> understand what's required for complete protection. Particularly if they
> are backporting.
... yes, agreed. Now we are putting retpoline first we shouldn't miss
things that we *were* doing anyway. TBH I really don't think we should
have spilt the patch sets apart; we'll work on getting the rest on top
ASAP.
--
dwmw2
On Mon, Jan 08, 2018 at 10:34:00AM +0000, Woodhouse, David wrote:
> On Mon, 2018-01-08 at 11:25 +0100, Thomas Gleixner wrote:
> > On Sun, 7 Jan 2018, David Woodhouse wrote:
> >
> > Cc+ Josh Poimboeuf <[email protected]>
> >
> > Sigh....
> >
> > > From: Andi Kleen <[email protected]>
> > >
> > > objtool's assembler nanny currently cannot deal with the code generated
> > > by the retpoline compiler and throws hundreds of warnings, mostly
> > > because it sees calls that don't have a symbolic target.
> > >
> > > Exclude all the options that rely on objtool when RETPOLINE is active.
> > >
> > > This mainly means that we use the frame pointer unwinder and livepatch
> > > is not supported.
> > >
> > > Eventually objtool can be fixed to handle this.
>
> I believe Josh is on vacation and I've asked Amit to take a look at
> this, at least as far as ensuring that it doesn't actually disable
> kpatch.
I'm back now and will start looking at it this week.
For future revisions of the patch set, can you add me to CC? And also
it would be good to add [email protected] as I'm sure the other x86
maintainers (mingo and hpa) want to see these patches.
--
Josh
On Sun, Jan 07, 2018 at 10:11:16PM +0000, David Woodhouse wrote:
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index a20eacd..918e550 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -235,6 +235,16 @@ KBUILD_CFLAGS += -Wno-sign-compare
> #
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>
> +# Avoid indirect branches in kernel to deal with Spectre
> +ifdef CONFIG_RETPOLINE
> + RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
> + ifneq ($(RETPOLINE_CFLAGS),)
> + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
> + else
> + $(warning Retpoline not supported in compiler. System may be insecure.)
> + endif
> +endif
I wonder if an error might be more appropriate than a warning. I
learned from experience that a lot of people don't see these Makefile
warnings, and this would be a dangerous one to miss.
Also if this were an error, you could get rid of the RETPOLINE define,
and that would be one less define cluttering up the already way-too-long
GCC arg list.
--
Josh
On Mon, 8 Jan 2018, Josh Poimboeuf wrote:
> On Sun, Jan 07, 2018 at 10:11:16PM +0000, David Woodhouse wrote:
> > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > index a20eacd..918e550 100644
> > --- a/arch/x86/Makefile
> > +++ b/arch/x86/Makefile
> > @@ -235,6 +235,16 @@ KBUILD_CFLAGS += -Wno-sign-compare
> > #
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> >
> > +# Avoid indirect branches in kernel to deal with Spectre
> > +ifdef CONFIG_RETPOLINE
> > + RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
> > + ifneq ($(RETPOLINE_CFLAGS),)
> > + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
> > + else
> > + $(warning Retpoline not supported in compiler. System may be insecure.)
> > + endif
> > +endif
>
> I wonder if an error might be more appropriate than a warning. I
> learned from experience that a lot of people don't see these Makefile
> warnings, and this would be a dangerous one to miss.
>
> Also if this were an error, you could get rid of the RETPOLINE define,
> and that would be one less define cluttering up the already way-too-long
> GCC arg list.
It still allows to get the ASM part covered. If that's worth it I can't tell.
Thanks,
tglx
On Sun, Jan 07, 2018 at 10:11:17PM +0000, David Woodhouse wrote:
> Convert all indirect jumps in crypto assembler code to use non-speculative
> sequences when CONFIG_RETPOLINE is enabled.
>
> Signed-off-by: David Woodhouse <[email protected]>
> Acked-By: Arjan van de Ven <[email protected]>
> ---
> arch/x86/crypto/aesni-intel_asm.S | 5 +++--
> arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 ++-
> arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 ++-
> arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 ++-
> 4 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
> index 16627fe..f128680 100644
> --- a/arch/x86/crypto/aesni-intel_asm.S
> +++ b/arch/x86/crypto/aesni-intel_asm.S
> @@ -32,6 +32,7 @@
> #include <linux/linkage.h>
> #include <asm/inst.h>
> #include <asm/frame.h>
> +#include <asm/nospec-branch.h>
>
> /*
> * The following macros are used to move an (un)aligned 16 byte value to/from
> @@ -2884,7 +2885,7 @@ ENTRY(aesni_xts_crypt8)
> pxor INC, STATE4
> movdqu IV, 0x30(OUTP)
>
> - call *%r11
> + NOSPEC_CALL %r11
Personally I find
CALL_NOSPEC %r11
to be more readable. The call is the action and the nospec is a detail.
And similarly for NOSPEC_JMP -> JMP_NOSPEC.
--
Josh
On Mon, Jan 08, 2018 at 02:46:32PM +0100, Thomas Gleixner wrote:
> On Mon, 8 Jan 2018, Josh Poimboeuf wrote:
> > On Sun, Jan 07, 2018 at 10:11:16PM +0000, David Woodhouse wrote:
> > > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > > index a20eacd..918e550 100644
> > > --- a/arch/x86/Makefile
> > > +++ b/arch/x86/Makefile
> > > @@ -235,6 +235,16 @@ KBUILD_CFLAGS += -Wno-sign-compare
> > > #
> > > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > >
> > > +# Avoid indirect branches in kernel to deal with Spectre
> > > +ifdef CONFIG_RETPOLINE
> > > + RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
> > > + ifneq ($(RETPOLINE_CFLAGS),)
> > > + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
> > > + else
> > > + $(warning Retpoline not supported in compiler. System may be insecure.)
> > > + endif
> > > +endif
> >
> > I wonder if an error might be more appropriate than a warning. I
> > learned from experience that a lot of people don't see these Makefile
> > warnings, and this would be a dangerous one to miss.
> >
> > Also if this were an error, you could get rid of the RETPOLINE define,
> > and that would be one less define cluttering up the already way-too-long
> > GCC arg list.
>
> It still allows to get the ASM part covered. If that's worth it I can't tell.
If there's a makefile error above, then CONFIG_RETPOLINE would already
imply compiler support, so the ASM code with the new '%V' option could
just do 'ifdef CONFIG_RETPOLINE'.
--
Josh
> On Mon, Jan 08, 2018 at 02:46:32PM +0100, Thomas Gleixner wrote:
>> On Mon, 8 Jan 2018, Josh Poimboeuf wrote:
>> > On Sun, Jan 07, 2018 at 10:11:16PM +0000, David Woodhouse wrote:
>> > > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
>> > > index a20eacd..918e550 100644
>> > > --- a/arch/x86/Makefile
>> > > +++ b/arch/x86/Makefile
>> > > @@ -235,6 +235,16 @@ KBUILD_CFLAGS += -Wno-sign-compare
>> > > #
>> > > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>> > >
>> > > +# Avoid indirect branches in kernel to deal with Spectre
>> > > +ifdef CONFIG_RETPOLINE
>> > > + RETPOLINE_CFLAGS += $(call
>> cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
>> > > + ifneq ($(RETPOLINE_CFLAGS),)
>> > > + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
>> > > + else
>> > > + $(warning Retpoline not supported in compiler. System may
>> be insecure.)
>> > > + endif
>> > > +endif
>> >
>> > I wonder if an error might be more appropriate than a warning. I
>> > learned from experience that a lot of people don't see these Makefile
>> > warnings, and this would be a dangerous one to miss.
>> >
>> > Also if this were an error, you could get rid of the RETPOLINE define,
>> > and that would be one less define cluttering up the already
>> way-too-long
>> > GCC arg list.
>>
>> It still allows to get the ASM part covered. If that's worth it I can't
>> tell.
>
> If there's a makefile error above, then CONFIG_RETPOLINE would already
> imply compiler support, so the ASM code with the new '%V' option could
> just do 'ifdef CONFIG_RETPOLINE'.
I did look at ditching the -DRETPOLINE but there is benefit in doing the
sys_call_table jump even when GCC isn't updated. So I put it back.
--
dwmw2
On Mon, Jan 08, 2018 at 02:42:13AM -0800, Paul Turner wrote:
>
> kernel->kernel independent of SMEP:
> While much harder to coordinate, facilities such as eBPF potentially
> allow exploitable return targets to be created.
> Generally speaking (particularly if eBPF has been disabled) the risk
> is _much_ lower here, since we can only return into kernel execution
> that was already occurring on another thread (which could e.g. likely
> be attacked there directly independent of RSB poisoning.)
we can remove bpf interpreter without losing features:
https://patchwork.ozlabs.org/patch/856694/
Ironically JIT is more secure than interpreter.
* Linus Torvalds <[email protected]> wrote:
> On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse <[email protected]> wrote:
> > This is a mitigation for the 'variant 2' attack described in
> > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
>
> Ok, I don't love the patches, but I see nothing horribly wrong here
> either, and I assume the performance impact of this is pretty minimal.
>
> Thomas? I'm obviously doing rc7 today without these, but I assume the
> x86 maintainers are resigned to this all. And yes, we'll have at least
> an rc8 this release..
I'm definitely resigned to them, and with these patches being disclosed so late
we don't have any good choices left, so a tentative:
Acked-by: Ingo Molnar <[email protected]>
Thanks,
Ingo
On Mon, 8 Jan 2018, Ingo Molnar wrote:
> * Linus Torvalds <[email protected]> wrote:
>
> > On Sun, Jan 7, 2018 at 2:11 PM, David Woodhouse <[email protected]> wrote:
> > > This is a mitigation for the 'variant 2' attack described in
> > > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> >
> > Ok, I don't love the patches, but I see nothing horribly wrong here
> > either, and I assume the performance impact of this is pretty minimal.
> >
> > Thomas? I'm obviously doing rc7 today without these, but I assume the
> > x86 maintainers are resigned to this all. And yes, we'll have at least
> > an rc8 this release..
>
> I'm definitely resigned to them, and with these patches being disclosed so late
> we don't have any good choices left, so a tentative:
>
> Acked-by: Ingo Molnar <[email protected]>
Just doing the last polishing on that. I'll add your ack
On Mon, Jan 08, 2018 at 02:26:11PM -0000, David Woodhouse wrote:
>
> > On Mon, Jan 08, 2018 at 02:46:32PM +0100, Thomas Gleixner wrote:
> >> On Mon, 8 Jan 2018, Josh Poimboeuf wrote:
> >> > On Sun, Jan 07, 2018 at 10:11:16PM +0000, David Woodhouse wrote:
> >> > > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> >> > > index a20eacd..918e550 100644
> >> > > --- a/arch/x86/Makefile
> >> > > +++ b/arch/x86/Makefile
> >> > > @@ -235,6 +235,16 @@ KBUILD_CFLAGS += -Wno-sign-compare
> >> > > #
> >> > > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> >> > >
> >> > > +# Avoid indirect branches in kernel to deal with Spectre
> >> > > +ifdef CONFIG_RETPOLINE
> >> > > + RETPOLINE_CFLAGS += $(call
> >> cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
> >> > > + ifneq ($(RETPOLINE_CFLAGS),)
> >> > > + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
> >> > > + else
> >> > > + $(warning Retpoline not supported in compiler. System may
> >> be insecure.)
> >> > > + endif
> >> > > +endif
> >> >
> >> > I wonder if an error might be more appropriate than a warning. I
> >> > learned from experience that a lot of people don't see these Makefile
> >> > warnings, and this would be a dangerous one to miss.
> >> >
> >> > Also if this were an error, you could get rid of the RETPOLINE define,
> >> > and that would be one less define cluttering up the already
> >> way-too-long
> >> > GCC arg list.
> >>
> >> It still allows to get the ASM part covered. If that's worth it I can't
> >> tell.
> >
> > If there's a makefile error above, then CONFIG_RETPOLINE would already
> > imply compiler support, so the ASM code with the new '%V' option could
> > just do 'ifdef CONFIG_RETPOLINE'.
>
> I did look at ditching the -DRETPOLINE but there is benefit in doing the
> sys_call_table jump even when GCC isn't updated. So I put it back.
What benefit is that? Doesn't it give the user a false sense of
security, since there's no shortage of other indirect branches to
attack?
--
Josh
This patch further hardens retpoline.
CPUs have return buffers which store the return address for
RET to predict function returns. Some CPUs (Skylake, some Broadwells)
can fall back to indirect branch prediction on return buffer underflow.
With retpoline we want to avoid uncontrolled indirect branches,
which could be poisoned by ring 3, so we need to avoid uncontrolled
return buffer underflows in the kernel.
This can happen when we're context switching from a shallower to a
deeper kernel stack. The deeper kernel stack would eventually underflow
the return buffer, which again would fall back to the indirect branch predictor.
To guard against this fill the return buffer with controlled
content during context switch. This prevents any underflows.
We always fill the buffer with 30 entries: 32 minus 2 for at
least one call from entry_{64,32}.S to C code and another into
the function doing the filling.
That's pessimistic because we likely did more controlled kernel calls.
So in principle we could do less. However it's hard to maintain such an
invariant, and it may be broken with more aggressive compilers.
So err on the side of safety and always fill 30.
[dwmw2: Fix comments about nop between calls,
Move #ifdef CONFIG_RETPOLINE to call sites not macro]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/entry/entry_32.S | 17 +++++++++++++++++
arch/x86/entry/entry_64.S | 17 +++++++++++++++++
arch/x86/include/asm/nospec-branch.h | 30 ++++++++++++++++++++++++++++++
3 files changed, 64 insertions(+)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index cf9ef33d299b..b6b83b9d3a0b 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -250,6 +250,23 @@ ENTRY(__switch_to_asm)
popl %ebx
popl %ebp
+#ifdef CONFIG_RETPOLINE
+ /*
+ * When we switch from a shallower to a deeper call stack
+ * the call stack will underflow in the kernel in the next task.
+ * This could cause the CPU to fall back to indirect branch
+ * prediction, which may be poisoned.
+ *
+ * To guard against that always fill the return stack with
+ * known values.
+ *
+ * We do this in assembler because it needs to be before
+ * any calls on the new stack, and this can be difficult to
+ * ensure in a complex C function like __switch_to.
+ */
+ ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+ FILL_RETURN_BUFFER
+#endif
jmp __switch_to
END(__switch_to_asm)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9bce6ed03353..1622e07c5ae8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -495,6 +495,23 @@ ENTRY(__switch_to_asm)
popq %rbx
popq %rbp
+#ifdef CONFIG_RETPOLINE
+ /*
+ * When we switch from a shallower to a deeper call stack
+ * the call stack will underflow in the kernel in the next task.
+ * This could cause the CPU to fall back to indirect branch
+ * prediction, which may be poisoned.
+ *
+ * To guard against that always fill the return stack with
+ * known values.
+ *
+ * We do this in assembler because it needs to be before
+ * any calls on the new stack, and this can be difficult to
+ * ensure in a complex C function like __switch_to.
+ */
+ ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+ FILL_RETURN_BUFFER
+#endif
jmp __switch_to
END(__switch_to_asm)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index b8c8eeacb4be..3022b1a4de17 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -53,6 +53,36 @@
#endif
.endm
+/*
+ * We use 32-N: 32 is the max return buffer size, but there should
+ * have been at a minimum two controlled calls already: one into the
+ * kernel from entry*.S and another into the function containing this
+ * macro. So N=2, thus 30.
+ */
+#define NUM_BRANCHES_TO_FILL 30
+
+/*
+ * Fill the CPU return stack buffer to prevent indirect branch
+ * prediction on underflow. We need a 'nop' after each call so it
+ * isn't interpreted by the CPU as a simple 'push %eip', which would
+ * be handled specially and not put anything in the RSB.
+ *
+ * Required in various cases for retpoline and IBRS-based mitigations
+ * for Spectre variant 2 vulnerability.
+ */
+.macro FILL_RETURN_BUFFER
+ .rept NUM_BRANCHES_TO_FILL
+ call 1221f
+ nop
+1221:
+ .endr
+#ifdef CONFIG_64BIT
+ addq $8*NUM_BRANCHES_TO_FILL, %rsp
+#else
+ addl $4*NUM_BRANCHES_TO_FILL, %esp
+#endif
+.endm
+
#else /* __ASSEMBLY__ */
#if defined(CONFIG_X86_64) && defined(RETPOLINE)
--
2.14.3
On Mon, Jan 8, 2018 at 3:44 PM, David Woodhouse <[email protected]> wrote:
>
> To guard against this fill the return buffer with controlled
> content during context switch. This prevents any underflows.
Ugh. I really dislike this patch. Everything else in the retpoline
patches makes me go "ok, that's reasonable". This one makes me go
"Eww".
It's hacky, it's ugly, and it looks pretty expensive too.
Is there really nothing more clever we can do?
Linus
On Mon, 2018-01-08 at 15:56 -0800, Linus Torvalds wrote:
> On Mon, Jan 8, 2018 at 3:44 PM, David Woodhouse <[email protected]> wrote:
> >
> > To guard against this fill the return buffer with controlled
> > content during context switch. This prevents any underflows.
>
> Ugh. I really dislike this patch. Everything else in the retpoline
> patches makes me go "ok, that's reasonable". This one makes me go
> "Eww".
>
> It's hacky, it's ugly, and it looks pretty expensive too.
>
> Is there really nothing more clever we can do?
You get this part in the IBRS/microcode solution too. The IBRS MSR
doesn't catch everything; you still need to stuff the RSB in very
similar places (and/or use the IBPB MSR in some).
On Mon, Jan 08, 2018 at 03:56:30PM -0800, Linus Torvalds wrote:
> On Mon, Jan 8, 2018 at 3:44 PM, David Woodhouse <[email protected]> wrote:
> >
> > To guard against this fill the return buffer with controlled
> > content during context switch. This prevents any underflows.
>
> Ugh. I really dislike this patch. Everything else in the retpoline
> patches makes me go "ok, that's reasonable". This one makes me go
> "Eww".
>
> It's hacky, it's ugly, and it looks pretty expensive too.
Modern cores are quite fast at executing calls.
>
> Is there really nothing more clever we can do?
We could be a cleverer in selecting how many dummy calls to do.
But that would likely be fragile and hard to maintain
and likely be more complicated, and I doubt it would buy that much.
Don't really have a better proposal, sorry.
-Andi
On Mon, Jan 8, 2018 at 3:58 PM, Woodhouse, David <[email protected]> wrote:
>>
>> Is there really nothing more clever we can do?
>
> You get this part in the IBRS/microcode solution too. The IBRS MSR
> doesn't catch everything; you still need to stuff the RSB in very
> similar places (and/or use the IBPB MSR in some).
So I was really hoping that in places like context switching etc, we'd
be able to instead effectively kill off any exploits by clearing
registers.
That should make it pretty damn hard to then find a matching "gadget"
that actually does anything interesting/powerful.
Together with Spectre already being pretty hard to take advantage of,
and the eBPF people making those user-proivided gadgets inaccessible,
it really should be a pretty powerful fix.
Hmm?
Linus
On Mon, 2018-01-08 at 16:35 -0800, Linus Torvalds wrote:
> On Mon, Jan 8, 2018 at 3:58 PM, Woodhouse, David <[email protected]> wrote:
> >>
> >> Is there really nothing more clever we can do?
> >
> > You get this part in the IBRS/microcode solution too. The IBRS MSR
> > doesn't catch everything; you still need to stuff the RSB in very
> > similar places (and/or use the IBPB MSR in some).
>
> So I was really hoping that in places like context switching etc, we'd
> be able to instead effectively kill off any exploits by clearing
> registers.
>
> That should make it pretty damn hard to then find a matching "gadget"
> that actually does anything interesting/powerful.
>
> Together with Spectre already being pretty hard to take advantage of,
> and the eBPF people making those user-proivided gadgets inaccessible,
> it really should be a pretty powerful fix.
Hm... on a context switch you're reloading the registers that were in
the other saved context. If the attacker does something they know will
go into a deep call stack and then sleep, and they pollute the BTB for
every 'ret' instruction in that stack¹, then the registers at the time
a 'ret' goes AWOL are not necessarily something you can 'clear' at the
time of the context switch.
And yes, a lot of this v2 stuff down past the sys_call_table branch
really *is* pretty damn hard already, especially the 'ret' based ones.
¹ perhaps they do the BTB-pollution in their other CPU-hogging thread
that runs on the same CPU when the attack thread is sleeping? I'm not
really trying to actually write an attack here, just thinking out
loud and being uncertain that you've actually proved there *isn't*
one :)
> So I was really hoping that in places like context switching etc, we'd
> be able to instead effectively kill off any exploits by clearing
> registers.
>
> That should make it pretty damn hard to then find a matching "gadget"
> that actually does anything interesting/powerful.
>
> Together with Spectre already being pretty hard to take advantage of,
> and the eBPF people making those user-proivided gadgets inaccessible,
> it really should be a pretty powerful fix.
>
> Hmm?
Essentially the RSB are hidden registers, and the only way to clear them
is the FILL_RETURN_BUFFER sequence. I don't see how clearing anything else
would help?
-Andi
On Mon, Jan 8, 2018 at 4:42 PM, David Woodhouse <[email protected]> wrote:
>
> Hm... on a context switch you're reloading the registers that were in
> the other saved context.
Actually, iirc we used to very actively try to minimize that by having
the inline asm mark a lot of registers as clobbered.
We moved away from that and now have that "switch_to_asm()" call
instead, but that was for unrelated reasons.
If I remember our old inline asm, we actually had *very* little real
data that was actually live on context switch, particularly that last
"branch to new EIP" point.
Partly because we had different targets, one of which was that "return
from fork" case.
But maybe I mis-remember. Wouldn't be the first time. This is code I
used to know well, but that was many many moons ago, now there are
other suckers^W maintainers who actually work with it.
Linus
On Mon, 2018-01-08 at 16:48 -0800, Linus Torvalds wrote:
> On Mon, Jan 8, 2018 at 4:42 PM, David Woodhouse <[email protected]> wrote:
> >
> >
> > Hm... on a context switch you're reloading the registers that were in
> > the other saved context.
>
> Actually, iirc we used to very actively try to minimize that by having
> the inline asm mark a lot of registers as clobbered.
Sure, in that case it makes sense purely as a matter of hygiene to
explicitly clear the registers which were marked as clobbered.
In the original patch set from that Intel were collecting — before they
threw it all out the fucking window and started again from scratch on
the day the embargo broke — there were patches to do the same thing on
syscall entry too, for precisely the same reason.
On Mon, Jan 8, 2018 at 4:44 PM, Andi Kleen <[email protected]> wrote:
>
> Essentially the RSB are hidden registers, and the only way to clear them
> is the FILL_RETURN_BUFFER sequence. I don't see how clearing anything else
> would help?
Forget theory. Look at practice.
Let's just assume that the attacker can write arbitrarily to the RSB
state. Just accept it.
If you accept that, then you turn the question instead into: are there
things we can do to make that useless to an attacker.
And there the point is that even if you control the RSB contents, you
need to find something relevant to *put* in the RSB. You need to find
the gadget that makes that control of the RSB useful.
That's where clearing the other registers comes in. Particularly for
the shallow cases (maybe the attack involves calling a shallow system
call that goes to the scheduler immediately, like "pause()").
Finding those gadgets is already hard. And then you have to find them
in such a way that you can control some state that you can start
reading out arbitrary memory. So you need to not just find the gadget,
you need to pass in an interesting pointer to it.
Preferably a pointer you control easily, like one of the registers
that nobody used on the way from the attacking user space to the
scheduler() call. That's already going to be damn hard, but with the C
compiler saving many registers by default, it might not be impossible.
And THAT is where "clear the registers" comes in. It adds _another_
huge barrier to an attack that was already pretty hard to begin with.
If we clear the registers, what the hell are you going to put in the
RSB that helps you?
I really think that people need to think about the actual _practical_
side here. We're never ever going to solve all theoretical timing
attacks. But when it comes to something like Spectre, it's already
very much non-trivial to attack. When then looking at something like
"a few call chains deep in the scheduler", it got harder still. If we
clear registers and don't give the attacker a way to leak data, that's
yet another big barrier.
So instead of saying "we have to flush the return stack", I'm saying
that we should look at things that make flushing the return stack
_unnecessary_, simply because even if the attacker were to control it
entirely, they'd still be up shit creek without a paddle.
Linus
> If we clear the registers, what the hell are you going to put in the
> RSB that helps you?
RSB allows you to control chains of gadgets.
You can likely find some chain of gadgets that set up constants in registers in a
lot of useful ways. Perhaps not any way (so may be hard to scan through all of
memory), but it's likely you could find gadgets that result in a lot of useful
direct mapped addresses, which the next gadget can then reference.
Especially RAX is quite vulnerable to this because there will be a lot
of code that does "modify RAX in interesting ways ; RET"
> So instead of saying "we have to flush the return stack", I'm saying
> that we should look at things that make flushing the return stack
> _unnecessary_, simply because even if the attacker were to control it
> entirely, they'd still be up shit creek without a paddle.
I agree that clearing registers is useful (was just hacking on that patch).
-Andi
On 09/01/2018 00:58, Linus Torvalds wrote:
> On Mon, Jan 8, 2018 at 4:44 PM, Andi Kleen <[email protected]> wrote:
>> Essentially the RSB are hidden registers, and the only way to clear them
>> is the FILL_RETURN_BUFFER sequence. I don't see how clearing anything else
>> would help?
> Forget theory. Look at practice.
>
> Let's just assume that the attacker can write arbitrarily to the RSB
> state. Just accept it.
>
> If you accept that, then you turn the question instead into: are there
> things we can do to make that useless to an attacker.
>
> And there the point is that even if you control the RSB contents, you
> need to find something relevant to *put* in the RSB. You need to find
> the gadget that makes that control of the RSB useful.
This is where the problem lies. An attacker with arbitrary control of
the RSB can redirect speculation arbitrarily. (And I agree, this is a
good default assumption to take).
If SMEP is not active, speculation can go anywhere, including to a user
controlled gadget which can reload any registers it needs, including
with immediate constants.
If SMEP is active, the attackers control of speculation is restricted to
supervisor executable mappings.
The real question is whether it is worth special casing the SMEP-active
case given that for the SMEP-inactive case, your only viable option is
to refill the RSB and discard any potentially poisoned mappings.
~Andrew
On Mon, Jan 08, 2018 at 05:16:02PM -0800, Andi Kleen wrote:
> > If we clear the registers, what the hell are you going to put in the
> > RSB that helps you?
>
> RSB allows you to control chains of gadgets.
I admit the gadget thing is a bit obscure.
There's another case we were actually more worried about:
On Skylake and Broadwell when the RSB underflows it will fall back to the
indirect branch predictor, which can be poisoned and we try to avoid
using with retpoline. So we try to avoid underflows, and this filling
helps us with that.
Does that make more sense?
-Andi
On Mon, 2018-01-08 at 17:21 -0800, Andi Kleen wrote:
> On Mon, Jan 08, 2018 at 05:16:02PM -0800, Andi Kleen wrote:
> > > If we clear the registers, what the hell are you going to put in the
> > > RSB that helps you?
> >
> > RSB allows you to control chains of gadgets.
>
> I admit the gadget thing is a bit obscure.
>
> There's another case we were actually more worried about:
>
> On Skylake and Broadwell when the RSB underflows it will fall back to the
> indirect branch predictor, which can be poisoned and we try to avoid
> using with retpoline. So we try to avoid underflows, and this filling
> helps us with that.
That's no longer true for Broadwell with the latest microcode, right?
And is why we *were* talking about using retpoline only for pre-
Skylake, and using IBRS on Skylake onwards (where IBRS isn't quite so
horribly slow anyway).
> > On Skylake and Broadwell when the RSB underflows it will fall back to the?
> > indirect branch predictor, which can be poisoned and we try to avoid
> > using with retpoline. So we try to avoid underflows, and this filling
> > helps us with that.
>
> That's no longer true for Broadwell with the latest microcode, right?
That's right.
-Andi
On Mon, Jan 8, 2018 at 5:21 PM, Andi Kleen <[email protected]> wrote:
> On Mon, Jan 08, 2018 at 05:16:02PM -0800, Andi Kleen wrote:
>> > If we clear the registers, what the hell are you going to put in the
>> > RSB that helps you?
>>
>> RSB allows you to control chains of gadgets.
>
> I admit the gadget thing is a bit obscure.
>
> There's another case we were actually more worried about:
>
> On Skylake and Broadwell when the RSB underflows it will fall back to the
(Broadwell without Microcode)
> indirect branch predictor, which can be poisoned and we try to avoid
> using with retpoline. So we try to avoid underflows, and this filling
> helps us with that.
>
> Does that make more sense?
The majority of the confusion does not stem from this being complicated.
It's that there's been great reluctance to document the details which
are different -- or changed by the microcode -- even at a high level
such as this.
Because of this, some of the details instead include vague "things are
different on Skylake" notes, with no exposition.
>
> -Andi
> On Jan 8, 2018, at 5:15 PM, Andrew Cooper <[email protected]> wrote:
>
>> On 09/01/2018 00:58, Linus Torvalds wrote:
>>> On Mon, Jan 8, 2018 at 4:44 PM, Andi Kleen <[email protected]> wrote:
>>> Essentially the RSB are hidden registers, and the only way to clear them
>>> is the FILL_RETURN_BUFFER sequence. I don't see how clearing anything else
>>> would help?
>> Forget theory. Look at practice.
>>
>> Let's just assume that the attacker can write arbitrarily to the RSB
>> state. Just accept it.
>>
>> If you accept that, then you turn the question instead into: are there
>> things we can do to make that useless to an attacker.
>>
>> And there the point is that even if you control the RSB contents, you
>> need to find something relevant to *put* in the RSB. You need to find
>> the gadget that makes that control of the RSB useful.
>
> This is where the problem lies. An attacker with arbitrary control of
> the RSB can redirect speculation arbitrarily. (And I agree, this is a
> good default assumption to take).
>
> If SMEP is not active, speculation can go anywhere, including to a user
> controlled gadget which can reload any registers it needs, including
> with immediate constants.
I thought that, even on pre-SMEP hardware, the CPU wouldn't speculatively execute from NX pages. And PTI marks user memory NX in kernel mode.
>
> If SMEP is active, the attackers control of speculation is restricted to
> supervisor executable mappings.
>
> The real question is whether it is worth special casing the SMEP-active
> case given that for the SMEP-inactive case, your only viable option is
> to refill the RSB and discard any potentially poisoned mappings.
>
> ~Andrew
On Mon, Jan 08, 2018 at 02:46:32PM +0100, Thomas Gleixner wrote:
> On Mon, 8 Jan 2018, Josh Poimboeuf wrote:
> > I wonder if an error might be more appropriate than a warning. I
> > learned from experience that a lot of people don't see these Makefile
> > warnings, and this would be a dangerous one to miss.
> >
> > Also if this were an error, you could get rid of the RETPOLINE define,
> > and that would be one less define cluttering up the already way-too-long
> > GCC arg list.
>
> It still allows to get the ASM part covered. If that's worth it I can't tell.
So elsewhere you stated we're dropping support for GCC without asm-goto
(<4.5), does it then make sense to make one more step and mandate a
retpoline capable compiler, which would put us at >=4.9 (for x86).
That would get rid of this weird case as well.
On Mon, 2018-01-08 at 19:27 -0800, Andy Lutomirski wrote:
> >
> > If SMEP is not active, speculation can go anywhere, including to a user
> > controlled gadget which can reload any registers it needs, including
> > with immediate constants.
>
> I thought that, even on pre-SMEP hardware, the CPU wouldn't
> speculatively execute from NX pages. And PTI marks user memory NX
> in kernel mode.
Hm, now that could be useful.
Do *all* the KPTI backports (some of which are reimplementations rather
than strictly backports) mark user memory NX?
On Tue, Jan 09, 2018 at 01:04:20PM +0000, David Woodhouse wrote:
> On Mon, 2018-01-08 at 19:27 -0800, Andy Lutomirski wrote:
> > >?
> > > If SMEP is not active, speculation can go anywhere, including to a user
> > > controlled gadget which can reload any registers it needs, including
> > > with immediate constants.
> >
> > I thought that, even on pre-SMEP hardware, the CPU wouldn't
> > speculatively execute from NX pages.? And PTI marks user memory NX
> > in kernel mode.
>
> Hm, now that could be useful.?
>
> Do *all* the KPTI backports (some of which are reimplementations rather
> than strictly backports) mark user memory NX?
That doesn't really matter for upstream though; but see here:
https://lkml.kernel.org/r/[email protected]
On Tue, 9 Jan 2018, Peter Zijlstra wrote:
> On Mon, Jan 08, 2018 at 02:46:32PM +0100, Thomas Gleixner wrote:
> > On Mon, 8 Jan 2018, Josh Poimboeuf wrote:
>
> > > I wonder if an error might be more appropriate than a warning. I
> > > learned from experience that a lot of people don't see these Makefile
> > > warnings, and this would be a dangerous one to miss.
> > >
> > > Also if this were an error, you could get rid of the RETPOLINE define,
> > > and that would be one less define cluttering up the already way-too-long
> > > GCC arg list.
> >
> > It still allows to get the ASM part covered. If that's worth it I can't tell.
>
> So elsewhere you stated we're dropping support for GCC without asm-goto
> (<4.5), does it then make sense to make one more step and mandate a
> retpoline capable compiler, which would put us at >=4.9 (for x86).
>
> That would get rid of this weird case as well.
I agree in principle, though the difference is that the retpoline compilers
are not available today, gcc with asm goto are.
The reasoning for the minimal thing was to cover at least the obvious easy
targets, eg. sys_call_table as the deeper ones are harder.
Thanks,
tglx
On Tue, 2018-01-09 at 13:36 +0100, Peter Zijlstra wrote:
> On Mon, Jan 08, 2018 at 02:46:32PM +0100, Thomas Gleixner wrote:
> > On Mon, 8 Jan 2018, Josh Poimboeuf wrote:
>
> > > I wonder if an error might be more appropriate than a warning. I
> > > learned from experience that a lot of people don't see these Makefile
> > > warnings, and this would be a dangerous one to miss.
> > >
> > > Also if this were an error, you could get rid of the RETPOLINE define,
> > > and that would be one less define cluttering up the already way-too-long
> > > GCC arg list.
> >
> > It still allows to get the ASM part covered. If that's worth it I can't tell.
>
> So elsewhere you stated we're dropping support for GCC without asm-goto
> (<4.5), does it then make sense to make one more step and mandate a
> retpoline capable compiler, which would put us at >=4.9 (for x86).
>
> That would get rid of this weird case as well.
Yeah... I don't have strong feelings there.
Arjan (IIRC) had asked me to keep it this way.
The idea was that those were the *easy* targets for an attacker to
find; especially in entry_64.S it's asm all the way to the indirect
branch. A rootkit might be targeted at entirely unpatched systems which
leave that vulnerable, and *even* though there are other targets which
could be found with more work, doing just the asm code might well end
up protecting from such an attack in practice.
The CONFIG_RETPOLINE/!RETPOLINE case isn't really *that* much
complexity; I don't really care about that either.
On the whole, I'm inclined to leave it as it is without further
bikeshedding for now. We can change it later once all those GCC
releases have been made with the backports, perhaps.
On Tue, Jan 9, 2018 at 5:04 AM, David Woodhouse <[email protected]> wrote:
> On Mon, 2018-01-08 at 19:27 -0800, Andy Lutomirski wrote:
>> >
>> > If SMEP is not active, speculation can go anywhere, including to a user
>> > controlled gadget which can reload any registers it needs, including
>> > with immediate constants.
>>
>> I thought that, even on pre-SMEP hardware, the CPU wouldn't
>> speculatively execute from NX pages. And PTI marks user memory NX
>> in kernel mode.
>
> Hm, now that could be useful.
>
> Do *all* the KPTI backports (some of which are reimplementations rather
> than strictly backports) mark user memory NX?
Yup. The KAISERish ports (4.9 and 4.4) have the same feature.
-Kees
--
Kees Cook
Pixel Security
On Tue, Jan 9, 2018 at 5:04 AM, David Woodhouse <[email protected]> wrote:
> On Mon, 2018-01-08 at 19:27 -0800, Andy Lutomirski wrote:
>>
>> I thought that, even on pre-SMEP hardware, the CPU wouldn't
>> speculatively execute from NX pages. And PTI marks user memory NX
>> in kernel mode.
>
> Hm, now that could be useful.
>
> Do *all* the KPTI backports (some of which are reimplementations rather
> than strictly backports) mark user memory NX?
As Kees said - yes. We're looking at patches to mark some processes
trusted and turn off KPTI for those, and then they'll obviously not
have that, but the whole point is that you'd trust them.
Also notice that the fundamental statement of "If SMEP is not active,
speculation can go anywhere" is not necessarily true either.
One of the reasons why the BTB can be poisoned from user space in the
first place is because the BTB doesn't have the full range of virtual
address bits for BTB lookup.
But at least on Intel, I think they don't have the full range of
virtual address bits for the _result_ either. The upper bits are
simply taken from the target. So the prediction only looks at - and
only affects - the lower bits of the indirect branch.
Which means that the whole "speculation can go anywhere" is garbage,
at least in general. You can't force the kernel to speculatively jump
to a user space address, because even if you have complete control of
the BTB, you only end up controlling the low bits of the predicted
address, because those are the only ones that exist in the BTB.
That all is obviously very implementation-dependent. You could have a
BTB that uses a small set of bits for the index checking, but the full
set of bits for target.
But since the problem (for the kernel) mostly arises from exactly the
fact that the BTB doesn't have the full bits for indexing, I suspect
the "not full bits in the target" is common too.
Linus
On Mon, 2018-01-08 at 02:42 -0800, Paul Turner wrote:
>
> While the cases above involve the crafting and use of poisoned
> entries. Recall also that one of the initial conditions was that we
> should avoid RSB underflow as some CPUs may try to use other indirect
> predictors when this occurs.
I think we should start by deliberately ignoring the CPUs which use the
other indirect predictors on RSB underflow. Those CPUs don't perform
*quite* so badly with IBRS anyway.
Let's get the minimum amount of RSB handling in to cope with the pre-
SKL CPUs, and then see if we really do want to extend it to make SKL
100% secure in retpoline mode or not.
So let's go through your list of cases and attempt to distinguish the
underflow concerns (which I declare we don't care about for now) from
the pollution (which we care about especially for non-SMEP) concerns...
> The cases we care about here are:
> - When we return _into_ protected execution. For the kernel, this
> means when we exit interrupt context into kernel context, since may
> have emptied or reduced the number of RSB entries while in iinterrupt
> context.
Don't care about that particular example. That's underflow-only.
However, we *do* care about entry to kernel code from userspace, for
interrupts and system calls etc. Basically everywhere that the IBRS
code would be setting IBRS, we need to flush the RSB (if !SMEP, I
think).
> - Context switch (even if we are returning to user code, we need to
> at unwind the scheduler/triggering frames that preempted it
> previously, considering that detail, this is a subset of the above,
> but listed for completeness)
Don't care. This is underflow-only. (Which means I think we want to
drop Andi's patch?)
> - On VMEXIT (it turns out we need to worry about both poisoned
> entries, and no entries, the solution is a single refill
> nonetheless).
Do care. This fixes pollution from the guest, and even SMEP isn't
enough to make us not care.
> - Leaving deeper (>C1) c-states, which may have flushed hardware
> state
Don't care.
> - Where we are unwinding call-chains of >16 entries[*]
Don't care.
Overall, I think the RSB-stuffing is needed in all the same places that
it's needed with IBRS.
* Woodhouse, David ([email protected]) wrote:
> On Mon, 2018-01-08 at 02:42 -0800, Paul Turner wrote:
> >
> > While the cases above involve the crafting and use of poisoned
> > entries.? Recall also that one of the initial conditions was that we
> > should avoid RSB underflow as some CPUs may try to use other indirect
> > predictors when this occurs.
>
> I think we should start by deliberately ignoring the CPUs which use the
> other indirect predictors on RSB underflow. Those CPUs don't perform
> *quite* so badly with IBRS anyway.
>
> Let's get the minimum amount of RSB handling in to cope with the pre-
> SKL CPUs, and then see if we really do want to extend it to make SKL
> 100% secure in retpoline mode or not.
How do you make decisions on which CPU you're running on?
I'm worried about the case of a VM that starts off on an older host
and then gets live migrated to a new Skylake.
For Intel CPUs we've historically been safe to live migrate
to any newer host based on having all the features that the old one had;
with the guest still seeing the flags etc for the old CPU.
Dave
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK
On Tue 2018-01-09 13:04:20, David Woodhouse wrote:
> On Mon, 2018-01-08 at 19:27 -0800, Andy Lutomirski wrote:
> > >?
> > > If SMEP is not active, speculation can go anywhere, including to a user
> > > controlled gadget which can reload any registers it needs, including
> > > with immediate constants.
> >
> > I thought that, even on pre-SMEP hardware, the CPU wouldn't
> > speculatively execute from NX pages.? And PTI marks user memory NX
> > in kernel mode.
>
> Hm, now that could be useful.?
>
> Do *all* the KPTI backports (some of which are reimplementations rather
> than strictly backports) mark user memory NX?
Hmm. We'd still want to do something on 32-bit, and those might not
even have NX support in hardware.
Pentium 4 (and such) is probably advanced enough to be vulnerable to
spectre, but not new enough to support NX...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html