2018-01-05 02:01:28

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 00/13] Retpoline: Avoid speculative indirect calls in kernel

This is a fix for the 'variant 2' attack described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Using GCC patches available from the gcc-7_2_0-retpoline branch of
http://git.infradead.org/users/dwmw2/gcc-retpoline.git and by manually
patching assembler code, all indirect branches (that occur after userspace
first runs) are eliminated from the kernel.

They are replaced with a 'retpoline' call sequence which deliberately
prevents speculation.

Now that the thunks are exported, we need to fix MODVERSIONS support,
because genksyms can't generate the crc for the symbols. Still working
on that...

v1: Initial post.
v2: Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message
v3: Update to the latest CET-capable retpoline version
Reinstate ALTERNATIVE support
v4: Finish reconciling Andi's and my patch sets, bug fixes.
Exclude objtool support for now
Add 'noretpoline' boot option
Add AMD retpoline alternative

Andi Kleen (4):
x86/retpoline/irq32: Convert assembler indirect jumps
retpoline/taint: Taint kernel for missing retpoline in compiler
x86/retpoline: Add boot time option to disable retpoline
x86/retpoline: Exclude objtool with retpoline

David Woodhouse (9):
x86/retpoline: Add initial retpoline support
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/alternatives: Add missing \n at end of ALTERNATIVE inline asm
x86/retpoline: Simplify AMD variant of retpoline thunk

Documentation/admin-guide/kernel-parameters.txt | 3 ++
Documentation/admin-guide/tainted-kernels.rst | 3 ++
arch/x86/Kconfig | 17 +++++++-
arch/x86/Kconfig.debug | 6 +--
arch/x86/Makefile | 10 +++++
arch/x86/crypto/aesni-intel_asm.S | 5 ++-
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 +-
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 +-
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 4 +-
arch/x86/entry/entry_32.S | 5 ++-
arch/x86/entry/entry_64.S | 22 ++++++++--
arch/x86/include/asm/alternative.h | 4 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/mshyperv.h | 18 ++++----
arch/x86/include/asm/nospec-branch.h | 58 +++++++++++++++++++++++++
arch/x86/include/asm/xen/hypercall.h | 5 ++-
arch/x86/kernel/cpu/intel.c | 10 +++++
arch/x86/kernel/ftrace_32.S | 6 ++-
arch/x86/kernel/ftrace_64.S | 8 ++--
arch/x86/kernel/irq_32.c | 9 ++--
arch/x86/kernel/setup.c | 6 +++
arch/x86/lib/Makefile | 1 +
arch/x86/lib/checksum_32.S | 7 +--
arch/x86/lib/retpoline.S | 53 ++++++++++++++++++++++
include/linux/kernel.h | 4 +-
kernel/module.c | 11 ++++-
kernel/panic.c | 1 +
scripts/mod/modpost.c | 9 ++++
28 files changed, 250 insertions(+), 42 deletions(-)
create mode 100644 arch/x86/include/asm/nospec-branch.h
create mode 100644 arch/x86/lib/retpoline.S

--
2.7.4


2018-01-05 02:01:36

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 01/13] x86/retpoline: Add initial retpoline support

Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
the corresponding thunks. Provide assembler macros for invoking the thunks
in the same way that GCC does, from native and inline assembler.

This adds an X86_BUG_NO_RETPOLINE "feature" for runtime patching out
of the thunks. This is a placeholder for now; the patches which support
the new Intel/AMD microcode features will flesh out the precise conditions
under which we disable the retpoline and do other things instead.

[Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/Kconfig | 13 ++++++++
arch/x86/Makefile | 10 +++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/nospec-branch.h | 58 ++++++++++++++++++++++++++++++++++++
arch/x86/lib/Makefile | 1 +
arch/x86/lib/retpoline.S | 53 ++++++++++++++++++++++++++++++++
6 files changed, 136 insertions(+)
create mode 100644 arch/x86/include/asm/nospec-branch.h
create mode 100644 arch/x86/lib/retpoline.S

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d4fc98c..1009d1a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -429,6 +429,19 @@ config GOLDFISH
def_bool y
depends on X86_GOLDFISH

+config RETPOLINE
+ bool "Avoid speculative indirect branches in kernel"
+ default y
+ help
+ Compile kernel with the retpoline compiler options to guard against
+ kernel to user data leaks by avoiding speculative indirect
+ branches. Requires a compiler with -mindirect-branch=thunk-extern
+ support for full protection. The kernel may run slower.
+
+ Without compiler support, at least indirect branches in assembler
+ code are eliminated. Since this includes the syscall entry path,
+ it is not entirely pointless.
+
config INTEL_RDT
bool "Intel Resource Director Technology support"
default n
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 3e73bc2..8fc45ec 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -230,6 +230,16 @@ KBUILD_CFLAGS += -Wno-sign-compare
#
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables

+# Avoid indirect branches in kernel to deal with Spectre
+ifdef CONFIG_RETPOLINE
+ RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)
+ ifneq ($(RETPOLINE_CFLAGS),)
+ KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
+ else
+ $(warning Retpoline not supported in compiler. System may be insecure.)
+ endif
+endif
+
archscripts: scripts_basic
$(Q)$(MAKE) $(build)=arch/x86/tools relocs

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 07cdd17..900fa70 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -342,5 +342,6 @@
#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */
#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */
#define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */
+#define X86_BUG_NO_RETPOLINE X86_BUG(15) /* Placeholder: disable retpoline branch thunks */

#endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
new file mode 100644
index 0000000..573e9a6
--- /dev/null
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __NOSPEC_BRANCH_H__
+#define __NOSPEC_BRANCH_H__
+
+#include <asm/alternative.h>
+#include <asm/alternative-asm.h>
+#include <asm/cpufeatures.h>
+
+#ifdef __ASSEMBLY__
+/*
+ * The asm code uses CONFIG_RETPOLINE; this part will happen even if
+ * the toolchain isn't retpoline-capable.
+ */
+.macro NOSPEC_JMP reg:req
+#ifdef CONFIG_RETPOLINE
+ ALTERNATIVE __stringify(jmp __x86.indirect_thunk.\reg), __stringify(jmp *%\reg), X86_BUG_NO_RETPOLINE
+#else
+ jmp *%\reg
+#endif
+.endm
+
+.macro NOSPEC_CALL reg:req
+#ifdef CONFIG_RETPOLINE
+ ALTERNATIVE __stringify(call __x86.indirect_thunk.\reg), __stringify(call *%\reg), X86_BUG_NO_RETPOLINE
+#else
+ call *%\reg
+#endif
+.endm
+
+#else /* __ASSEMBLY__ */
+
+/*
+ * Since the inline asm uses the %V modifier which is only in newer
+ * toolchains, this is dependent on RETPOLINE not CONFIG_RETPOLINE.
+ */
+#ifdef RETPOLINE
+# ifdef CONFIG_64BIT
+# define NOSPEC_CALL ALTERNATIVE( \
+ "call __x86.indirect_thunk.%V[thunk_target]\n", \
+ "call *%[thunk_target]\n", X86_BUG_NO_RETPOLINE)
+# define THUNK_TARGET(addr) [thunk_target] "r" (addr)
+# else /* i386 is going to run out of registers if we do that */
+# define NOSPEC_CALL ALTERNATIVE( \
+ " jmp 1221f; " \
+ "1222: push %[thunk_target];" \
+ " jmp __x86.indirect_thunk;" \
+ "1221: call 1222b;\n", \
+ "call *%[thunk_target]\n", X86_BUG_NO_RETPOLINE)
+# define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
+# endif /* !CONFIG_64BIT */
+#else
+# define NOSPEC_CALL "call *%[thunk_target]\n"
+# define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
+#endif
+
+#endif /* __ASSEMBLY__ */
+#endif /* __NOSPEC_BRANCH_H__ */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 7b181b6..f23934b 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -26,6 +26,7 @@ lib-y += memcpy_$(BITS).o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
+lib-$(CONFIG_RETPOLINE) += retpoline.o

obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o

diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
new file mode 100644
index 0000000..958d56f
--- /dev/null
+++ b/arch/x86/lib/retpoline.S
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/stringify.h>
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/cpufeatures.h>
+#include <asm/alternative-asm.h>
+#include <asm/export.h>
+
+.macro THUNK sp reg
+ .section .text.__x86.indirect_thunk.\reg
+
+ENTRY(__x86.indirect_thunk.\reg)
+ CFI_STARTPROC
+ ALTERNATIVE "call 2f", __stringify(jmp *%\reg), X86_BUG_NO_RETPOLINE
+1:
+ lfence
+ jmp 1b
+2:
+ mov %\reg, (%\sp)
+ ret
+ CFI_ENDPROC
+ENDPROC(__x86.indirect_thunk.\reg)
+EXPORT_SYMBOL(__x86.indirect_thunk.\reg)
+.endm
+
+#ifdef CONFIG_64BIT
+.irp reg rax rbx rcx rdx rsi rdi rbp r8 r9 r10 r11 r12 r13 r14 r15
+ THUNK rsp \reg
+.endr
+#else
+.irp reg eax ebx ecx edx esi edi ebp
+ THUNK esp \reg
+.endr
+
+/*
+ * Also provide the original ret-equivalent retpoline for i386 because it's
+ * so register-starved, and we don't care about CET compatibility here.
+ */
+ENTRY(__x86.indirect_thunk)
+ CFI_STARTPROC
+ ALTERNATIVE "call 2f", "ret", X86_BUG_NO_RETPOLINE
+1:
+ lfence
+ jmp 1b
+2:
+ lea 4(%esp), %esp
+ ret
+ CFI_ENDPROC
+ENDPROC(__x86.indirect_thunk)
+EXPORT_SYMBOL(__x86.indirect_thunk)
+
+#endif
--
2.7.4

2018-01-05 02:01:47

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 04/13] x86/retpoline/ftrace: Convert ftrace assembler indirect jumps

Convert all indirect jumps in ftrace assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/kernel/ftrace_32.S | 6 ++++--
arch/x86/kernel/ftrace_64.S | 8 ++++----
2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
index b6c6468..eb9a56f 100644
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -8,6 +8,7 @@
#include <asm/segment.h>
#include <asm/export.h>
#include <asm/ftrace.h>
+#include <asm/nospec-branch.h>

#ifdef CC_USING_FENTRY
# define function_hook __fentry__
@@ -197,7 +198,8 @@ ftrace_stub:
movl 0x4(%ebp), %edx
subl $MCOUNT_INSN_SIZE, %eax

- call *ftrace_trace_function
+ movl ftrace_trace_function, %ecx
+ NOSPEC_CALL ecx

popl %edx
popl %ecx
@@ -241,5 +243,5 @@ return_to_handler:
movl %eax, %ecx
popl %edx
popl %eax
- jmp *%ecx
+ NOSPEC_JMP ecx
#endif
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index c832291..d4611d8 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -7,7 +7,7 @@
#include <asm/ptrace.h>
#include <asm/ftrace.h>
#include <asm/export.h>
-
+#include <asm/nospec-branch.h>

.code64
.section .entry.text, "ax"
@@ -286,8 +286,8 @@ trace:
* ip and parent ip are used and the list function is called when
* function tracing is enabled.
*/
- call *ftrace_trace_function
-
+ movq ftrace_trace_function, %r8
+ NOSPEC_CALL r8
restore_mcount_regs

jmp fgraph_trace
@@ -329,5 +329,5 @@ GLOBAL(return_to_handler)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $24, %rsp
- jmp *%rdi
+ NOSPEC_JMP rdi
#endif
--
2.7.4

2018-01-05 02:02:06

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 11/13] x86/retpoline: Simplify AMD variant of retpoline thunk

On AMD (which is X86_FEATURE_K8), just the lfence is sufficient.

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/lib/retpoline.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 958d56f..d58f8f3 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -12,7 +12,7 @@

ENTRY(__x86.indirect_thunk.\reg)
CFI_STARTPROC
- ALTERNATIVE "call 2f", __stringify(jmp *%\reg), X86_BUG_NO_RETPOLINE
+ ALTERNATIVE_2 "call 2f", __stringify(lfence;jmp *%\reg), X86_FEATURE_K8, __stringify(jmp *%\reg), X86_BUG_NO_RETPOLINE
1:
lfence
jmp 1b
--
2.7.4

2018-01-05 02:02:11

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 13/13] x86/retpoline: Exclude objtool with retpoline

From: Andi Kleen <[email protected]>

objtool's assembler nanny currently cannot deal with the code generated
by the retpoline compiler and throws hundreds of warnings, mostly
because it sees calls that don't have a symbolic target.

Exclude all the options that rely on objtool when RETPOLINE is active.

This mainly means that we use the frame pointer unwinder and livepatch
is not supported.

Eventually objtool can be fixed to handle this.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/Kconfig | 4 ++--
arch/x86/Kconfig.debug | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1009d1a..439633a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -172,8 +172,8 @@ config X86
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
- select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION
- select HAVE_STACK_VALIDATION if X86_64
+ select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION && !RETPOLINE
+ select HAVE_STACK_VALIDATION if X86_64 && !RETPOLINE
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_USER_RETURN_NOTIFIER
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 672441c..f169d88 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -359,8 +359,8 @@ config PUNIT_ATOM_DEBUG

choice
prompt "Choose kernel unwinder"
- default UNWINDER_ORC if X86_64
- default UNWINDER_FRAME_POINTER if X86_32
+ default UNWINDER_ORC if X86_64 && !RETPOLINE
+ default UNWINDER_FRAME_POINTER if X86_32 || RETPOLINE
---help---
This determines which method will be used for unwinding kernel stack
traces for panics, oopses, bugs, warnings, perf, /proc/<pid>/stack,
@@ -368,7 +368,7 @@ choice

config UNWINDER_ORC
bool "ORC unwinder"
- depends on X86_64
+ depends on X86_64 && !RETPOLINE
select STACK_VALIDATION
---help---
This option enables the ORC (Oops Rewind Capability) unwinder for
--
2.7.4

2018-01-05 02:02:33

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 10/13] retpoline/taint: Taint kernel for missing retpoline in compiler

From: Andi Kleen <[email protected]>

When the kernel or a module hasn't been compiled with a retpoline
aware compiler, print a warning and set a taint flag.

For modules it is checked at compile time, however it cannot
check assembler or other non compiled objects used in the module link.

Due to lack of better letter it uses taint option 'Z'

v2: Change warning message
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
Documentation/admin-guide/tainted-kernels.rst | 3 +++
arch/x86/kernel/setup.c | 6 ++++++
include/linux/kernel.h | 4 +++-
kernel/module.c | 11 ++++++++++-
kernel/panic.c | 1 +
scripts/mod/modpost.c | 9 +++++++++
6 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index 1df03b5..800261b 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -52,6 +52,9 @@ characters, each representing a particular tainted value.

16) ``K`` if the kernel has been live patched.

+ 17) ``Z`` if the x86 kernel or a module hasn't been compiled with
+ a retpoline aware compiler and may be vulnerable to data leaks.
+
The primary reason for the **'Tainted: '** string is to tell kernel
debuggers if this is a clean kernel or if anything unusual has
occurred. Tainting is permanent: even if an offending module is
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8af2e8d..cc880b4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1296,6 +1296,12 @@ void __init setup_arch(char **cmdline_p)
#endif

unwind_init();
+
+#ifndef RETPOLINE
+ add_taint(TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK);
+ pr_warn("No support for retpoline in kernel compiler\n");
+ pr_warn("System may be vulnerable to data leaks.\n");
+#endif
}

#ifdef CONFIG_X86_32
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index ce51455..fbb4d3b 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -550,7 +550,9 @@ extern enum system_states {
#define TAINT_SOFTLOCKUP 14
#define TAINT_LIVEPATCH 15
#define TAINT_AUX 16
-#define TAINT_FLAGS_COUNT 17
+#define TAINT_NO_RETPOLINE 17
+
+#define TAINT_FLAGS_COUNT 18

struct taint_flag {
char c_true; /* character printed when tainted */
diff --git a/kernel/module.c b/kernel/module.c
index dea01ac..92db3f5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3028,7 +3028,16 @@ static int check_modinfo(struct module *mod, struct load_info *info, int flags)
mod->name);
add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);
}
-
+#ifdef RETPOLINE
+ if (!get_modinfo(info, "retpoline")) {
+ if (!test_taint(TAINT_NO_RETPOLINE)) {
+ pr_warn("%s: loading module not compiled with retpoline compiler.\n",
+ mod->name);
+ pr_warn("Kernel may be vulnerable to data leaks.\n");
+ }
+ add_taint_module(mod, TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK);
+ }
+#endif
if (get_modinfo(info, "staging")) {
add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);
pr_warn("%s: module is from the staging directory, the quality "
diff --git a/kernel/panic.c b/kernel/panic.c
index 2cfef40..6686c67 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -325,6 +325,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
{ 'L', ' ', false }, /* TAINT_SOFTLOCKUP */
{ 'K', ' ', true }, /* TAINT_LIVEPATCH */
{ 'X', ' ', true }, /* TAINT_AUX */
+ { 'Z', ' ', true }, /* TAINT_NO_RETPOLINE */
};

/**
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index f51cf97..6510536 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -2165,6 +2165,14 @@ static void add_intree_flag(struct buffer *b, int is_intree)
buf_printf(b, "\nMODULE_INFO(intree, \"Y\");\n");
}

+/* Cannot check for assembler */
+static void add_retpoline(struct buffer *b)
+{
+ buf_printf(b, "\n#ifdef RETPOLINE\n");
+ buf_printf(b, "MODULE_INFO(retpoline, \"Y\");\n");
+ buf_printf(b, "#endif\n");
+}
+
static void add_staging_flag(struct buffer *b, const char *name)
{
static const char *staging_dir = "drivers/staging";
@@ -2506,6 +2514,7 @@ int main(int argc, char **argv)
err |= check_modname_len(mod);
add_header(&buf, mod);
add_intree_flag(&buf, !external_module);
+ add_retpoline(&buf);
add_staging_flag(&buf, mod->name);
err |= add_versions(&buf, mod);
add_depends(&buf, mod, modules);
--
2.7.4

2018-01-05 02:01:54

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 08/13] x86/alternatives: Add missing \n at end of ALTERNATIVE inline asm

Where an ALTERNATIVE is used in the middle of an inline asm block, this
would otherwise lead to the following instruction being appended directly
to the trailing ".popsection", and a failed compile.

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/include/asm/alternative.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index dbfd085..cf5961c 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -140,7 +140,7 @@ static inline int alternatives_text_reserved(void *start, void *end)
".popsection\n" \
".pushsection .altinstr_replacement, \"ax\"\n" \
ALTINSTR_REPLACEMENT(newinstr, feature, 1) \
- ".popsection"
+ ".popsection\n"

#define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
OLDINSTR_2(oldinstr, 1, 2) \
@@ -151,7 +151,7 @@ static inline int alternatives_text_reserved(void *start, void *end)
".pushsection .altinstr_replacement, \"ax\"\n" \
ALTINSTR_REPLACEMENT(newinstr1, feature1, 1) \
ALTINSTR_REPLACEMENT(newinstr2, feature2, 2) \
- ".popsection"
+ ".popsection\n"

/*
* Alternative instructions for different CPU types or capabilities.
--
2.7.4

2018-01-05 02:03:11

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 09/13] x86/retpoline/irq32: Convert assembler indirect jumps

From: Andi Kleen <[email protected]>

Convert all indirect jumps in 32bit irq inline asm code to use
non speculative sequences.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/kernel/irq_32.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index a83b334..e1e58f7 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -20,6 +20,7 @@
#include <linux/mm.h>

#include <asm/apic.h>
+#include <asm/nospec-branch.h>

#ifdef CONFIG_DEBUG_STACKOVERFLOW

@@ -55,11 +56,11 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
static void call_on_stack(void *func, void *stack)
{
asm volatile("xchgl %%ebx,%%esp \n"
- "call *%%edi \n"
+ NOSPEC_CALL
"movl %%ebx,%%esp \n"
: "=b" (stack)
: "0" (stack),
- "D"(func)
+ [thunk_target] "D"(func)
: "memory", "cc", "edx", "ecx", "eax");
}

@@ -95,11 +96,11 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc)
call_on_stack(print_stack_overflow, isp);

asm volatile("xchgl %%ebx,%%esp \n"
- "call *%%edi \n"
+ NOSPEC_CALL
"movl %%ebx,%%esp \n"
: "=a" (arg1), "=b" (isp)
: "0" (desc), "1" (isp),
- "D" (desc->handle_irq)
+ [thunk_target] "D" (desc->handle_irq)
: "memory", "cc", "ecx");
return 1;
}
--
2.7.4

2018-01-05 02:03:29

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 05/13] x86/retpoline/hyperv: Convert assembler indirect jumps

Convert all indirect jumps in hyperv inline asm code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/include/asm/mshyperv.h | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 5400add..532ab44 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -7,6 +7,7 @@
#include <linux/nmi.h>
#include <asm/io.h>
#include <asm/hyperv.h>
+#include <asm/nospec-branch.h>

/*
* The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
@@ -186,10 +187,11 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
return U64_MAX;

__asm__ __volatile__("mov %4, %%r8\n"
- "call *%5"
+ NOSPEC_CALL
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
"+c" (control), "+d" (input_address)
- : "r" (output_address), "m" (hv_hypercall_pg)
+ : "r" (output_address),
+ THUNK_TARGET(hv_hypercall_pg)
: "cc", "memory", "r8", "r9", "r10", "r11");
#else
u32 input_address_hi = upper_32_bits(input_address);
@@ -200,13 +202,13 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
if (!hv_hypercall_pg)
return U64_MAX;

- __asm__ __volatile__("call *%7"
+ __asm__ __volatile__(NOSPEC_CALL
: "=A" (hv_status),
"+c" (input_address_lo), ASM_CALL_CONSTRAINT
: "A" (control),
"b" (input_address_hi),
"D"(output_address_hi), "S"(output_address_lo),
- "m" (hv_hypercall_pg)
+ THUNK_TARGET(hv_hypercall_pg)
: "cc", "memory");
#endif /* !x86_64 */
return hv_status;
@@ -227,10 +229,10 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)

#ifdef CONFIG_X86_64
{
- __asm__ __volatile__("call *%4"
+ __asm__ __volatile__(NOSPEC_CALL
: "=a" (hv_status), ASM_CALL_CONSTRAINT,
"+c" (control), "+d" (input1)
- : "m" (hv_hypercall_pg)
+ : THUNK_TARGET(hv_hypercall_pg)
: "cc", "r8", "r9", "r10", "r11");
}
#else
@@ -238,13 +240,13 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
u32 input1_hi = upper_32_bits(input1);
u32 input1_lo = lower_32_bits(input1);

- __asm__ __volatile__ ("call *%5"
+ __asm__ __volatile__ (NOSPEC_CALL
: "=A"(hv_status),
"+c"(input1_lo),
ASM_CALL_CONSTRAINT
: "A" (control),
"b" (input1_hi),
- "m" (hv_hypercall_pg)
+ THUNK_TARGET(hv_hypercall_pg)
: "cc", "edi", "esi");
}
#endif
--
2.7.4

2018-01-05 02:01:45

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 02/13] x86/retpoline/crypto: Convert crypto assembler indirect jumps

Convert all indirect jumps in crypto assembler code to use non-speculative
sequences when CONFIG_RETPOLINE is enabled.

[Andi Kleen: Fix crc32c 'bufp' handling]
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/crypto/aesni-intel_asm.S | 5 +++--
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 ++-
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 3 ++-
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 4 +++-
4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 16627fe..074c137 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -32,6 +32,7 @@
#include <linux/linkage.h>
#include <asm/inst.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>

/*
* The following macros are used to move an (un)aligned 16 byte value to/from
@@ -2884,7 +2885,7 @@ ENTRY(aesni_xts_crypt8)
pxor INC, STATE4
movdqu IV, 0x30(OUTP)

- call *%r11
+ NOSPEC_CALL r11

movdqu 0x00(OUTP), INC
pxor INC, STATE1
@@ -2929,7 +2930,7 @@ ENTRY(aesni_xts_crypt8)
_aesni_gf128mul_x_ble()
movups IV, (IVP)

- call *%r11
+ NOSPEC_CALL r11

movdqu 0x40(OUTP), INC
pxor INC, STATE1
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e..98a717b 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -17,6 +17,7 @@

#include <linux/linkage.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>

#define CAMELLIA_TABLE_BYTE_LEN 272

@@ -1227,7 +1228,7 @@ camellia_xts_crypt_16way:
vpxor 14 * 16(%rax), %xmm15, %xmm14;
vpxor 15 * 16(%rax), %xmm15, %xmm15;

- call *%r9;
+ NOSPEC_CALL r9;

addq $(16 * 16), %rsp;

diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index eee5b39..99d09d3 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -12,6 +12,7 @@

#include <linux/linkage.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>

#define CAMELLIA_TABLE_BYTE_LEN 272

@@ -1343,7 +1344,7 @@ camellia_xts_crypt_32way:
vpxor 14 * 32(%rax), %ymm15, %ymm14;
vpxor 15 * 32(%rax), %ymm15, %ymm15;

- call *%r9;
+ NOSPEC_CALL r9;

addq $(16 * 32), %rsp;

diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index 7a7de27..480f4e3 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -45,6 +45,7 @@

#include <asm/inst.h>
#include <linux/linkage.h>
+#include <asm/nospec-branch.h>

## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction

@@ -75,6 +76,7 @@
.text
ENTRY(crc_pcl)
#define bufp %rdi
+#define bufp_raw rdi
#define bufp_dw %edi
#define bufp_w %di
#define bufp_b %dil
@@ -172,7 +174,7 @@ continue_block:
movzxw (bufp, %rax, 2), len
lea crc_array(%rip), bufp
lea (bufp, len, 1), bufp
- jmp *bufp
+ NOSPEC_JMP bufp_raw

################################################################
## 2a) PROCESS FULL BLOCKS:
--
2.7.4

2018-01-05 02:01:44

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 03/13] x86/retpoline/entry: Convert entry assembler indirect jumps

Convert indirect jumps in core 32/64bit entry assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.

KPTI complicates this a little; the one in entry_SYSCALL_64_trampoline
can't just jump to the thunk because the thunk isn't mapped. So it gets
its own copy of the thunk, inline.

Don't use NOSPEC_CALL in entry_SYSCALL_64_fastpath because the return
address after the 'call' instruction must be *precisely* at the
.Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work,
and the use of alternatives will mess that up unless we play horrid
games to prepend with NOPs and make the variants the same length. It's
not worth it; in the case where we ALTERNATIVE out the retpoline, the
first instruction at __x86.indirect_thunk.rax is going to be a bare
jmp *%rax anyway.

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/entry/entry_32.S | 5 +++--
arch/x86/entry/entry_64.S | 22 +++++++++++++++++++---
2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ace8f32..abd1e5d 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -44,6 +44,7 @@
#include <asm/asm.h>
#include <asm/smap.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>

.section .entry.text, "ax"

@@ -290,7 +291,7 @@ ENTRY(ret_from_fork)

/* kernel thread */
1: movl %edi, %eax
- call *%ebx
+ NOSPEC_CALL ebx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
@@ -919,7 +920,7 @@ common_exception:
movl %ecx, %es
TRACE_IRQS_OFF
movl %esp, %eax # pt_regs pointer
- call *%edi
+ NOSPEC_CALL edi
jmp ret_from_exception
END(common_exception)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f048e38..dbca433 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
#include <asm/pgtable_types.h>
#include <asm/export.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
#include <linux/err.h>

#include "calling.h"
@@ -191,7 +192,17 @@ ENTRY(entry_SYSCALL_64_trampoline)
*/
pushq %rdi
movq $entry_SYSCALL_64_stage2, %rdi
- jmp *%rdi
+ /*
+ * Open-code the retpoline from retpoline.S, because we can't
+ * just jump to it directly.
+ */
+ ALTERNATIVE "call 2f", "jmp *%rdi", X86_BUG_NO_RETPOLINE
+1:
+ lfence
+ jmp 1b
+2:
+ mov %rdi, (%rsp)
+ ret
END(entry_SYSCALL_64_trampoline)

.popsection
@@ -270,7 +281,12 @@ entry_SYSCALL_64_fastpath:
* It might end up jumping to the slow path. If it jumps, RAX
* and all argument registers are clobbered.
*/
+#ifdef CONFIG_RETPOLINE
+ movq sys_call_table(, %rax, 8), %rax
+ call __x86.indirect_thunk.rax
+#else
call *sys_call_table(, %rax, 8)
+#endif
.Lentry_SYSCALL_64_after_fastpath_call:

movq %rax, RAX(%rsp)
@@ -442,7 +458,7 @@ ENTRY(stub_ptregs_64)
jmp entry_SYSCALL64_slow_path

1:
- jmp *%rax /* Called from C */
+ NOSPEC_JMP rax /* Called from C */
END(stub_ptregs_64)

.macro ptregs_stub func
@@ -521,7 +537,7 @@ ENTRY(ret_from_fork)
1:
/* kernel thread */
movq %r12, %rdi
- call *%rbx
+ NOSPEC_CALL rbx
/*
* A kernel thread is allowed to return here after successfully
* calling do_execve(). Exit to userspace to complete the execve()
--
2.7.4

2018-01-05 02:04:04

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 06/13] x86/retpoline/xen: Convert Xen hypercall indirect jumps

Convert indirect call in Xen hypercall to use non-speculative sequence,
when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/include/asm/xen/hypercall.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 7cb282e..393c004 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -44,6 +44,7 @@
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/smap.h>
+#include <asm/nospec-branch.h>

#include <xen/interface/xen.h>
#include <xen/interface/sched.h>
@@ -217,9 +218,9 @@ privcmd_call(unsigned call,
__HYPERCALL_5ARG(a1, a2, a3, a4, a5);

stac();
- asm volatile("call *%[call]"
+ asm volatile(NOSPEC_CALL
: __HYPERCALL_5PARAM
- : [call] "a" (&hypercall_page[call])
+ : [thunk_target] "a" (&hypercall_page[call])
: __HYPERCALL_CLOBBER5);
clac();

--
2.7.4

2018-01-05 02:06:47

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 07/13] x86/retpoline/checksum32: Convert assembler indirect jumps

Convert all indirect jumps in 32bit checksum assembler code to use
non-speculative sequences when CONFIG_RETPOLINE is enabled.

Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/lib/checksum_32.S | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 4d34bb5..d5ef7cc 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -29,7 +29,8 @@
#include <asm/errno.h>
#include <asm/asm.h>
#include <asm/export.h>
-
+#include <asm/nospec-branch.h>
+
/*
* computes a partial checksum, e.g. for TCP/UDP fragments
*/
@@ -156,7 +157,7 @@ ENTRY(csum_partial)
negl %ebx
lea 45f(%ebx,%ebx,2), %ebx
testl %esi, %esi
- jmp *%ebx
+ NOSPEC_JMP ebx

# Handle 2-byte-aligned regions
20: addw (%esi), %ax
@@ -439,7 +440,7 @@ ENTRY(csum_partial_copy_generic)
andl $-32,%edx
lea 3f(%ebx,%ebx), %ebx
testl %esi, %esi
- jmp *%ebx
+ NOSPEC_JMP ebx
1: addl $64,%esi
addl $64,%edi
SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
--
2.7.4

2018-01-05 02:06:57

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH v4 12/13] x86/retpoline: Add boot time option to disable retpoline

From: Andi Kleen <[email protected]>

Add a noretpoline option boot to disable retpoline and patch
out the extra sequences. It cannot patch out the jumps
to the thunk functions generated by the compiler, but they
turn into a single indirect branch now.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 3 +++
arch/x86/kernel/cpu/intel.c | 10 ++++++++++
2 files changed, 13 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index af7104a..c48db34 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2623,6 +2623,9 @@

nohugeiomap [KNL,x86] Disable kernel huge I/O mappings.

+ noretpoline [X86] Disable the retpoline kernel indirect branch speculation
+ workarounds. System may allow data leaks with this option.
+
nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index b1af220..4ab46f5 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -31,6 +31,16 @@
#include <asm/apic.h>
#endif

+#ifdef RETPOLINE
+static int __init noretpoline_setup(char *__unused)
+{
+ pr_info("Retpoline runtime disabled\n");
+ setup_force_cpu_cap(X86_BUG_NO_RETPOLINE);
+ return 1;
+}
+__setup("noretpoline", noretpoline_setup);
+#endif
+
/*
* Just in case our CPU detection goes bad, or you have a weird system,
* allow a way to override the automatic disabling of MPX.
--
2.7.4

2018-01-06 05:54:21

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v4 01/13] x86/retpoline: Add initial retpoline support

On 01/04/18 18:00, David Woodhouse wrote:
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index d4fc98c..1009d1a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -429,6 +429,19 @@ config GOLDFISH
> def_bool y
> depends on X86_GOLDFISH
>
> +config RETPOLINE
> + bool "Avoid speculative indirect branches in kernel"
> + default y
> + help
> + Compile kernel with the retpoline compiler options to guard against
> + kernel to user data leaks by avoiding speculative indirect

On first reading, I encountered a parse err^W^W <what the heck did that say>
on "kernel to user data". I get it after rereading it, but
kernel-to-user data leaks
would be better. (IMHO)

> + branches. Requires a compiler with -mindirect-branch=thunk-extern
> + support for full protection. The kernel may run slower.
> +
> + Without compiler support, at least indirect branches in assembler
> + code are eliminated. Since this includes the syscall entry path,
> + it is not entirely pointless.


--
~Randy