This patch series enables the basic detection and usage of x86 indirect
branch speculation feature. It enables the indirect branch restricted
speculation (IBRS) on kernel entry and disables it on exit.
It enumerates the indirect branch prediction barrier (IBPB).
The x86 IBRS feature requires corresponding microcode support.
It mitigates the variant 2 vulnerability described in
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
If IBRS is set, near returns and near indirect jumps/calls will not
allow their predicted target address to be controlled by code that
executed in a less privileged prediction mode before the IBRS mode was
last written with a value of 1 or on another logical processor so long
as all RSB entries from the previous less privileged prediction mode
are overwritten.
Setting of IBPB ensures that earlier code's behavior does not control later
indirect branch predictions. It is used when context switching to new
untrusted address space. Unlike IBRS, IBPB is a command MSR
and does not retain its state.
Speculation on Skylake and later requires these patches ("dynamic IBRS")
be used instead of retpoline[1]. If you are very paranoid or you run on
a CPU where IBRS=1 is cheaper, you may also want to run in "IBRS always"
mode.
See: https://docs.google.com/document/d/e/2PACX-1vSMrwkaoSUBAFc6Fjd19F18c1O9pudkfAY-7lGYGOTN8mc9ul-J6pWadcAaBJZcVA7W_3jlLKRtKRbd/pub
More detailed description of IBRS is described in the first patch.
It is applied on top of the page table isolation changes.
A run time and boot time control of the IBRS feature is provided
There are 2 ways to control IBRS
1. At boot time
noibrs kernel boot parameter will disable IBRS usage
Otherwise if the above parameters are not specified, the system
will enable ibrs and ibpb usage if the cpu supports it.
2. At run time
echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel (IBRS always)
[1] https://lkml.org/lkml/2018/1/4/174
Tim Chen (7):
x86/feature: Detect the x86 feature to control Speculation
x86/enter: MACROS to set/clear IBRS
x86/enter: Use IBRS on syscall and interrupts
x86/idle: Disable IBRS entering idle and enable it on wakeup
x86: Use IBRS for firmware update path
x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
x86/microcode: Recheck IBRS features on microcode reload
Documentation/admin-guide/kernel-parameters.txt | 4 +
arch/x86/entry/entry_64.S | 24 +++
arch/x86/entry/entry_64_compat.S | 9 +
arch/x86/include/asm/apm.h | 6 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/efi.h | 16 +-
arch/x86/include/asm/msr-index.h | 7 +
arch/x86/include/asm/mwait.h | 19 ++
arch/x86/include/asm/spec_ctrl.h | 253 ++++++++++++++++++++++++
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/cpu/microcode/core.c | 6 +
arch/x86/kernel/cpu/scattered.c | 11 ++
arch/x86/kernel/cpu/spec_ctrl.c | 124 ++++++++++++
arch/x86/kernel/process.c | 9 +-
14 files changed, 486 insertions(+), 4 deletions(-)
create mode 100644 arch/x86/include/asm/spec_ctrl.h
create mode 100644 arch/x86/kernel/cpu/spec_ctrl.c
--
2.9.4
Create macros to control IBRS. Use these macros to enable IBRS on kernel entry
paths and disable IBRS on kernel exit paths.
The registers rax, rcx and rdx are touched when controlling IBRS
so they need to be saved when they can't be clobbered.
Signed-off-by: Tim Chen <[email protected]>
---
arch/x86/include/asm/spec_ctrl.h | 80 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 80 insertions(+)
create mode 100644 arch/x86/include/asm/spec_ctrl.h
diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
new file mode 100644
index 0000000..16fc4f58
--- /dev/null
+++ b/arch/x86/include/asm/spec_ctrl.h
@@ -0,0 +1,80 @@
+#ifndef _ASM_X86_SPEC_CTRL_H
+#define _ASM_X86_SPEC_CTRL_H
+
+#include <linux/stringify.h>
+#include <asm/msr-index.h>
+#include <asm/cpufeatures.h>
+#include <asm/alternative-asm.h>
+
+#ifdef __ASSEMBLY__
+
+.macro PUSH_MSR_REGS
+ pushq %rax
+ pushq %rcx
+ pushq %rdx
+.endm
+
+.macro POP_MSR_REGS
+ popq %rdx
+ popq %rcx
+ popq %rax;
+.endm
+
+.macro WRMSR_ASM msr_nr:req eax_val:req
+ movl \msr_nr, %ecx
+ movl $0, %edx
+ movl \eax_val, %eax
+.endm
+
+.macro ENABLE_IBRS
+ ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ PUSH_MSR_REGS
+ WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
+ POP_MSR_REGS
+10:
+.endm
+
+.macro DISABLE_IBRS
+ ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ PUSH_MSR_REGS
+ WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
+ POP_MSR_REGS
+10:
+.endm
+
+.macro ENABLE_IBRS_CLOBBER
+ ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
+10:
+.endm
+
+.macro DISABLE_IBRS_CLOBBER
+ ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
+10:
+.endm
+
+.macro ENABLE_IBRS_SAVE_AND_CLOBBER save_reg:req
+ ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ movl $MSR_IA32_SPEC_CTRL, %ecx
+ rdmsr
+ movl %eax, \save_reg
+
+ movl $0, %edx
+ movl $SPEC_CTRL_FEATURE_ENABLE_IBRS, %eax
+ wrmsr
+10:
+.endm
+
+.macro RESTORE_IBRS_CLOBBER save_reg:req
+ ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ /* Set IBRS to the value saved in the save_reg */
+ movl $MSR_IA32_SPEC_CTRL, %ecx
+ movl $0, %edx
+ movl \save_reg, %eax
+ wrmsr
+10:
+.endm
+
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_SPEC_CTRL_H */
--
2.9.4
On new microcode write, check whether IBRS
is present by rescanning scattered CPU features.
Signed-off-by: Tim Chen <[email protected]>
---
arch/x86/kernel/cpu/microcode/core.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index c4fa4a8..44b9355 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -40,6 +40,7 @@
#include <asm/processor.h>
#include <asm/cmdline.h>
#include <asm/setup.h>
+#include <asm/spec_ctrl.h>
#define DRIVER_VERSION "2.2"
@@ -444,6 +445,11 @@ static ssize_t microcode_write(struct file *file, const char __user *buf,
if (ret > 0)
perf_check_microcode();
+ /* check spec_ctrl capabilities */
+ mutex_lock(&spec_ctrl_mutex);
+ init_scattered_cpuid_features(&boot_cpu_data);
+ mutex_unlock(&spec_ctrl_mutex);
+
mutex_unlock(µcode_mutex);
put_online_cpus();
--
2.9.4
From: David Woodhouse <[email protected]>
We are impervious to the indirect branch prediction attack with retpoline
but firmware won't be, so we still need to set IBRS to protect
firmware code execution when calling into firmware at runtime.
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Tim Chen <[email protected]>
---
arch/x86/include/asm/apm.h | 6 ++++++
arch/x86/include/asm/efi.h | 16 ++++++++++++++--
arch/x86/include/asm/spec_ctrl.h | 37 +++++++++++++++++++++++++++++++++++++
3 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h
index 4d4015d..1ca4f7b 100644
--- a/arch/x86/include/asm/apm.h
+++ b/arch/x86/include/asm/apm.h
@@ -7,6 +7,8 @@
#ifndef _ASM_X86_MACH_DEFAULT_APM_H
#define _ASM_X86_MACH_DEFAULT_APM_H
+#include <asm/spec_ctrl.h>
+
#ifdef APM_ZERO_SEGS
# define APM_DO_ZERO_SEGS \
"pushl %%ds\n\t" \
@@ -28,6 +30,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
u32 *eax, u32 *ebx, u32 *ecx,
u32 *edx, u32 *esi)
{
+ unprotected_firmware_begin();
/*
* N.B. We do NOT need a cld after the BIOS call
* because we always save and restore the flags.
@@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
"=S" (*esi)
: "a" (func), "b" (ebx_in), "c" (ecx_in)
: "memory", "cc");
+ unprotected_formware_end();
}
static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
@@ -52,6 +56,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
int cx, dx, si;
bool error;
+ unprotected_firmware_begin();
/*
* N.B. We do NOT need a cld after the BIOS call
* because we always save and restore the flags.
@@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
"=S" (si)
: "a" (func), "b" (ebx_in), "c" (ecx_in)
: "memory", "cc");
+ unprotected_formware_end();
return error;
}
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb..25bd506 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -6,6 +6,7 @@
#include <asm/pgtable.h>
#include <asm/processor-flags.h>
#include <asm/tlb.h>
+#include <asm/spec_ctrl.h>
/*
* We map the EFI regions needed for runtime services non-contiguously,
@@ -36,8 +37,17 @@
extern asmlinkage unsigned long efi_call_phys(void *, ...);
-#define arch_efi_call_virt_setup() kernel_fpu_begin()
-#define arch_efi_call_virt_teardown() kernel_fpu_end()
+#define arch_efi_call_virt_setup() \
+{( \
+ kernel_fpu_begin(); \
+ unprotected_firmware_begin(); \
+)}
+
+#define arch_efi_call_virt_teardown() \
+{( \
+ unprotected_firmware_end(); \
+ kernel_fpu_end(); \
+)}
/*
* Wrap all the virtual calls in a way that forces the parameters on the stack.
@@ -73,6 +83,7 @@ struct efi_scratch {
efi_sync_low_kernel_mappings(); \
preempt_disable(); \
__kernel_fpu_begin(); \
+ unprotected_firmware_begin(); \
\
if (efi_scratch.use_pgd) { \
efi_scratch.prev_cr3 = __read_cr3(); \
@@ -91,6 +102,7 @@ struct efi_scratch {
__flush_tlb_all(); \
} \
\
+ unprotected_firmware_end(); \
__kernel_fpu_end(); \
preempt_enable(); \
})
diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
index 28b0314..23b2804 100644
--- a/arch/x86/include/asm/spec_ctrl.h
+++ b/arch/x86/include/asm/spec_ctrl.h
@@ -113,5 +113,42 @@ static inline void unprotected_speculation_end(void)
rmb();
}
+
+#if defined(RETPOLINE)
+/*
+ * RETPOLINE does not protect against indirect speculation
+ * in firmware code. Enable IBRS to protect firmware execution.
+ */
+static inline void unprotected_firmware_begin(void)
+{
+ if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+ __disable_indirect_speculation();
+ else
+ /*
+ * If we intended to disable indirect speculation
+ * but come here due to mis-speculation, we need
+ * to stop the mis-speculation with rmb.
+ */
+ rmb();
+}
+
+static inline void unprotected_firmware_end(void)
+{
+ if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+ __enable_indirect_speculation();
+}
+
+#else
+static inline void unprotected_firmware_begin(void)
+{
+ return;
+}
+
+static inline void unprotected_firmware_end(void)
+{
+ return;
+}
+#endif
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_SPEC_CTRL_H */
--
2.9.4
There are 2 ways to control IBRS
1. At boot time
noibrs kernel boot parameter will disable IBRS usage
Otherwise if the above parameters are not specified, the system
will enable ibrs and ibpb usage if the cpu supports it.
2. At run time
echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
The implementation was updated with input from Andrea Arcangeli.
Signed-off-by: Tim Chen <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 4 +
arch/x86/include/asm/spec_ctrl.h | 163 +++++++++++++++++++-----
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/cpu/scattered.c | 10 ++
arch/x86/kernel/cpu/spec_ctrl.c | 124 ++++++++++++++++++
5 files changed, 270 insertions(+), 32 deletions(-)
create mode 100644 arch/x86/kernel/cpu/spec_ctrl.c
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5dfd262..d64f49f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2573,6 +2573,10 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings
+ noibrs [X86]
+ Don't use indirect branch restricted speculation (IBRS)
+ feature.
+
nosmap [X86]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
index 23b2804..2c35571 100644
--- a/arch/x86/include/asm/spec_ctrl.h
+++ b/arch/x86/include/asm/spec_ctrl.h
@@ -1,13 +1,17 @@
#ifndef _ASM_X86_SPEC_CTRL_H
#define _ASM_X86_SPEC_CTRL_H
-#include <linux/stringify.h>
#include <asm/msr-index.h>
#include <asm/cpufeatures.h>
-#include <asm/alternative-asm.h>
+
+#define SPEC_CTRL_IBRS_INUSE (1<<0) /* OS enables IBRS usage */
+#define SPEC_CTRL_IBRS_SUPPORTED (1<<1) /* System supports IBRS */
+#define SPEC_CTRL_IBRS_ADMIN_DISABLED (1<<2) /* Admin disables IBRS */
#ifdef __ASSEMBLY__
+.extern spec_ctrl_ibrs
+
.macro PUSH_MSR_REGS
pushq %rax
pushq %rcx
@@ -27,35 +31,63 @@
.endm
.macro ENABLE_IBRS
- ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
+ jz .Lskip_\@
+
PUSH_MSR_REGS
WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
POP_MSR_REGS
-10:
+
+ jmp .Ldone_\@
+.Lskip_\@:
+ /*
+ * prevent speculation beyond here as we could want to
+ * stop speculation by enabling IBRS
+ */
+ lfence
+.Ldone_\@:
.endm
.macro DISABLE_IBRS
- ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
+ jz .Lskip_\@
+
PUSH_MSR_REGS
WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
POP_MSR_REGS
-10:
+
+.Lskip_\@:
.endm
.macro ENABLE_IBRS_CLOBBER
- ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
+ jz .Lskip_\@
+
WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
-10:
+
+ jmp .Ldone_\@
+.Lskip_\@:
+ /*
+ * prevent speculation beyond here as we could want to
+ * stop speculation by enabling IBRS
+ */
+ lfence
+.Ldone_\@:
.endm
.macro DISABLE_IBRS_CLOBBER
- ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
+ jz .Lskip_\@
+
WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
-10:
+
+.Lskip_\@:
.endm
.macro ENABLE_IBRS_SAVE_AND_CLOBBER save_reg:req
- ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
+ jz .Lskip_\@
+
movl $MSR_IA32_SPEC_CTRL, %ecx
rdmsr
movl %eax, \save_reg
@@ -63,22 +95,103 @@
movl $0, %edx
movl $SPEC_CTRL_FEATURE_ENABLE_IBRS, %eax
wrmsr
-10:
+
+ jmp .Ldone_\@
+.Lskip_\@:
+ /*
+ * prevent speculation beyond here as we could want to
+ * stop speculation by enabling IBRS
+ */
+ lfence
+.Ldone_\@:
.endm
.macro RESTORE_IBRS_CLOBBER save_reg:req
- ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
+ testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
+ jz .Lskip_\@
+
/* Set IBRS to the value saved in the save_reg */
movl $MSR_IA32_SPEC_CTRL, %ecx
movl $0, %edx
movl \save_reg, %eax
wrmsr
-10:
+
+ jmp .Ldone_\@
+.Lskip_\@:
+ /*
+ * prevent speculation beyond here as we could want to
+ * stop speculation by enabling IBRS
+ */
+ lfence
+.Ldone_\@:
.endm
#else
#include <asm/microcode.h>
+extern int spec_ctrl_ibrs;
+extern struct mutex spec_ctrl_mutex;
+extern unsigned int ibrs_enabled;
+
+enum {
+ IBRS_DISABLED,
+ /* in host kernel, disabled in guest and userland */
+ IBRS_ENABLED,
+ /* in host kernel and host userland, disabled in guest */
+ IBRS_ENABLED_USER,
+ IBRS_MAX = IBRS_ENABLED_USER,
+};
+
+
+static inline void set_ibrs_inuse(void)
+{
+ if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
+ spec_ctrl_ibrs |= SPEC_CTRL_IBRS_INUSE;
+}
+
+static inline void clear_ibrs_inuse(void)
+{
+ spec_ctrl_ibrs &= ~SPEC_CTRL_IBRS_INUSE;
+}
+
+static inline int ibrs_inuse(void)
+{
+ if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_INUSE)
+ return 1;
+ else
+ /*
+ * prevent speculation beyond here as we could want to
+ * stop speculation by enabling IBRS with this check
+ */
+ rmb();
+ return 0;
+}
+
+static inline void set_ibrs_supported(void)
+{
+ spec_ctrl_ibrs |= SPEC_CTRL_IBRS_SUPPORTED;
+ if (!(spec_ctrl_ibrs & SPEC_CTRL_IBRS_ADMIN_DISABLED))
+ set_ibrs_inuse();
+ else
+ /*
+ * prevent speculation beyond here as we could want to
+ * stop speculation by enabling IBRS
+ */
+ rmb();
+}
+
+static inline void set_ibrs_disabled(void)
+{
+ spec_ctrl_ibrs |= SPEC_CTRL_IBRS_ADMIN_DISABLED;
+ if (ibrs_inuse())
+ clear_ibrs_inuse();
+}
+
+static inline void clear_ibrs_disabled(void)
+{
+ spec_ctrl_ibrs &= ~SPEC_CTRL_IBRS_ADMIN_DISABLED;
+}
+
static inline void __disable_indirect_speculation(void)
{
native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
@@ -96,21 +209,14 @@ static inline void __enable_indirect_speculation(void)
static inline void unprotected_speculation_begin(void)
{
WARN_ON_ONCE(!irqs_disabled());
- if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+ if (ibrs_inuse())
__enable_indirect_speculation();
}
static inline void unprotected_speculation_end(void)
{
- if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+ if (ibrs_inuse())
__disable_indirect_speculation();
- else
- /*
- * If we intended to disable indirect speculation
- * but come here due to mis-speculation, we need
- * to stop the mis-speculation with rmb.
- */
- rmb();
}
@@ -121,20 +227,13 @@ static inline void unprotected_speculation_end(void)
*/
static inline void unprotected_firmware_begin(void)
{
- if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+ if (ibrs_inuse())
__disable_indirect_speculation();
- else
- /*
- * If we intended to disable indirect speculation
- * but come here due to mis-speculation, we need
- * to stop the mis-speculation with rmb.
- */
- rmb();
}
static inline void unprotected_firmware_end(void)
{
- if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+ if (ibrs_inuse())
__enable_indirect_speculation();
}
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 90cb82d..a25f1ab 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -24,6 +24,7 @@ obj-y += match.o
obj-y += bugs.o
obj-$(CONFIG_CPU_FREQ) += aperfmperf.o
obj-y += cpuid-deps.o
+obj-y += spec_ctrl.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index bc50c40..2d23a2fe 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -8,6 +8,7 @@
#include <asm/processor.h>
#include <asm/apic.h>
+#include <asm/spec_ctrl.h>
struct cpuid_bit {
u16 feature;
@@ -56,6 +57,15 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
if (regs[cb->reg] & (1 << cb->bit))
set_cpu_cap(c, cb->feature);
+
+ }
+ if (!c->cpu_index) {
+ if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
+ printk(KERN_INFO "FEATURE SPEC_CTRL Present\n");
+ set_ibrs_supported();
+ if (ibrs_inuse())
+ ibrs_enabled = IBRS_ENABLED;
+ }
}
}
diff --git a/arch/x86/kernel/cpu/spec_ctrl.c b/arch/x86/kernel/cpu/spec_ctrl.c
new file mode 100644
index 0000000..6946678
--- /dev/null
+++ b/arch/x86/kernel/cpu/spec_ctrl.c
@@ -0,0 +1,124 @@
+#include <linux/export.h>
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/debugfs.h>
+#include <linux/uaccess.h>
+
+#include <asm/spec_ctrl.h>
+#include <asm/cpufeature.h>
+
+/*
+ * spec_ctrl_ibrs
+ * bit 0 = indicate if ibrs is currently in use
+ * bit 1 = indicate if system supports ibrs
+ * bit 2 = indicate if admin disables ibrs
+ */
+
+int spec_ctrl_ibrs;
+EXPORT_SYMBOL(spec_ctrl_ibrs);
+
+/* mutex to serialize IBRS control changes */
+DEFINE_MUTEX(spec_ctrl_mutex);
+EXPORT_SYMBOL(spec_ctrl_mutex);
+
+unsigned int ibrs_enabled __read_mostly;
+EXPORT_SYMBOL(ibrs_enabled);
+
+static int __init noibrs(char *str)
+{
+ set_ibrs_disabled();
+
+ return 0;
+}
+early_param("noibrs", noibrs);
+
+static ssize_t __enabled_read(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos, unsigned int *field)
+{
+ char buf[32];
+ unsigned int len;
+
+ len = sprintf(buf, "%d\n", READ_ONCE(*field));
+ return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static ssize_t ibrs_enabled_read(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ return __enabled_read(file, user_buf, count, ppos, &ibrs_enabled);
+}
+
+static void spec_ctrl_flush_all_cpus(u32 msr_nr, u64 val)
+{
+ int cpu;
+ get_online_cpus();
+ for_each_online_cpu(cpu)
+ wrmsrl_on_cpu(cpu, msr_nr, val);
+ put_online_cpus();
+}
+
+static ssize_t ibrs_enabled_write(struct file *file,
+ const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ char buf[32];
+ ssize_t len;
+ unsigned int enable;
+
+ len = min(count, sizeof(buf) - 1);
+ if (copy_from_user(buf, user_buf, len))
+ return -EFAULT;
+
+ buf[len] = '\0';
+ if (kstrtouint(buf, 0, &enable))
+ return -EINVAL;
+
+ if (enable > IBRS_MAX)
+ return -EINVAL;
+
+ mutex_lock(&spec_ctrl_mutex);
+
+ if (enable == IBRS_DISABLED) {
+ /* disable IBRS usage */
+ set_ibrs_disabled();
+ if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
+ spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_DISABLE_IBRS);
+ } else if (enable == IBRS_ENABLED) {
+ /* enable IBRS usage in kernel */
+ clear_ibrs_disabled();
+ if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
+ set_ibrs_inuse();
+ else
+ /* Platform don't support IBRS */
+ enable = IBRS_DISABLED;
+ } else if (enable == IBRS_ENABLED_USER) {
+ /* enable IBRS usage in both userspace and kernel */
+ clear_ibrs_disabled();
+ /* don't change IBRS value once we set it to always on */
+ clear_ibrs_inuse();
+ if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
+ spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
+ else
+ /* Platform don't support IBRS */
+ enable = IBRS_DISABLED;
+ }
+
+ WRITE_ONCE(ibrs_enabled, enable);
+
+ mutex_unlock(&spec_ctrl_mutex);
+ return count;
+}
+
+static const struct file_operations fops_ibrs_enabled = {
+ .read = ibrs_enabled_read,
+ .write = ibrs_enabled_write,
+ .llseek = default_llseek,
+};
+
+static int __init debugfs_spec_ctrl(void)
+{
+ debugfs_create_file("ibrs_enabled", S_IRUSR | S_IWUSR,
+ arch_debugfs_dir, NULL, &fops_ibrs_enabled);
+ return 0;
+}
+late_initcall(debugfs_spec_ctrl);
--
2.9.4
Clear IBRS on idle entry and set it on idle exit into kernel on mwait.
When we are in mwait, we are not running but if we leave IBRS on,
it will affect the performance on the sibling hardware thread. So
we disable IBRS and reenable it when we wake up.
Modify my original implementation of IBRS on mwait idle path based on input from
Andrea Arcangeli.
Signed-off-by: Tim Chen <[email protected]>
---
arch/x86/include/asm/mwait.h | 19 +++++++++++++++++++
arch/x86/include/asm/spec_ctrl.h | 37 +++++++++++++++++++++++++++++++++++++
arch/x86/kernel/process.c | 9 +++++++--
3 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 39a2fb2..5cb3bff 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -6,6 +6,7 @@
#include <linux/sched/idle.h>
#include <asm/cpufeature.h>
+#include <asm/spec_ctrl.h>
#define MWAIT_SUBSTATE_MASK 0xf
#define MWAIT_CSTATE_MASK 0xf
@@ -100,15 +101,33 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
{
if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
+ bool can_toggle_ibrs = false;
if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
mb();
clflush((void *)¤t_thread_info()->flags);
mb();
}
+ if (irqs_disabled()) {
+ /*
+ * CPUs run faster with speculation protection
+ * disabled. All CPU threads in a core must
+ * disable speculation protection for it to be
+ * disabled. Disable it while we are idle so the
+ * other hyperthread can run fast.
+ *
+ * nmi uses the save_paranoid model which
+ * always enables ibrs on exception entry
+ * before any indirect jump can run.
+ */
+ can_toggle_ibrs = true;
+ unprotected_speculation_begin();
+ }
__monitor((void *)¤t_thread_info()->flags, 0, 0);
if (!need_resched())
__mwait(eax, ecx);
+ if (can_toggle_ibrs)
+ unprotected_speculation_end();
}
current_clr_polling();
}
diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
index 16fc4f58..28b0314 100644
--- a/arch/x86/include/asm/spec_ctrl.h
+++ b/arch/x86/include/asm/spec_ctrl.h
@@ -76,5 +76,42 @@
10:
.endm
+#else
+#include <asm/microcode.h>
+
+static inline void __disable_indirect_speculation(void)
+{
+ native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
+}
+
+static inline void __enable_indirect_speculation(void)
+{
+ native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_DISABLE_IBRS);
+}
+
+/*
+ * Interrupts must be disabled to begin unprotected speculation.
+ * Otherwise interrupts could be running in unprotected mode.
+ */
+static inline void unprotected_speculation_begin(void)
+{
+ WARN_ON_ONCE(!irqs_disabled());
+ if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+ __enable_indirect_speculation();
+}
+
+static inline void unprotected_speculation_end(void)
+{
+ if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
+ __disable_indirect_speculation();
+ else
+ /*
+ * If we intended to disable indirect speculation
+ * but come here due to mis-speculation, we need
+ * to stop the mis-speculation with rmb.
+ */
+ rmb();
+}
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_SPEC_CTRL_H */
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 5174159..cb14820 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -39,6 +39,7 @@
#include <asm/switch_to.h>
#include <asm/desc.h>
#include <asm/prctl.h>
+#include <asm/spec_ctrl.h>
/*
* per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -461,11 +462,15 @@ static __cpuidle void mwait_idle(void)
mb(); /* quirk */
}
+ unprotected_speculation_begin();
__monitor((void *)¤t_thread_info()->flags, 0, 0);
- if (!need_resched())
+ if (!need_resched()) {
__sti_mwait(0, 0);
- else
+ unprotected_speculation_end();
+ } else {
+ unprotected_speculation_end();
local_irq_enable();
+ }
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
} else {
local_irq_enable();
--
2.9.4
Set IBRS upon kernel entrance via syscall and interrupts. Clear it
upon exit.
If NMI runs when exiting kernel between IBRS_DISABLE and
SWAPGS, the NMI would have turned on IBRS bit 0 and then it would have
left enabled when exiting the NMI. IBRS bit 0 would then be left
enabled in userland until the next enter kernel.
That is a minor inefficiency only, but we can eliminate it by saving
the MSR when entering the NMI in save_paranoid and restoring it when
exiting the NMI.
Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Tim Chen <[email protected]>
---
arch/x86/entry/entry_64.S | 24 ++++++++++++++++++++++++
arch/x86/entry/entry_64_compat.S | 9 +++++++++
2 files changed, 33 insertions(+)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3f72f5c..0c4d542 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
#include <asm/pgtable_types.h>
#include <asm/export.h>
#include <asm/frame.h>
+#include <asm/spec_ctrl.h>
#include <linux/err.h>
#include "calling.h"
@@ -170,6 +171,8 @@ ENTRY(entry_SYSCALL_64_trampoline)
/* Load the top of the task stack into RSP */
movq CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp
+ /* Stack is usable, use the non-clobbering IBRS enable: */
+ ENABLE_IBRS
/* Start building the simulated IRET frame. */
pushq $__USER_DS /* pt_regs->ss */
@@ -213,6 +216,8 @@ ENTRY(entry_SYSCALL_64)
* is not required to switch CR3.
*/
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+ /* Stack is usable, use the non-clobbering IBRS enable: */
+ ENABLE_IBRS
TRACE_IRQS_OFF
@@ -407,6 +412,7 @@ syscall_return_via_sysret:
* We are on the trampoline stack. All regs except RDI are live.
* We can do future final exit work right here.
*/
+ DISABLE_IBRS
SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
popq %rdi
@@ -745,6 +751,7 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode)
* We can do future final exit work right here.
*/
+ DISABLE_IBRS
SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
/* Restore RDI. */
@@ -832,6 +839,14 @@ native_irq_return_ldt:
SWAPGS /* to kernel GS */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi /* to kernel CR3 */
+ /*
+ * Normally we enable IBRS when we switch to kernel's CR3.
+ * But we are going to switch back to user CR3 immediately
+ * in this routine after fixing ESPFIX stack. There is
+ * no vulnerable code branching for IBRS to protect.
+ * We don't toggle IBRS to avoid the cost of two MSR writes.
+ */
+
movq PER_CPU_VAR(espfix_waddr), %rdi
movq %rax, (0*8)(%rdi) /* user RAX */
movq (1*8)(%rsp), %rax /* user RIP */
@@ -965,6 +980,8 @@ ENTRY(switch_to_thread_stack)
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+ /* Stack is usable, use the non-clobbering IBRS enable: */
+ ENABLE_IBRS
UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI
pushq 7*8(%rdi) /* regs->ss */
@@ -1265,6 +1282,7 @@ ENTRY(paranoid_entry)
1:
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
+ ENABLE_IBRS_SAVE_AND_CLOBBER save_reg=%r13d
ret
END(paranoid_entry)
@@ -1288,6 +1306,7 @@ ENTRY(paranoid_exit)
testl %ebx, %ebx /* swapgs needed? */
jnz .Lparanoid_exit_no_swapgs
TRACE_IRQS_IRETQ
+ RESTORE_IBRS_CLOBBER save_reg=%r13d
RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
SWAPGS_UNSAFE_STACK
jmp .Lparanoid_exit_restore
@@ -1318,6 +1337,7 @@ ENTRY(error_entry)
SWAPGS
/* We have user CR3. Change to kernel CR3. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+ ENABLE_IBRS_CLOBBER
.Lerror_entry_from_usermode_after_swapgs:
/* Put us onto the real thread stack. */
@@ -1365,6 +1385,7 @@ ENTRY(error_entry)
*/
SWAPGS
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+ ENABLE_IBRS_CLOBBER
jmp .Lerror_entry_done
.Lbstep_iret:
@@ -1379,6 +1400,7 @@ ENTRY(error_entry)
*/
SWAPGS
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+ ENABLE_IBRS
/*
* Pretend that the exception came from user mode: set up pt_regs
@@ -1480,6 +1502,7 @@ ENTRY(nmi)
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+ ENABLE_IBRS
UNWIND_HINT_IRET_REGS base=%rdx offset=8
pushq 5*8(%rdx) /* pt_regs->ss */
pushq 4*8(%rdx) /* pt_regs->rsp */
@@ -1730,6 +1753,7 @@ end_repeat_nmi:
movq $-1, %rsi
call do_nmi
+ RESTORE_IBRS_CLOBBER save_reg=%r13d
RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
testl %ebx, %ebx /* swapgs needed? */
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 40f1700..88ee1c0 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -14,6 +14,7 @@
#include <asm/irqflags.h>
#include <asm/asm.h>
#include <asm/smap.h>
+#include <asm/spec_ctrl.h>
#include <linux/linkage.h>
#include <linux/err.h>
@@ -54,6 +55,7 @@ ENTRY(entry_SYSENTER_compat)
SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+ ENABLE_IBRS
/*
* User tracing code (ptrace or signal handlers) might assume that
@@ -224,6 +226,7 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
* preserved during the C calls inside TRACE_IRQS_OFF anyway.
*/
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi
+ ENABLE_IBRS_CLOBBER /* clobbers %rax, %rcx, %rdx */
/*
* User mode is traced as though IRQs are on, and SYSENTER
@@ -240,6 +243,12 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
/* Opportunistic SYSRET */
sysret32_from_system_call:
TRACE_IRQS_ON /* User mode traces as IRQs on. */
+ /*
+ * Clobber of %rax, %rcx, %rdx is OK before register restoring.
+ * This is safe to do here because we have no indirect branches
+ * between here and the return to userspace (sysretl).
+ */
+ DISABLE_IBRS_CLOBBER
movq RBX(%rsp), %rbx /* pt_regs->rbx */
movq RBP(%rsp), %rbp /* pt_regs->rbp */
movq EFLAGS(%rsp), %r11 /* pt_regs->flags (in r11) */
--
2.9.4
cpuid ax=0x7, return rdx bit 26 to indicate presence of this feature
IA32_SPEC_CTRL (0x48) and IA32_PRED_CMD (0x49)
IA32_SPEC_CTRL, bit0 – Indirect Branch Restricted Speculation (IBRS)
IA32_PRED_CMD, bit0 – Indirect Branch Prediction Barrier (IBPB)
If IBRS is set, near returns and near indirect jumps/calls will not
allow their predicted target address to be controlled by code that
executed in a less privileged prediction mode before the IBRS mode was
last written with a value of 1 or on another logical processor so long
as all RSB entries from the previous less privileged prediction mode
are overwritten.
Setting of IBPB ensures that earlier code's behavior does not control later
indirect branch predictions. It is used when context switching to new
untrusted address space. Unlike IBRS, it is a command MSR and does not retain
its state.
* Thus a near indirect jump/call/return may be affected by code in a
less privileged prediction mode that executed AFTER IBRS mode was last
written with a value of 1
* There is no need to clear IBRS before writing it with a value of
1. Unconditionally writing it with a value of 1 after the prediction
mode change is sufficient
* Note: IBRS is not required in order to isolate branch predictions for
SMM or SGX enclaves
* Code executed by a sibling logical processor cannot control indirect
jump/call/return predicted target when IBRS is set
* SMEP will prevent supervisor mode using RSB entries filled by user code;
this can reduce the need for software to overwrite RSB entries
* IBRS is not guaranteed to differentiate two applications that use
the same CR3 due to recycling. Software can use an IBPB command when
recycling a page table base address.
* VMM software can similarly use an IBPB when recycling a controlling
VMCS pointer address
CPU performance could be reduced when running with IBRS set.
Signed-off-by: Tim Chen <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 7 +++++++
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 9 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 86c68cb..431f393 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -209,6 +209,7 @@
#define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */
#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
+#define X86_FEATURE_SPEC_CTRL ( 7*32+19) /* Control Speculation Control */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1e7d710..f51e516 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -42,6 +42,12 @@
#define MSR_PPIN_CTL 0x0000004e
#define MSR_PPIN 0x0000004f
+#define MSR_IA32_SPEC_CTRL 0x00000048
+#define SPEC_CTRL_FEATURE_DISABLE_IBRS (0 << 0)
+#define SPEC_CTRL_FEATURE_ENABLE_IBRS (1 << 0)
+
+#define MSR_IA32_PRED_CMD 0x00000049
+
#define MSR_IA32_PERFCTR0 0x000000c1
#define MSR_IA32_PERFCTR1 0x000000c2
#define MSR_FSB_FREQ 0x000000cd
@@ -439,6 +445,7 @@
#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1)
#define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2)
#define FEATURE_CONTROL_LMCE (1<<20)
+#define FEATURE_SET_IBPB (1<<0)
#define MSR_IA32_APICBASE 0x0000001b
#define MSR_IA32_APICBASE_BSP (1<<8)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 05459ad..bc50c40 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -24,6 +24,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x00000007, 0 },
{ X86_FEATURE_AVX512_4VNNIW, CPUID_EDX, 2, 0x00000007, 0 },
{ X86_FEATURE_AVX512_4FMAPS, CPUID_EDX, 3, 0x00000007, 0 },
+ { X86_FEATURE_SPEC_CTRL, CPUID_EDX, 26, 0x00000007, 0 },
{ X86_FEATURE_CAT_L3, CPUID_EBX, 1, 0x00000010, 0 },
{ X86_FEATURE_CAT_L2, CPUID_EBX, 2, 0x00000010, 0 },
{ X86_FEATURE_CDP_L3, CPUID_ECX, 2, 0x00000010, 1 },
--
2.9.4
On Thu, Jan 04, 2018 at 09:56:48AM -0800, Tim Chen wrote:
> On new microcode write, check whether IBRS
> is present by rescanning scattered CPU features.
>
> Signed-off-by: Tim Chen <[email protected]>
> ---
> arch/x86/kernel/cpu/microcode/core.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
> index c4fa4a8..44b9355 100644
> --- a/arch/x86/kernel/cpu/microcode/core.c
> +++ b/arch/x86/kernel/cpu/microcode/core.c
> @@ -40,6 +40,7 @@
> #include <asm/processor.h>
> #include <asm/cmdline.h>
> #include <asm/setup.h>
> +#include <asm/spec_ctrl.h>
>
> #define DRIVER_VERSION "2.2"
>
> @@ -444,6 +445,11 @@ static ssize_t microcode_write(struct file *file, const char __user *buf,
> if (ret > 0)
> perf_check_microcode();
>
> + /* check spec_ctrl capabilities */
> + mutex_lock(&spec_ctrl_mutex);
> + init_scattered_cpuid_features(&boot_cpu_data);
No need for that - make a specific function like perf_check_microcode()
which checks only the IBRS bit and updates stuff accordingly.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
> There are 2 ways to control IBRS
>
> 1. At boot time
> noibrs kernel boot parameter will disable IBRS usage
>
> Otherwise if the above parameters are not specified, the system
> will enable ibrs and ibpb usage if the cpu supports it.
>
> 2. At run time
> echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
> echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
> echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
I am not sure that tristate is really needed. What's wrong with on/off
only?
Also, stuff like that:
+/* mutex to serialize IBRS control changes */
+DEFINE_MUTEX(spec_ctrl_mutex);
+EXPORT_SYMBOL(spec_ctrl_mutex);
looks ugly.
Wrapper functions which everything calls and they hide internals are
much more preferrable.
And then, if at all, this needs to be connected to the retpolines fun,
methinks, so that it can be decided at boot what to use.
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Jan 04, 2018 at 07:28:58PM +0100, Borislav Petkov wrote:
> On Thu, Jan 04, 2018 at 09:56:48AM -0800, Tim Chen wrote:
> > On new microcode write, check whether IBRS
> > is present by rescanning scattered CPU features.
> >
> > Signed-off-by: Tim Chen <[email protected]>
> > ---
> > arch/x86/kernel/cpu/microcode/core.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
> > index c4fa4a8..44b9355 100644
> > --- a/arch/x86/kernel/cpu/microcode/core.c
> > +++ b/arch/x86/kernel/cpu/microcode/core.c
> > @@ -40,6 +40,7 @@
> > #include <asm/processor.h>
> > #include <asm/cmdline.h>
> > #include <asm/setup.h>
> > +#include <asm/spec_ctrl.h>
> >
> > #define DRIVER_VERSION "2.2"
> >
> > @@ -444,6 +445,11 @@ static ssize_t microcode_write(struct file *file, const char __user *buf,
> > if (ret > 0)
> > perf_check_microcode();
> >
> > + /* check spec_ctrl capabilities */
> > + mutex_lock(&spec_ctrl_mutex);
> > + init_scattered_cpuid_features(&boot_cpu_data);
>
> No need for that - make a specific function like perf_check_microcode()
> which checks only the IBRS bit and updates stuff accordingly.
It would be better I agree. I've got this:
void spec_ctrl_rescan_cpuid(void)
{
int cpu;
if (use_ibp_disable)
return;
mutex_lock(&spec_ctrl_mutex);
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
/* detect spec ctrl related cpuid additions */
init_scattered_cpuid_features(&boot_cpu_data);
spec_ctrl_init(&boot_cpu_data);
/*
* The SPEC_CTRL and IBPB_SUPPORT cpuid bits may have
* just been set in the boot_cpu_data, transfer them
* to the per-cpu data too. This must run after
* spec_ctrl_init() to take care of
* setup_force_cpu_cap() too.
*/
if (cpu_has_spec_ctrl())
for_each_online_cpu(cpu)
set_cpu_cap(&cpu_data(cpu),
X86_FEATURE_SPEC_CTRL);
if (boot_cpu_has(X86_FEATURE_IBPB_SUPPORT))
for_each_online_cpu(cpu)
set_cpu_cap(&cpu_data(cpu),
X86_FEATURE_IBPB_SUPPORT);
}
mutex_unlock(&spec_ctrl_mutex);
}
However we've to start somewhere so that is a simpler start..
On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
> There are 2 ways to control IBRS
>
> 1. At boot time
> noibrs kernel boot parameter will disable IBRS usage
>
> Otherwise if the above parameters are not specified, the system
> will enable ibrs and ibpb usage if the cpu supports it.
>
> 2. At run time
> echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
> echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
> echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
debug/x86/?
Reviewed-by: Andrea Arcangeli <[email protected]>
Thanks,
Andrea
On 01/04/2018 10:33 AM, Borislav Petkov wrote:
>> 2. At run time
>> echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
>> echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
>> echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
> I am not sure that tristate is really needed. What's wrong with on/off
> only?
Lots of things:
Distros have the tri-state already deployed.
Paranoid people want "IBRS always" aka "ibrs 2".
Future CPUs may have cheap-enough IBRS to not be worth switching it.
On Thu, Jan 04, 2018 at 07:33:45PM +0100, Borislav Petkov wrote:
> On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
> > There are 2 ways to control IBRS
> >
> > 1. At boot time
> > noibrs kernel boot parameter will disable IBRS usage
> >
> > Otherwise if the above parameters are not specified, the system
> > will enable ibrs and ibpb usage if the cpu supports it.
> >
> > 2. At run time
> > echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
> > echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
> > echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
>
> I am not sure that tristate is really needed. What's wrong with on/off
> only?
well, tristate is to provide a new feature (which is also incidentally
the mode new silicon will prefer to run).
> Also, stuff like that:
>
> +/* mutex to serialize IBRS control changes */
> +DEFINE_MUTEX(spec_ctrl_mutex);
> +EXPORT_SYMBOL(spec_ctrl_mutex);
>
> looks ugly.
Consolidating in arch/x86/kernel/spec_ctrl.c would allow removing that
export.
Here I've got:
static DEFINE_MUTEX(spec_ctrl_mutex);
> Wrapper functions which everything calls and they hide internals are
> much more preferrable.
Agreed.
> And then, if at all, this needs to be connected to the retpolines fun,
> methinks, so that it can be decided at boot what to use.
Turning this off at any time is very easy, making reptoline runtime
disabled is more difficult.
Thanks,
Andrea
On Thu, 4 Jan 2018 09:56:46 -0800
Tim Chen <[email protected]> wrote:
> From: David Woodhouse <[email protected]>
>
> We are impervious to the indirect branch prediction attack with retpoline
> but firmware won't be, so we still need to set IBRS to protect
> firmware code execution when calling into firmware at runtime.
If you are going to care about APM then you also need to care about
BIOS32 interfaces (arch/x86/pc/pcibios.c) and PNPBIOS
(drivers/pnp/pnpbios/bioscalls.c)
Alan
On Thu, Jan 04, 2018 at 07:34:30PM +0100, Andrea Arcangeli wrote:
> void spec_ctrl_rescan_cpuid(void)
> {
> int cpu;
>
> if (use_ibp_disable)
> return;
> mutex_lock(&spec_ctrl_mutex);
> if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
> boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
> /* detect spec ctrl related cpuid additions */
> init_scattered_cpuid_features(&boot_cpu_data);
You don't need to noodle through all the scattered features - just the
two bits.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Jan 04, 2018 at 10:36:30AM -0800, Dave Hansen wrote:
> Distros have the tri-state already deployed.
So that's not really a reason.
> Paranoid people want "IBRS always" aka "ibrs 2".
So why not "IBRS always" or off? No need for the "IBRS only in the
kernel" setting.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, 4 Jan 2018 09:56:47 -0800
Tim Chen <[email protected]> wrote:
> There are 2 ways to control IBRS
>
> 1. At boot time
> noibrs kernel boot parameter will disable IBRS usage
>
> Otherwise if the above parameters are not specified, the system
> will enable ibrs and ibpb usage if the cpu supports it.
>
> 2. At run time
> echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
> echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
> echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
>
> The implementation was updated with input from Andrea Arcangeli.
>
Why are these under debug ? This isn't a debugging thing.
Alan
On 01/04/2018 10:38 AM, Andrea Arcangeli wrote:
>> And then, if at all, this needs to be connected to the retpolines fun,
>> methinks, so that it can be decided at boot what to use.
Yes, we need to reconcile this with the retpoline. I've tried to
capture what we do where in here:
> https://docs.google.com/document/d/e/2PACX-1vSMrwkaoSUBAFc6Fjd19F18c1O9pudkfAY-7lGYGOTN8mc9ul-J6pWadcAaBJZcVA7W_3jlLKRtKRbd/pub
Any feedback on this, especially on the CPUs that aren't specifically
covered would be welcome.
On Thu, Jan 04, 2018 at 07:38:26PM +0100, Andrea Arcangeli wrote:
> Consolidating in arch/x86/kernel/spec_ctrl.c would allow removing that
> export.
>
> Here I've got:
>
> static DEFINE_MUTEX(spec_ctrl_mutex);
Yap.
> Turning this off at any time is very easy, making reptoline runtime
> disabled is more difficult.
So dwmw2 is doing alternatives. Seems workable to me.
Also, you don't want to slap all those different settings in front of
the user and then expect her to do an informed choice in all cases.
So it should be a very simple first set of choices what to set and then
add more only when really needed and when it really makes sense.
IMO.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
+ Tom.
On Thu, Jan 04, 2018 at 10:54:38AM -0800, Dave Hansen wrote:
> On 01/04/2018 10:38 AM, Andrea Arcangeli wrote:
> >> And then, if at all, this needs to be connected to the retpolines fun,
> >> methinks, so that it can be decided at boot what to use.
> Yes, we need to reconcile this with the retpoline. I've tried to
> capture what we do where in here:
>
> > https://docs.google.com/document/d/e/2PACX-1vSMrwkaoSUBAFc6Fjd19F18c1O9pudkfAY-7lGYGOTN8mc9ul-J6pWadcAaBJZcVA7W_3jlLKRtKRbd/pub
>
> Any feedback on this, especially on the CPUs that aren't specifically
> covered would be welcome.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, Jan 04, 2018 at 07:52:19PM +0100, Borislav Petkov wrote:
> So why not "IBRS always" or off? No need for the "IBRS only in the
> kernel" setting.
Because it's slower (or much slower depending on how much stuff the
microcode has to disable in the CPU to provide IBSR) and you only need
that kind of protection in kernel if you've PTI enabled already.
ibrs 1 (not 2) is the current default because of that.
On 01/04/2018 10:52 AM, Borislav Petkov wrote:
>> Paranoid people want "IBRS always" aka "ibrs 2".
>
> So why not "IBRS always" or off? No need for the "IBRS only in the
> kernel" setting.
IBRS=1 slows execution down. If it's on all the time, you pay a
performance cost in userspace. The assumption is that the user/kernel
boundary switching cost is below the cost of having it on all the time.
On Thu, Jan 4, 2018 at 9:56 AM, Tim Chen <[email protected]> wrote:
>
> Speculation on Skylake and later requires these patches ("dynamic IBRS")
> be used instead of retpoline[1].
Can somebody explain this part?
I was assuming that retpoline would work around this issue on all uarchs.
This seems to say "retpoline does nothing on Skylake+"
Linus
On 01/04/2018 10:34 AM, Andrea Arcangeli wrote:
> On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
>> There are 2 ways to control IBRS
>>
>> 1. At boot time
>> noibrs kernel boot parameter will disable IBRS usage
>>
>> Otherwise if the above parameters are not specified, the system
>> will enable ibrs and ibpb usage if the cpu supports it.
>>
>> 2. At run time
>> echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
>> echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
>> echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
>
> debug/x86/?
>
> Reviewed-by: Andrea Arcangeli <[email protected]>
>
> Thanks,
> Andrea
>
Yes, will correct.
Thanks.
On Thu, Jan 4, 2018 at 11:56 AM, Tim Chen <[email protected]> wrote:
> This patch series enables the basic detection and usage of x86 indirect
> branch speculation feature. It enables the indirect branch restricted
> speculation (IBRS) on kernel entry and disables it on exit.
> It enumerates the indirect branch prediction barrier (IBPB).
>
> The x86 IBRS feature requires corresponding microcode support.
> It mitigates the variant 2 vulnerability described in
> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
>
Are there plans to make the corresponding microcode support available?
On Thu, Jan 04, 2018 at 10:59:35AM -0800, Dave Hansen wrote:
> On 01/04/2018 10:52 AM, Borislav Petkov wrote:
> >> Paranoid people want "IBRS always" aka "ibrs 2".
> >
> > So why not "IBRS always" or off? No need for the "IBRS only in the
> > kernel" setting.
>
> IBRS=1 slows execution down. If it's on all the time, you pay a
> performance cost in userspace. The assumption is that the user/kernel
> boundary switching cost is below the cost of having it on all the time.
Ok, so we need to make this distinction visible to users so that they
are aware.
Also, adding a common, combined option which is called something like
make_my_kernel_supa_dupa_secure
or so would make sense too, for all those paranoid people. Because I see
the mess coming:
ibrs= ibpb= pti= ...
A helluva confusion that would be.
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 01/04/2018 11:05 AM, Justin Forbes wrote:
> On Thu, Jan 4, 2018 at 11:56 AM, Tim Chen <[email protected]> wrote:
>> This patch series enables the basic detection and usage of x86 indirect
>> branch speculation feature. It enables the indirect branch restricted
>> speculation (IBRS) on kernel entry and disables it on exit.
>> It enumerates the indirect branch prediction barrier (IBPB).
>>
>> The x86 IBRS feature requires corresponding microcode support.
>> It mitigates the variant 2 vulnerability described in
>> https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
>>
>
> Are there plans to make the corresponding microcode support available?
>
The microcode patches should be released soon.
Tim
On 01/04/2018 10:50 AM, Borislav Petkov wrote:
> On Thu, Jan 04, 2018 at 07:34:30PM +0100, Andrea Arcangeli wrote:
>> void spec_ctrl_rescan_cpuid(void)
>> {
>> int cpu;
>>
>> if (use_ibp_disable)
>> return;
>> mutex_lock(&spec_ctrl_mutex);
>> if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
>> boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
>> /* detect spec ctrl related cpuid additions */
>> init_scattered_cpuid_features(&boot_cpu_data);
>
> You don't need to noodle through all the scattered features - just the
> two bits.
>
Sure. Will update in rev 2.
Tim
On Thu, 2018-01-04 at 11:00 -0800, Linus Torvalds wrote:
> On Thu, Jan 4, 2018 at 9:56 AM, Tim Chen <[email protected]>
> wrote:
> >
> >
> > Speculation on Skylake and later requires these patches ("dynamic
> > IBRS")
> > be used instead of retpoline[1].
> Can somebody explain this part?
>
> I was assuming that retpoline would work around this issue on all
> uarchs.
>
> This seems to say "retpoline does nothing on Skylake+"
On Skylake the target for a 'ret' instruction may also come from the
BTB. So if you ever let the RSB (which remembers where the 'call's came
from get empty, you end up vulnerable.
Other than the obvious call stack of more than 16 calls in depth,
there's also a big list of other things which can empty the RSB,
including an SMI.
Which basically makes retpoline on Skylake+ *very* hard to use
reliably. The plan is to use IBRS there and not retpoline.
On Thu, Jan 4, 2018 at 11:19 AM, David Woodhouse <[email protected]> wrote:
>
> On Skylake the target for a 'ret' instruction may also come from the
> BTB. So if you ever let the RSB (which remembers where the 'call's came
> from get empty, you end up vulnerable.
That sounds like it could cause mispredicts, but it doesn't sound _exploitable_.
Sure, interrupts in between the call instruction and the 'ret' could
overflow the return stack. And we could migrate to another CPU. And so
apparently SMM clears the return stack too.
... but again, none of them sound even remotely _exploitable_.
Remember: it's not mispredicts that leak information. It's *exploits"
that use forced very specific mispredicts to leak information.
There's a big difference there. And I think patch authors should keep
that difference in mind.
For example, flushing the BTB at kernel entry doesn't mean that later
in-kernel indirect branches don't get predicted, and doesn't even mean
that they don't get mis-predicted. It only means that an exploit can't
pre-populate those things and use them for exploits.
Linus
On Thu, 2018-01-04 at 11:33 -0800, Linus Torvalds wrote:
> On Thu, Jan 4, 2018 at 11:19 AM, David Woodhouse <[email protected]> wrote:
> >
> > On Skylake the target for a 'ret' instruction may also come from the
> > BTB. So if you ever let the RSB (which remembers where the 'call's came
> > from get empty, you end up vulnerable.
>
> That sounds like it could cause mispredicts, but it doesn't sound _exploitable_.
>
> Sure, interrupts in between the call instruction and the 'ret' could
> overflow the return stack. And we could migrate to another CPU. And so
> apparently SMM clears the return stack too.
>
> ... but again, none of them sound even remotely _exploitable_.
The concern is that the attacker could poison the BTB for a 'ret'
insteruction, as in the general case of the SP2 (conditional branch
misprediction) attack, so that it predicts a branch to an address of
the attacker's choice.
Now *most* of the time, one might expect the target for that 'ret' to
come from the RSB. But if there is a way to force the RSB to empty, or
the attacker is just happy to keep trying, and wait for things like SMI
to make it work every now and then, then it *might* be exploitable.
It's quite possible that a proof exists that all the above is *so*
hypothetical and unlikely, that we might as well use retpoline on
Skylake too. So far, nobody's proved it sufficiently; that's all.
So we're erring on the side of caution there.
On 04/01/18 19:33, Linus Torvalds wrote:
> On Thu, Jan 4, 2018 at 11:19 AM, David Woodhouse <[email protected]> wrote:
>> On Skylake the target for a 'ret' instruction may also come from the
>> BTB. So if you ever let the RSB (which remembers where the 'call's came
>> from get empty, you end up vulnerable.
> That sounds like it could cause mispredicts, but it doesn't sound _exploitable_.
>
> Sure, interrupts in between the call instruction and the 'ret' could
> overflow the return stack. And we could migrate to another CPU. And so
> apparently SMM clears the return stack too.
>
> ... but again, none of them sound even remotely _exploitable_.
>
> Remember: it's not mispredicts that leak information. It's *exploits"
> that use forced very specific mispredicts to leak information.
>
> There's a big difference there. And I think patch authors should keep
> that difference in mind.
>
> For example, flushing the BTB at kernel entry doesn't mean that later
> in-kernel indirect branches don't get predicted, and doesn't even mean
> that they don't get mis-predicted. It only means that an exploit can't
> pre-populate those things and use them for exploits.
Retpoline as a mitigation strategy swaps indirect branches for returns,
to avoid using predictions which come from the BTB, as they can be
poisoned by an attacker.
The problem with Skylake+ is that an RSB underflow falls back to using a
BTB prediction, which allows the attacker to take control of speculation.
Also remember that sibling threads share a BTB, so you can't rely on
isolated straight-line codepath on the current cpu for safety. (e.g. by
issuing an IBPB on every entry to supervisor mode).
~Andrew
On Thu, 2018-01-04 at 19:40 +0000, Andrew Cooper wrote:
>
> Also remember that sibling threads share a BTB, so you can't rely on
> isolated straight-line codepath on the current cpu for safety. (e.g. by
> issuing an IBPB on every entry to supervisor mode).
That is just one of a whole litany of reasons the RSB gets flushed,
*many* of which were individually sufficient for us to throw our toys
out of the pram and say "OK, this doesn't work on Skylake".
On Thu, Jan 04, 2018 at 09:56:42AM -0800, Tim Chen wrote:
> cpuid ax=0x7, return rdx bit 26 to indicate presence of this feature
> IA32_SPEC_CTRL (0x48) and IA32_PRED_CMD (0x49)
> IA32_SPEC_CTRL, bit0 – Indirect Branch Restricted Speculation (IBRS)
> IA32_PRED_CMD, bit0 – Indirect Branch Prediction Barrier (IBPB)
>
> If IBRS is set, near returns and near indirect jumps/calls will not
> allow their predicted target address to be controlled by code that
> executed in a less privileged prediction mode before the IBRS mode was
> last written with a value of 1 or on another logical processor so long
> as all RSB entries from the previous less privileged prediction mode
> are overwritten.
>
> Setting of IBPB ensures that earlier code's behavior does not control later
> indirect branch predictions. It is used when context switching to new
> untrusted address space. Unlike IBRS, it is a command MSR and does not retain
> its state.
>
> * Thus a near indirect jump/call/return may be affected by code in a
> less privileged prediction mode that executed AFTER IBRS mode was last
> written with a value of 1
>
> * There is no need to clear IBRS before writing it with a value of
> 1. Unconditionally writing it with a value of 1 after the prediction
> mode change is sufficient
>
> * Note: IBRS is not required in order to isolate branch predictions for
> SMM or SGX enclaves
>
> * Code executed by a sibling logical processor cannot control indirect
> jump/call/return predicted target when IBRS is set
>
> * SMEP will prevent supervisor mode using RSB entries filled by user code;
> this can reduce the need for software to overwrite RSB entries
>
> * IBRS is not guaranteed to differentiate two applications that use
> the same CR3 due to recycling. Software can use an IBPB command when
> recycling a page table base address.
>
> * VMM software can similarly use an IBPB when recycling a controlling
> VMCS pointer address
>
> CPU performance could be reduced when running with IBRS set.
>
> Signed-off-by: Tim Chen <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/msr-index.h | 7 +++++++
> arch/x86/kernel/cpu/scattered.c | 1 +
> 3 files changed, 9 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 86c68cb..431f393 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -209,6 +209,7 @@
> #define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */
>
> #define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
> +#define X86_FEATURE_SPEC_CTRL ( 7*32+19) /* Control Speculation Control */
>
> /* Virtualization flags: Linux defined, word 8 */
> #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
You should have gotten a build warning with just this patch, please also
update tools/arch/x86/include/asm/cpufeatures.h to fix that.
And why not use a free slot, (7*32+13) or (7*32+12) is free, right?
Or were you just trying to make backports "easier"? :)
thanks,
greg k-h
On Thu, Jan 04, 2018 at 09:56:44AM -0800, Tim Chen wrote:
>
> That is a minor inefficiency only, but we can eliminate it by saving
> the MSR when entering the NMI in save_paranoid and restoring it when
> exiting the NMI.
Any hints as to what exactly "minor" means in cycles here? :)
thanks,
greg k-h
On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote:
> From: David Woodhouse <[email protected]>
>
> We are impervious to the indirect branch prediction attack with retpoline
> but firmware won't be, so we still need to set IBRS to protect
> firmware code execution when calling into firmware at runtime.
Wait, what?
Maybe it's just the wine from dinner talking, but if the firmware has
issues, we have bigger things to worry about here, right? It already
handed over the "chain of trust" to us, so we have already implicitly
trusted that the firmware was correct here. So why do we need to do
anything about firmware calls in this manner?
Or am I totally missing something else here?
thanks,
greg k-h
On Thu, 2018-01-04 at 21:05 +0100, Greg KH wrote:
> On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote:
> >
> > From: David Woodhouse <[email protected]>
> >
> > We are impervious to the indirect branch prediction attack with
> > retpoline
> > but firmware won't be, so we still need to set IBRS to protect
> > firmware code execution when calling into firmware at runtime.
> Wait, what?
>
> Maybe it's just the wine from dinner talking, but if the firmware has
> issues, we have bigger things to worry about here, right? It already
> handed over the "chain of trust" to us, so we have already implicitly
> trusted that the firmware was correct here. So why do we need to do
> anything about firmware calls in this manner?
In the ideal world, firmware exists to boot the kernel and then it gets
out of the way, never to be thought of again.
In the Intel world, firmware idiocy permeates everything and we
sometimes end up making calls to it at runtime.
If an attacker can poison the BTB for an indirect branch in EFI
firmware, then reliably do something which calls EFI runtime calls,
that might lead to an exploit. So if we're using retpoline for the
kernel, then we should be setting IBRS before any firmware calls.
On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
> There are 2 ways to control IBRS
>
> 1. At boot time
> noibrs kernel boot parameter will disable IBRS usage
>
> Otherwise if the above parameters are not specified, the system
> will enable ibrs and ibpb usage if the cpu supports it.
>
> 2. At run time
> echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
> echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
> echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
>
> The implementation was updated with input from Andrea Arcangeli.
>
> Signed-off-by: Tim Chen <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 4 +
> arch/x86/include/asm/spec_ctrl.h | 163 +++++++++++++++++++-----
> arch/x86/kernel/cpu/Makefile | 1 +
> arch/x86/kernel/cpu/scattered.c | 10 ++
> arch/x86/kernel/cpu/spec_ctrl.c | 124 ++++++++++++++++++
> 5 files changed, 270 insertions(+), 32 deletions(-)
> create mode 100644 arch/x86/kernel/cpu/spec_ctrl.c
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 5dfd262..d64f49f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2573,6 +2573,10 @@
> noexec=on: enable non-executable mappings (default)
> noexec=off: disable non-executable mappings
>
> + noibrs [X86]
> + Don't use indirect branch restricted speculation (IBRS)
> + feature.
> +
> nosmap [X86]
> Disable SMAP (Supervisor Mode Access Prevention)
> even if it is supported by processor.
> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
> index 23b2804..2c35571 100644
> --- a/arch/x86/include/asm/spec_ctrl.h
> +++ b/arch/x86/include/asm/spec_ctrl.h
> @@ -1,13 +1,17 @@
> #ifndef _ASM_X86_SPEC_CTRL_H
> #define _ASM_X86_SPEC_CTRL_H
>
> -#include <linux/stringify.h>
> #include <asm/msr-index.h>
> #include <asm/cpufeatures.h>
> -#include <asm/alternative-asm.h>
> +
> +#define SPEC_CTRL_IBRS_INUSE (1<<0) /* OS enables IBRS usage */
> +#define SPEC_CTRL_IBRS_SUPPORTED (1<<1) /* System supports IBRS */
> +#define SPEC_CTRL_IBRS_ADMIN_DISABLED (1<<2) /* Admin disables IBRS */
>
> #ifdef __ASSEMBLY__
>
> +.extern spec_ctrl_ibrs
> +
> .macro PUSH_MSR_REGS
> pushq %rax
> pushq %rcx
> @@ -27,35 +31,63 @@
> .endm
>
> .macro ENABLE_IBRS
> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
> + jz .Lskip_\@
> +
> PUSH_MSR_REGS
> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
> POP_MSR_REGS
> -10:
> +
> + jmp .Ldone_\@
> +.Lskip_\@:
> + /*
> + * prevent speculation beyond here as we could want to
> + * stop speculation by enabling IBRS
> + */
> + lfence
> +.Ldone_\@:
> .endm
>
> .macro DISABLE_IBRS
> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
> + jz .Lskip_\@
> +
> PUSH_MSR_REGS
> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
> POP_MSR_REGS
> -10:
> +
> +.Lskip_\@:
> .endm
>
> .macro ENABLE_IBRS_CLOBBER
> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
> + jz .Lskip_\@
> +
> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
> -10:
> +
> + jmp .Ldone_\@
> +.Lskip_\@:
> + /*
> + * prevent speculation beyond here as we could want to
> + * stop speculation by enabling IBRS
> + */
> + lfence
> +.Ldone_\@:
> .endm
>
> .macro DISABLE_IBRS_CLOBBER
> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
> + jz .Lskip_\@
> +
> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
> -10:
> +
> +.Lskip_\@:
> .endm
>
> .macro ENABLE_IBRS_SAVE_AND_CLOBBER save_reg:req
> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
> + jz .Lskip_\@
> +
> movl $MSR_IA32_SPEC_CTRL, %ecx
> rdmsr
> movl %eax, \save_reg
> @@ -63,22 +95,103 @@
> movl $0, %edx
> movl $SPEC_CTRL_FEATURE_ENABLE_IBRS, %eax
> wrmsr
> -10:
> +
> + jmp .Ldone_\@
> +.Lskip_\@:
> + /*
> + * prevent speculation beyond here as we could want to
> + * stop speculation by enabling IBRS
> + */
> + lfence
> +.Ldone_\@:
> .endm
>
> .macro RESTORE_IBRS_CLOBBER save_reg:req
> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
> + jz .Lskip_\@
> +
> /* Set IBRS to the value saved in the save_reg */
> movl $MSR_IA32_SPEC_CTRL, %ecx
> movl $0, %edx
> movl \save_reg, %eax
> wrmsr
> -10:
> +
> + jmp .Ldone_\@
> +.Lskip_\@:
> + /*
> + * prevent speculation beyond here as we could want to
> + * stop speculation by enabling IBRS
> + */
> + lfence
> +.Ldone_\@:
> .endm
>
> #else
> #include <asm/microcode.h>
>
> +extern int spec_ctrl_ibrs;
> +extern struct mutex spec_ctrl_mutex;
> +extern unsigned int ibrs_enabled;
> +
> +enum {
> + IBRS_DISABLED,
> + /* in host kernel, disabled in guest and userland */
> + IBRS_ENABLED,
> + /* in host kernel and host userland, disabled in guest */
> + IBRS_ENABLED_USER,
> + IBRS_MAX = IBRS_ENABLED_USER,
Yup, not a bit field, your documentation is wrong :(
> +};
> +
> +
> +static inline void set_ibrs_inuse(void)
> +{
> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
> + spec_ctrl_ibrs |= SPEC_CTRL_IBRS_INUSE;
> +}
> +
> +static inline void clear_ibrs_inuse(void)
> +{
> + spec_ctrl_ibrs &= ~SPEC_CTRL_IBRS_INUSE;
> +}
> +
> +static inline int ibrs_inuse(void)
> +{
> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_INUSE)
> + return 1;
> + else
> + /*
> + * prevent speculation beyond here as we could want to
> + * stop speculation by enabling IBRS with this check
> + */
> + rmb();
> + return 0;
> +}
> +
> +static inline void set_ibrs_supported(void)
> +{
> + spec_ctrl_ibrs |= SPEC_CTRL_IBRS_SUPPORTED;
> + if (!(spec_ctrl_ibrs & SPEC_CTRL_IBRS_ADMIN_DISABLED))
> + set_ibrs_inuse();
> + else
> + /*
> + * prevent speculation beyond here as we could want to
> + * stop speculation by enabling IBRS
> + */
> + rmb();
> +}
> +
> +static inline void set_ibrs_disabled(void)
> +{
> + spec_ctrl_ibrs |= SPEC_CTRL_IBRS_ADMIN_DISABLED;
> + if (ibrs_inuse())
> + clear_ibrs_inuse();
> +}
> +
> +static inline void clear_ibrs_disabled(void)
> +{
> + spec_ctrl_ibrs &= ~SPEC_CTRL_IBRS_ADMIN_DISABLED;
> +}
> +
> static inline void __disable_indirect_speculation(void)
> {
> native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
> @@ -96,21 +209,14 @@ static inline void __enable_indirect_speculation(void)
> static inline void unprotected_speculation_begin(void)
> {
> WARN_ON_ONCE(!irqs_disabled());
> - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + if (ibrs_inuse())
> __enable_indirect_speculation();
> }
>
> static inline void unprotected_speculation_end(void)
> {
> - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + if (ibrs_inuse())
> __disable_indirect_speculation();
> - else
> - /*
> - * If we intended to disable indirect speculation
> - * but come here due to mis-speculation, we need
> - * to stop the mis-speculation with rmb.
> - */
> - rmb();
> }
>
>
> @@ -121,20 +227,13 @@ static inline void unprotected_speculation_end(void)
> */
> static inline void unprotected_firmware_begin(void)
> {
> - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + if (ibrs_inuse())
> __disable_indirect_speculation();
> - else
> - /*
> - * If we intended to disable indirect speculation
> - * but come here due to mis-speculation, we need
> - * to stop the mis-speculation with rmb.
> - */
> - rmb();
> }
>
> static inline void unprotected_firmware_end(void)
> {
> - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + if (ibrs_inuse())
> __enable_indirect_speculation();
> }
>
> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
> index 90cb82d..a25f1ab 100644
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -24,6 +24,7 @@ obj-y += match.o
> obj-y += bugs.o
> obj-$(CONFIG_CPU_FREQ) += aperfmperf.o
> obj-y += cpuid-deps.o
> +obj-y += spec_ctrl.o
>
> obj-$(CONFIG_PROC_FS) += proc.o
> obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index bc50c40..2d23a2fe 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -8,6 +8,7 @@
> #include <asm/processor.h>
>
> #include <asm/apic.h>
> +#include <asm/spec_ctrl.h>
>
> struct cpuid_bit {
> u16 feature;
> @@ -56,6 +57,15 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
>
> if (regs[cb->reg] & (1 << cb->bit))
> set_cpu_cap(c, cb->feature);
> +
> + }
> + if (!c->cpu_index) {
> + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> + printk(KERN_INFO "FEATURE SPEC_CTRL Present\n");
> + set_ibrs_supported();
> + if (ibrs_inuse())
> + ibrs_enabled = IBRS_ENABLED;
> + }
> }
> }
>
> diff --git a/arch/x86/kernel/cpu/spec_ctrl.c b/arch/x86/kernel/cpu/spec_ctrl.c
> new file mode 100644
> index 0000000..6946678
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/spec_ctrl.c
> @@ -0,0 +1,124 @@
> +#include <linux/export.h>
No copyright or SPDX line?
At least it doesn't have the old horrid Intel header boilerplate, so I
should be thankful of that. But it isn't ok like this either, sorry.
> +#include <linux/init.h>
> +#include <linux/mutex.h>
> +#include <linux/debugfs.h>
> +#include <linux/uaccess.h>
> +
> +#include <asm/spec_ctrl.h>
> +#include <asm/cpufeature.h>
> +
> +/*
> + * spec_ctrl_ibrs
> + * bit 0 = indicate if ibrs is currently in use
> + * bit 1 = indicate if system supports ibrs
> + * bit 2 = indicate if admin disables ibrs
Why bits and not integer values? Can you mix them?
> + */
> +
> +int spec_ctrl_ibrs;
> +EXPORT_SYMBOL(spec_ctrl_ibrs);
Why is this exported? What module will ever need this?
And horrid global symbol name, it doesn't say what it is in an obvious
way just by looking at it.
> +
> +/* mutex to serialize IBRS control changes */
> +DEFINE_MUTEX(spec_ctrl_mutex);
> +EXPORT_SYMBOL(spec_ctrl_mutex);
Why export this? You only ever use it in one place? Again, what module
needs it?
And I have to ask, EXPORT_SYMBOL_GPL() for this and spec_ctrl_ibrs as
well?
> +unsigned int ibrs_enabled __read_mostly;
> +EXPORT_SYMBOL(ibrs_enabled);
Again, what module needs this?
And did __read_mostly really matter in performance tests?
> +
> +static int __init noibrs(char *str)
> +{
> + set_ibrs_disabled();
> +
> + return 0;
> +}
> +early_param("noibrs", ex);
> +
> +static ssize_t __enabled_read(struct file *file, char __user *user_buf,
> + size_t count, loff_t *ppos, unsigned int *field)
> +{
> + char buf[32];
> + unsigned int len;
> +
> + len = sprintf(buf, "%d\n", READ_ONCE(*field));
> + return simple_read_from_buffer(user_buf, count, ppos, buf, len);
> +}
> +
> +static ssize_t ibrs_enabled_read(struct file *file, char __user *user_buf,
> + size_t count, loff_t *ppos)
> +{
> + return __enabled_read(file, user_buf, count, ppos, &ibrs_enabled);
> +}
It's a single variable, why such complex debugfs for that? This should
be handled by a helper macro already, right?
> +static void spec_ctrl_flush_all_cpus(u32 msr_nr, u64 val)
> +{
> + int cpu;
> + get_online_cpus();
> + for_each_online_cpu(cpu)
> + wrmsrl_on_cpu(cpu, msr_nr, val);
> + put_online_cpus();
> +}
> +
> +static ssize_t ibrs_enabled_write(struct file *file,
> + const char __user *user_buf,
> + size_t count, loff_t *ppos)
> +{
> + char buf[32];
> + ssize_t len;
> + unsigned int enable;
> +
> + len = min(count, sizeof(buf) - 1);
> + if (copy_from_user(buf, user_buf, len))
> + return -EFAULT;
> +
> + buf[len] = '\0';
> + if (kstrtouint(buf, 0, &enable))
> + return -EINVAL;
3 value "flags" are horrid :(
> +
> + if (enable > IBRS_MAX)
> + return -EINVAL;
> +
> + mutex_lock(&spec_ctrl_mutex);
Always run checkpatch.pl so you don't get grumpy kernel maintainers
telling you to run checkpatch.pl :(
> +
> + if (enable == IBRS_DISABLED) {
> + /* disable IBRS usage */
> + set_ibrs_disabled();
> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
> + spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_DISABLE_IBRS);
> + } else if (enable == IBRS_ENABLED) {
> + /* enable IBRS usage in kernel */
> + clear_ibrs_disabled();
> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
> + set_ibrs_inuse();
> + else
> + /* Platform don't support IBRS */
> + enable = IBRS_DISABLED;
> + } else if (enable == IBRS_ENABLED_USER) {
> + /* enable IBRS usage in both userspace and kernel */
> + clear_ibrs_disabled();
> + /* don't change IBRS value once we set it to always on */
> + clear_ibrs_inuse();
> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
> + spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
> + else
> + /* Platform don't support IBRS */
> + enable = IBRS_DISABLED;
> + }
> +
> + WRITE_ONCE(ibrs_enabled, enable);
It's a debugfs write callback, why do you care about WRITE_ONCE()?
thanks,
greg k-h
On 04/01/18 20:05, Greg KH wrote:
> On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote:
>> From: David Woodhouse <[email protected]>
>>
>> We are impervious to the indirect branch prediction attack with retpoline
>> but firmware won't be, so we still need to set IBRS to protect
>> firmware code execution when calling into firmware at runtime.
> Wait, what?
>
> Maybe it's just the wine from dinner talking, but if the firmware has
> issues, we have bigger things to worry about here, right? It already
> handed over the "chain of trust" to us, so we have already implicitly
> trusted that the firmware was correct here. So why do we need to do
> anything about firmware calls in this manner?
>
> Or am I totally missing something else here?
The firmware doesn't have to be malicious to cause problems for the OS.
There is still an open question of what happens in the RSB-to-SMM case,
where the SMM handler empties the RSB just before supervisor code
executes a ret instruction. Hardware (other than the Skylake+ case
which uses a BTB prediction) speculates to the stale top-of-RSB entry,
for want of anything better to do. (AMD have confirmed this, Intel
haven't replied to my query yet.)
Therefore, a crafty piece of userspace can stick a speculative leaky
gadget at a linear address which aliases the SMM code, and wait for an
SMI to hit.
To mitigate, a kernel has to hope that the SMM handler doesn't run in a
non-identity mappings, and either rely on SMEP being active, or disallow
userspace mmap()'s covering the SMM region.
True, exploiting this is probably on the upper end of the difficulty
scale here, but I'm willing to be its not the only unexpected
interaction going.
~Andrew
On 01/04/2018 12:00 PM, Greg KH wrote:
> On Thu, Jan 04, 2018 at 09:56:44AM -0800, Tim Chen wrote:
>>
>> That is a minor inefficiency only, but we can eliminate it by saving
>> the MSR when entering the NMI in save_paranoid and restoring it when
>> exiting the NMI.
>
> Any hints as to what exactly "minor" means in cycles here? :)
>
The current implementation does not have this inefficiency. The
comment is to explain why we need to save the IBRS state in save_paranoid.
The issue is if we don't save the IBRS state for NMI,
For nested interrupts, it is hard to
figure out when we are returning from NMI, whether we are
returning to user space or kernel space. And if we do the safe
thing by leaving IBRS on, there is a possibility that we may
return to user space with IBRS enabled, which will affect performance.
The possibility of hitting this is minor, but still we want
to eliminate it.
Thanks.
Tim
On 01/04/2018 09:56 AM, Tim Chen wrote:
> If NMI runs when exiting kernel between IBRS_DISABLE and
> SWAPGS, the NMI would have turned on IBRS bit 0 and then it would have
> left enabled when exiting the NMI. IBRS bit 0 would then be left
> enabled in userland until the next enter kernel.
>
> That is a minor inefficiency only, but we can eliminate it by saving
> the MSR when entering the NMI in save_paranoid and restoring it when
> exiting the NMI.
Can I suggest and alternate description for the NMI case? This is
long-winded, but it should keep me from having to think through it yet
again. :)
"
The normal interrupt code uses the 'error_entry' path which uses the
Code Segment (CS) of the instruction that was interrupted to tell
whether it interrupted the kernel or userspace and thus has to switch
IBRS, or leave it alone.
The NMI code is different. It uses 'paranoid_entry' because it can
interrupt the kernel while it is running with a userspace IBRS (and %GS
and CR3) value, but has a kernel CS. If we used the same approach as
the normal interrupt code, we might do the following;
SYSENTER_entry
<-------------- NMI HERE
IBRS=1
do_something()
IBRS=0
SYSRET
The NMI code might notice that we are running in the kernel and decide
that it is OK to skip the IBRS=1. This would leave it running
unprotected with IBRS=0, which is bad.
However, if we unconditionally set IBRS=1, in the NMI, we might get the
following case:
SYSENTER_entry
IBRS=1
do_something()
IBRS=0
<-------------- NMI HERE (set IBRS=1)
SYSRET
and we would return to userspace with IBRS=1. Userspace would run
slowly until we entered and exited the kernel again. (This is the case
Tim is alluding to in the patch description).
Instead of those two approaches, we chose a third one where we simply
save the IBRS value in a scratch register (%r13) and then restore that
value, verbatim. This is what PTI does with CR3 and it works beautifully.
"
On 01/04/2018 11:58 AM, Greg KH wrote:
> On Thu, Jan 04, 2018 at 09:56:42AM -0800, Tim Chen wrote:
>> cpuid ax=0x7, return rdx bit 26 to indicate presence of this feature
>> IA32_SPEC_CTRL (0x48) and IA32_PRED_CMD (0x49)
>> IA32_SPEC_CTRL, bit0 – Indirect Branch Restricted Speculation (IBRS)
>> IA32_PRED_CMD, bit0 – Indirect Branch Prediction Barrier (IBPB)
>>
>> If IBRS is set, near returns and near indirect jumps/calls will not
>> allow their predicted target address to be controlled by code that
>> executed in a less privileged prediction mode before the IBRS mode was
>> last written with a value of 1 or on another logical processor so long
>> as all RSB entries from the previous less privileged prediction mode
>> are overwritten.
>>
>> Setting of IBPB ensures that earlier code's behavior does not control later
>> indirect branch predictions. It is used when context switching to new
>> untrusted address space. Unlike IBRS, it is a command MSR and does not retain
>> its state.
>>
>> * Thus a near indirect jump/call/return may be affected by code in a
>> less privileged prediction mode that executed AFTER IBRS mode was last
>> written with a value of 1
>>
>> * There is no need to clear IBRS before writing it with a value of
>> 1. Unconditionally writing it with a value of 1 after the prediction
>> mode change is sufficient
>>
>> * Note: IBRS is not required in order to isolate branch predictions for
>> SMM or SGX enclaves
>>
>> * Code executed by a sibling logical processor cannot control indirect
>> jump/call/return predicted target when IBRS is set
>>
>> * SMEP will prevent supervisor mode using RSB entries filled by user code;
>> this can reduce the need for software to overwrite RSB entries
>>
>> * IBRS is not guaranteed to differentiate two applications that use
>> the same CR3 due to recycling. Software can use an IBPB command when
>> recycling a page table base address.
>>
>> * VMM software can similarly use an IBPB when recycling a controlling
>> VMCS pointer address
>>
>> CPU performance could be reduced when running with IBRS set.
>>
>> Signed-off-by: Tim Chen <[email protected]>
>> ---
>> arch/x86/include/asm/cpufeatures.h | 1 +
>> arch/x86/include/asm/msr-index.h | 7 +++++++
>> arch/x86/kernel/cpu/scattered.c | 1 +
>> 3 files changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index 86c68cb..431f393 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -209,6 +209,7 @@
>> #define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */
>>
>> #define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */
>> +#define X86_FEATURE_SPEC_CTRL ( 7*32+19) /* Control Speculation Control */
>>
>> /* Virtualization flags: Linux defined, word 8 */
>> #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
>
> You should have gotten a build warning with just this patch, please also
> update tools/arch/x86/include/asm/cpufeatures.h to fix that.
Sorry about that. Trying to get this patchset posted quickly
so I could have missed a few things.
>
> And why not use a free slot, (7*32+13) or (7*32+12) is free, right?
>
> Or were you just trying to make backports "easier"? :)
>
>
There are future features related to speculation control. So
putting it here so they can stay together.
Tim
On Thu, Jan 04, 2018 at 09:05:15PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote:
> > From: David Woodhouse <[email protected]>
> >
> > We are impervious to the indirect branch prediction attack with retpoline
> > but firmware won't be, so we still need to set IBRS to protect
> > firmware code execution when calling into firmware at runtime.
>
> Wait, what?
>
> Maybe it's just the wine from dinner talking, but if the firmware has
> issues, we have bigger things to worry about here, right? It already
> handed over the "chain of trust" to us, so we have already implicitly
> trusted that the firmware was correct here. So why do we need to do
> anything about firmware calls in this manner?
>
> Or am I totally missing something else here?
Reptoline-only case there. BIOS isn't built with reptoline. Kernel is
trusted too but it can be tricked unless built with reptoline.
If reptoline wasn't used IBRS would already be set and no issue would
materialize in the BIOS.
This is just a fallout of the reptoline being used, which requires to
close holes here and there by using IBRS where reptoline alone can't
reach.
On Thu, 2018-01-04 at 09:56 -0800, Tim Chen wrote:
> @@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in,
> u32 ecx_in,
> "=S" (*esi)
> : "a" (func), "b" (ebx_in), "c" (ecx_in)
> : "memory", "cc");
> + unprotected_formware_end();
Just an obvious typo (form/firm) but I thought I'd report it.
> }
>
> static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
> @@ -52,6 +56,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32
> ebx_in,
> int cx, dx, si;
> bool error;
>
> + unprotected_firmware_begin();
> /*
> * N.B. We do NOT need a cld after the BIOS call
> * because we always save and restore the flags.
> @@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32
> ebx_in,
> "=S" (si)
> : "a" (func), "b" (ebx_in), "c" (ecx_in)
> : "memory", "cc");
> + unprotected_formware_end();
> return error;
> }
Same here.
--
Yves-Alexis
On 01/04/2018 12:16 PM, Greg KH wrote:
> On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
>> There are 2 ways to control IBRS
>>
>> 1. At boot time
>> noibrs kernel boot parameter will disable IBRS usage
>>
>> Otherwise if the above parameters are not specified, the system
>> will enable ibrs and ibpb usage if the cpu supports it.
>>
>> 2. At run time
>> echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
>> echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
>> echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
>>
>> The implementation was updated with input from Andrea Arcangeli.
>>
>> Signed-off-by: Tim Chen <[email protected]>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 4 +
>> arch/x86/include/asm/spec_ctrl.h | 163 +++++++++++++++++++-----
>> arch/x86/kernel/cpu/Makefile | 1 +
>> arch/x86/kernel/cpu/scattered.c | 10 ++
>> arch/x86/kernel/cpu/spec_ctrl.c | 124 ++++++++++++++++++
>> 5 files changed, 270 insertions(+), 32 deletions(-)
>> create mode 100644 arch/x86/kernel/cpu/spec_ctrl.c
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 5dfd262..d64f49f 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2573,6 +2573,10 @@
>> noexec=on: enable non-executable mappings (default)
>> noexec=off: disable non-executable mappings
>>
>> + noibrs [X86]
>> + Don't use indirect branch restricted speculation (IBRS)
>> + feature.
>> +
>> nosmap [X86]
>> Disable SMAP (Supervisor Mode Access Prevention)
>> even if it is supported by processor.
>> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
>> index 23b2804..2c35571 100644
>> --- a/arch/x86/include/asm/spec_ctrl.h
>> +++ b/arch/x86/include/asm/spec_ctrl.h
>> @@ -1,13 +1,17 @@
>> #ifndef _ASM_X86_SPEC_CTRL_H
>> #define _ASM_X86_SPEC_CTRL_H
>>
>> -#include <linux/stringify.h>
>> #include <asm/msr-index.h>
>> #include <asm/cpufeatures.h>
>> -#include <asm/alternative-asm.h>
>> +
>> +#define SPEC_CTRL_IBRS_INUSE (1<<0) /* OS enables IBRS usage */
>> +#define SPEC_CTRL_IBRS_SUPPORTED (1<<1) /* System supports IBRS */
>> +#define SPEC_CTRL_IBRS_ADMIN_DISABLED (1<<2) /* Admin disables IBRS */
>>
>> #ifdef __ASSEMBLY__
>>
>> +.extern spec_ctrl_ibrs
>> +
>> .macro PUSH_MSR_REGS
>> pushq %rax
>> pushq %rcx
>> @@ -27,35 +31,63 @@
>> .endm
>>
>> .macro ENABLE_IBRS
>> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
>> + jz .Lskip_\@
>> +
>> PUSH_MSR_REGS
>> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
>> POP_MSR_REGS
>> -10:
>> +
>> + jmp .Ldone_\@
>> +.Lskip_\@:
>> + /*
>> + * prevent speculation beyond here as we could want to
>> + * stop speculation by enabling IBRS
>> + */
>> + lfence
>> +.Ldone_\@:
>> .endm
>>
>> .macro DISABLE_IBRS
>> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
>> + jz .Lskip_\@
>> +
>> PUSH_MSR_REGS
>> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
>> POP_MSR_REGS
>> -10:
>> +
>> +.Lskip_\@:
>> .endm
>>
>> .macro ENABLE_IBRS_CLOBBER
>> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
>> + jz .Lskip_\@
>> +
>> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
>> -10:
>> +
>> + jmp .Ldone_\@
>> +.Lskip_\@:
>> + /*
>> + * prevent speculation beyond here as we could want to
>> + * stop speculation by enabling IBRS
>> + */
>> + lfence
>> +.Ldone_\@:
>> .endm
>>
>> .macro DISABLE_IBRS_CLOBBER
>> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
>> + jz .Lskip_\@
>> +
>> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
>> -10:
>> +
>> +.Lskip_\@:
>> .endm
>>
>> .macro ENABLE_IBRS_SAVE_AND_CLOBBER save_reg:req
>> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
>> + jz .Lskip_\@
>> +
>> movl $MSR_IA32_SPEC_CTRL, %ecx
>> rdmsr
>> movl %eax, \save_reg
>> @@ -63,22 +95,103 @@
>> movl $0, %edx
>> movl $SPEC_CTRL_FEATURE_ENABLE_IBRS, %eax
>> wrmsr
>> -10:
>> +
>> + jmp .Ldone_\@
>> +.Lskip_\@:
>> + /*
>> + * prevent speculation beyond here as we could want to
>> + * stop speculation by enabling IBRS
>> + */
>> + lfence
>> +.Ldone_\@:
>> .endm
>>
>> .macro RESTORE_IBRS_CLOBBER save_reg:req
>> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
>> + jz .Lskip_\@
>> +
>> /* Set IBRS to the value saved in the save_reg */
>> movl $MSR_IA32_SPEC_CTRL, %ecx
>> movl $0, %edx
>> movl \save_reg, %eax
>> wrmsr
>> -10:
>> +
>> + jmp .Ldone_\@
>> +.Lskip_\@:
>> + /*
>> + * prevent speculation beyond here as we could want to
>> + * stop speculation by enabling IBRS
>> + */
>> + lfence
>> +.Ldone_\@:
>> .endm
>>
>> #else
>> #include <asm/microcode.h>
>>
>> +extern int spec_ctrl_ibrs;
>> +extern struct mutex spec_ctrl_mutex;
>> +extern unsigned int ibrs_enabled;
>> +
>> +enum {
>> + IBRS_DISABLED,
>> + /* in host kernel, disabled in guest and userland */
>> + IBRS_ENABLED,
>> + /* in host kernel and host userland, disabled in guest */
>> + IBRS_ENABLED_USER,
>> + IBRS_MAX = IBRS_ENABLED_USER,
>
> Yup, not a bit field, your documentation is wrong :(
>
>> +};
>> +
>> +
>> +static inline void set_ibrs_inuse(void)
>> +{
>> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
>> + spec_ctrl_ibrs |= SPEC_CTRL_IBRS_INUSE;
>> +}
>> +
>> +static inline void clear_ibrs_inuse(void)
>> +{
>> + spec_ctrl_ibrs &= ~SPEC_CTRL_IBRS_INUSE;
>> +}
>> +
>> +static inline int ibrs_inuse(void)
>> +{
>> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_INUSE)
>> + return 1;
>> + else
>> + /*
>> + * prevent speculation beyond here as we could want to
>> + * stop speculation by enabling IBRS with this check
>> + */
>> + rmb();
>> + return 0;
>> +}
>> +
>> +static inline void set_ibrs_supported(void)
>> +{
>> + spec_ctrl_ibrs |= SPEC_CTRL_IBRS_SUPPORTED;
>> + if (!(spec_ctrl_ibrs & SPEC_CTRL_IBRS_ADMIN_DISABLED))
>> + set_ibrs_inuse();
>> + else
>> + /*
>> + * prevent speculation beyond here as we could want to
>> + * stop speculation by enabling IBRS
>> + */
>> + rmb();
>> +}
>> +
>> +static inline void set_ibrs_disabled(void)
>> +{
>> + spec_ctrl_ibrs |= SPEC_CTRL_IBRS_ADMIN_DISABLED;
>> + if (ibrs_inuse())
>> + clear_ibrs_inuse();
>> +}
>> +
>> +static inline void clear_ibrs_disabled(void)
>> +{
>> + spec_ctrl_ibrs &= ~SPEC_CTRL_IBRS_ADMIN_DISABLED;
>> +}
>> +
>> static inline void __disable_indirect_speculation(void)
>> {
>> native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
>> @@ -96,21 +209,14 @@ static inline void __enable_indirect_speculation(void)
>> static inline void unprotected_speculation_begin(void)
>> {
>> WARN_ON_ONCE(!irqs_disabled());
>> - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
>> + if (ibrs_inuse())
>> __enable_indirect_speculation();
>> }
>>
>> static inline void unprotected_speculation_end(void)
>> {
>> - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
>> + if (ibrs_inuse())
>> __disable_indirect_speculation();
>> - else
>> - /*
>> - * If we intended to disable indirect speculation
>> - * but come here due to mis-speculation, we need
>> - * to stop the mis-speculation with rmb.
>> - */
>> - rmb();
>> }
>>
>>
>> @@ -121,20 +227,13 @@ static inline void unprotected_speculation_end(void)
>> */
>> static inline void unprotected_firmware_begin(void)
>> {
>> - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
>> + if (ibrs_inuse())
>> __disable_indirect_speculation();
>> - else
>> - /*
>> - * If we intended to disable indirect speculation
>> - * but come here due to mis-speculation, we need
>> - * to stop the mis-speculation with rmb.
>> - */
>> - rmb();
>> }
>>
>> static inline void unprotected_firmware_end(void)
>> {
>> - if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
>> + if (ibrs_inuse())
>> __enable_indirect_speculation();
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
>> index 90cb82d..a25f1ab 100644
>> --- a/arch/x86/kernel/cpu/Makefile
>> +++ b/arch/x86/kernel/cpu/Makefile
>> @@ -24,6 +24,7 @@ obj-y += match.o
>> obj-y += bugs.o
>> obj-$(CONFIG_CPU_FREQ) += aperfmperf.o
>> obj-y += cpuid-deps.o
>> +obj-y += spec_ctrl.o
>>
>> obj-$(CONFIG_PROC_FS) += proc.o
>> obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
>> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
>> index bc50c40..2d23a2fe 100644
>> --- a/arch/x86/kernel/cpu/scattered.c
>> +++ b/arch/x86/kernel/cpu/scattered.c
>> @@ -8,6 +8,7 @@
>> #include <asm/processor.h>
>>
>> #include <asm/apic.h>
>> +#include <asm/spec_ctrl.h>
>>
>> struct cpuid_bit {
>> u16 feature;
>> @@ -56,6 +57,15 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
>>
>> if (regs[cb->reg] & (1 << cb->bit))
>> set_cpu_cap(c, cb->feature);
>> +
>> + }
>> + if (!c->cpu_index) {
>> + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
>> + printk(KERN_INFO "FEATURE SPEC_CTRL Present\n");
>> + set_ibrs_supported();
>> + if (ibrs_inuse())
>> + ibrs_enabled = IBRS_ENABLED;
>> + }
>> }
>> }
>>
>> diff --git a/arch/x86/kernel/cpu/spec_ctrl.c b/arch/x86/kernel/cpu/spec_ctrl.c
>> new file mode 100644
>> index 0000000..6946678
>> --- /dev/null
>> +++ b/arch/x86/kernel/cpu/spec_ctrl.c
>> @@ -0,0 +1,124 @@
>> +#include <linux/export.h>
>
> No copyright or SPDX line?
>
> At least it doesn't have the old horrid Intel header boilerplate, so I
> should be thankful of that. But it isn't ok like this either, sorry.
>
>> +#include <linux/init.h>
>> +#include <linux/mutex.h>
>> +#include <linux/debugfs.h>
>> +#include <linux/uaccess.h>
>> +
>> +#include <asm/spec_ctrl.h>
>> +#include <asm/cpufeature.h>
>> +
>> +/*
>> + * spec_ctrl_ibrs
>> + * bit 0 = indicate if ibrs is currently in use
>> + * bit 1 = indicate if system supports ibrs
>> + * bit 2 = indicate if admin disables ibrs
>
> Why bits and not integer values? Can you mix them?
spec_ctrl_ibrs has bit fields for internal speculation
control and is not exposed to user.
ibrs_enabled is reported to the user and takes the enum
values
+enum {
+ IBRS_DISABLED,
+ /* in host kernel, disabled in guest and userland */
+ IBRS_ENABLED,
+ /* in host kernel and host userland, disabled in guest */
+ IBRS_ENABLED_USER,
+ IBRS_MAX = IBRS_ENABLED_USER,
+};
+
>
>> + */
>> +
>> +int spec_ctrl_ibrs;
>> +EXPORT_SYMBOL(spec_ctrl_ibrs);
>
> Why is this exported? What module will ever need this?
>
> And horrid global symbol name, it doesn't say what it is in an obvious
> way just by looking at it.
>
>> +
>> +/* mutex to serialize IBRS control changes */
>> +DEFINE_MUTEX(spec_ctrl_mutex);
>> +EXPORT_SYMBOL(spec_ctrl_mutex);
>
> Why export this? You only ever use it in one place? Again, what module
> needs it?
It is needed for places that potentially have more than one cpu
touching spec_ctrl_ibrs values. I'll see if I can consolidate
them to spec_ctrl.c to avoid the export.
>
> And I have to ask, EXPORT_SYMBOL_GPL() for this and spec_ctrl_ibrs as
> well?
>
>> +unsigned int ibrs_enabled __read_mostly;
>> +EXPORT_SYMBOL(ibrs_enabled);
>
> Again, what module needs this?
>
> And did __read_mostly really matter in performance tests?
>
>> +
>> +static int __init noibrs(char *str)
>> +{
>> + set_ibrs_disabled();
>> +
>> + return 0;
>> +}
>> +early_param("noibrs", ex);
>> +
>> +static ssize_t __enabled_read(struct file *file, char __user *user_buf,
>> + size_t count, loff_t *ppos, unsigned int *field)
>> +{
>> + char buf[32];
>> + unsigned int len;
>> +
>> + len = sprintf(buf, "%d\n", READ_ONCE(*field));
>> + return simple_read_from_buffer(user_buf, count, ppos, buf, len);
>> +}
>> +
>> +static ssize_t ibrs_enabled_read(struct file *file, char __user *user_buf,
>> + size_t count, loff_t *ppos)
>> +{
>> + return __enabled_read(file, user_buf, count, ppos, &ibrs_enabled);
>> +}
>
> It's a single variable, why such complex debugfs for that? This should
> be handled by a helper macro already, right?
>
>> +static void spec_ctrl_flush_all_cpus(u32 msr_nr, u64 val)
>> +{
>> + int cpu;
>> + get_online_cpus();
>> + for_each_online_cpu(cpu)
>> + wrmsrl_on_cpu(cpu, msr_nr, val);
>> + put_online_cpus();
>> +}
>> +
>> +static ssize_t ibrs_enabled_write(struct file *file,
>> + const char __user *user_buf,
>> + size_t count, loff_t *ppos)
>> +{
>> + char buf[32];
>> + ssize_t len;
>> + unsigned int enable;
>> +
>> + len = min(count, sizeof(buf) - 1);
>> + if (copy_from_user(buf, user_buf, len))
>> + return -EFAULT;
>> +
>> + buf[len] = '\0';
>> + if (kstrtouint(buf, 0, &enable))
>> + return -EINVAL;
>
> 3 value "flags" are horrid :(
Unfortunately, IBRS can have 3 different usage scenarios:
1. disabled all the time
2. used only in kernel and privileged code
3. enabled all the time in both user and kernel space.
>
>> +
>> + if (enable > IBRS_MAX)
>> + return -EINVAL;
>> +
>> + mutex_lock(&spec_ctrl_mutex);
>
> Always run checkpatch.pl so you don't get grumpy kernel maintainers
> telling you to run checkpatch.pl :(
>
>> +
>> + if (enable == IBRS_DISABLED) {
>> + /* disable IBRS usage */
>> + set_ibrs_disabled();
>> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
>> + spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_DISABLE_IBRS);
>> + } else if (enable == IBRS_ENABLED) {
>> + /* enable IBRS usage in kernel */
>> + clear_ibrs_disabled();
>> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
>> + set_ibrs_inuse();
>> + else
>> + /* Platform don't support IBRS */
>> + enable = IBRS_DISABLED;
>> + } else if (enable == IBRS_ENABLED_USER) {
>> + /* enable IBRS usage in both userspace and kernel */
>> + clear_ibrs_disabled();
>> + /* don't change IBRS value once we set it to always on */
>> + clear_ibrs_inuse();
>> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
>> + spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
>> + else
>> + /* Platform don't support IBRS */
>> + enable = IBRS_DISABLED;
>> + }
>> +
>> + WRITE_ONCE(ibrs_enabled, enable);
>
> It's a debugfs write callback, why do you care about WRITE_ONCE()?
Probably not necessary now. It is serialized by spec_ctrl_mutex.
>
> thanks,
>
> greg k-h
>
On Thu, 2018-01-04 at 11:10 -0800, Tim Chen wrote:
> > Are there plans to make the corresponding microcode support available?
> >
>
> The microcode patches should be released soon.
In the meantime, Lenovo has started issuing BIOS/UEFI updates which include
microcode updates for this.
See for example for the ThinkPad X250
https://download.lenovo.com/pccbbs/mobiles/n10ur16w.txt
https://download.lenovo.com/pccbbs/mobiles/n10ur16w.iso
iucode-tool -L -tr n10ur16w.iso |grep 2017-11
001/020: sig 0x000306d4, pf_mask 0xc0, 2017-11-17, rev 0x0028, size 18432
001/022: sig 0x00040651, pf_mask 0x72, 2017-11-20, rev 0x0021, size 22528
001/025: sig 0x000306d4, pf_mask 0xc0, 2017-11-17, rev 0x0028, size 18432
001/027: sig 0x00040651, pf_mask 0x72, 2017-11-20, rev 0x0021, size 22528
001/030: sig 0x000306c3, pf_mask 0x32, 2017-11-20, rev 0x0023, size 23552
001/032: sig 0x00040651, pf_mask 0x72, 2017-11-20, rev 0x0021, size 22528
Regards,
--
Yves-Alexis
On 01/04/2018 12:51 PM, Yves-Alexis Perez wrote:
> On Thu, 2018-01-04 at 09:56 -0800, Tim Chen wrote:
>> @@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in,
>> u32 ecx_in,
>> "=S" (*esi)
>> : "a" (func), "b" (ebx_in), "c" (ecx_in)
>> : "memory", "cc");
>> + unprotected_formware_end();
>
> Just an obvious typo (form/firm) but I thought I'd report it.
>> }
>>
>> static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
>> @@ -52,6 +56,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32
>> ebx_in,
>> int cx, dx, si;
>> bool error;
>>
>> + unprotected_firmware_begin();
>> /*
>> * N.B. We do NOT need a cld after the BIOS call
>> * because we always save and restore the flags.
>> @@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32
>> ebx_in,
>> "=S" (si)
>> : "a" (func), "b" (ebx_in), "c" (ecx_in)
>> : "memory", "cc");
>> + unprotected_formware_end();
>> return error;
>> }
>
> Same here.
>
Thanks. Noted.
Tim
> On Thu, Jan 4, 2018 at 11:19 AM, David Woodhouse <[email protected]>
> wrote:
> >
> > On Skylake the target for a 'ret' instruction may also come from the
> > BTB. So if you ever let the RSB (which remembers where the 'call's came
> > from get empty, you end up vulnerable.
>
> That sounds like it could cause mispredicts, but it doesn't sound _exploitable_.
>
> Sure, interrupts in between the call instruction and the 'ret' could
> overflow the return stack. And we could migrate to another CPU. And so
> apparently SMM clears the return stack too.
>
> ... but again, none of them sound even remotely _exploitable_.
>
this is about a level of paranoia you are comfortable with.
Retpoline on Skylake raises the bar for the issue enormously, but there are a set of corner cases that exist and that are not trivial to prove you dealt with them.
personally I am comfortable with retpoline on Skylake, but I would like to have IBRS as an opt in for the paranoid.
On Thu, Jan 04, 2018 at 09:56:43AM -0800, Tim Chen wrote:
> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
> new file mode 100644
> index 0000000..16fc4f58
> --- /dev/null
> +++ b/arch/x86/include/asm/spec_ctrl.h
Does this really have to live outside of arch/x86/entry/ ?
> +.macro ENABLE_IBRS
> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + PUSH_MSR_REGS
> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
> + POP_MSR_REGS
> +10:
> +.endm
> +
> +.macro DISABLE_IBRS
> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + PUSH_MSR_REGS
> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
> + POP_MSR_REGS
> +10:
> +.endm
> +
> +.macro ENABLE_IBRS_CLOBBER
> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
> +10:
> +.endm
> +
> +.macro DISABLE_IBRS_CLOBBER
> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
> +10:
> +.endm
> +
> +.macro ENABLE_IBRS_SAVE_AND_CLOBBER save_reg:req
> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + movl $MSR_IA32_SPEC_CTRL, %ecx
> + rdmsr
> + movl %eax, \save_reg
> +
> + movl $0, %edx
> + movl $SPEC_CTRL_FEATURE_ENABLE_IBRS, %eax
> + wrmsr
> +10:
> +.endm
> +
> +.macro RESTORE_IBRS_CLOBBER save_reg:req
> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + /* Set IBRS to the value saved in the save_reg */
> + movl $MSR_IA32_SPEC_CTRL, %ecx
> + movl $0, %edx
whitespace damage
> + movl \save_reg, %eax
> + wrmsr
> +10:
> +.endm
Should not all those 10 things look like .Ldone_\@ or something ?
On 01/04/2018 02:16 PM, Peter Zijlstra wrote:
> On Thu, Jan 04, 2018 at 09:56:43AM -0800, Tim Chen wrote:
>> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
>> new file mode 100644
>> index 0000000..16fc4f58
>> --- /dev/null
>> +++ b/arch/x86/include/asm/spec_ctrl.h
>
> Does this really have to live outside of arch/x86/entry/ ?
>
There are some inline C routines later in this file
that will be needed by other functions. Want to consolidate
them in the same file.
>
>> +.macro ENABLE_IBRS
>> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + PUSH_MSR_REGS
>> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
>> + POP_MSR_REGS
>> +10:
>> +.endm
>> +
>> +.macro DISABLE_IBRS
>> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + PUSH_MSR_REGS
>> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
>> + POP_MSR_REGS
>> +10:
>> +.endm
>> +
>> +.macro ENABLE_IBRS_CLOBBER
>> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
>> +10:
>> +.endm
>> +
>> +.macro DISABLE_IBRS_CLOBBER
>> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_DISABLE_IBRS
>> +10:
>> +.endm
>> +
>> +.macro ENABLE_IBRS_SAVE_AND_CLOBBER save_reg:req
>> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + movl $MSR_IA32_SPEC_CTRL, %ecx
>> + rdmsr
>> + movl %eax, \save_reg
>> +
>> + movl $0, %edx
>> + movl $SPEC_CTRL_FEATURE_ENABLE_IBRS, %eax
>> + wrmsr
>> +10:
>> +.endm
>> +
>> +.macro RESTORE_IBRS_CLOBBER save_reg:req
>> + ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + /* Set IBRS to the value saved in the save_reg */
>> + movl $MSR_IA32_SPEC_CTRL, %ecx
>> + movl $0, %edx
>
> whitespace damage
Will fix
>
>> + movl \save_reg, %eax
>> + wrmsr
>> +10:
>> +.endm
>
> Should not all those 10 things look like .Ldone_\@ or something ?
>
They are removed in patch 6 with .Ldone_\@
Tim
On 01/04/2018 02:21 PM, Tim Chen wrote:
>> Does this really have to live outside of arch/x86/entry/ ?
>>
> There are some inline C routines later in this file
> that will be needed by other functions. Want to consolidate
> them in the same file.
We could put all of the assembly into calling.h along with the PTI
assembly. Seems as sane a place as anywhere else to put it.
On Thu, Jan 04, 2018 at 09:56:44AM -0800, Tim Chen wrote:
> Set IBRS upon kernel entrance via syscall and interrupts. Clear it
> upon exit.
So not only did we add a CR3 write, we're now adding an MSR write to the
entry/exit paths. Please tell me that these are 'fast' MSRs? Given
people are already reporting stupid numbers with just the existing
PTI/CR3, what kind of pain are we going to get from adding this?
> If NMI runs when exiting kernel between IBRS_DISABLE and
> SWAPGS, the NMI would have turned on IBRS bit 0 and then it would have
> left enabled when exiting the NMI. IBRS bit 0 would then be left
> enabled in userland until the next enter kernel.
>
> That is a minor inefficiency only, but we can eliminate it by saving
> the MSR when entering the NMI in save_paranoid and restoring it when
> exiting the NMI.
>
> Signed-off-by: Andrea Arcangeli <[email protected]>
> Signed-off-by: Tim Chen <[email protected]>
Invalid SoB chain, either you lost a From: Andrea or you need something
else.
On Thu, Jan 04, 2018 at 09:56:45AM -0800, Tim Chen wrote:
> @@ -100,15 +101,33 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
> {
> if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
> + bool can_toggle_ibrs = false;
> if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
> mb();
> clflush((void *)¤t_thread_info()->flags);
> mb();
> }
>
> + if (irqs_disabled()) {
> + /*
> + * CPUs run faster with speculation protection
> + * disabled. All CPU threads in a core must
> + * disable speculation protection for it to be
> + * disabled. Disable it while we are idle so the
> + * other hyperthread can run fast.
> + *
> + * nmi uses the save_paranoid model which
> + * always enables ibrs on exception entry
> + * before any indirect jump can run.
> + */
> + can_toggle_ibrs = true;
> + unprotected_speculation_begin();
> + }
> __monitor((void *)¤t_thread_info()->flags, 0, 0);
> if (!need_resched())
> __mwait(eax, ecx);
> + if (can_toggle_ibrs)
> + unprotected_speculation_end();
> }
> current_clr_polling();
> }
Argh.. no. Who is calling this with IRQs enabled? And why can't we frob
the MSR with IRQs enabled? That comment doesn't seem to explain
anything.
> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
> index 16fc4f58..28b0314 100644
> --- a/arch/x86/include/asm/spec_ctrl.h
> +++ b/arch/x86/include/asm/spec_ctrl.h
> @@ -76,5 +76,42 @@
> 10:
> .endm
>
> +#else
> +#include <asm/microcode.h>
> +
> +static inline void __disable_indirect_speculation(void)
> +{
> + native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
> +}
> +
> +static inline void __enable_indirect_speculation(void)
> +{
> + native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_DISABLE_IBRS);
> +}
> +
> +/*
> + * Interrupts must be disabled to begin unprotected speculation.
> + * Otherwise interrupts could be running in unprotected mode.
> + */
> +static inline void unprotected_speculation_begin(void)
> +{
> + WARN_ON_ONCE(!irqs_disabled());
lockdep_assert_irqs_disabled()
> + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + __enable_indirect_speculation();
> +}
> +
> +static inline void unprotected_speculation_end(void)
> +{
> + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + __disable_indirect_speculation();
> + else
> + /*
> + * If we intended to disable indirect speculation
> + * but come here due to mis-speculation, we need
> + * to stop the mis-speculation with rmb.
> + */
> + rmb();
Code is lacking {}, also the comment doesn't make sense. If we don't
have the MSR, why are we doing an LFENCE?
And why are these boot_cpu_has() and not static_cpu_has() ?
On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote:
> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
> index 28b0314..23b2804 100644
> --- a/arch/x86/include/asm/spec_ctrl.h
> +++ b/arch/x86/include/asm/spec_ctrl.h
> @@ -113,5 +113,42 @@ static inline void unprotected_speculation_end(void)
> rmb();
> }
>
> +
> +#if defined(RETPOLINE)
> +/*
> + * RETPOLINE does not protect against indirect speculation
> + * in firmware code. Enable IBRS to protect firmware execution.
> + */
> +static inline void unprotected_firmware_begin(void)
> +{
> + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + __disable_indirect_speculation();
> + else
> + /*
> + * If we intended to disable indirect speculation
> + * but come here due to mis-speculation, we need
> + * to stop the mis-speculation with rmb.
> + */
> + rmb();
> +}
Looks like an exact replica of unprotected_speculation_end() we're going
for max linecount or something?
> +
> +static inline void unprotected_firmware_end(void)
> +{
> + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + __enable_indirect_speculation();
> +}
> +
> +#else
> +static inline void unprotected_firmware_begin(void)
> +{
> + return;
> +}
> +
> +static inline void unprotected_firmware_end(void)
> +{
> + return;
> +}
Those return's are superfluous.
> +#endif
> +
> #endif /* __ASSEMBLY__ */
> #endif /* _ASM_X86_SPEC_CTRL_H */
> --
> 2.9.4
>
On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
> .macro ENABLE_IBRS
> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
> + jz .Lskip_\@
> +
> PUSH_MSR_REGS
> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
> POP_MSR_REGS
> -10:
> +
> + jmp .Ldone_\@
> +.Lskip_\@:
> + /*
> + * prevent speculation beyond here as we could want to
> + * stop speculation by enabling IBRS
> + */
> + lfence
> +.Ldone_\@:
> .endm
Yeah no. We have jump labels for this stuff. There is no reason what so
ever to do dynamic tests for a variable that _never_ changes.
On Thu, Jan 04, 2018 at 11:47:31PM +0100, Peter Zijlstra wrote:
> Argh.. no. Who is calling this with IRQs enabled? And why can't we frob
> the MSR with IRQs enabled? That comment doesn't seem to explain
> anything.
Why we can't is easy to explain, the irq handler would run in such
case and that isn't using save paranoid, it relies on KERNEL_CS and it
assumes IBRS already set.
The irqs_disabled() check can be dropped if you do enough verification
that it never happens. Initially it wasn't obvious the irq disabled
invariant would be always enforced from the multitude of callers it
has (and that varies on different codebases). I didn't want to deal
with such an occurrence and risk even more trouble. Later I did the
verifications and I dropped the irqs_disabled() too.
It should be possible to drop it but it generally doesn't hurt to
start more obviously safe and optimize it later.
On Thu, Jan 04, 2018 at 11:33:21PM +0100, Peter Zijlstra wrote:
> So not only did we add a CR3 write, we're now adding an MSR write to the
> entry/exit paths. Please tell me that these are 'fast' MSRs? Given
> people are already reporting stupid numbers with just the existing
> PTI/CR3, what kind of pain are we going to get from adding this?
On SkyLake it costs roughly the same as cr3 write with bit 63 set, but
SkyLake then runs faster with IBRS enabled too. On earlier CPUs
enabling IBRS slows down CPU quite a bit, so the primary concern is
for older CPUs and the MSR write is the last worry there.
ibrs 2 will set IBRS all the time (only guest mode will alter it and
it'll always be restored to IBRS set during vmexit) so there will be
no cost on kernel enter/exit (also no cost in vmenter vmexit if guest
leaves it always set). Future silicon will like to run in ibrs 2 mode
always, but current one runs faster at ibrs 1 despite the MSR write
for most workloads (kernel builds etc..).
Thanks,
Andrea
On 01/04/2018 02:47 PM, Peter Zijlstra wrote:
> On Thu, Jan 04, 2018 at 09:56:45AM -0800, Tim Chen wrote:
>> @@ -100,15 +101,33 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
>> static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
>> {
>> if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
>> + bool can_toggle_ibrs = false;
>> if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
>> mb();
>> clflush((void *)¤t_thread_info()->flags);
>> mb();
>> }
>>
>> + if (irqs_disabled()) {
>> + /*
>> + * CPUs run faster with speculation protection
>> + * disabled. All CPU threads in a core must
>> + * disable speculation protection for it to be
>> + * disabled. Disable it while we are idle so the
>> + * other hyperthread can run fast.
>> + *
>> + * nmi uses the save_paranoid model which
>> + * always enables ibrs on exception entry
>> + * before any indirect jump can run.
>> + */
>> + can_toggle_ibrs = true;
>> + unprotected_speculation_begin();
>> + }
>> __monitor((void *)¤t_thread_info()->flags, 0, 0);
>> if (!need_resched())
>> __mwait(eax, ecx);
>> + if (can_toggle_ibrs)
>> + unprotected_speculation_end();
>> }
>> current_clr_polling();
>> }
>
> Argh.. no. Who is calling this with IRQs enabled? And why can't we frob
> the MSR with IRQs enabled? That comment doesn't seem to explain
> anything.
No one should be calling this with IRQs enabled. This check is probably
just paranoid. I can get rid of it.
We cannot have IRQs enabled here, as if we disable IBRS and an interrupt
comes in, we will be running the interrupt code without IBRS in unsafe
speculation mode.
>
>> diff --git a/arch/x86/include/asm/spec_ctrl.h b/arch/x86/include/asm/spec_ctrl.h
>> index 16fc4f58..28b0314 100644
>> --- a/arch/x86/include/asm/spec_ctrl.h
>> +++ b/arch/x86/include/asm/spec_ctrl.h
>> @@ -76,5 +76,42 @@
>> 10:
>> .endm
>>
>> +#else
>> +#include <asm/microcode.h>
>> +
>> +static inline void __disable_indirect_speculation(void)
>> +{
>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
>> +}
>> +
>> +static inline void __enable_indirect_speculation(void)
>> +{
>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_DISABLE_IBRS);
>> +}
>> +
>> +/*
>> + * Interrupts must be disabled to begin unprotected speculation.
>> + * Otherwise interrupts could be running in unprotected mode.
>> + */
>> +static inline void unprotected_speculation_begin(void)
>> +{
>> + WARN_ON_ONCE(!irqs_disabled());
>
> lockdep_assert_irqs_disabled()
OK
>
>> + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
>> + __enable_indirect_speculation();
>> +}
>> +
>> +static inline void unprotected_speculation_end(void)
>> +{
>> + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL))
>> + __disable_indirect_speculation();
>> + else
>> + /*
>> + * If we intended to disable indirect speculation
>> + * but come here due to mis-speculation, we need
>> + * to stop the mis-speculation with rmb.
>> + */
>> + rmb();
>
> Code is lacking {}, also the comment doesn't make sense. If we don't
> have the MSR, why are we doing an LFENCE?
The reason for lfence is because we intend to put up IBRS and disable
speculation. But if the CPU mis-speculates into the else portion,
we could be speculating without IBRS. The lfence stop the mis-speculation.
>
> And why are these boot_cpu_has() and not static_cpu_has() ?
>
It probably doesn't matter as we will be switching the check
to the spec_ctrl_ibrs a couple of patches later.
On 01/04/2018 02:54 PM, Peter Zijlstra wrote:
> On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
>> .macro ENABLE_IBRS
>> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
>> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
>> + jz .Lskip_\@
>> +
>> PUSH_MSR_REGS
>> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
>> POP_MSR_REGS
>> -10:
>> +
>> + jmp .Ldone_\@
>> +.Lskip_\@:
>> + /*
>> + * prevent speculation beyond here as we could want to
>> + * stop speculation by enabling IBRS
>> + */
>> + lfence
>> +.Ldone_\@:
>> .endm
>
>
> Yeah no. We have jump labels for this stuff. There is no reason what so
> ever to do dynamic tests for a variable that _never_ changes.
>
Admin can change spec_ctrl_ibrs value at run time, or when
we scan new microcode. So it doesn't often change, but it could.
There may be time when the admin wants to run the system
in a more secure mode, and time when it is okay to leave out
IBRS.
Tim
On Thu, Jan 04, 2018 at 03:22:09PM -0800, Tim Chen wrote:
> No one should be calling this with IRQs enabled. This check is probably
> just paranoid. I can get rid of it.
Yes, confirmed.
> It probably doesn't matter as we will be switching the check
> to the spec_ctrl_ibrs a couple of patches later.
The other problem with static_cpu_has is later the code intends to
support the cpuid rescan post late microcode update which will set
bits in boot_cpu_data (and transfer them to each cpu_has). If we'd
only support early microcode update static_cpu_has should have been ok
too.
On Fri, 5 Jan 2018, Andrea Arcangeli wrote:
> On Thu, Jan 04, 2018 at 03:22:09PM -0800, Tim Chen wrote:
> > No one should be calling this with IRQs enabled. This check is probably
> > just paranoid. I can get rid of it.
>
> Yes, confirmed.
>
> > It probably doesn't matter as we will be switching the check
> > to the spec_ctrl_ibrs a couple of patches later.
>
> The other problem with static_cpu_has is later the code intends to
> support the cpuid rescan post late microcode update which will set
> bits in boot_cpu_data (and transfer them to each cpu_has). If we'd
> only support early microcode update static_cpu_has should have been ok
> too.
What's the problem to make the early update mandatory for this?
Thanks,
tglx
On Thu, Jan 04, 2018 at 03:26:52PM -0800, Tim Chen wrote:
> On 01/04/2018 02:54 PM, Peter Zijlstra wrote:
> > On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote:
> >> .macro ENABLE_IBRS
> >> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL
> >> + testl $SPEC_CTRL_IBRS_INUSE, spec_ctrl_ibrs
> >> + jz .Lskip_\@
> >> +
> >> PUSH_MSR_REGS
> >> WRMSR_ASM $MSR_IA32_SPEC_CTRL, $SPEC_CTRL_FEATURE_ENABLE_IBRS
> >> POP_MSR_REGS
> >> -10:
> >> +
> >> + jmp .Ldone_\@
> >> +.Lskip_\@:
> >> + /*
> >> + * prevent speculation beyond here as we could want to
> >> + * stop speculation by enabling IBRS
> >> + */
> >> + lfence
> >> +.Ldone_\@:
> >> .endm
> >
> >
> > Yeah no. We have jump labels for this stuff. There is no reason what so
> > ever to do dynamic tests for a variable that _never_ changes.
> >
>
> Admin can change spec_ctrl_ibrs value at run time, or when
> we scan new microcode. So it doesn't often change, but it could.
>
> There may be time when the admin wants to run the system
> in a more secure mode, and time when it is okay to leave out
> IBRS.
I think he meant to use STATIC_JUMP_IF_TRUE instead of testl
spec_ctrl_ibrs to avoid the conditional jump but still allow to
enable/disable the branch. I suggested using static key earlier too.
In older kernels there's not even the boilerplate to check the static
key in asm, so I used a pcp bitfield in a already guaranteed hot
cacheline so in practice it's as fast as static key but static key is
always theoretically preferable.
On Fri, Jan 05, 2018 at 12:51:34AM +0100, Andrea Arcangeli wrote:
> In older kernels there's not even the boilerplate to check the static
> key in asm,
This is the reason why I didn't use a static_key too.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Fri, Jan 05, 2018 at 12:45:58AM +0100, Thomas Gleixner wrote:
> What's the problem to make the early update mandatory for this?
That will make a few differences. A host reboot will be required to
use the microcode features, if you upgrade the kernel before the
microcode_ctl package and you won't be able to leverage the new
microcode unless you have an initramfs updated. That's the first time
only so I guess it doesn't matter much. After that if you don't use
initrd/initramfs you won't be able to use the new microcode at all,
just having stuff updated in /lib won't help anymore.
We don't rely on late microcode at all, so it's fine if early
microcode is the only workable option, but in such case it'd look
safer to remove all those different APIs that will be left a bit
bitrotten as they can update microcode but then the kernel can't use
the new microcode feature if the kernel uses static_cpu_has.
Speaking personally I tend to like not to be forced to use initramfs
while testing kernels, so it won't be me converting it to
static_cpu_has and making early microcode mandatory (I went a long way
to make sure late microcode worked in fact), but in production it will
make no difference whatsoever if all late microcode options are not
workable and gone so it's no problem.
On Fri, 5 Jan 2018, Borislav Petkov wrote:
> On Fri, Jan 05, 2018 at 12:51:34AM +0100, Andrea Arcangeli wrote:
> > In older kernels there's not even the boilerplate to check the static
> > key in asm,
>
> This is the reason why I didn't use a static_key too.
We are talking about upstream first and given the amount of crap you
backported already the few lines for the ASM jump labels really do not make
a difference anymore.
I so hope this exercise will finally stop the kernel-necrophilia in
distros. I know it won't but hope dies last.
Thanks,
tglx
On 01/04/2018 02:33 PM, Peter Zijlstra wrote:
> On Thu, Jan 04, 2018 at 09:56:44AM -0800, Tim Chen wrote:
>> Set IBRS upon kernel entrance via syscall and interrupts. Clear it
>> upon exit.
>
> So not only did we add a CR3 write, we're now adding an MSR write to the
> entry/exit paths. Please tell me that these are 'fast' MSRs? Given
> people are already reporting stupid numbers with just the existing
> PTI/CR3, what kind of pain are we going to get from adding this?
This "dynamic IBRS" that does runtime switching will not be on by
default and will be patched around by alternatives unless someone
explicitly opts in.
If you decide you want the additional protection that it provides, you
can take the performance hit. How much is that? We've been saying that
these new MSRs are roughly as expensive as the CR3 writes. How
expensive are those? Don't take my word for it, a few folks were
talking about it today:
Google says[1]: "We see negligible impact on performance."
Amazon says[2]: "We don’t expect meaningful performance impact."
I chopped a few qualifiers out of there, but I think that roughly
captures the sentiment.
1.
https://security.googleblog.com/2018/01/more-details-about-mitigations-for-cpu_4.html
2.
http://www.businessinsider.com/google-amazon-performance-hit-meltdown-spectre-fixes-overblown-2018-1
On Thu, Jan 4, 2018 at 4:08 PM, Dave Hansen <[email protected]> wrote:
> On 01/04/2018 02:33 PM, Peter Zijlstra wrote:
>> On Thu, Jan 04, 2018 at 09:56:44AM -0800, Tim Chen wrote:
>>> Set IBRS upon kernel entrance via syscall and interrupts. Clear it
>>> upon exit.
>>
>> So not only did we add a CR3 write, we're now adding an MSR write to the
>> entry/exit paths. Please tell me that these are 'fast' MSRs? Given
>> people are already reporting stupid numbers with just the existing
>> PTI/CR3, what kind of pain are we going to get from adding this?
>
> This "dynamic IBRS" that does runtime switching will not be on by
> default and will be patched around by alternatives unless someone
> explicitly opts in.
>
> If you decide you want the additional protection that it provides, you
> can take the performance hit. How much is that? We've been saying that
> these new MSRs are roughly as expensive as the CR3 writes. How
> expensive are those? Don't take my word for it, a few folks were
> talking about it today:
>
> Google says[1]: "We see negligible impact on performance."
> Amazon says[2]: "We don’t expect meaningful performance impact."
>
> I chopped a few qualifiers out of there, but I think that roughly
> captures the sentiment.
>
> 1.
> https://security.googleblog.com/2018/01/more-details-about-mitigations-for-cpu_4.html
> 2.
> http://www.businessinsider.com/google-amazon-performance-hit-meltdown-spectre-fixes-overblown-2018-1
Do we need an arch_prctl() to enable IBRS for user mode?
On Thu, Jan 4, 2018 at 2:23 PM, Dave Hansen <[email protected]> wrote:
> On 01/04/2018 02:21 PM, Tim Chen wrote:
>>> Does this really have to live outside of arch/x86/entry/ ?
>>>
>> There are some inline C routines later in this file
>> that will be needed by other functions. Want to consolidate
>> them in the same file.
>
> We could put all of the assembly into calling.h along with the PTI
> assembly. Seems as sane a place as anywhere else to put it.
>
We should also stop thinking that NMI is at all special. All the
paranoid entry paths + NMI should just save and restore it, just like
CR3. Otherwise we get nasty corner cases with MCE, kprobes, etc.
On 01/04/2018 08:54 PM, Andy Lutomirski wrote:
> On Thu, Jan 4, 2018 at 2:23 PM, Dave Hansen <[email protected]> wrote:
>> On 01/04/2018 02:21 PM, Tim Chen wrote:
>>>> Does this really have to live outside of arch/x86/entry/ ?
>>>>
>>> There are some inline C routines later in this file
>>> that will be needed by other functions. Want to consolidate
>>> them in the same file.
>>
>> We could put all of the assembly into calling.h along with the PTI
>> assembly. Seems as sane a place as anywhere else to put it.
>
> We should also stop thinking that NMI is at all special. All the
> paranoid entry paths + NMI should just save and restore it, just like
> CR3. Otherwise we get nasty corner cases with MCE, kprobes, etc.
I've probably been too imprecise in my language here. The goal is
absolutely to deal with all the paranoid paths. It's just that the NMI
one is the easiest to understand and easiest to exercise.
It also *is* special because it's the only one needing paranoid handling
that does not use paranoid_exit itself.
On 01/04/2018 08:51 PM, Andy Lutomirski wrote:
> Do we need an arch_prctl() to enable IBRS for user mode?
Eventually, once the dust settles. I think there's a spectrum of
paranoia here, that is roughly (with increasing paranoia):
1. do nothing
2. do retpoline
3. do IBRS in kernel
4. do IBRS always
I think you're asking for ~3.5.
Patches for 1-3 are out there and 4 is pretty straightforward. Doing a
arch_prctl() is still straightforward, but will be a much more niche
thing than any of the other choices. Plus, with a user interface, we
have to argue over the ABI for at least a month or two. ;)
* Linus Torvalds:
> On Thu, Jan 4, 2018 at 9:56 AM, Tim Chen <[email protected]> wrote:
>>
>> Speculation on Skylake and later requires these patches ("dynamic IBRS")
>> be used instead of retpoline[1].
>
> Can somebody explain this part?
>
> I was assuming that retpoline would work around this issue on all uarchs.
>
> This seems to say "retpoline does nothing on Skylake+"
Retpoline also looks incompatible with CET, so future Intel CPUs will
eventually need a different approach anyway.
On Fri, 2018-01-05 at 06:25 +0100, Florian Weimer wrote:
>
> Retpoline also looks incompatible with CET, so future Intel CPUs will
> eventually need a different approach anyway.
CPUs with CET will have the indirect branch problem fixed, so the
retpoline ALTERNATIVE will be used which is a bare 'jmp *%reg'.
That's why we switched the ABI for the retpoline from having the target
address on the stack, to passing it in in a register.
On Thu, 2018-01-04 at 09:56 -0800, Tim Chen wrote:
> cpuid ax=0x7, return rdx bit 26 to indicate presence of this feature
> IA32_SPEC_CTRL (0x48) and IA32_PRED_CMD (0x49)
> IA32_SPEC_CTRL, bit0 – Indirect Branch Restricted Speculation (IBRS)
> IA32_PRED_CMD, bit0 – Indirect Branch Prediction Barrier (IBPB)
In a previous iteration of these patches, hadn't you already merged
support for the AMD variant? Where'd that go?
On Thu, 2018-01-04 at 09:56 -0800, Tim Chen wrote:
>
> + mutex_lock(&spec_ctrl_mutex);
> +
> + if (enable == IBRS_DISABLED) {
> + /* disable IBRS usage */
> + set_ibrs_disabled();
> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
> + spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_DISABLE_IBRS);
> + } else if (enable == IBRS_ENABLED) {
> + /* enable IBRS usage in kernel */
> + clear_ibrs_disabled();
> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
> + set_ibrs_inuse();
> + else
> + /* Platform don't support IBRS */
> + enable = IBRS_DISABLED;
> + } else if (enable == IBRS_ENABLED_USER) {
> + /* enable IBRS usage in both userspace and kernel */
> + clear_ibrs_disabled();
> + /* don't change IBRS value once we set it to always on */
> + clear_ibrs_inuse();
> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
> + spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
> + else
> + /* Platform don't support IBRS */
> + enable = IBRS_DISABLED;
> + }
This doesn't take the retpoline status into account. If we have
retpoline, we don't need IBRS in the kernel.
On 04/01/2018 22:22, Van De Ven, Arjan wrote:
> this is about a level of paranoia you are comfortable with.
>
> Retpoline on Skylake raises the bar for the issue enormously, but
> there are a set of corner cases that exist and that are not trivial
> to prove you dealt with them.
>
> personally I am comfortable with retpoline on Skylake, but I would
> like to have IBRS as an opt in for the paranoid.
>
IBRS actually seems to perform more than decently on Skylake with the
microcode updates we're shipping in RHEL---much more so than Broadwell
or Haswell. Can you confirm that this is expected?
Paolo
On Thu, Jan 4, 2018 at 11:33 AM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Jan 4, 2018 at 11:19 AM, David Woodhouse <[email protected]> wrote:
>>
>> On Skylake the target for a 'ret' instruction may also come from the
>> BTB. So if you ever let the RSB (which remembers where the 'call's came
>> from get empty, you end up vulnerable.
>
> That sounds like it could cause mispredicts, but it doesn't sound _exploitable_.
>
> Sure, interrupts in between the call instruction and the 'ret' could
> overflow the return stack. And we could migrate to another CPU. And so
> apparently SMM clears the return stack too.
>
> ... but again, none of them sound even remotely _exploitable_.
These are also mitigatable; the retpoline sequence itself will never
result in an RSB underflow.
So long as the underlying binary satisfies the precondition that it
will not underflow its own RSB.
Then we if we subsequently guarantee never to _reduce_ the number of
entries in its RSB at any point remote to its own execution, then the
precondition is preserved and underflow will not occur.
This does not preclude perturbing the RSB, only that it's contents
contain at least as many (non-exploitable) entries as when execution
was interrupted.
We do not need to actually track how many entries it's currently
using, because returning control to it with a full RSB, guarantees
that number of entries cannot have decreased.
>
> Remember: it's not mispredicts that leak information. It's *exploits"
> that use forced very specific mispredicts to leak information.
>
> There's a big difference there. And I think patch authors should keep
> that difference in mind.
>
> For example, flushing the BTB at kernel entry doesn't mean that later
> in-kernel indirect branches don't get predicted, and doesn't even mean
> that they don't get mis-predicted. It only means that an exploit can't
> pre-populate those things and use them for exploits.
Or worse, that a co-executing SMT sibling is not continually refilling them.
>
> Linus
On Thu, 4 Jan 2018 21:11:23 -0800
Dave Hansen <[email protected]> wrote:
> On 01/04/2018 08:51 PM, Andy Lutomirski wrote:
> > Do we need an arch_prctl() to enable IBRS for user mode?
>
> Eventually, once the dust settles. I think there's a spectrum of
> paranoia here, that is roughly (with increasing paranoia):
>
> 1. do nothing
> 2. do retpoline
> 3. do IBRS in kernel
> 4. do IBRS always
>
> I think you're asking for ~3.5.
And we'll actually end up with cgroups needing to handle this and a prctl
because the answer is simply not a systemwide single constant. To start
with if my code has CAP_SYS_RAWIO who gives a **** about IBRS protecting
it.
Likewise on many real world systems I trust my base OS (or I might as
well turn off the power) I sort of trust my apps, and I deeply distrust
my web browser which itself probably wants to turn some of the
protections on for crap like javascript and webassembly.
If I'm running containers well my desktop is probably #2 and my container
#3 or #4
There's no point getting hung up about a single magic default number,
because that's not how it's going to end up.
Alan
On Fri, Jan 5, 2018 at 3:32 AM, Paolo Bonzini <[email protected]> wrote:
> On 04/01/2018 22:22, Van De Ven, Arjan wrote:
>> this is about a level of paranoia you are comfortable with.
>>
>> Retpoline on Skylake raises the bar for the issue enormously, but
>> there are a set of corner cases that exist and that are not trivial
>> to prove you dealt with them.
>>
>> personally I am comfortable with retpoline on Skylake, but I would
>> like to have IBRS as an opt in for the paranoid.
>>
>
> IBRS actually seems to perform more than decently on Skylake with the
> microcode updates we're shipping in RHEL---much more so than Broadwell
> or Haswell. Can you confirm that this is expected?
>
The cost of IBRS performance varies with processor generation.
Skylake incurs the least overhead. It is expected that future
generations will be better still.
There are two distinct costs: The overhead to an indirect branch, and
the transition cost for enabling/disabling the feature as we schedule
into (and out of) protected code.
A naked indirect call (with IBRS enabled) on Skylake and a retpolined
call have approximately the same cost.
(I have not compared this cost for pre-Skylake uarchs.)
The main difference is that a retpolined binary does not incur the
transition costs, nor does it interact with sibling execution while
protected code is running.
A trade-off to consider when choosing between them is that it does on
the other hand carry a higher implementation (versus run-time) cost.
As Arjan references above, this is also a sliding scale. The bar is
significantly raised by compiling with retpoline alone, and the
vulnerability is wholly eliminated if the preconditions are also
satisfied.
(And obviously, if you do not have the sources to the target you are
trying to protect, IBRS allows you to run it in a protected fashion --
while it cannot easily be retpolined.)
> Dave Hansen <[email protected]> wrote:
>> On 01/04/2018 08:51 PM, Andy Lutomirski wrote:
>> > Do we need an arch_prctl() to enable IBRS for user mode?
>>
>> Eventually, once the dust settles. I think there's a spectrum of
>> paranoia here, that is roughly (with increasing paranoia):
>>
>> 1. do nothing
>> 2. do retpoline
>> 3. do IBRS in kernel
>> 4. do IBRS always
>>
>> I think you're asking for ~3.5.
>>
>> Patches for 1-3 are out there and 4 is pretty straightforward. Doing a
>> arch_prctl() is still straightforward, but will be a much more niche
>> thing than any of the other choices. Plus, with a user interface, we
>> have to argue over the ABI for at least a month or two. ;)
I was chatting to Andrea about this, and we came to the conclusion one
use might be for qemu; I was worried about (theoretically) whether
userspace in a guest could read privileged data from the guest kernel by
attacking the qemu process rather than by attacking the kernels.
Dave
--
Dr. David Alan Gilbert / [email protected] / Manchester, UK
On Thu, 4 Jan 2018, Tim Chen wrote:
> +#define MSR_IA32_SPEC_CTRL 0x00000048
> +#define SPEC_CTRL_FEATURE_DISABLE_IBRS (0 << 0)
> +#define SPEC_CTRL_FEATURE_ENABLE_IBRS (1 << 0)
> +
> +#define MSR_IA32_PRED_CMD 0x00000049
> +
> #define MSR_IA32_PERFCTR0 0x000000c1
> #define MSR_IA32_PERFCTR1 0x000000c2
> #define MSR_FSB_FREQ 0x000000cd
> @@ -439,6 +445,7 @@
> #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1)
> #define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2)
> #define FEATURE_CONTROL_LMCE (1<<20)
> +#define FEATURE_SET_IBPB (1<<0)
So how is that bit related to the control bits above? This file is
structured in obvious ways ....
Thanks,
tglx
On Thu, 4 Jan 2018, Tim Chen wrote:
> On 01/04/2018 02:16 PM, Peter Zijlstra wrote:
> >> + movl \save_reg, %eax
> >> + wrmsr
> >> +10:
> >> +.endm
> >
> > Should not all those 10 things look like .Ldone_\@ or something ?
> >
>
> They are removed in patch 6 with .Ldone_\@
So we need to develop speculative review capabilities from now on?
Please get your act together and fold stuff to the right place upfront, so
we can concentrate on the actual implementation review.
Thanks,
tglx
On Thu, Jan 04, 2018 at 10:01:52PM +0100, Yves-Alexis Perez wrote:
> On Thu, 2018-01-04 at 11:10 -0800, Tim Chen wrote:
> > > Are there plans to make the corresponding microcode support available?
> > >
> >
> > The microcode patches should be released soon.
>
> In the meantime, Lenovo has started issuing BIOS/UEFI updates which include
> microcode updates for this.
>
> See for example for the ThinkPad X250
>
> https://download.lenovo.com/pccbbs/mobiles/n10ur16w.txt
> https://download.lenovo.com/pccbbs/mobiles/n10ur16w.iso
>
> iucode-tool -L -tr n10ur16w.iso |grep 2017-11
> 001/020: sig 0x000306d4, pf_mask 0xc0, 2017-11-17, rev 0x0028, size 18432
That's been out for a while now:
https://downloadcenter.intel.com/download/27337/Linux-Processor-Microcode-Data-File
so I really don't think this is an "update" yet.
I saw a bios update for my Dell laptop dated December 28, but it didn't
seem to change anything with the cpuids or microcode status from before.
A hard answer to what "soon" means here would be nice to get, as I'm
seeing a lot of misunderstanding floating around right now :(
thanks,
greg k-h
On Thu, Jan 04, 2018 at 07:50:33PM +0100, Borislav Petkov wrote:
> On Thu, Jan 04, 2018 at 07:34:30PM +0100, Andrea Arcangeli wrote:
> > void spec_ctrl_rescan_cpuid(void)
> > {
> > int cpu;
> >
> > if (use_ibp_disable)
> > return;
> > mutex_lock(&spec_ctrl_mutex);
> > if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
> > boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
> > /* detect spec ctrl related cpuid additions */
> > init_scattered_cpuid_features(&boot_cpu_data);
>
> You don't need to noodle through all the scattered features - just the
> two bits.
Does it really matter? Rescanning everything can't hurt here, and it
shouldn't be noticable in any boot time chart. I feel like arguing
about tiny stuff like this takes away from the obvious other problems
this whole patch series had :(
But hey, I'm guilty of it numerous times as well, I know, so I'll shut
up now...
thanks,
greg k-h
On Thu, 4 Jan 2018, Tim Chen wrote:
> Set IBRS upon kernel entrance via syscall and interrupts. Clear it
> upon exit.
I have no idea on which kernel this is supposed to apply. It fails on Linus
tree and on tip x86/pti.
Can you please finally ditch the broken and outdated version of PTI on
which this is based on? I really wonder how this stuff is developed and
tested when its missing essential fixes.
Please post patches against tip x86/pti where the lastest fixes and patches
concerning this mess are staged for both Linus and 4.14.stable
Thanks,
tglx
On Fri, Jan 05, 2018 at 02:32:17PM +0100, Greg KH wrote:
> Does it really matter? Rescanning everything can't hurt here, and it
> shouldn't be noticable in any boot time chart. I feel like arguing
> about tiny stuff like this takes away from the obvious other problems
> this whole patch series had :(
It does because others will come and copy this and then we're stuck
with cleaning up the mess. The whole CPUID parsing needs a much desired
cleanup but there's other more pressing shit going on...
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Thu, 4 Jan 2018, Tim Chen wrote:
> #ifdef APM_ZERO_SEGS
> # define APM_DO_ZERO_SEGS \
> "pushl %%ds\n\t" \
> @@ -28,6 +30,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
> u32 *eax, u32 *ebx, u32 *ecx,
> u32 *edx, u32 *esi)
> {
> + unprotected_firmware_begin();
> /*
> * N.B. We do NOT need a cld after the BIOS call
> * because we always save and restore the flags.
> @@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
> "=S" (*esi)
> : "a" (func), "b" (ebx_in), "c" (ecx_in)
> : "memory", "cc");
> + unprotected_formware_end();
This doesn't even compile unless the new compiler speculates that into
unprotected_firmware_end().
Thanks,
tglx
On Fri, Jan 05, 2018 at 02:09:43PM +0100, Thomas Gleixner wrote:
> On Thu, 4 Jan 2018, Tim Chen wrote:
> > +#define MSR_IA32_SPEC_CTRL 0x00000048
> > +#define SPEC_CTRL_FEATURE_DISABLE_IBRS (0 << 0)
> > +#define SPEC_CTRL_FEATURE_ENABLE_IBRS (1 << 0)
> > +
> > +#define MSR_IA32_PRED_CMD 0x00000049
> > +
> > #define MSR_IA32_PERFCTR0 0x000000c1
> > #define MSR_IA32_PERFCTR1 0x000000c2
> > #define MSR_FSB_FREQ 0x000000cd
> > @@ -439,6 +445,7 @@
> > #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1)
> > #define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (1<<2)
> > #define FEATURE_CONTROL_LMCE (1<<20)
> > +#define FEATURE_SET_IBPB (1<<0)
>
> So how is that bit related to the control bits above? This file is
> structured in obvious ways ....
It's not related, FEATURE_SET_IBPB value is specific and only
meaningful to MSR_IA32_PRED_CMD.
The only use that you can ever make of that is:
static inline void __spec_ctrl_ibpb(void)
{
native_wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
}
static inline void spec_ctrl_ibpb(void)
{
if (static_cpu_has(X86_FEATURE_IBPB_SUPPORT)) {
*obsolete line deleted to be replaced with static key*
__spec_ctrl_ibpb();
}
}
You need to set X86_FEATURE_IBPB_SUPPORT by hand if
X86_FEATURE_SPEC_CTRL is present. On some CPU X86_FEATURE_IBPB_SUPPORT
will show up in cpuid, in others only X86_FEATURE_SPEC_CTRL shows up
but that always implies X86_FEATURE_IBPB_SUPPORT.
void spec_ctrl_init(struct cpuinfo_x86 *c)
{
if (c->x86_vendor != X86_VENDOR_INTEL &&
c->x86_vendor != X86_VENDOR_AMD)
return;
if (c != &boot_cpu_data) {
spec_ctrl_cpu_init();
return;
}
[..]
if (cpu_has_spec_ctrl()) {
setup_force_cpu_cap(X86_FEATURE_IBPB_SUPPORT);
[..]
}
static void identify_cpu(struct cpuinfo_x86 *c)
[..]
/* Set up SMEP/SMAP */
setup_smep(c);
setup_smap(c);
+ spec_ctrl_init(c);
[..]
static inline int cpu_has_spec_ctrl(void)
{
return static_cpu_has(X86_FEATURE_SPEC_CTRL);
}
Note: if you don't drop all late microcode as discussed yesterday, the
above has to do a if (this/boot_cpu_has()) return 1; rmb() ; return
0;. rmb in the return 0 if there's more than one branch the CPU can
speculate through. Not from where it's called in spec_ctrl_init, but
for all other places that checks cpu_has_spec_ctrl().
Now about the late microcode my preference is not for static_cpu_has
and forcing the early microcode, but my long term preference is to
start with this/boot_cpu_has() and then turn static_cpu_has in a true
static key so that setup_force_cpu_cap shall also flip the static key
for all static_cpu_has(X86_FEATURE_IBPB_SUPPORT) also if run any time
after boot and not only if run before the static_cpu_has alternative
is patched in.
On Fri, Jan 05, 2018 at 02:37:17PM +0100, Borislav Petkov wrote:
> On Fri, Jan 05, 2018 at 02:32:17PM +0100, Greg KH wrote:
> > Does it really matter? Rescanning everything can't hurt here, and it
> > shouldn't be noticable in any boot time chart. I feel like arguing
> > about tiny stuff like this takes away from the obvious other problems
> > this whole patch series had :(
>
> It does because others will come and copy this and then we're stuck
> with cleaning up the mess. The whole CPUID parsing needs a much desired
> cleanup but there's other more pressing shit going on..
Ok, fair enough, and Tim already said he fixed this up in his series,
sorry for the noise.
greg k-h
On Fri, 2018-01-05 at 14:28 +0100, Greg KH wrote:
> > iucode-tool -L -tr n10ur16w.iso |grep 2017-11
> > 001/020: sig 0x000306d4, pf_mask 0xc0, 2017-11-17, rev 0x0028, size
> > 18432
>
> That's been out for a while now:
> https://downloadcenter.intel.com/download/27337/Linux-Processor-Micr
> ocode-Data-File
>
> so I really don't think this is an "update" yet.
Actually for my processor (0x000306d4 / i5-5200U) this release (20171117) only
contains:
050/001: sig 0x000306d4, pf_mask 0xc0, 2017-01-27, rev 0x0025, size 17408
I find it weird that the Intel release from 2017-11-17 doesn't contain a
microcode file supposedly released on the same day, but it seems to be the
case.
> I saw a bios update for my Dell laptop dated December 28, but it didn't
> seem to change anything with the cpuids or microcode status from before.
Not sure which distribution you use nowadays but Henrique just pushed a
microcode update to Debian with some collected microcodes:
https://tracker.debian.org/news/899110
Obviously it lacks a *lot* of processors (especially pre-Haswell).
>
> A hard answer to what "soon" means here would be nice to get, as I'm
> seeing a lot of misunderstanding floating around right now :(
Agreed, but I'm not sure who can provide the information.
Regards,
--
Yves-Alexis
On Thu, 4 Jan 2018, Dave Hansen wrote:
> On 01/04/2018 10:33 AM, Borislav Petkov wrote:
> >> 2. At run time
> >> echo 0 > /sys/kernel/debug/ibrs_enabled will turn off IBRS
> >> echo 1 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in kernel
> >> echo 2 > /sys/kernel/debug/ibrs_enabled will turn on IBRS in both userspace and kernel
> > I am not sure that tristate is really needed. What's wrong with on/off
> > only?
>
> Lots of things:
>
> Distros have the tri-state already deployed.
Which is the least relevant argument. Distros have shipped a lot of crap.
> Paranoid people want "IBRS always" aka "ibrs 2".
> Future CPUs may have cheap-enough IBRS to not be worth switching it.
So again, as I said to David in the retpoline thread.
1) Start off with a CPU bug flag which describes the attack mode.
2) Have feature flags, which denote the mitigation mode
3) Have proper command line switching similar to pti
4) Keep the sysctl stuff separate
Thanks,
tglx
On Fri, 5 Jan 2018, Andrea Arcangeli wrote:
> On Fri, Jan 05, 2018 at 02:09:43PM +0100, Thomas Gleixner wrote:
> Now about the late microcode my preference is not for static_cpu_has
> and forcing the early microcode, but my long term preference is to
> start with this/boot_cpu_has() and then turn static_cpu_has in a true
> static key so that setup_force_cpu_cap shall also flip the static key
> for all static_cpu_has(X86_FEATURE_IBPB_SUPPORT) also if run any time
> after boot and not only if run before the static_cpu_has alternative
> is patched in.
Fair enough. We can avoid the static_cpu_has() to begin with and fix it
later.
Thanks,
tglx
On Fri, Jan 05, 2018 at 02:47:45PM +0100, Yves-Alexis Perez wrote:
> On Fri, 2018-01-05 at 14:28 +0100, Greg KH wrote:
> > > iucode-tool -L -tr n10ur16w.iso |grep 2017-11
> > > 001/020: sig 0x000306d4, pf_mask 0xc0, 2017-11-17, rev 0x0028, size
> > > 18432
> >
> > That's been out for a while now:
> > https://downloadcenter.intel.com/download/27337/Linux-Processor-Micr
> > ocode-Data-File
> >
> > so I really don't think this is an "update" yet.
>
> Actually for my processor (0x000306d4 / i5-5200U) this release (20171117) only
> contains:
>
> 050/001: sig 0x000306d4, pf_mask 0xc0, 2017-01-27, rev 0x0025, size 17408
>
> I find it weird that the Intel release from 2017-11-17 doesn't contain a
> microcode file supposedly released on the same day, but it seems to be the
> case.
Doh, same here, sorry about that, should have checked the microcode
image file myself.
> > I saw a bios update for my Dell laptop dated December 28, but it didn't
> > seem to change anything with the cpuids or microcode status from before.
>
> Not sure which distribution you use nowadays but Henrique just pushed a
> microcode update to Debian with some collected microcodes:
>
> https://tracker.debian.org/news/899110
>
> Obviously it lacks a *lot* of processors (especially pre-Haswell).
I'm running Arch, but it would be nice to know where those microcode
updates came from, given that they aren't on the "official" Intel page
yet :)
> > A hard answer to what "soon" means here would be nice to get, as I'm
> > seeing a lot of misunderstanding floating around right now :(
>
> Agreed, but I'm not sure who can provide the information.
That's why I'm asking here, hopefully Arjan might have an idea.
thanks,
greg k-h
On 05/01/2018 15:01, Greg KH wrote:
>> Obviously it lacks a *lot* of processors (especially pre-Haswell).
>
> I'm running Arch, but it would be nice to know where those microcode
> updates came from, given that they aren't on the "official" Intel page
> yet :)
Those from November seem way too early to include IBRS/IBPB. Maybe the
two from December 3rd, but I wouldn't be 100% sure.
So it would be even nicer to know how those microcode updates were tested.
(And by the way, the LFENCE change is for variant 1 aka CVE-2017-5753).
Paolo
On Fri, 2018-01-05 at 03:52 -0800, Paul Turner wrote:
>
> These are also mitigatable; the retpoline sequence itself will never
> result in an RSB underflow.
Unless an event occurs which clears the RSB between the CALL and the
RET of the retpoline.
> So long as the underlying binary satisfies the precondition that it
> will not underflow its own RSB.
>
> Then we if we subsequently guarantee never to _reduce_ the number of
> entries in its RSB at any point remote to its own execution, then the
> precondition is preserved and underflow will not occur.
The problem is that underflow can occur not only on a retpoline, but
also on *any* bare ret.
Unless we want to do something evil like turning them all into a
sequence of 'call $+1; sub $8, %rsp; ret' and narrowing the race window
for that 'external event' to be negligible.
On the whole, since IBRS doesn't perform as badly on Skylake+ as it
does on earlier CPUs, it makes more sense just to use IBRS on Skylake+.
Unless we *only* have retpoline, of course, in which case we use that.
> > So long as the underlying binary satisfies the precondition that it
> > will not underflow its own RSB.
> >
> > Then we if we subsequently guarantee never to _reduce_ the number of
> > entries in its RSB at any point remote to its own execution, then the
> > precondition is preserved and underflow will not occur.
>
> The problem is that underflow can occur not only on a retpoline, but
> also on *any* bare ret.
that is not true though
yes there's NMIs and interrupts but those Paul managed.
the cases where underflow can happen are not infinite or unbound, but a specific set of causes as Paul already mentioned.
Now there is complexity in proving that all that is there, and if you're that paranoid, IBRS is certainly an option and the performance, while not nearly as good as retpoline, is not horrid either on Skylake.
This is why I said I would like to see retpoline be the default, with IBRS an opt-in for the paranoid. I guess David will turn that on ;-)
On Thu, Jan 04, 2018 at 09:22:34PM +0000, Van De Ven, Arjan wrote:
> personally I am comfortable with retpoline on Skylake, but I would
>like to have IBRS as an opt in for the paranoid.
I think this whole variant#2 issue has to be fixed mathematically or
not at all, the reason is that it's already "near zero risk" to begin
with.
If we leave any theoretical issues open we can as well stick with the
"near zero risk" wording on AMD website and not fix anything.
Doing a huge amount of work with reptoline and then you find SMM is
called reproducibly somehow and a new PoC could exist for it, not fun.
Whatever solution should be total and guaranteed or there's huge
amount of resources wasted.
Again spectre variant#2 this is "near zero risk" to begin with, on all
CPUs because of the complexity to mount an attack on any given
kernel/CPU combination out there. I'm mostly thinking about the "guest
mode"/"host kernel mod" isolation here but this should apply in
general to all kernel protection for spectre variant#2.
If we can do a 3 way alternative IBRS on skylake, repotline + IBRS
around all BIOS, graphics firmware and pci firmware + kernel asm
patched with a 2-way alternative between amd reptoline and intel
reptoline emitted 2-way by same gcc build, and still guarantee this
solution is as mathematically safe as a pure IBRS (which is obviously
mathematically safe), ok. Otherwise it'd prefer to stick to pure IBRS
and skip the reptoline complexity and fallouts here and there around
bios asm firmware SMM etc.. not to tell the need to rebuild qemu with
reptoline with 2-way amd/intel alternatives self modifying code in
userland during qemu startup to achieve ibrs 1 ibpb 1 + prctl IBRS on
qemu with repotline.
Thanks,
Andrea
> -----Original Message-----
> From: Paolo Bonzini [mailto:[email protected]] On Behalf Of Paolo
> Bonzini
> Sent: Friday, January 05, 2018 3:32 AM
> To: Van De Ven, Arjan <[email protected]>; Linus Torvalds
> <[email protected]>; David Woodhouse <[email protected]>
> Cc: Tim Chen <[email protected]>; Thomas Gleixner
> <[email protected]>; Andy Lutomirski <[email protected]>; Greg KH
> <[email protected]>; Hansen, Dave <[email protected]>;
> Andrea Arcangeli <[email protected]>; Andi Kleen <[email protected]>;
> Linux Kernel Mail
> > personally I am comfortable with retpoline on Skylake, but I would
> > like to have IBRS as an opt in for the paranoid.
> >
>
> IBRS actually seems to perform more than decently on Skylake with the
> microcode updates we're shipping in RHEL---much more so than Broadwell
> or Haswell. Can you confirm that this is expected?
Retpoline outperforms IBRS even on skylake family (including the client kabylake etc)
by some real margin, but IBRS is certainly not the end of the world either... more a case of 'we can do even faster'. The difference also gets bigger in retpoline's favor for older generations cpus.
>
> Doing a huge amount of work with reptoline and then you find SMM is
> called reproducibly somehow and a new PoC could exist for it, not fun.
retpoline we want for broadwell and earlier anyway..
I'm sorry but your whole statement reeks a little bit of "perfect is the enemy of good"
On Fri, 2018-01-05 at 15:26 +0100, Paolo Bonzini wrote:
> Those from November seem way too early to include IBRS/IBPB. Maybe the
> two from December 3rd, but I wouldn't be 100% sure.
So, for my CPU with updated microcode:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 61
model name : Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
stepping : 4
microcode : 0x28
cpuid returns:
0x00000007 0x00: eax=0x00000000 ebx=0x021c27ab ecx=0x00000000
edx=0x0c000000
So bit 26/27 are set, which as I understand means IBRS is supported (but I
would appreciate any pointer to relevant documentation on this).
>
> So it would be even nicer to know how those microcode updates were tested.
At least I didn't test IBRS/IBPB here. I could do it provided I'm pointed to a
tree with all the things to test.
>
> (And by the way, the LFENCE change is for variant 1 aka CVE-2017-5753).
Ok, good to know. Is the kernel support part for LFENCE in the same thread (I
have to admit I'm a bit lost).
Regards,
--
Yves-Alexis
On Fri, 5 Jan 2018, Andrea Arcangeli wrote:
> On Thu, Jan 04, 2018 at 09:22:34PM +0000, Van De Ven, Arjan wrote:
> > personally I am comfortable with retpoline on Skylake, but I would
> >like to have IBRS as an opt in for the paranoid.
>
> I think this whole variant#2 issue has to be fixed mathematically or
> not at all, the reason is that it's already "near zero risk" to begin
> with.
>
> If we leave any theoretical issues open we can as well stick with the
> "near zero risk" wording on AMD website and not fix anything.
>
> Doing a huge amount of work with reptoline and then you find SMM is
> called reproducibly somehow and a new PoC could exist for it, not fun.
>
> Whatever solution should be total and guaranteed or there's huge
> amount of resources wasted.
>
> Again spectre variant#2 this is "near zero risk" to begin with, on all
> CPUs because of the complexity to mount an attack on any given
> kernel/CPU combination out there. I'm mostly thinking about the "guest
> mode"/"host kernel mod" isolation here but this should apply in
> general to all kernel protection for spectre variant#2.
>
> If we can do a 3 way alternative IBRS on skylake, repotline + IBRS
> around all BIOS, graphics firmware and pci firmware + kernel asm
> patched with a 2-way alternative between amd reptoline and intel
> reptoline emitted 2-way by same gcc build, and still guarantee this
> solution is as mathematically safe as a pure IBRS (which is obviously
> mathematically safe), ok. Otherwise it'd prefer to stick to pure IBRS
> and skip the reptoline complexity and fallouts here and there around
> bios asm firmware SMM etc.. not to tell the need to rebuild qemu with
> reptoline with 2-way amd/intel alternatives self modifying code in
> userland during qemu startup to achieve ibrs 1 ibpb 1 + prctl IBRS on
> qemu with repotline.
I agree. The amount of options we get plus the desire to do prctl/cgroup
and whatever based enable/disable makes me nervous as hell. Especially the
per task/process/cgroup runtime magic is bound to dig yet another hole of
speculation attack vector into the existing mess.
Thanks,
tglx
On Fri, Jan 05, 2018 at 02:52:33PM +0000, Van De Ven, Arjan wrote:
> I'm sorry but your whole statement reeks a little bit of "perfect is the enemy of good"
My point is exactly that this sentences could apply to spectre
variant#2 in the first place..
If we start moving in any direction, either we reach "perfection" or
moving towards perfection and still being in "perfect is the enemy of
good" territory doesn't sound great to me.
The reptoline status with 3 way " IBRS skylake mode" or " 2way
reptoline code emission from gcc for older CPUs or CPUs without
SPEC_CTRL" plus IBRS forced still to be used around all
firmware/bios/SMM (how do you do IBRS on those CPUS without IBRS, that
requires yet another IBPB alternative combined with AMD reptoline
case) is far from a "less is more" or KISS.
This whole reptoline work is total waste for future silicon so I like
to keep it simple and obviously mathematically safe at the same
time... and also easy to activate for full math safe qemu ibrs 2 mode
without having to do a 2-way reptoline gcc code emission and dynamic
patching on qemu startup.
On 1/5/2018 5:14 AM, David Woodhouse wrote:
> On Thu, 2018-01-04 at 09:56 -0800, Tim Chen wrote:
>> cpuid ax=0x7, return rdx bit 26 to indicate presence of this feature
>> IA32_SPEC_CTRL (0x48) and IA32_PRED_CMD (0x49)
>> IA32_SPEC_CTRL, bit0 – Indirect Branch Restricted Speculation (IBRS)
>> IA32_PRED_CMD, bit0 – Indirect Branch Prediction Barrier (IBPB)
>
> In a previous iteration of these patches, hadn't you already merged
> support for the AMD variant? Where'd that go?
It looks like this series is strictly IBRS related. Even though some of
the IBPB definitions are in here, there isn't support later for indicating
IBPB is in use or any places where IBPB would be used. I was assuming that
there would be another series for IBPB support. Is that the plan?
AMD is covered by the IBRS support as is, but we also have support for
the IBPB feature alone, identified by a different CPUID bit. I was
waiting for the IBPB series to see if I needed to submit anything or
whether the patches were included.
Thanks,
Tom
>
Hello everyone,
On Fri, Jan 05, 2018 at 02:32:17PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 07:50:33PM +0100, Borislav Petkov wrote:
> > On Thu, Jan 04, 2018 at 07:34:30PM +0100, Andrea Arcangeli wrote:
> > > void spec_ctrl_rescan_cpuid(void)
> > > {
> > > int cpu;
> > >
> > > if (use_ibp_disable)
> > > return;
> > > mutex_lock(&spec_ctrl_mutex);
> > > if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
> > > boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
> > > /* detect spec ctrl related cpuid additions */
> > > init_scattered_cpuid_features(&boot_cpu_data);
> >
> > You don't need to noodle through all the scattered features - just the
> > two bits.
>
> Does it really matter? Rescanning everything can't hurt here, and it
> shouldn't be noticable in any boot time chart. I feel like arguing
> about tiny stuff like this takes away from the obvious other problems
> this whole patch series had :(
Yes IMHO it's not worth trying to filter those two bits, sharing the
same function looked much cleaner.
It always helps to get patches to perfection so reviews welcome, but
IMHO the best way forward is to release a kernel with PTI then in rc1
of the next apply ibrs ibpb solution for variant#2 in whatever shape
and form (not necessarily immediately hyper optimized using static key
for the entry code branching or static_cpu_has for once), so that it
activates the microcode IBRS IBPB. Then we can optimize it in rc2 and
later.
About reptoline, I think all reptoline talk it's a waste of time right
now. Reptoline is an attempt to optimize for old CPUs only, it's
hugely more complex to implement and deploy, and for future silicon
it's useless. Even reptoline still requires IBRS and some CPU has no
ibrs so it'd require yet another alternative there and repotline can't
fix skylake anyway, reptoline requires new gcc code that doesn't even
exist today to do a 2-way code emission with 2 different reptolines
for different CPUS with no kernel code that exists that can reconcile
it at boot time to patch it. There's no way to runtime disable it
after applied, and AMD fam 0x10 0x12 0x16 also wouldn't get any
benefit reptoline either, not just skylake wouldn't use it.
Thanks,
Andrea
On Fri, 2018-01-05 at 14:42 +0000, Van De Ven, Arjan wrote:
> This is why I said I would like to see retpoline be the default, with
> IBRS an opt-in for the paranoid. I guess David will turn that on ;-)
I can live with that.
It really depends how you look at it.
We had IBRS first, and especially on Broadwell and earlier, its
performance really is painful.
Then came retpoline, purely as an optimisation. A very *important*
performance improvement, but an optimisation nonetheless.
When looking at optimisations, it is rare for us to say "oh, well it
opens up only a *small* theoretical security hole, but it's faster so
that's OK".
Retpoline is sufficient on pre-Skylake. On Skylake... it's complicated,
and that's why my default position has always been that we should
prefer IBRS there. Especially as it isn't *even* as much of a
performance win there.
But you're right, the holes that it opens up when compared to IBRS are
minimal, and it *is* still a performance win (just less so than on
earlier CPUs). So if your recommendation is that we stick with
retpoline until IBRS_ATT comes along, I can live with that. People
always have the option to build without retpoline, and then they get to
use IBRS in the kernel if they want.
On Fri, Jan 05, 2018 at 03:38:24PM +0000, David Woodhouse wrote:
> We had IBRS first, and especially on Broadwell and earlier, its
> performance really is painful.
>
> Then came retpoline, purely as an optimisation. A very *important*
> performance improvement, but an optimisation nonetheless.
>
> When looking at optimisations, it is rare for us to say "oh, well it
> opens up only a *small* theoretical security hole, but it's faster so
> that's OK".
I couldn't express any better than the above, the way I look at
repotlines.
Now seeing how this thing escalates with 2-way gcc code emission with
amd version that will not have ibrs at all to call it around the bios
etc... and IBRS skylake is still needed as a 3-way alternative and no
code exists to even patch it all at boot like that, and then qemu has
to be compiled with reptoline and 2-way alternative there too, to
achieve ibrs 2 ibpb 1, it's not looking like the way to go to me.
Not in the short term at least, and for the long term "reptoline" is
guaranteed a totally useless effort.
reptoline is like highmem kmap() etc... eventually nobody will ever
use reptoline. reptoline is more interesting for downstream in fact if
something where at least we won't have to deal with that forever where
there's more control on the toolchain used, and after a certain number
of years that code expires.
I can imagine we'll unfortunately have to deal with reptoline at some
point, but starting with reptoline at least looks backwards,
especially if it's the long term you're planning for.
Thanks,
Andrea
On Thu, Jan 04, 2018 at 08:08:55PM +0000, Woodhouse, David wrote:
> On Thu, 2018-01-04 at 21:05 +0100, Greg KH wrote:
> > On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote:
> > >
> > > From: David Woodhouse <[email protected]>
> > >
> > > We are impervious to the indirect branch prediction attack with
> > > retpoline
> > > but firmware won't be, so we still need to set IBRS to protect
> > > firmware code execution when calling into firmware at runtime.
> > Wait, what?
> >
> > Maybe it's just the wine from dinner talking, but if the firmware has
> > issues, we have bigger things to worry about here, right???It already
> > handed over the "chain of trust" to us, so we have already implicitly
> > trusted that the firmware was correct here.??So why do we need to do
> > anything about firmware calls in this manner?
>
> In the ideal world, firmware exists to boot the kernel and then it gets
> out of the way, never to be thought of again.
>
> In the Intel world, firmware idiocy permeates everything and we
> sometimes end up making calls to it at runtime.
>
> If an attacker can poison the BTB for an indirect branch in EFI
> firmware, then reliably do something which calls EFI runtime calls,
> that might lead to an exploit. So if we're using retpoline for the
> kernel, then we should be setting IBRS before any firmware calls.
Ugh, ok, seems a bit far-fetched to me, but I will not object anymore.
Except that the patch doesn't actually build, which means no one has
actually tested it at all :(
greg k-h
On Fri, Jan 05, 2018 at 05:08:48PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 08:08:55PM +0000, Woodhouse, David wrote:
> > On Thu, 2018-01-04 at 21:05 +0100, Greg KH wrote:
> > > On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote:
> > > >
> > > > From: David Woodhouse <[email protected]>
> > > >
> > > > We are impervious to the indirect branch prediction attack with
> > > > retpoline
> > > > but firmware won't be, so we still need to set IBRS to protect
> > > > firmware code execution when calling into firmware at runtime.
> > > Wait, what?
> > >
> > > Maybe it's just the wine from dinner talking, but if the firmware has
> > > issues, we have bigger things to worry about here, right???It already
> > > handed over the "chain of trust" to us, so we have already implicitly
> > > trusted that the firmware was correct here.??So why do we need to do
> > > anything about firmware calls in this manner?
> >
> > In the ideal world, firmware exists to boot the kernel and then it gets
> > out of the way, never to be thought of again.
> >
> > In the Intel world, firmware idiocy permeates everything and we
> > sometimes end up making calls to it at runtime.
> >
> > If an attacker can poison the BTB for an indirect branch in EFI
> > firmware, then reliably do something which calls EFI runtime calls,
> > that might lead to an exploit. So if we're using retpoline for the
> > kernel, then we should be setting IBRS before any firmware calls.
>
> Ugh, ok, seems a bit far-fetched to me, but I will not object anymore.
>
> Except that the patch doesn't actually build, which means no one has
> actually tested it at all :(
One more wish, I personally would prefer the whole:
+#if defined(RETPOLINE)
to be dropped too and if this could be respinned without any REPTOLINE
knowledge at all. It can't be activated anyway so it should go in a
different patchset that can actually activate it if something but
that's much lower prio.
It was great info to know reptoline still needs IBRS anyway though.
Thanks,
Andrea
On Fri, 2018-01-05 at 17:05 +0100, Andrea Arcangeli wrote:
> On Fri, Jan 05, 2018 at 03:38:24PM +0000, David Woodhouse wrote:
> >
> > We had IBRS first, and especially on Broadwell and earlier, its
> > performance really is painful.
> >
> > Then came retpoline, purely as an optimisation. A very *important*
> > performance improvement, but an optimisation nonetheless.
> >
> > When looking at optimisations, it is rare for us to say "oh, well it
> > opens up only a *small* theoretical security hole, but it's faster so
> > that's OK".
> I couldn't express any better than the above, the way I look at
> repotlines.
>
> Now seeing how this thing escalates with 2-way gcc code emission with
> amd version that will not have ibrs at all to call it around the bios
> etc... and IBRS skylake is still needed as a 3-way alternative and no
> code exists to even patch it all at boot like that, and then qemu has
> to be compiled with reptoline and 2-way alternative there too, to
> achieve ibrs 2 ibpb 1, it's not looking like the way to go to me.
>
> Not in the short term at least, and for the long term "reptoline" is
> guaranteed a totally useless effort.
>
> reptoline is like highmem kmap() etc... eventually nobody will ever
> use reptoline. reptoline is more interesting for downstream in fact if
> something where at least we won't have to deal with that forever where
> there's more control on the toolchain used, and after a certain number
> of years that code expires.
>
> I can imagine we'll unfortunately have to deal with reptoline at some
> point, but starting with reptoline at least looks backwards,
> especially if it's the long term you're planning for.
You are completely ignoring pre-Skylake here.
On pre-Skylake, retpoline is perfectly sufficient and it's a *lot*
faster than the IBRS option which is almost prohibitively slow.
We didn't do it just for fun. And it's working fine; it isn't *that*
complex.
On Fri, Jan 05, 2018 at 04:37:30PM +0000, David Woodhouse wrote:
> You are completely ignoring pre-Skylake here.
>
> On pre-Skylake, retpoline is perfectly sufficient and it's a *lot*
> faster than the IBRS option which is almost prohibitively slow.
>
> We didn't do it just for fun. And it's working fine; it isn't *that*
> complex.
How do you enable IBRS when the CPU switches to SMM?
Do you already have this 2-way code emission from gcc and patching
with a 3-way alternatives at boot between ibrs and 2 reptoline version
emitted by gcc and alternatives between ibrs and ibpb where SPEC_CTRL
is missing on some CPU but IBPB_SUPPORT is available?
Or are you talking about having done this on a non upstream Xen build
only without the 2-way code emission for gcc?
> On Fri, Jan 05, 2018 at 04:37:30PM +0000, David Woodhouse wrote:
> > You are completely ignoring pre-Skylake here.
> >
> > On pre-Skylake, retpoline is perfectly sufficient and it's a *lot*
> > faster than the IBRS option which is almost prohibitively slow.
> >
> > We didn't do it just for fun. And it's working fine; it isn't *that*
> > complex.
>
> How do you enable IBRS when the CPU switches to SMM?
you do not need to. SMM takes care of itself.
On Fri, 2018-01-05 at 17:42 +0100, Andrea Arcangeli wrote:
> On Fri, Jan 05, 2018 at 04:37:30PM +0000, David Woodhouse wrote:
> > You are completely ignoring pre-Skylake here.
> >
> > On pre-Skylake, retpoline is perfectly sufficient and it's a *lot*
> > faster than the IBRS option which is almost prohibitively slow.
> >
> > We didn't do it just for fun. And it's working fine; it isn't *that*
> > complex.
>
> How do you enable IBRS when the CPU switches to SMM?
SMM is fine, as Arjan said. It's only for stuff like EFI runtime calls,
and then only if you're really paranoid.
> Do you already have this 2-way code emission from gcc and patching
> with a 3-way alternatives at boot between ibrs and 2 reptoline version
> emitted by gcc and alternatives between ibrs and ibpb where SPEC_CTRL
> is missing on some CPU but IBPB_SUPPORT is available?
This was implemented in Intel's patch sets that they've been sending
out. I don't really know why we've suddenly gone back to the drawing
board and turned things around to put retpoline first in the series,
etc.
I'm also mildly concerned that all the variant 1 patches have just
disappeared.
> Or are you talking about having done this on a non upstream Xen build
> only without the 2-way code emission for gcc?
Xen has it too, but no. I was talking about Linux.
On 01/05/2018 07:14 AM, Tom Lendacky wrote:
> On 1/5/2018 5:14 AM, David Woodhouse wrote:
>> On Thu, 2018-01-04 at 09:56 -0800, Tim Chen wrote:
>>> cpuid ax=0x7, return rdx bit 26 to indicate presence of this feature
>>> IA32_SPEC_CTRL (0x48) and IA32_PRED_CMD (0x49)
>>> IA32_SPEC_CTRL, bit0 – Indirect Branch Restricted Speculation (IBRS)
>>> IA32_PRED_CMD, bit0 – Indirect Branch Prediction Barrier (IBPB)
>>
>> In a previous iteration of these patches, hadn't you already merged
>> support for the AMD variant? Where'd that go?
>
> It looks like this series is strictly IBRS related. Even though some of
> the IBPB definitions are in here, there isn't support later for indicating
> IBPB is in use or any places where IBPB would be used. I was assuming that
> there would be another series for IBPB support. Is that the plan?
IBPB is meant to be a separate series. We want to get retpoline
and IBRS out first.
>
> AMD is covered by the IBRS support as is, but we also have support for
> the IBPB feature alone, identified by a different CPUID bit. I was
> waiting for the IBPB series to see if I needed to submit anything or
> whether the patches were included.
>
Tim
On 01/05/2018 04:27 AM, Dr. David Alan Gilbert wrote:
>>> Patches for 1-3 are out there and 4 is pretty straightforward. Doing a
>>> arch_prctl() is still straightforward, but will be a much more niche
>>> thing than any of the other choices. Plus, with a user interface, we
>>> have to argue over the ABI for at least a month or two. ;)
> I was chatting to Andrea about this, and we came to the conclusion one
> use might be for qemu; I was worried about (theoretically) whether
> userspace in a guest could read privileged data from the guest kernel by
> attacking the qemu process rather than by attacking the kernels.
Theoretically, I believe it's possible. The SMEP-based mitigations are
effective when crossing rings, but do not help with
guest-ring0->host-ring0 or presumably guest-ring3->host-ring3.
For the same-ring things, we have the indirect branch predictor flush
operation MSR (IBPB). Expect those to be posted once we have the IBRS
and retpoline approaches settled.
On 01/05/2018 03:16 AM, David Woodhouse wrote:
> On Thu, 2018-01-04 at 09:56 -0800, Tim Chen wrote:
>>
>> + mutex_lock(&spec_ctrl_mutex);
>> +
>> + if (enable == IBRS_DISABLED) {
>> + /* disable IBRS usage */
>> + set_ibrs_disabled();
>> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
>> + spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_DISABLE_IBRS);
>> + } else if (enable == IBRS_ENABLED) {
>> + /* enable IBRS usage in kernel */
>> + clear_ibrs_disabled();
>> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
>> + set_ibrs_inuse();
>> + else
>> + /* Platform don't support IBRS */
>> + enable = IBRS_DISABLED;
>> + } else if (enable == IBRS_ENABLED_USER) {
>> + /* enable IBRS usage in both userspace and kernel */
>> + clear_ibrs_disabled();
>> + /* don't change IBRS value once we set it to always on */
>> + clear_ibrs_inuse();
>> + if (spec_ctrl_ibrs & SPEC_CTRL_IBRS_SUPPORTED)
>> + spec_ctrl_flush_all_cpus(MSR_IA32_SPEC_CTRL, SPEC_CTRL_FEATURE_ENABLE_IBRS);
>> + else
>> + /* Platform don't support IBRS */
>> + enable = IBRS_DISABLED;
>> + }
>
> This doesn't take the retpoline status into account. If we have
> retpoline, we don't need IBRS in the kernel.
>
If retpoline is used, we don't enable IBRS automatically during feature detection.
But if the admin is paranoid, he still has the choice to explicitly issue
a command to enable IBRS here.
Tim