Using retpoline ensures the kernel is safe because it doesn't contain
any indirect branches, but firmware still can — and we make calls into
firmware at runtime. Where the IBRS microcode support is available, use
that before calling into firmware.
While doing that, I noticed that we were calling C functions without
telling the compiler about the call-clobbered registers. Stop that.
This also contains the always_inline fix for the performance problem
introduced by retpoline in KVM code, and fixes some other issues with
the per-vCPU KVM handling for the SPEC_CTRL MSR.
Finally, update the microcode blacklist to reflect the latest
information from Intel.
v2: Drop IBRS_ALL patch for the time being
Add KVM MSR fixes (karahmed)
Update microcode blacklist
David Woodhouse (4):
x86/speculation: Update Speculation Control microcode blacklist
Revert "x86/speculation: Simplify
indirect_branch_prediction_barrier()"
KVM: x86: Reduce retpoline performance impact in
slot_handle_level_range()
x86/speculation: Use IBRS if available before calling into firmware
KarimAllah Ahmed (2):
X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR
bitmap
arch/x86/include/asm/apm.h | 6 ++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/efi.h | 17 +++++++++++++++--
arch/x86/include/asm/nospec-branch.h | 32 ++++++++++++++++++++++++++++----
arch/x86/include/asm/processor.h | 3 ---
arch/x86/kernel/cpu/bugs.c | 18 +++++++++++-------
arch/x86/kernel/cpu/intel.c | 4 ----
arch/x86/kvm/mmu.c | 10 +++++-----
arch/x86/kvm/vmx.c | 7 ++++---
drivers/watchdog/hpwdt.c | 3 +++
10 files changed, 73 insertions(+), 28 deletions(-)
--
2.7.4
With retpoline, tight loops of "call this function for every XXX" are
very much pessimised by taking a prediction miss *every* time. This one
is by far the biggest contributor to the guest launch time with retpoline.
By marking the iterator slot_handle_…() functions always_inline, we can
ensure that the indirect function call can be optimised away into a
direct call and it actually generates slightly smaller code because
some of the other conditionals can get optimised away too.
Performance is now pretty close to what we see with nospectre_v2 on
the command line.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Filippo Sironi <[email protected]>
Tested-by: Filippo Sironi <[email protected]>
---
arch/x86/kvm/mmu.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2b8eb4d..cc83bdc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5058,7 +5058,7 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
/* The caller should hold mmu-lock before calling this function. */
-static bool
+static __always_inline bool
slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, int start_level, int end_level,
gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
@@ -5088,7 +5088,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
return flush;
}
-static bool
+static __always_inline bool
slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, int start_level, int end_level,
bool lock_flush_tlb)
@@ -5099,7 +5099,7 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
@@ -5107,7 +5107,7 @@ slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
@@ -5115,7 +5115,7 @@ slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
--
2.7.4
From: KarimAllah Ahmed <[email protected]>
We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
MSR accesses or create a valid L02 MSR bitmap and use that. This decision
has to be made every time we evaluate whether we are going to generate the
L02 MSR bitmap.
Before commit 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
this was probably OK since the decision was always identical. This is no
longer the case now since the MSR bitmap might actually change once we
decide to not intercept SPEC_CTRL and PRED_CMD.
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kvm/vmx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 599179b..91e3539 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10130,7 +10130,8 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
if (cpu_has_vmx_msr_bitmap() &&
nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS) &&
nested_vmx_merge_msr_bitmap(vcpu, vmcs12))
- ;
+ vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL,
+ CPU_BASED_USE_MSR_BITMAPS);
else
vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
CPU_BASED_USE_MSR_BITMAPS);
--
2.7.4
Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/include/asm/apm.h | 6 ++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/efi.h | 17 +++++++++++++++--
arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++---------
arch/x86/kernel/cpu/bugs.c | 12 +++++++++++-
drivers/watchdog/hpwdt.c | 3 +++
6 files changed, 64 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h
index 4d4015d..4483616 100644
--- a/arch/x86/include/asm/apm.h
+++ b/arch/x86/include/asm/apm.h
@@ -7,6 +7,8 @@
#ifndef _ASM_X86_MACH_DEFAULT_APM_H
#define _ASM_X86_MACH_DEFAULT_APM_H
+#include <asm/spec_ctrl.h>
+
#ifdef APM_ZERO_SEGS
# define APM_DO_ZERO_SEGS \
"pushl %%ds\n\t" \
@@ -32,6 +34,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
* N.B. We do NOT need a cld after the BIOS call
* because we always save and restore the flags.
*/
+ firmware_restrict_branch_speculation_start();
__asm__ __volatile__(APM_DO_ZERO_SEGS
"pushl %%edi\n\t"
"pushl %%ebp\n\t"
@@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
"=S" (*esi)
: "a" (func), "b" (ebx_in), "c" (ecx_in)
: "memory", "cc");
+ firmware_restrict_branch_speculation_end();
}
static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
@@ -56,6 +60,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
* N.B. We do NOT need a cld after the BIOS call
* because we always save and restore the flags.
*/
+ firmware_restrict_branch_speculation_start();
__asm__ __volatile__(APM_DO_ZERO_SEGS
"pushl %%edi\n\t"
"pushl %%ebp\n\t"
@@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
"=S" (si)
: "a" (func), "b" (ebx_in), "c" (ecx_in)
: "memory", "cc");
+ firmware_restrict_branch_speculation_end();
return error;
}
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 73b5fff..66c1434 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */
#define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */
+#define X86_FEATURE_USE_IBRS_FW ( 7*32+22) /* "" Use IBRS during runtime firmware calls */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb..a399c1e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -6,6 +6,7 @@
#include <asm/pgtable.h>
#include <asm/processor-flags.h>
#include <asm/tlb.h>
+#include <asm/nospec-branch.h>
/*
* We map the EFI regions needed for runtime services non-contiguously,
@@ -36,8 +37,18 @@
extern asmlinkage unsigned long efi_call_phys(void *, ...);
-#define arch_efi_call_virt_setup() kernel_fpu_begin()
-#define arch_efi_call_virt_teardown() kernel_fpu_end()
+#define arch_efi_call_virt_setup() \
+({ \
+ kernel_fpu_begin(); \
+ firmware_restrict_branch_speculation_start(); \
+})
+
+#define arch_efi_call_virt_teardown() \
+({ \
+ firmware_restrict_branch_speculation_end(); \
+ kernel_fpu_end(); \
+})
+
/*
* Wrap all the virtual calls in a way that forces the parameters on the stack.
@@ -73,6 +84,7 @@ struct efi_scratch {
efi_sync_low_kernel_mappings(); \
preempt_disable(); \
__kernel_fpu_begin(); \
+ firmware_restrict_branch_speculation_start(); \
\
if (efi_scratch.use_pgd) { \
efi_scratch.prev_cr3 = __read_cr3(); \
@@ -91,6 +103,7 @@ struct efi_scratch {
__flush_tlb_all(); \
} \
\
+ firmware_restrict_branch_speculation_end(); \
__kernel_fpu_end(); \
preempt_enable(); \
})
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..788c4da 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -162,17 +162,36 @@ static inline void vmexit_fill_RSB(void)
#endif
}
+#define alternative_msr_write(_msr, _val, _feature) \
+ asm volatile(ALTERNATIVE("", \
+ "movl %[msr], %%ecx\n\t" \
+ "movl %[val], %%eax\n\t" \
+ "movl $0, %%edx\n\t" \
+ "wrmsr", \
+ _feature) \
+ : : [msr] "i" (_msr), [val] "i" (_val) \
+ : "eax", "ecx", "edx", "memory")
+
static inline void indirect_branch_prediction_barrier(void)
{
- asm volatile(ALTERNATIVE("",
- "movl %[msr], %%ecx\n\t"
- "movl %[val], %%eax\n\t"
- "movl $0, %%edx\n\t"
- "wrmsr",
- X86_FEATURE_USE_IBPB)
- : : [msr] "i" (MSR_IA32_PRED_CMD),
- [val] "i" (PRED_CMD_IBPB)
- : "eax", "ecx", "edx", "memory");
+ alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB,
+ X86_FEATURE_USE_IBPB);
+}
+
+/*
+ * With retpoline, we must use IBRS to restrict branch prediction
+ * before calling into firmware.
+ */
+static inline void firmware_restrict_branch_speculation_start(void)
+{
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
+ X86_FEATURE_USE_IBRS_FW);
+}
+
+static inline void firmware_restrict_branch_speculation_end(void)
+{
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+ X86_FEATURE_USE_IBRS_FW);
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 61152aa..6f6d763 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -303,6 +303,15 @@ static void __init spectre_v2_select_mitigation(void)
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
pr_info("Enabling Indirect Branch Prediction Barrier\n");
}
+
+ /*
+ * Retpoline means the kernel is safe because it has no indirect
+ * branches. But firmware isn't, so use IBRS to protect that.
+ */
+ if (boot_cpu_has(X86_FEATURE_IBRS)) {
+ setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
+ pr_info("Enabling Restricted Speculation for firmware calls\n");
+ }
}
#undef pr_fmt
@@ -332,8 +341,9 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
return sprintf(buf, "Not affected\n");
- return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
+ return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
+ boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
spectre_v2_module_string());
}
#endif
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 67fbe35..bab3721 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -38,6 +38,7 @@
#endif /* CONFIG_HPWDT_NMI_DECODING */
#include <asm/nmi.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
#define HPWDT_VERSION "1.4.0"
#define SECS_TO_TICKS(secs) ((secs) * 1000 / 128)
@@ -486,11 +487,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
if (!hpwdt_nmi_decoding)
return NMI_DONE;
+ firmware_restrict_branch_speculation_start();
spin_lock_irqsave(&rom_lock, rom_pl);
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(&cmn_regs, cru_rom_addr);
die_nmi_called = 1;
spin_unlock_irqrestore(&rom_lock, rom_pl);
+ firmware_restrict_branch_speculation_end();
if (allow_kdump)
hpwdt_stop();
--
2.7.4
From: KarimAllah Ahmed <[email protected]>
These two variables should check whether SPEC_CTRL and PRED_CMD are
supposed to be passed through to L2 guests or not. While
msr_write_intercepted_l01 would return 'true' if it is not passed through.
So just invert the result of msr_write_intercepted_l01 to implement the
correct semantics.
Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kvm/vmx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bee4c49..599179b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10219,8 +10219,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
* updated to reflect this when L1 (or its L2s) actually write to
* the MSR.
*/
- bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
- bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
+ bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+ bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
!pred_cmd && !spec_ctrl)
--
2.7.4
Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
too. We blacklisted the latter purely because it was present with all
the other problematic ones in the 2018-01-08 release, but now it's
explicitly listed as OK.
We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
that appeared in one version of the blacklist and then reverted to
0x80 again. We can change it if 0x84 is actually announced to be safe.
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 319bf98..f73b814 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -123,8 +123,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
{ INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x84 },
{ INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
{ INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
- { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0xc2 },
- { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0xc2 },
{ INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
{ INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
{ INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
@@ -136,8 +134,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
{ INTEL_FAM6_HASWELL_X, 0x02, 0x3b },
{ INTEL_FAM6_HASWELL_X, 0x04, 0x10 },
{ INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a },
- /* Updated in the 20180108 release; blacklist until we know otherwise */
- { INTEL_FAM6_ATOM_GEMINI_LAKE, 0x01, 0x22 },
/* Observed in the wild */
{ INTEL_FAM6_SANDYBRIDGE_X, 0x06, 0x61b },
{ INTEL_FAM6_SANDYBRIDGE_X, 0x07, 0x712 },
--
2.7.4
This reverts commit 64e16720ea0879f8ab4547e3b9758936d483909b.
We cannot call C functions like that, without marking all the
call-clobbered registers as, well, clobbered. We might have got away
with it for now because the __ibp_barrier() function was *fairly*
unlikely to actually use any other registers. But no. Just no.
Signed-off-by: David Woodhouse <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 13 +++++++++----
arch/x86/include/asm/processor.h | 3 ---
arch/x86/kernel/cpu/bugs.c | 6 ------
3 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4d57894..300cc15 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
static inline void indirect_branch_prediction_barrier(void)
{
- alternative_input("",
- "call __ibp_barrier",
- X86_FEATURE_USE_IBPB,
- ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
+ asm volatile(ALTERNATIVE("",
+ "movl %[msr], %%ecx\n\t"
+ "movl %[val], %%eax\n\t"
+ "movl $0, %%edx\n\t"
+ "wrmsr",
+ X86_FEATURE_USE_IBPB)
+ : : [msr] "i" (MSR_IA32_PRED_CMD),
+ [val] "i" (PRED_CMD_IBPB)
+ : "eax", "ecx", "edx", "memory");
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 513f960..99799fb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -969,7 +969,4 @@ bool xen_set_default_idle(void);
void stop_this_cpu(void *dummy);
void df_debug(struct pt_regs *regs, long error_code);
-
-void __ibp_barrier(void);
-
#endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 71949bf..61152aa 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -337,9 +337,3 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
spectre_v2_module_string());
}
#endif
-
-void __ibp_barrier(void)
-{
- __wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
-}
-EXPORT_SYMBOL_GPL(__ibp_barrier);
--
2.7.4
* David Woodhouse <[email protected]> wrote:
> From: KarimAllah Ahmed <[email protected]>
>
> We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
> MSR accesses or create a valid L02 MSR bitmap and use that. This decision
> has to be made every time we evaluate whether we are going to generate the
> L02 MSR bitmap.
>
> Before commit 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
> this was probably OK since the decision was always identical. This is no
> longer the case now since the MSR bitmap might actually change once we
> decide to not intercept SPEC_CTRL and PRED_CMD.
Note, I fixed the changelog to refer to the correct upstream SHA1, which is:
d28b387fb74d: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
Thanks,
Ingo
Paolo, Radim,
* David Woodhouse <[email protected]> wrote:
> David Woodhouse (4):
> KVM: x86: Reduce retpoline performance impact in slot_handle_level_range()
> KarimAllah Ahmed (2):
> X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
> KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap
Similarly to the previous Spectre patches I've applied these three KVM patches to
tip:x86/pti too, to keep them all in a single backportable group of commits. They
all look correct to me and solve real problems, and there's no conflict with
current upstream KVM code.
Let me know if that's OK to you or if you'd like to see any changes to them.
Thanks,
Ingo
* Woodhouse, David <[email protected]> wrote:
> On Sun, 2018-02-11 at 11:19 +0100, Ingo Molnar wrote:
> > Note, I fixed the changelog to refer to the correct upstream SHA1,
> > which is:
> >
> > ? d28b387fb74d: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
>
> Thanks for catching that.
>
> Wouldn't it be nice if 'git rebase --interactive tip/x86/pti' had done
> that *for* me? :)
Yeah, but given that the commit title changed as well:
086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
I'd rather not have tooling guess about such things.
Thanks,
Ingo
* David Woodhouse <[email protected]> wrote:
> Retpoline means the kernel is safe because it has no indirect branches.
> But firmware isn't, so use IBRS for firmware calls if it's available.
Ok, this approach looks good to me in principle, but:
> --- a/arch/x86/include/asm/apm.h
> +++ b/arch/x86/include/asm/apm.h
> @@ -7,6 +7,8 @@
> #ifndef _ASM_X86_MACH_DEFAULT_APM_H
> #define _ASM_X86_MACH_DEFAULT_APM_H
>
> +#include <asm/spec_ctrl.h>
I cannot see this header file upstream anywhere, nor in any other patch in my mbox
- is there some dependency that has to be applied first?
Thanks,
Ingo
Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.
Signed-off-by: David Woodhouse <[email protected]>
---
Helps to include the right header file.
arch/x86/include/asm/apm.h | 6 ++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/efi.h | 17 +++++++++++++++--
arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++---------
arch/x86/kernel/cpu/bugs.c | 12 +++++++++++-
drivers/watchdog/hpwdt.c | 3 +++
6 files changed, 64 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h
index 4d4015d..c356098 100644
--- a/arch/x86/include/asm/apm.h
+++ b/arch/x86/include/asm/apm.h
@@ -7,6 +7,8 @@
#ifndef _ASM_X86_MACH_DEFAULT_APM_H
#define _ASM_X86_MACH_DEFAULT_APM_H
+#include <asm/nospec-branch.h>
+
#ifdef APM_ZERO_SEGS
# define APM_DO_ZERO_SEGS \
"pushl %%ds\n\t" \
@@ -32,6 +34,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
* N.B. We do NOT need a cld after the BIOS call
* because we always save and restore the flags.
*/
+ firmware_restrict_branch_speculation_start();
__asm__ __volatile__(APM_DO_ZERO_SEGS
"pushl %%edi\n\t"
"pushl %%ebp\n\t"
@@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
"=S" (*esi)
: "a" (func), "b" (ebx_in), "c" (ecx_in)
: "memory", "cc");
+ firmware_restrict_branch_speculation_end();
}
static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
@@ -56,6 +60,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
* N.B. We do NOT need a cld after the BIOS call
* because we always save and restore the flags.
*/
+ firmware_restrict_branch_speculation_start();
__asm__ __volatile__(APM_DO_ZERO_SEGS
"pushl %%edi\n\t"
"pushl %%ebp\n\t"
@@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
"=S" (si)
: "a" (func), "b" (ebx_in), "c" (ecx_in)
: "memory", "cc");
+ firmware_restrict_branch_speculation_end();
return error;
}
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 73b5fff..66c1434 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */
#define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */
+#define X86_FEATURE_USE_IBRS_FW ( 7*32+22) /* "" Use IBRS during runtime firmware calls */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb..a399c1e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -6,6 +6,7 @@
#include <asm/pgtable.h>
#include <asm/processor-flags.h>
#include <asm/tlb.h>
+#include <asm/nospec-branch.h>
/*
* We map the EFI regions needed for runtime services non-contiguously,
@@ -36,8 +37,18 @@
extern asmlinkage unsigned long efi_call_phys(void *, ...);
-#define arch_efi_call_virt_setup() kernel_fpu_begin()
-#define arch_efi_call_virt_teardown() kernel_fpu_end()
+#define arch_efi_call_virt_setup() \
+({ \
+ kernel_fpu_begin(); \
+ firmware_restrict_branch_speculation_start(); \
+})
+
+#define arch_efi_call_virt_teardown() \
+({ \
+ firmware_restrict_branch_speculation_end(); \
+ kernel_fpu_end(); \
+})
+
/*
* Wrap all the virtual calls in a way that forces the parameters on the stack.
@@ -73,6 +84,7 @@ struct efi_scratch {
efi_sync_low_kernel_mappings(); \
preempt_disable(); \
__kernel_fpu_begin(); \
+ firmware_restrict_branch_speculation_start(); \
\
if (efi_scratch.use_pgd) { \
efi_scratch.prev_cr3 = __read_cr3(); \
@@ -91,6 +103,7 @@ struct efi_scratch {
__flush_tlb_all(); \
} \
\
+ firmware_restrict_branch_speculation_end(); \
__kernel_fpu_end(); \
preempt_enable(); \
})
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..788c4da 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -162,17 +162,36 @@ static inline void vmexit_fill_RSB(void)
#endif
}
+#define alternative_msr_write(_msr, _val, _feature) \
+ asm volatile(ALTERNATIVE("", \
+ "movl %[msr], %%ecx\n\t" \
+ "movl %[val], %%eax\n\t" \
+ "movl $0, %%edx\n\t" \
+ "wrmsr", \
+ _feature) \
+ : : [msr] "i" (_msr), [val] "i" (_val) \
+ : "eax", "ecx", "edx", "memory")
+
static inline void indirect_branch_prediction_barrier(void)
{
- asm volatile(ALTERNATIVE("",
- "movl %[msr], %%ecx\n\t"
- "movl %[val], %%eax\n\t"
- "movl $0, %%edx\n\t"
- "wrmsr",
- X86_FEATURE_USE_IBPB)
- : : [msr] "i" (MSR_IA32_PRED_CMD),
- [val] "i" (PRED_CMD_IBPB)
- : "eax", "ecx", "edx", "memory");
+ alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB,
+ X86_FEATURE_USE_IBPB);
+}
+
+/*
+ * With retpoline, we must use IBRS to restrict branch prediction
+ * before calling into firmware.
+ */
+static inline void firmware_restrict_branch_speculation_start(void)
+{
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
+ X86_FEATURE_USE_IBRS_FW);
+}
+
+static inline void firmware_restrict_branch_speculation_end(void)
+{
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+ X86_FEATURE_USE_IBRS_FW);
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 61152aa..6f6d763 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -303,6 +303,15 @@ static void __init spectre_v2_select_mitigation(void)
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
pr_info("Enabling Indirect Branch Prediction Barrier\n");
}
+
+ /*
+ * Retpoline means the kernel is safe because it has no indirect
+ * branches. But firmware isn't, so use IBRS to protect that.
+ */
+ if (boot_cpu_has(X86_FEATURE_IBRS)) {
+ setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
+ pr_info("Enabling Restricted Speculation for firmware calls\n");
+ }
}
#undef pr_fmt
@@ -332,8 +341,9 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
return sprintf(buf, "Not affected\n");
- return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
+ return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
+ boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
spectre_v2_module_string());
}
#endif
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 67fbe35..bab3721 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -38,6 +38,7 @@
#endif /* CONFIG_HPWDT_NMI_DECODING */
#include <asm/nmi.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
#define HPWDT_VERSION "1.4.0"
#define SECS_TO_TICKS(secs) ((secs) * 1000 / 128)
@@ -486,11 +487,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
if (!hpwdt_nmi_decoding)
return NMI_DONE;
+ firmware_restrict_branch_speculation_start();
spin_lock_irqsave(&rom_lock, rom_pl);
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(&cmn_regs, cru_rom_addr);
die_nmi_called = 1;
spin_unlock_irqrestore(&rom_lock, rom_pl);
+ firmware_restrict_branch_speculation_end();
if (allow_kdump)
hpwdt_stop();
--
2.7.4
* David Woodhouse <[email protected]> wrote:
> + /*
> + * Retpoline means the kernel is safe because it has no indirect
> + * branches. But firmware isn't, so use IBRS to protect that.
> + */
> + if (boot_cpu_has(X86_FEATURE_IBRS)) {
> + setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
> + pr_info("Enabling Restricted Speculation for firmware calls\n");
> + }
I have changed this text to say:
pr_info("Spectre mitigation: Restricting branch speculation (enabling IBRS) for firmware calls\n");
In fact while at it I found and improved a few other details as well, such as:
* Retpoline means the kernel is safe because it has no indirect
- * branches. But firmware isn't, so use IBRS to protect that.
+ * branches. But we don't know whether the firmware is safe, so
+ * use IBRS to protect against that:
most Spectre related messages are now harmonized:
arch/x86/kernel/cpu/bugs.c: pr_info("Spectre mitigation: Filling RSB on context switch\n");
arch/x86/kernel/cpu/bugs.c: pr_info("Spectre mitigation: Enabling Indirect Branch Prediction Barrier (IBPB)\n");
arch/x86/kernel/cpu/bugs.c: pr_info("Spectre mitigation: Restricting branch speculation (enabling IBRS) for firmware calls\n");
Find the full patch below.
Thanks,
Ingo
=========================>
From 82c2b2f29691143a05181333f387e786646aa28b Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Sat, 10 Feb 2018 11:51:57 +0100
Subject: [PATCH] x86/speculation: Clean up various Spectre related details
Harmonize all the Spectre messages so that a:
dmesg | grep -i spectre
... gives us most Spectre related kernel boot messages.
Also fix a few other details:
- clarify a comment about firmware speculation control
- s/KPTI/PTI
- remove various line-breaks that made the code uglier
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 33 ++++++++++++++-------------------
1 file changed, 14 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6f6d763225c8..eff45477fcca 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -162,8 +162,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
return SPECTRE_V2_CMD_NONE;
else {
- ret = cmdline_find_option(boot_command_line, "spectre_v2", arg,
- sizeof(arg));
+ ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg));
if (ret < 0)
return SPECTRE_V2_CMD_AUTO;
@@ -175,8 +174,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
}
if (i >= ARRAY_SIZE(mitigation_options)) {
- pr_err("unknown option (%s). Switching to AUTO select\n",
- mitigation_options[i].option);
+ pr_err("unknown option (%s). Switching to AUTO select\n", mitigation_options[i].option);
return SPECTRE_V2_CMD_AUTO;
}
}
@@ -185,8 +183,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
cmd == SPECTRE_V2_CMD_RETPOLINE_AMD ||
cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) &&
!IS_ENABLED(CONFIG_RETPOLINE)) {
- pr_err("%s selected but not compiled in. Switching to AUTO select\n",
- mitigation_options[i].option);
+ pr_err("%s selected but not compiled in. Switching to AUTO select\n", mitigation_options[i].option);
return SPECTRE_V2_CMD_AUTO;
}
@@ -256,14 +253,14 @@ static void __init spectre_v2_select_mitigation(void)
goto retpoline_auto;
break;
}
- pr_err("kernel not compiled with retpoline; no mitigation available!");
+ pr_err("Spectre mitigation: kernel not compiled with retpoline; no mitigation available!");
return;
retpoline_auto:
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
retpoline_amd:
if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) {
- pr_err("LFENCE not serializing. Switching to generic retpoline\n");
+ pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n");
goto retpoline_generic;
}
mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD :
@@ -281,7 +278,7 @@ static void __init spectre_v2_select_mitigation(void)
pr_info("%s\n", spectre_v2_strings[mode]);
/*
- * If neither SMEP or KPTI are available, there is a risk of
+ * If neither SMEP or PTI are available, there is a risk of
* hitting userspace addresses in the RSB after a context switch
* from a shallow call stack to a deeper one. To prevent this fill
* the entire RSB, even when using IBRS.
@@ -295,30 +292,30 @@ static void __init spectre_v2_select_mitigation(void)
if ((!boot_cpu_has(X86_FEATURE_PTI) &&
!boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) {
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
- pr_info("Filling RSB on context switch\n");
+ pr_info("Spectre mitigation: Filling RSB on context switch\n");
}
/* Initialize Indirect Branch Prediction Barrier if supported */
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
- pr_info("Enabling Indirect Branch Prediction Barrier\n");
+ pr_info("Spectre mitigation: Enabling Indirect Branch Prediction Barrier (IBPB)\n");
}
/*
* Retpoline means the kernel is safe because it has no indirect
- * branches. But firmware isn't, so use IBRS to protect that.
+ * branches. But we don't know whether the firmware is safe, so
+ * use IBRS to protect against that:
*/
if (boot_cpu_has(X86_FEATURE_IBRS)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
- pr_info("Enabling Restricted Speculation for firmware calls\n");
+ pr_info("Spectre mitigation: Restricting branch speculation (enabling IBRS) for firmware calls\n");
}
}
#undef pr_fmt
#ifdef CONFIG_SYSFS
-ssize_t cpu_show_meltdown(struct device *dev,
- struct device_attribute *attr, char *buf)
+ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf)
{
if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
return sprintf(buf, "Not affected\n");
@@ -327,16 +324,14 @@ ssize_t cpu_show_meltdown(struct device *dev,
return sprintf(buf, "Vulnerable\n");
}
-ssize_t cpu_show_spectre_v1(struct device *dev,
- struct device_attribute *attr, char *buf)
+ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf)
{
if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1))
return sprintf(buf, "Not affected\n");
return sprintf(buf, "Mitigation: __user pointer sanitization\n");
}
-ssize_t cpu_show_spectre_v2(struct device *dev,
- struct device_attribute *attr, char *buf)
+ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf)
{
if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
return sprintf(buf, "Not affected\n");
On Sun, 2018-02-11 at 19:50 +0100, Ingo Molnar wrote:
>
> From 82c2b2f29691143a05181333f387e786646aa28b Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <[email protected]>
> Date: Sat, 10 Feb 2018 11:51:57 +0100
> Subject: [PATCH] x86/speculation: Clean up various Spectre related details
>
> Harmonize all the Spectre messages so that a:
>
> dmesg | grep -i spectre
>
> ... gives us most Spectre related kernel boot messages.
>
> Also fix a few other details:
>
> - clarify a comment about firmware speculation control
>
> - s/KPTI/PTI
>
> - remove various line-breaks that made the code uglier
>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Arjan van de Ven <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: David Woodhouse <[email protected]>
> Cc: David Woodhouse <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: David Woodhouse <[email protected]>
with a couple of comments:
- * If neither SMEP or KPTI are available, there is a risk of
+ * If neither SMEP or PTI are available, there is a risk of
Make that 'neither SMEP nor PTI' while you're at it though please;
that's bugged me a couple of times in passing.
And should these say 'Spectre v2' not just 'Spectre'?
* David Woodhouse <[email protected]> wrote:
>
>
> On Sun, 2018-02-11 at 19:50 +0100, Ingo Molnar wrote:
> >
> > From 82c2b2f29691143a05181333f387e786646aa28b Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar <[email protected]>
> > Date: Sat, 10 Feb 2018 11:51:57 +0100
> > Subject: [PATCH] x86/speculation: Clean up various Spectre related details
> >
> > Harmonize all the Spectre messages so that a:
> >
> > ??? dmesg | grep -i spectre
> >
> > ... gives us most Spectre related kernel boot messages.
> >
> > Also fix a few other details:
> >
> > ?- clarify a comment about firmware speculation control
> >
> > ?- s/KPTI/PTI
> >
> > ?- remove various line-breaks that made the code uglier
> >
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Arjan van de Ven <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: David Woodhouse <[email protected]>
> > Cc: David Woodhouse <[email protected]>
> > Cc: Greg Kroah-Hartman <[email protected]>
> > Cc: Josh Poimboeuf <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
>
>
>
> Acked-by: David Woodhouse <[email protected]>
Thanks, added.
> with a couple of comments:
>
>
> -????????* If neither SMEP or KPTI are available, there is a risk of
> +????????* If neither SMEP or PTI are available, there is a risk of
>
> Make that 'neither SMEP nor PTI' while you're at it though please;
> that's bugged me a couple of times in passing.
Ok, fixed that too.
>
> And should these say 'Spectre v2' not just 'Spectre'?
Yeah, you are probably right, but I didn't want to make the messages too specific
- do we really know that this is the end of Spectre-style speculation holes?
Thanks,
Ingo
Commit-ID: 670c3e8da87fa4046a55077b1409cf250865a203
Gitweb: https://git.kernel.org/tip/670c3e8da87fa4046a55077b1409cf250865a203
Author: David Woodhouse <[email protected]>
AuthorDate: Sun, 11 Feb 2018 15:19:19 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 11 Feb 2018 19:44:46 +0100
x86/speculation: Use IBRS if available before calling into firmware
Retpoline means the kernel is safe because it has no indirect branches.
But firmware isn't, so use IBRS for firmware calls if it's available.
Signed-off-by: David Woodhouse <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/apm.h | 6 ++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/efi.h | 17 +++++++++++++++--
arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++---------
arch/x86/kernel/cpu/bugs.c | 12 +++++++++++-
drivers/watchdog/hpwdt.c | 3 +++
6 files changed, 64 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h
index 4d4015d..c356098 100644
--- a/arch/x86/include/asm/apm.h
+++ b/arch/x86/include/asm/apm.h
@@ -7,6 +7,8 @@
#ifndef _ASM_X86_MACH_DEFAULT_APM_H
#define _ASM_X86_MACH_DEFAULT_APM_H
+#include <asm/nospec-branch.h>
+
#ifdef APM_ZERO_SEGS
# define APM_DO_ZERO_SEGS \
"pushl %%ds\n\t" \
@@ -32,6 +34,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
* N.B. We do NOT need a cld after the BIOS call
* because we always save and restore the flags.
*/
+ firmware_restrict_branch_speculation_start();
__asm__ __volatile__(APM_DO_ZERO_SEGS
"pushl %%edi\n\t"
"pushl %%ebp\n\t"
@@ -44,6 +47,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in,
"=S" (*esi)
: "a" (func), "b" (ebx_in), "c" (ecx_in)
: "memory", "cc");
+ firmware_restrict_branch_speculation_end();
}
static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
@@ -56,6 +60,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
* N.B. We do NOT need a cld after the BIOS call
* because we always save and restore the flags.
*/
+ firmware_restrict_branch_speculation_start();
__asm__ __volatile__(APM_DO_ZERO_SEGS
"pushl %%edi\n\t"
"pushl %%ebp\n\t"
@@ -68,6 +73,7 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in,
"=S" (si)
: "a" (func), "b" (ebx_in), "c" (ecx_in)
: "memory", "cc");
+ firmware_restrict_branch_speculation_end();
return error;
}
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 73b5fff..66c1434 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -211,6 +211,7 @@
#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */
#define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled */
+#define X86_FEATURE_USE_IBRS_FW ( 7*32+22) /* "" Use IBRS during runtime firmware calls */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb..a399c1e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -6,6 +6,7 @@
#include <asm/pgtable.h>
#include <asm/processor-flags.h>
#include <asm/tlb.h>
+#include <asm/nospec-branch.h>
/*
* We map the EFI regions needed for runtime services non-contiguously,
@@ -36,8 +37,18 @@
extern asmlinkage unsigned long efi_call_phys(void *, ...);
-#define arch_efi_call_virt_setup() kernel_fpu_begin()
-#define arch_efi_call_virt_teardown() kernel_fpu_end()
+#define arch_efi_call_virt_setup() \
+({ \
+ kernel_fpu_begin(); \
+ firmware_restrict_branch_speculation_start(); \
+})
+
+#define arch_efi_call_virt_teardown() \
+({ \
+ firmware_restrict_branch_speculation_end(); \
+ kernel_fpu_end(); \
+})
+
/*
* Wrap all the virtual calls in a way that forces the parameters on the stack.
@@ -73,6 +84,7 @@ struct efi_scratch {
efi_sync_low_kernel_mappings(); \
preempt_disable(); \
__kernel_fpu_begin(); \
+ firmware_restrict_branch_speculation_start(); \
\
if (efi_scratch.use_pgd) { \
efi_scratch.prev_cr3 = __read_cr3(); \
@@ -91,6 +103,7 @@ struct efi_scratch {
__flush_tlb_all(); \
} \
\
+ firmware_restrict_branch_speculation_end(); \
__kernel_fpu_end(); \
preempt_enable(); \
})
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..788c4da 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -162,17 +162,36 @@ static inline void vmexit_fill_RSB(void)
#endif
}
+#define alternative_msr_write(_msr, _val, _feature) \
+ asm volatile(ALTERNATIVE("", \
+ "movl %[msr], %%ecx\n\t" \
+ "movl %[val], %%eax\n\t" \
+ "movl $0, %%edx\n\t" \
+ "wrmsr", \
+ _feature) \
+ : : [msr] "i" (_msr), [val] "i" (_val) \
+ : "eax", "ecx", "edx", "memory")
+
static inline void indirect_branch_prediction_barrier(void)
{
- asm volatile(ALTERNATIVE("",
- "movl %[msr], %%ecx\n\t"
- "movl %[val], %%eax\n\t"
- "movl $0, %%edx\n\t"
- "wrmsr",
- X86_FEATURE_USE_IBPB)
- : : [msr] "i" (MSR_IA32_PRED_CMD),
- [val] "i" (PRED_CMD_IBPB)
- : "eax", "ecx", "edx", "memory");
+ alternative_msr_write(MSR_IA32_PRED_CMD, PRED_CMD_IBPB,
+ X86_FEATURE_USE_IBPB);
+}
+
+/*
+ * With retpoline, we must use IBRS to restrict branch prediction
+ * before calling into firmware.
+ */
+static inline void firmware_restrict_branch_speculation_start(void)
+{
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
+ X86_FEATURE_USE_IBRS_FW);
+}
+
+static inline void firmware_restrict_branch_speculation_end(void)
+{
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+ X86_FEATURE_USE_IBRS_FW);
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 61152aa..6f6d763 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -303,6 +303,15 @@ retpoline_auto:
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
pr_info("Enabling Indirect Branch Prediction Barrier\n");
}
+
+ /*
+ * Retpoline means the kernel is safe because it has no indirect
+ * branches. But firmware isn't, so use IBRS to protect that.
+ */
+ if (boot_cpu_has(X86_FEATURE_IBRS)) {
+ setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
+ pr_info("Enabling Restricted Speculation for firmware calls\n");
+ }
}
#undef pr_fmt
@@ -332,8 +341,9 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2))
return sprintf(buf, "Not affected\n");
- return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
+ return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
+ boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
spectre_v2_module_string());
}
#endif
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 67fbe35..bab3721 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -38,6 +38,7 @@
#endif /* CONFIG_HPWDT_NMI_DECODING */
#include <asm/nmi.h>
#include <asm/frame.h>
+#include <asm/nospec-branch.h>
#define HPWDT_VERSION "1.4.0"
#define SECS_TO_TICKS(secs) ((secs) * 1000 / 128)
@@ -486,11 +487,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
if (!hpwdt_nmi_decoding)
return NMI_DONE;
+ firmware_restrict_branch_speculation_start();
spin_lock_irqsave(&rom_lock, rom_pl);
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(&cmn_regs, cru_rom_addr);
die_nmi_called = 1;
spin_unlock_irqrestore(&rom_lock, rom_pl);
+ firmware_restrict_branch_speculation_end();
if (allow_kdump)
hpwdt_stop();
Commit-ID: 1751342095f0d2b36fa8114d8e12c5688c455ac4
Gitweb: https://git.kernel.org/tip/1751342095f0d2b36fa8114d8e12c5688c455ac4
Author: David Woodhouse <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:22 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 11 Feb 2018 11:24:15 +0100
x86/speculation: Update Speculation Control microcode blacklist
Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
too. We blacklisted the latter purely because it was present with all
the other problematic ones in the 2018-01-08 release, but now it's
explicitly listed as OK.
We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
that appeared in one version of the blacklist and then reverted to
0x80 again. We can change it if 0x84 is actually announced to be safe.
Signed-off-by: David Woodhouse <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/intel.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 319bf98..f73b814 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -123,8 +123,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
{ INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x84 },
{ INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
{ INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
- { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0xc2 },
- { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0xc2 },
{ INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
{ INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
{ INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
@@ -136,8 +134,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
{ INTEL_FAM6_HASWELL_X, 0x02, 0x3b },
{ INTEL_FAM6_HASWELL_X, 0x04, 0x10 },
{ INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a },
- /* Updated in the 20180108 release; blacklist until we know otherwise */
- { INTEL_FAM6_ATOM_GEMINI_LAKE, 0x01, 0x22 },
/* Observed in the wild */
{ INTEL_FAM6_SANDYBRIDGE_X, 0x06, 0x61b },
{ INTEL_FAM6_SANDYBRIDGE_X, 0x07, 0x712 },
Commit-ID: ff37dc0cd96c266c7700386b7ba48abc32a91b1f
Gitweb: https://git.kernel.org/tip/ff37dc0cd96c266c7700386b7ba48abc32a91b1f
Author: KarimAllah Ahmed <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:26 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 11 Feb 2018 11:24:16 +0100
KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap
We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
MSR accesses or create a valid L02 MSR bitmap and use that. This decision
has to be made every time we evaluate whether we are going to generate the
L02 MSR bitmap.
Before commit:
d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
... this was probably OK since the decision was always identical.
This is no longer the case now since the MSR bitmap might actually
change once we decide to not intercept SPEC_CTRL and PRED_CMD.
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kvm/vmx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 599179b..91e3539 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10130,7 +10130,8 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
if (cpu_has_vmx_msr_bitmap() &&
nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS) &&
nested_vmx_merge_msr_bitmap(vcpu, vmcs12))
- ;
+ vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL,
+ CPU_BASED_USE_MSR_BITMAPS);
else
vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
CPU_BASED_USE_MSR_BITMAPS);
Commit-ID: 930ce1a7a55bc0eb8917f453ee22f1b6d67df5cd
Gitweb: https://git.kernel.org/tip/930ce1a7a55bc0eb8917f453ee22f1b6d67df5cd
Author: David Woodhouse <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:23 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 11 Feb 2018 11:24:15 +0100
Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
This reverts commit 64e16720ea0879f8ab4547e3b9758936d483909b.
We cannot call C functions like that, without marking all the
call-clobbered registers as, well, clobbered. We might have got away
with it for now because the __ibp_barrier() function was *fairly*
unlikely to actually use any other registers. But no. Just no.
Signed-off-by: David Woodhouse <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 13 +++++++++----
arch/x86/include/asm/processor.h | 3 ---
arch/x86/kernel/cpu/bugs.c | 6 ------
3 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4d57894..300cc15 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
static inline void indirect_branch_prediction_barrier(void)
{
- alternative_input("",
- "call __ibp_barrier",
- X86_FEATURE_USE_IBPB,
- ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
+ asm volatile(ALTERNATIVE("",
+ "movl %[msr], %%ecx\n\t"
+ "movl %[val], %%eax\n\t"
+ "movl $0, %%edx\n\t"
+ "wrmsr",
+ X86_FEATURE_USE_IBPB)
+ : : [msr] "i" (MSR_IA32_PRED_CMD),
+ [val] "i" (PRED_CMD_IBPB)
+ : "eax", "ecx", "edx", "memory");
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 513f960..99799fb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -969,7 +969,4 @@ bool xen_set_default_idle(void);
void stop_this_cpu(void *dummy);
void df_debug(struct pt_regs *regs, long error_code);
-
-void __ibp_barrier(void);
-
#endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 71949bf..61152aa 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -337,9 +337,3 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
spectre_v2_module_string());
}
#endif
-
-void __ibp_barrier(void)
-{
- __wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
-}
-EXPORT_SYMBOL_GPL(__ibp_barrier);
Commit-ID: fb5b90b795c76e9c10c520fcdb7fe0d7b8334833
Gitweb: https://git.kernel.org/tip/fb5b90b795c76e9c10c520fcdb7fe0d7b8334833
Author: KarimAllah Ahmed <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:25 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 11 Feb 2018 11:24:16 +0100
X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
These two variables should check whether SPEC_CTRL and PRED_CMD are
supposed to be passed through to L2 guests or not. While
msr_write_intercepted_l01 would return 'true' if it is not passed through.
So just invert the result of msr_write_intercepted_l01 to implement the
correct semantics.
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kvm/vmx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bee4c49..599179b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10219,8 +10219,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
* updated to reflect this when L1 (or its L2s) actually write to
* the MSR.
*/
- bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
- bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
+ bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+ bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
!pred_cmd && !spec_ctrl)
Commit-ID: 33f1e899478efb7c77b2b833e7edee1203a24a48
Gitweb: https://git.kernel.org/tip/33f1e899478efb7c77b2b833e7edee1203a24a48
Author: David Woodhouse <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:24 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 11 Feb 2018 11:24:15 +0100
KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods
With retpoline, tight loops of "call this function for every XXX" are
very much pessimised by taking a prediction miss *every* time. This one
is by far the biggest contributor to the guest launch time with retpoline.
By marking the iterator slot_handle_…() functions always_inline, we can
ensure that the indirect function call can be optimised away into a
direct call and it actually generates slightly smaller code because
some of the other conditionals can get optimised away too.
Performance is now pretty close to what we see with nospectre_v2 on
the command line.
Suggested-by: Linus Torvalds <[email protected]>
Tested-by: Filippo Sironi <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Filippo Sironi <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kvm/mmu.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2b8eb4d..cc83bdc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5058,7 +5058,7 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
/* The caller should hold mmu-lock before calling this function. */
-static bool
+static __always_inline bool
slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, int start_level, int end_level,
gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
@@ -5088,7 +5088,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
return flush;
}
-static bool
+static __always_inline bool
slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, int start_level, int end_level,
bool lock_flush_tlb)
@@ -5099,7 +5099,7 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
@@ -5107,7 +5107,7 @@ slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
@@ -5115,7 +5115,7 @@ slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
Hi,
On Sun, Feb 11, 2018 at 11:19:10AM -0800, tip-bot for David Woodhouse wrote:
> x86/speculation: Use IBRS if available before calling into firmware
>
> Retpoline means the kernel is safe because it has no indirect branches.
> But firmware isn't, so use IBRS for firmware calls if it's available.
afaui, so only retpoline means still mitigation not enough.
Also David W has mentioned [1] that even with retpoline, IBPB is also
required (except Sky Lake).
If IBPB & IBRS is not supported by ucode, shouldn't the below indicate
some thing on the lines of Mitigation not enough ?
> - return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
> + return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
> boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
> + boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
> spectre_v2_module_string());
On 4.16-rc1, w/ GCC 7.3.0,
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline
Here for the user (at least for me), it is not clear whether the
mitigation is enough. In the present system (Ivy Bridge), as ucode
update is not available, IBPB is not printed along with
"spectre_v2:Mitigation", so unless i am missing something, till then
this system should be considered vulnerable, but for a user not
familiar with details of the issue, it cannot be deduced.
Perhaps an additional status field [OKAY,PARTIAL] to Mitigation in
sysfs might be helpful. All these changes are in the air for me, this
is from a user perspective, sorry if my feedback seems idiotic.
afzal
[1] lkml.kernel.org/r/[email protected]
On 11/02/2018 00:39, David Woodhouse wrote:
> Using retpoline ensures the kernel is safe because it doesn't contain
> any indirect branches, but firmware still can — and we make calls into
> firmware at runtime. Where the IBRS microcode support is available, use
> that before calling into firmware.
>
> While doing that, I noticed that we were calling C functions without
> telling the compiler about the call-clobbered registers. Stop that.
>
> This also contains the always_inline fix for the performance problem
> introduced by retpoline in KVM code, and fixes some other issues with
> the per-vCPU KVM handling for the SPEC_CTRL MSR.
>
> Finally, update the microcode blacklist to reflect the latest
> information from Intel.
>
> v2: Drop IBRS_ALL patch for the time being
> Add KVM MSR fixes (karahmed)
> Update microcode blacklist
>
>
>
> David Woodhouse (4):
> x86/speculation: Update Speculation Control microcode blacklist
> Revert "x86/speculation: Simplify
> indirect_branch_prediction_barrier()"
> KVM: x86: Reduce retpoline performance impact in
> slot_handle_level_range()
> x86/speculation: Use IBRS if available before calling into firmware
>
> KarimAllah Ahmed (2):
> X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
> KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR
> bitmap
>
> arch/x86/include/asm/apm.h | 6 ++++++
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/efi.h | 17 +++++++++++++++--
> arch/x86/include/asm/nospec-branch.h | 32 ++++++++++++++++++++++++++++----
> arch/x86/include/asm/processor.h | 3 ---
> arch/x86/kernel/cpu/bugs.c | 18 +++++++++++-------
> arch/x86/kernel/cpu/intel.c | 4 ----
> arch/x86/kvm/mmu.c | 10 +++++-----
> arch/x86/kvm/vmx.c | 7 ++++---
> drivers/watchdog/hpwdt.c | 3 +++
> 10 files changed, 73 insertions(+), 28 deletions(-)
>
Acked-by: Paolo Bonzini <[email protected]>
* tip-bot for David Woodhouse <[email protected]> wrote:
> Commit-ID: 670c3e8da87fa4046a55077b1409cf250865a203
> Gitweb: https://git.kernel.org/tip/670c3e8da87fa4046a55077b1409cf250865a203
> Author: David Woodhouse <[email protected]>
> AuthorDate: Sun, 11 Feb 2018 15:19:19 +0000
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Sun, 11 Feb 2018 19:44:46 +0100
>
> x86/speculation: Use IBRS if available before calling into firmware
>
> Retpoline means the kernel is safe because it has no indirect branches.
> But firmware isn't, so use IBRS for firmware calls if it's available.
>
> Signed-off-by: David Woodhouse <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/include/asm/apm.h | 6 ++++++
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/efi.h | 17 +++++++++++++++--
> arch/x86/include/asm/nospec-branch.h | 37 +++++++++++++++++++++++++++---------
> arch/x86/kernel/cpu/bugs.c | 12 +++++++++++-
> drivers/watchdog/hpwdt.c | 3 +++
> 6 files changed, 64 insertions(+), 12 deletions(-)
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> +/*
> + * With retpoline, we must use IBRS to restrict branch prediction
> + * before calling into firmware.
> + */
> +static inline void firmware_restrict_branch_speculation_start(void)
> +{
> + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> + X86_FEATURE_USE_IBRS_FW);
> +}
> +
> +static inline void firmware_restrict_branch_speculation_end(void)
> +{
> + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> + X86_FEATURE_USE_IBRS_FW);
BTW., there's a detail that only occurred to me today, this enabling/disabling
sequence is not NMI safe, and it might be called from NMI context:
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -38,6 +38,7 @@
> #endif /* CONFIG_HPWDT_NMI_DECODING */
> #include <asm/nmi.h>
> #include <asm/frame.h>
> +#include <asm/nospec-branch.h>
>
> #define HPWDT_VERSION "1.4.0"
> #define SECS_TO_TICKS(secs) ((secs) * 1000 / 128)
> @@ -486,11 +487,13 @@ static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
> if (!hpwdt_nmi_decoding)
> return NMI_DONE;
>
> + firmware_restrict_branch_speculation_start();
> spin_lock_irqsave(&rom_lock, rom_pl);
> if (!die_nmi_called && !is_icru && !is_uefi)
> asminline_call(&cmn_regs, cru_rom_addr);
> die_nmi_called = 1;
> spin_unlock_irqrestore(&rom_lock, rom_pl);
> + firmware_restrict_branch_speculation_end();
>
> if (allow_kdump)
> hpwdt_stop();
But ... this is a (comparatively rare) hardware-watchdog tick function, and the
chance of racing with say an EFI call seems minuscule. The race will result in an
EFI call being performed with speculation enabled, sporadically.
Is this something we should worry about?
Thanks,
Ingo
On Sat, Feb 10, 2018 at 11:39:22PM +0000, David Woodhouse wrote:
>Intel have retroactively blessed the 0xc2 microcode on Skylake mobile
>and desktop parts, and the Gemini Lake 0x22 microcode is apparently fine
>too. We blacklisted the latter purely because it was present with all
>the other problematic ones in the 2018-01-08 release, but now it's
>explicitly listed as OK.
>
>We still list 0x84 for the various Kaby Lake / Coffee Lake parts, as
>that appeared in one version of the blacklist and then reverted to
>0x80 again. We can change it if 0x84 is actually announced to be safe.
>
>Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
>---
> arch/x86/kernel/cpu/intel.c | 4 ----
> 1 file changed, 4 deletions(-)
>
>diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
>index 319bf98..f73b814 100644
>--- a/arch/x86/kernel/cpu/intel.c
>+++ b/arch/x86/kernel/cpu/intel.c
>@@ -123,8 +123,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
> { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x84 },
> { INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
> { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
>- { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0xc2 },
>- { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0xc2 },
> { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
> { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
> { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
>@@ -136,8 +134,6 @@ static const struct sku_microcode spectre_bad_microcodes[] = {
> { INTEL_FAM6_HASWELL_X, 0x02, 0x3b },
> { INTEL_FAM6_HASWELL_X, 0x04, 0x10 },
> { INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a },
>- /* Updated in the 20180108 release; blacklist until we know otherwise */
>- { INTEL_FAM6_ATOM_GEMINI_LAKE, 0x01, 0x22 },
> /* Observed in the wild */
> { INTEL_FAM6_SANDYBRIDGE_X, 0x06, 0x61b },
> { INTEL_FAM6_SANDYBRIDGE_X, 0x07, 0x712 },
>--
>2.7.4
>
On Mon, Feb 12, 2018 at 11:22:11AM +0100, Ingo Molnar wrote:
> > +static inline void firmware_restrict_branch_speculation_start(void)
> > +{
> > + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > + X86_FEATURE_USE_IBRS_FW);
> > +}
> > +
> > +static inline void firmware_restrict_branch_speculation_end(void)
> > +{
> > + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > + X86_FEATURE_USE_IBRS_FW);
>
> BTW., there's a detail that only occurred to me today, this enabling/disabling
> sequence is not NMI safe, and it might be called from NMI context:
Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
idea.
On Mon, Feb 12, 2018 at 12:50:02PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 12, 2018 at 11:22:11AM +0100, Ingo Molnar wrote:
> > > +static inline void firmware_restrict_branch_speculation_start(void)
> > > +{
> > > + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > > + X86_FEATURE_USE_IBRS_FW);
> > > +}
> > > +
> > > +static inline void firmware_restrict_branch_speculation_end(void)
> > > +{
> > > + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > > + X86_FEATURE_USE_IBRS_FW);
> >
> > BTW., there's a detail that only occurred to me today, this enabling/disabling
> > sequence is not NMI safe, and it might be called from NMI context:
>
> Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
> idea.
Argh, its that stupid watchdog driver again.. Not only does it call
firmware, it also uses (!raw) spinlock.
On Sun, 2018-02-11 at 20:43 +0100, Ingo Molnar wrote:
> > And should these say 'Spectre v2' not just 'Spectre'?
>
> Yeah, you are probably right, but I didn't want to make the messages too specific
> - do we really know that this is the end of Spectre-style speculation holes?
Well... if a new problem is also remedied by use if IBRS/IBPB and
retpoline, I think we can happily call it a subclass of "Spectre v2".
And if it *isn't* addressed by those same things, then it's clearly
something different. Either way, these messages should be 'v2', no?
On the whole though, there are plenty of better things to be worrying
about :)
On 02/12/2018 02:22 AM, Ingo Molnar wrote:
>> +static inline void firmware_restrict_branch_speculation_end(void)
>> +{
>> + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
>> + X86_FEATURE_USE_IBRS_FW);
> BTW., there's a detail that only occurred to me today, this enabling/disabling
> sequence is not NMI safe, and it might be called from NMI context:
FWIW, Tim Chen and I talked about this a bunch. We ended up just
saving/restoring the MSR verbatim in the NMI handler the same way we do
CR3, stashing it in a high general-purpose-register (r%12?). That costs
a RDMSR (at least) and an WRMSR (which you can optimize out). We have a
patch for that somewhere if anybody wants it.
We also came to the same conclusion that it's a rather challenging thing
to exploit these cases, especially when you consider that we can easily
do RSB stuffing on NMI exit just before going back to the firmware.
On Mon, 2018-02-12 at 11:29 +0530, afzal mohammed wrote:
> Hi,
>
> On Sun, Feb 11, 2018 at 11:19:10AM -0800, tip-bot for David Woodhouse wrote:
>
> >
> > x86/speculation: Use IBRS if available before calling into firmware
> >
> > Retpoline means the kernel is safe because it has no indirect branches.
> > But firmware isn't, so use IBRS for firmware calls if it's available.
>
> afaui, so only retpoline means still mitigation not enough.
>
> Also David W has mentioned [1] that even with retpoline, IBPB is also
> required (except Sky Lake).
Retpoline is sufficient to protect the *kernel*, which is the biggest
target. (Except on Skylake, where IBRS is the only full mitigation and
people are still working trying to come up with a "good enough"
mitigation that isn't IBRS.)
On all CPUs, you need IBPB to protect userspace processes from each
other, although since it's slow we don't actually *do* that for every
context switch; only when switching to non-dumpable processes.
That IBPB requirement for protecting userspace is true even on the next
generation of CPUs with the "Enhanced IBRS" (IBRS_ALL) feature. It only
goes away in CPUs which are even *further* in the future, when Intel
manage to fix it completely in hardware. They haven't even documented
the feature bit they're going to advertise to indicate that fix yet!
> If IBPB & IBRS is not supported by ucode, shouldn't the below indicate
> some thing on the lines of Mitigation not enough ?
>
> >
> > - return sprintf(buf, "%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
> > + return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled],
> > boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "",
> > + boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
> > spectre_v2_module_string());
> On 4.16-rc1, w/ GCC 7.3.0,
>
> /sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
> /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization
> /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline
>
> Here for the user (at least for me), it is not clear whether the
> mitigation is enough. In the present system (Ivy Bridge), as ucode
> update is not available, IBPB is not printed along with
> "spectre_v2:Mitigation", so unless i am missing something, till then
> this system should be considered vulnerable, but for a user not
> familiar with details of the issue, it cannot be deduced.
>
> Perhaps an additional status field [OKAY,PARTIAL] to Mitigation in
> sysfs might be helpful. All these changes are in the air for me, this
> is from a user perspective, sorry if my feedback seems idiotic.
Given that we only do it for non-dumpable processes, it's *always*
going to be only partial. (Although I think Thomas was looking at a
command line option to make that happen on every context switch?)
And on Skylake the current plan is that strictly speaking it would also
be partial.
I understand the concern, but I'm not sure that there's much we can do
to improve it. If it says "Mitigation:" that's generally OK, and if it
says anything else, it's not.
On Mon, 2018-02-12 at 12:50 +0100, Peter Zijlstra wrote:
> On Mon, Feb 12, 2018 at 11:22:11AM +0100, Ingo Molnar wrote:
> > > +static inline void firmware_restrict_branch_speculation_start(void)
> > > +{
> > > + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > > + X86_FEATURE_USE_IBRS_FW);
> > > +}
> > > +
> > > +static inline void firmware_restrict_branch_speculation_end(void)
> > > +{
> > > + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > > + X86_FEATURE_USE_IBRS_FW);
> >
> > BTW., there's a detail that only occurred to me today, this enabling/disabling
> > sequence is not NMI safe, and it might be called from NMI context:
>
> Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
> idea.
And spin_lock_irqsave() too. Which is probably why I missed the fact
that this was being called in NMI context.
Yay for HP and their persistent attempts to "value subtract" in their
firmware offerings.
I'm tempted to drop that part of the patch and declare that if you're
using this driver, the potential for stray branch prediction when you
call into the firmware from the NMI handler is the *least* of your
problems.
I *will* go back over the other parts of the patch and audit them for
preempt safety though; there could potentially be a similar issue
there. I think I put them close enough to the actual firmware calls
that if we aren't already preempt-safe then we were screwed anyway, but
*maybe* there's merit in making the macros explicitly bump the preempt
count anyway.
On Mon, Feb 12, 2018 at 12:27:19PM +0000, David Woodhouse wrote:
> On Mon, 2018-02-12 at 12:50 +0100, Peter Zijlstra wrote:
> > Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
> > idea.
>
> And spin_lock_irqsave() too. Which is probably why I missed the fact
> that this was being called in NMI context.
>
> Yay for HP and their persistent attempts to "value subtract" in their
> firmware offerings.
>
> I'm tempted to drop that part of the patch and declare that if you're
> using this driver, the potential for stray branch prediction when you
> call into the firmware from the NMI handler is the *least* of your
> problems.
We should probably mark it BROKEN or something, or move it into staging.
On Mon, 12 Feb 2018, David Woodhouse wrote:
> On Sat, 2018-02-10 at 23:39 +0000, David Woodhouse wrote:
> >
> > --- a/arch/x86/kernel/cpu/intel.c
> > +++ b/arch/x86/kernel/cpu/intel.c
> > @@ -123,8 +123,6 @@ static const struct sku_microcode
> > spectre_bad_microcodes[] = {
> > { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x84 },
> > { INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
> > { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
> > - { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0xc2 },
> > - { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0xc2 },
> > { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
> > { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
> > { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
>
> Arjan points out that the SKYLAKE_DESKTOP one there is premature. There
> are *two* rows in Intel's table which match that CPUID (506E3).
>
> Only *one* of them ("Skylake H/S") has cleared the 0xC2 microcode for
> use, while the "Skylake E3" line still doesn't approve it. (But doesn't
> explicitly list it in the "STOP deploying" column any more either,
> which it probably should, and might have helped me notice.)
>
> Ingo, Thomas: do you want to drop this patch which is already in
> tip/x86/pti and have a new version with the SKYLAKE_DESKTOP no longer
> removed? Or shall I send an incremental patch to add it back?
Delta patch please.
On Sat, 2018-02-10 at 23:39 +0000, David Woodhouse wrote:
>
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -123,8 +123,6 @@ static const struct sku_microcode
> spectre_bad_microcodes[] = {
> { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x84 },
> { INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e },
> { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c },
> - { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0xc2 },
> - { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0xc2 },
> { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 },
> { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b },
> { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 },
Arjan points out that the SKYLAKE_DESKTOP one there is premature. There
are *two* rows in Intel's table which match that CPUID (506E3).
Only *one* of them ("Skylake H/S") has cleared the 0xC2 microcode for
use, while the "Skylake E3" line still doesn't approve it. (But doesn't
explicitly list it in the "STOP deploying" column any more either,
which it probably should, and might have helped me notice.)
Ingo, Thomas: do you want to drop this patch which is already in
tip/x86/pti and have a new version with the SKYLAKE_DESKTOP no longer
removed? Or shall I send an incremental patch to add it back?
On Mon, Feb 12, 2018 at 08:13:31AM -0800, Dave Hansen wrote:
> On 02/12/2018 02:22 AM, Ingo Molnar wrote:
> >> +static inline void firmware_restrict_branch_speculation_end(void)
> >> +{
> >> + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> >> + X86_FEATURE_USE_IBRS_FW);
> > BTW., there's a detail that only occurred to me today, this enabling/disabling
> > sequence is not NMI safe, and it might be called from NMI context:
>
> FWIW, Tim Chen and I talked about this a bunch. We ended up just
> saving/restoring the MSR verbatim in the NMI handler the same way we do
> CR3, stashing it in a high general-purpose-register (r%12?). That costs
> a RDMSR (at least) and an WRMSR (which you can optimize out). We have a
> patch for that somewhere if anybody wants it.
I would really rather not do that on the NMI path.. And if we _have_ to,
please keep a software shadow of that MSR state, such that we can avoid
touching that MSR 99% of the time.
* Peter Zijlstra <[email protected]> wrote:
> On Mon, Feb 12, 2018 at 08:13:31AM -0800, Dave Hansen wrote:
> > On 02/12/2018 02:22 AM, Ingo Molnar wrote:
> > >> +static inline void firmware_restrict_branch_speculation_end(void)
> > >> +{
> > >> + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > >> + X86_FEATURE_USE_IBRS_FW);
> > > BTW., there's a detail that only occurred to me today, this enabling/disabling
> > > sequence is not NMI safe, and it might be called from NMI context:
> >
> > FWIW, Tim Chen and I talked about this a bunch. We ended up just
> > saving/restoring the MSR verbatim in the NMI handler the same way we do
> > CR3, stashing it in a high general-purpose-register (r%12?). That costs
> > a RDMSR (at least) and an WRMSR (which you can optimize out). We have a
> > patch for that somewhere if anybody wants it.
>
> I would really rather not do that on the NMI path.. And if we _have_ to,
> please keep a software shadow of that MSR state, such that we can avoid
> touching that MSR 99% of the time.
Yeah, I'd rather avoid doing firmware calls from NMI context altogether.
Thanks,
Ingo
* David Woodhouse <[email protected]> wrote:
> On Mon, 2018-02-12 at 12:50 +0100, Peter Zijlstra wrote:
> > On Mon, Feb 12, 2018 at 11:22:11AM +0100, Ingo Molnar wrote:
> > > > +static inline void firmware_restrict_branch_speculation_start(void)
> > > > +{
> > > > +???alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > > > +???????????????????????? X86_FEATURE_USE_IBRS_FW);
> > > > +}
> > > > +
> > > > +static inline void firmware_restrict_branch_speculation_end(void)
> > > > +{
> > > > +???alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > > > +???????????????????????? X86_FEATURE_USE_IBRS_FW);
> > >?
> > > BTW., there's a detail that only occurred to me today, this enabling/disabling?
> > > sequence is not NMI safe, and it might be called from NMI context:
> >
> > Wait, we're doing firmware from NMI? That sounds like a _REALLY_ bad
> > idea.
>
> And spin_lock_irqsave() too. Which is probably why I missed the fact
> that this was being called in NMI context.
>
> Yay for HP and their persistent attempts to "value subtract" in their
> firmware offerings.
>
> I'm tempted to drop that part of the patch and declare that if you're
> using this driver, the potential for stray branch prediction when you
> call into the firmware from the NMI handler is the *least* of your
> problems.
>
> I *will* go back over the other parts of the patch and audit them for
> preempt safety though; there could potentially be a similar issue
> there. I think I put them close enough to the actual firmware calls
> that if we aren't already preempt-safe then we were screwed anyway, but
> *maybe* there's merit in making the macros explicitly bump the preempt
> count anyway.
Ok, meanwhile I'm removing this patch from the x86/pti branch, and since the
branch has to be rebased anyway, I'll merge these into a single patch:
85d8426e0720: x86/speculation: Correct Speculation Control microcode blacklist again
1751342095f0: x86/speculation: Update Speculation Control microcode blacklist
Thanks,
Ingo
* Paolo Bonzini <[email protected]> wrote:
> On 11/02/2018 00:39, David Woodhouse wrote:
> > Using retpoline ensures the kernel is safe because it doesn't contain
> > any indirect branches, but firmware still can — and we make calls into
> > firmware at runtime. Where the IBRS microcode support is available, use
> > that before calling into firmware.
> >
> > While doing that, I noticed that we were calling C functions without
> > telling the compiler about the call-clobbered registers. Stop that.
> >
> > This also contains the always_inline fix for the performance problem
> > introduced by retpoline in KVM code, and fixes some other issues with
> > the per-vCPU KVM handling for the SPEC_CTRL MSR.
> >
> > Finally, update the microcode blacklist to reflect the latest
> > information from Intel.
> >
> > v2: Drop IBRS_ALL patch for the time being
> > Add KVM MSR fixes (karahmed)
> > Update microcode blacklist
> >
> >
> >
> > David Woodhouse (4):
> > x86/speculation: Update Speculation Control microcode blacklist
> > Revert "x86/speculation: Simplify
> > indirect_branch_prediction_barrier()"
> > KVM: x86: Reduce retpoline performance impact in
> > slot_handle_level_range()
> > x86/speculation: Use IBRS if available before calling into firmware
> >
> > KarimAllah Ahmed (2):
> > X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
> > KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR
> > bitmap
> >
> > arch/x86/include/asm/apm.h | 6 ++++++
> > arch/x86/include/asm/cpufeatures.h | 1 +
> > arch/x86/include/asm/efi.h | 17 +++++++++++++++--
> > arch/x86/include/asm/nospec-branch.h | 32 ++++++++++++++++++++++++++++----
> > arch/x86/include/asm/processor.h | 3 ---
> > arch/x86/kernel/cpu/bugs.c | 18 +++++++++++-------
> > arch/x86/kernel/cpu/intel.c | 4 ----
> > arch/x86/kvm/mmu.c | 10 +++++-----
> > arch/x86/kvm/vmx.c | 7 ++++---
> > drivers/watchdog/hpwdt.c | 3 +++
> > 10 files changed, 73 insertions(+), 28 deletions(-)
> >
>
> Acked-by: Paolo Bonzini <[email protected]>
Thanks - I've added your Ack to the three KVM patches.
Thanks,
Ingo
* David Woodhouse <[email protected]> wrote:
> On Sun, 2018-02-11 at 20:43 +0100, Ingo Molnar wrote:
> > > And should these say 'Spectre v2' not just 'Spectre'?
> >
> > Yeah, you are probably right, but I didn't want to make the messages too specific?
> > - do we really know that this is the end of Spectre-style speculation holes?
>
> Well... if a new problem is also remedied by use if IBRS/IBPB and
> retpoline, I think we can happily call it a subclass of "Spectre v2".
>
> And if it *isn't* addressed by those same things, then it's clearly
> something different. Either way, these messages should be 'v2', no?
Ok, fair enough - I've changed it to v2 as you suggest:
- pr_info("Filling RSB on context switch\n");
+ pr_info("Spectre v2 mitigation: Filling RSB on context switch\n");
- pr_info("Enabling Indirect Branch Prediction Barrier\n");
+ pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n");
> On the whole though, there are plenty of better things to be worrying
> about :)
Sure - nevertheless I fixed these while they were still hot ;-)
Thanks,
Ingo
Commit-ID: 206587a9fb764d71f035dc7f6d3b6488f5d5b304
Gitweb: https://git.kernel.org/tip/206587a9fb764d71f035dc7f6d3b6488f5d5b304
Author: KarimAllah Ahmed <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:25 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 13 Feb 2018 09:00:06 +0100
X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs
These two variables should check whether SPEC_CTRL and PRED_CMD are
supposed to be passed through to L2 guests or not. While
msr_write_intercepted_l01 would return 'true' if it is not passed through.
So just invert the result of msr_write_intercepted_l01 to implement the
correct semantics.
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kvm/vmx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bee4c49..599179b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10219,8 +10219,8 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu,
* updated to reflect this when L1 (or its L2s) actually write to
* the MSR.
*/
- bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
- bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
+ bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
+ bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
!pred_cmd && !spec_ctrl)
Commit-ID: 3712caeb14dcb33fb4d5114f14c0beef10aca101
Gitweb: https://git.kernel.org/tip/3712caeb14dcb33fb4d5114f14c0beef10aca101
Author: KarimAllah Ahmed <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:26 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 13 Feb 2018 09:00:17 +0100
KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap
We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
MSR accesses or create a valid L02 MSR bitmap and use that. This decision
has to be made every time we evaluate whether we are going to generate the
L02 MSR bitmap.
Before commit:
d28b387fb74d ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
... this was probably OK since the decision was always identical.
This is no longer the case now since the MSR bitmap might actually
change once we decide to not intercept SPEC_CTRL and PRED_CMD.
Signed-off-by: KarimAllah Ahmed <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kvm/vmx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 599179b..91e3539 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10130,7 +10130,8 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu,
if (cpu_has_vmx_msr_bitmap() &&
nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS) &&
nested_vmx_merge_msr_bitmap(vcpu, vmcs12))
- ;
+ vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL,
+ CPU_BASED_USE_MSR_BITMAPS);
else
vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL,
CPU_BASED_USE_MSR_BITMAPS);
Commit-ID: 928a4c39484281f8ca366f53a1db79330d058401
Gitweb: https://git.kernel.org/tip/928a4c39484281f8ca366f53a1db79330d058401
Author: David Woodhouse <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:24 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 13 Feb 2018 08:59:45 +0100
KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods
With retpoline, tight loops of "call this function for every XXX" are
very much pessimised by taking a prediction miss *every* time. This one
is by far the biggest contributor to the guest launch time with retpoline.
By marking the iterator slot_handle_…() functions always_inline, we can
ensure that the indirect function call can be optimised away into a
direct call and it actually generates slightly smaller code because
some of the other conditionals can get optimised away too.
Performance is now pretty close to what we see with nospectre_v2 on
the command line.
Suggested-by: Linus Torvalds <[email protected]>
Tested-by: Filippo Sironi <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Filippo Sironi <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kvm/mmu.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2b8eb4d..cc83bdc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5058,7 +5058,7 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
/* The caller should hold mmu-lock before calling this function. */
-static bool
+static __always_inline bool
slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, int start_level, int end_level,
gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
@@ -5088,7 +5088,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
return flush;
}
-static bool
+static __always_inline bool
slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, int start_level, int end_level,
bool lock_flush_tlb)
@@ -5099,7 +5099,7 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
@@ -5107,7 +5107,7 @@ slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
@@ -5115,7 +5115,7 @@ slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
}
-static bool
+static __always_inline bool
slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
slot_level_handler fn, bool lock_flush_tlb)
{
Commit-ID: f208820a321f9b23d77d7eed89945d862d62a3ed
Gitweb: https://git.kernel.org/tip/f208820a321f9b23d77d7eed89945d862d62a3ed
Author: David Woodhouse <[email protected]>
AuthorDate: Sat, 10 Feb 2018 23:39:23 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 13 Feb 2018 08:59:00 +0100
Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"
This reverts commit 64e16720ea0879f8ab4547e3b9758936d483909b.
We cannot call C functions like that, without marking all the
call-clobbered registers as, well, clobbered. We might have got away
with it for now because the __ibp_barrier() function was *fairly*
unlikely to actually use any other registers. But no. Just no.
Signed-off-by: David Woodhouse <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 13 +++++++++----
arch/x86/include/asm/processor.h | 3 ---
arch/x86/kernel/cpu/bugs.c | 6 ------
3 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4d57894..300cc15 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
static inline void indirect_branch_prediction_barrier(void)
{
- alternative_input("",
- "call __ibp_barrier",
- X86_FEATURE_USE_IBPB,
- ASM_NO_INPUT_CLOBBER("eax", "ecx", "edx", "memory"));
+ asm volatile(ALTERNATIVE("",
+ "movl %[msr], %%ecx\n\t"
+ "movl %[val], %%eax\n\t"
+ "movl $0, %%edx\n\t"
+ "wrmsr",
+ X86_FEATURE_USE_IBPB)
+ : : [msr] "i" (MSR_IA32_PRED_CMD),
+ [val] "i" (PRED_CMD_IBPB)
+ : "eax", "ecx", "edx", "memory");
}
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 513f960..99799fb 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -969,7 +969,4 @@ bool xen_set_default_idle(void);
void stop_this_cpu(void *dummy);
void df_debug(struct pt_regs *regs, long error_code);
-
-void __ibp_barrier(void);
-
#endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 71949bf..61152aa 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -337,9 +337,3 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
spectre_v2_module_string());
}
#endif
-
-void __ibp_barrier(void)
-{
- __wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0);
-}
-EXPORT_SYMBOL_GPL(__ibp_barrier);
On Tue, Feb 13, 2018 at 12:58:21AM -0800, tip-bot for David Woodhouse wrote:
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
>
> static inline void indirect_branch_prediction_barrier(void)
> {
> + asm volatile(ALTERNATIVE("",
> + "movl %[msr], %%ecx\n\t"
> + "movl %[val], %%eax\n\t"
> + "movl $0, %%edx\n\t"
> + "wrmsr",
> + X86_FEATURE_USE_IBPB)
> + : : [msr] "i" (MSR_IA32_PRED_CMD),
> + [val] "i" (PRED_CMD_IBPB)
> + : "eax", "ecx", "edx", "memory");
> }
Joe Konno pointed out that we now need the below line too, because we're
using MSR_IA32_PRED_CMD in this header.
With the existing code that's not a problem per-se, but my objtool
retpoline annotation things did do stumble over this.
Do we want to fold it into the objtool annotation patch or have it
separate?
---
arch/x86/include/asm/nospec-branch.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc159b4a0..76b058533e47 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -6,6 +6,7 @@
#include <asm/alternative.h>
#include <asm/alternative-asm.h>
#include <asm/cpufeatures.h>
+#include <asm/msr-index.h>
#ifdef __ASSEMBLY__
* Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Feb 13, 2018 at 12:58:21AM -0800, tip-bot for David Woodhouse wrote:
>
> > --- a/arch/x86/include/asm/nospec-branch.h
> > +++ b/arch/x86/include/asm/nospec-branch.h
> > @@ -164,10 +164,15 @@ static inline void vmexit_fill_RSB(void)
> >
> > static inline void indirect_branch_prediction_barrier(void)
> > {
> > + asm volatile(ALTERNATIVE("",
> > + "movl %[msr], %%ecx\n\t"
> > + "movl %[val], %%eax\n\t"
> > + "movl $0, %%edx\n\t"
> > + "wrmsr",
> > + X86_FEATURE_USE_IBPB)
> > + : : [msr] "i" (MSR_IA32_PRED_CMD),
> > + [val] "i" (PRED_CMD_IBPB)
> > + : "eax", "ecx", "edx", "memory");
> > }
>
> Joe Konno pointed out that we now need the below line too, because we're
> using MSR_IA32_PRED_CMD in this header.
>
> With the existing code that's not a problem per-se, but my objtool
> retpoline annotation things did do stumble over this.
>
> Do we want to fold it into the objtool annotation patch or have it
> separate?
Separate would be better, it makes sense and is one problem less to worry about?
Thanks,
Ingo
On Tue, Feb 13, 2018 at 12:28:38PM +0100, Ingo Molnar wrote:
> Separate would be better, it makes sense and is one problem less to worry about?
Something like so then? I'm not entirely sure which commit wants to fo
in Fixes, I picked the earlier one, but it could equally have been:
Fixes: f208820a321f ("Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"")
---
Subject: x86/speculation: Add msr-index.h
From: Peter Zijlstra <[email protected]>
Date: Tue, 13 Feb 2018 10:41:32 +0100
Joe Konno reported a compile failure resulting from using an MSR
without inclusion of msr-index.h, and while the current code builds
fine (by accident) this needs fixing for future patches.
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 20ffa1caecca ("x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support")
Reported-by: Joe Konno <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 1 +
1 file changed, 1 insertion(+)
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -6,6 +6,7 @@
#include <asm/alternative.h>
#include <asm/alternative-asm.h>
#include <asm/cpufeatures.h>
+#include <asm/msr-index.h>
#ifdef __ASSEMBLY__
* Peter Zijlstra <[email protected]> wrote:
> On Tue, Feb 13, 2018 at 12:28:38PM +0100, Ingo Molnar wrote:
>
> > Separate would be better, it makes sense and is one problem less to worry about?
>
> Something like so then?
Yeah, perfect!
> I'm not entirely sure which commit wants to fo in Fixes, I picked the earlier
> one, but it could equally have been:
>
> Fixes: f208820a321f ("Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()"")
I'll leave the tag out (in case a rebase has to be done), as these commits are
close enough to each other and are also in a single backporting block of commits.
Thanks,
Ingo
Commit-ID: 30d97b098534d1d35b198531870356a29cc066c2
Gitweb: https://git.kernel.org/tip/30d97b098534d1d35b198531870356a29cc066c2
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 13 Feb 2018 14:28:19 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 13 Feb 2018 14:39:34 +0100
x86/speculation: Add <asm/msr-index.h> dependency
Joe Konno reported a compile failure resulting from using an MSR
without inclusion of <asm/msr-index.h>, and while the current code builds
fine (by accident) this needs fixing for future patches.
Reported-by: Joe Konno <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 20ffa1caecca ("x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..76b0585 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -6,6 +6,7 @@
#include <asm/alternative.h>
#include <asm/alternative-asm.h>
#include <asm/cpufeatures.h>
+#include <asm/msr-index.h>
#ifdef __ASSEMBLY__
On 02/12/2018 11:55 PM, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Mon, Feb 12, 2018 at 08:13:31AM -0800, Dave Hansen wrote:
>>> On 02/12/2018 02:22 AM, Ingo Molnar wrote:
>>>>> +static inline void firmware_restrict_branch_speculation_end(void)
>>>>> +{
>>>>> + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
>>>>> + X86_FEATURE_USE_IBRS_FW);
>>>> BTW., there's a detail that only occurred to me today, this enabling/disabling
>>>> sequence is not NMI safe, and it might be called from NMI context:
>>>
>>> FWIW, Tim Chen and I talked about this a bunch. We ended up just
>>> saving/restoring the MSR verbatim in the NMI handler the same way we do
>>> CR3, stashing it in a high general-purpose-register (r%12?). That costs
>>> a RDMSR (at least) and an WRMSR (which you can optimize out). We have a
>>> patch for that somewhere if anybody wants it.
>>
>> I would really rather not do that on the NMI path.. And if we _have_ to,
>> please keep a software shadow of that MSR state, such that we can avoid
>> touching that MSR 99% of the time.
>
> Yeah, I'd rather avoid doing firmware calls from NMI context altogether.
>
> Thanks,
>
> Ingo
>
Dave Hansen and I had some discussions on how to handle the nested NMI
and firmware calls. We thought of using
a per cpu counter to record the nesting call depth
and toggle IBRS appropriately for the depth 0->1 and 1->0
transition. Will this change be sufficient?
Thanks.
Tim
commit 55546c27a006198630c57b900abcbd3baaabf63a
Author: Tim Chen <[email protected]>
Date: Tue Feb 13 04:10:41 2018 -0800
x86/firmware: Prevent IBRS from being turned off prematurely.
Dave Woodhoue proposed to use IBRS to protect the firmware
call path against Spectre exploit. However, firmware path
can go through NMI and we can get nested calls, causing
unsafe firmware calls with missing IBRS as illustrated below:
firmware_restrict_branch_speculation_start (set IBRS=1)
NMI
firmware_restrict_branch_speculation_start (set IBRS=1)
firmware call
firmware_restrict_branch_speculation_end (set IBRS=0)
NMI return
firmware call (with IBRS=0) <---- unsafe call, premature IBRS disabling
firmware_restrict_branch_speculation_end (set IBRS=0)
This patch proposes using a per cpu counter to track the IBRS
firmware call nesting depth, to ensure that we don't turn off
IBRS prematurely before calling firmware.
Signed-off-by: Tim Chen <[email protected]>
---
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 297d457..1e9c828 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -146,6 +146,8 @@ enum spectre_v2_mitigation {
extern char __indirect_thunk_start[];
extern char __indirect_thunk_end[];
+DECLARE_PER_CPU(int, spec_ctrl_ibrs_fw_depth);
+
/*
* On VMEXIT we must ensure that no RSB predictions learned in the guest
* can be followed in the host, by overwriting the RSB completely. Both
@@ -186,14 +188,16 @@ static inline void indirect_branch_prediction_barrier(void)
*/
static inline void firmware_restrict_branch_speculation_start(void)
{
- alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
+ if (this_cpu_inc_return(spec_ctrl_ibrs_fw_depth) == 1)
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
X86_FEATURE_USE_IBRS_FW);
}
static inline void firmware_restrict_branch_speculation_end(void)
{
- alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
- X86_FEATURE_USE_IBRS_FW);
+ if (this_cpu_dec_return(spec_ctrl_ibrs_fw_depth) == 0)
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+ X86_FEATURE_USE_IBRS_FW);
}
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index c994dab..4ab13f0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -27,6 +27,8 @@
#include <asm/intel-family.h>
static void __init spectre_v2_select_mitigation(void);
+DEFINE_PER_CPU(int, spec_ctrl_ibrs_fw_depth);
+EXPORT_PER_CPU_SYMBOL(spec_ctrl_ibrs_fw_depth);
void __init check_bugs(void)
{
On Wed, Feb 14, 2018 at 09:56:14AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 13, 2018 at 05:49:47PM -0800, Tim Chen wrote:
>
> > static inline void firmware_restrict_branch_speculation_start(void)
> > {
> > + if (this_cpu_inc_return(spec_ctrl_ibrs_fw_depth) == 1)
> > + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> > X86_FEATURE_USE_IBRS_FW);
> > }
> >
> > static inline void firmware_restrict_branch_speculation_end(void)
> > {
> > + if (this_cpu_dec_return(spec_ctrl_ibrs_fw_depth) == 0)
> > + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> > + X86_FEATURE_USE_IBRS_FW);
> > }
>
>
> At the very least this must disable and re-enable preemption, such that
> we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
> actually are preemptible so that wouldn't work.
>
> Further, consider:
>
> this_cpu_inc_return() // 0->1
> <NMI>
> this_cpu_inc_return() // 1->2
> call_broken_arse_firmware()
> this_cpu_dec_return() // 2->1
> </NMI>
> wrmsr(SPEC_CTRL, IBRS);
>
> /* from dodgy firmware crap */
s/from/more/
typing hard.
> this_cpu_dec_return() // 1->0
> wrmsr(SPEC_CTRL, 0);
On Tue, Feb 13, 2018 at 05:49:47PM -0800, Tim Chen wrote:
> static inline void firmware_restrict_branch_speculation_start(void)
> {
> + if (this_cpu_inc_return(spec_ctrl_ibrs_fw_depth) == 1)
> + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
> X86_FEATURE_USE_IBRS_FW);
> }
>
> static inline void firmware_restrict_branch_speculation_end(void)
> {
> + if (this_cpu_dec_return(spec_ctrl_ibrs_fw_depth) == 0)
> + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
> + X86_FEATURE_USE_IBRS_FW);
> }
At the very least this must disable and re-enable preemption, such that
we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
actually are preemptible so that wouldn't work.
Further, consider:
this_cpu_inc_return() // 0->1
<NMI>
this_cpu_inc_return() // 1->2
call_broken_arse_firmware()
this_cpu_dec_return() // 2->1
</NMI>
wrmsr(SPEC_CTRL, IBRS);
/* from dodgy firmware crap */
this_cpu_dec_return() // 1->0
wrmsr(SPEC_CTRL, 0);
* Tim Chen <[email protected]> wrote:
> Dave Hansen and I had some discussions on how to handle the nested NMI and
> firmware calls. We thought of using a per cpu counter to record the nesting
> call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 transition.
> Will this change be sufficient?
Yeah, so I think the first question should be: does the firmware call from NMI
context make sense to begin with?
Because in this particular case it does not appear to be so: the reason for the
BIOS/firmware call appears to be to determine how we nmi_panic() after receiving
an NMI that no other NMI handler handled: with a passive-aggressive "I don't know"
panic message or with a slightly more informative panic message.
That's not a real value-add to users - so we can avoid all these complications by
applying the patch below:
drivers/watchdog/hpwdt.c | 30 ++++--------------------------
1 file changed, 4 insertions(+), 26 deletions(-)
As a bonus the spinlock use can be removed as well.
Thanks,
Ingo
====================>
From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Wed, 14 Feb 2018 10:24:41 +0100
Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
NMI context
Taking a spinlock and calling into the firmware are problematic things to
do from NMI callbacks.
It also seems completely pointless in this particular case:
- cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
callback, but there's almost no use for it: we use it only to determine
whether to panic with an 'unknown NMI' or a slightly more descriptive
message.
- cmn_regs and rom_lock is not used anywhere else, other than early detection
of the firmware capability.
So remove rom_lock, make the cmn_regs call from the detection routine only,
and thus make the NMI callback a lot more robust.
Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/watchdog/hpwdt.c | 30 ++++--------------------------
1 file changed, 4 insertions(+), 26 deletions(-)
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index f1f00dfc0e68..2544fe482fe3 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
static unsigned int allow_kdump = 1;
static unsigned int is_icru;
static unsigned int is_uefi;
-static DEFINE_SPINLOCK(rom_lock);
static void *cru_rom_addr;
-static struct cmn_registers cmn_regs;
extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
unsigned long *pRomEntry);
@@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
unsigned long physical_bios_base = 0;
unsigned long physical_bios_offset = 0;
int retval = -ENODEV;
+ struct cmn_registers cmn_regs = { };
bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
@@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
*/
static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
{
- unsigned long rom_pl;
- static int die_nmi_called;
-
if (!hpwdt_nmi_decoding)
return NMI_DONE;
if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
return NMI_DONE;
- spin_lock_irqsave(&rom_lock, rom_pl);
- if (!die_nmi_called && !is_icru && !is_uefi)
- asminline_call(&cmn_regs, cru_rom_addr);
- die_nmi_called = 1;
- spin_unlock_irqrestore(&rom_lock, rom_pl);
-
if (allow_kdump)
hpwdt_stop();
- if (!is_icru && !is_uefi) {
- if (cmn_regs.u1.ral == 0) {
- nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
- return NMI_HANDLED;
- }
- }
- nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
- "for the NMI is logged in any one of the following "
+ nmi_panic(regs,
+ "An NMI occurred. Depending on your system the reason "
+ "for the NMI might be logged in any one of the following "
"resources:\n"
"1. Integrated Management Log (IML)\n"
"2. OA Syslog\n"
@@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
HPWDT_ARCH);
return retval;
}
-
- /*
- * We know this is the only CRU call we need to make so lets keep as
- * few instructions as possible once the NMI comes in.
- */
- cmn_regs.u1.rah = 0x0D;
- cmn_regs.u1.ral = 0x02;
}
/*
On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> Because in this particular case it does not appear to be so: the reason for the
> BIOS/firmware call appears to be to determine how we nmi_panic() after receiving
> an NMI that no other NMI handler handled: with a passive-aggressive "I don't know"
> panic message or with a slightly more informative panic message.
However much I like just ripping all that out, I think the ROM call
actually does that logging, or that is how I read things.
If you look at the original Changelog for that driver:
Hp is providing a Hardware WatchDog Timer driver that will only work with the
specific HW Timer located in the HP ProLiant iLO 2 ASIC. The iLO 2 HW Timer
will generate a Non-maskable Interrupt (NMI) 9 seconds before physically
resetting the server, by removing power, so that the event can be logged to
the HP Integrated Management Log (IML), a Non-Volatile Random Access Memory
(NVRAM). The logging of the event is performed using the HP ProLiant ROM via
an Industry Standard access known as a BIOS Service Directory Entry.
+ Jerry
On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
>
> * Tim Chen <[email protected]> wrote:
>
> > Dave Hansen and I had some discussions on how to handle the nested NMI and
> > firmware calls. We thought of using a per cpu counter to record the nesting
> > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 transition.
> > Will this change be sufficient?
>
> Yeah, so I think the first question should be: does the firmware call from NMI
> context make sense to begin with?
>
> Because in this particular case it does not appear to be so: the reason for the
> BIOS/firmware call appears to be to determine how we nmi_panic() after receiving
> an NMI that no other NMI handler handled: with a passive-aggressive "I don't know"
> panic message or with a slightly more informative panic message.
>
> That's not a real value-add to users - so we can avoid all these complications by
> applying the patch below:
>
> drivers/watchdog/hpwdt.c | 30 ++++--------------------------
> 1 file changed, 4 insertions(+), 26 deletions(-)
>
> As a bonus the spinlock use can be removed as well.
>
> Thanks,
>
> Ingo
>
> ====================>
> From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <[email protected]>
> Date: Wed, 14 Feb 2018 10:24:41 +0100
> Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
> NMI context
>
> Taking a spinlock and calling into the firmware are problematic things to
> do from NMI callbacks.
>
> It also seems completely pointless in this particular case:
>
> - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
> callback, but there's almost no use for it: we use it only to determine
> whether to panic with an 'unknown NMI' or a slightly more descriptive
> message.
>
> - cmn_regs and rom_lock is not used anywhere else, other than early detection
> of the firmware capability.
>
> So remove rom_lock, make the cmn_regs call from the detection routine only,
> and thus make the NMI callback a lot more robust.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> drivers/watchdog/hpwdt.c | 30 ++++--------------------------
> 1 file changed, 4 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> index f1f00dfc0e68..2544fe482fe3 100644
> --- a/drivers/watchdog/hpwdt.c
> +++ b/drivers/watchdog/hpwdt.c
> @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
> static unsigned int allow_kdump = 1;
> static unsigned int is_icru;
> static unsigned int is_uefi;
> -static DEFINE_SPINLOCK(rom_lock);
> static void *cru_rom_addr;
> -static struct cmn_registers cmn_regs;
>
> extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
> unsigned long *pRomEntry);
> @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
> unsigned long physical_bios_base = 0;
> unsigned long physical_bios_offset = 0;
> int retval = -ENODEV;
> + struct cmn_registers cmn_regs = { };
>
> bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
>
> @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
> */
> static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
> {
> - unsigned long rom_pl;
> - static int die_nmi_called;
> -
> if (!hpwdt_nmi_decoding)
> return NMI_DONE;
>
> if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
> return NMI_DONE;
>
> - spin_lock_irqsave(&rom_lock, rom_pl);
> - if (!die_nmi_called && !is_icru && !is_uefi)
> - asminline_call(&cmn_regs, cru_rom_addr);
> - die_nmi_called = 1;
> - spin_unlock_irqrestore(&rom_lock, rom_pl);
> -
> if (allow_kdump)
> hpwdt_stop();
>
> - if (!is_icru && !is_uefi) {
> - if (cmn_regs.u1.ral == 0) {
> - nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
> - return NMI_HANDLED;
> - }
> - }
> - nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> - "for the NMI is logged in any one of the following "
> + nmi_panic(regs,
> + "An NMI occurred. Depending on your system the reason "
> + "for the NMI might be logged in any one of the following "
> "resources:\n"
> "1. Integrated Management Log (IML)\n"
> "2. OA Syslog\n"
> @@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
> HPWDT_ARCH);
> return retval;
> }
> -
> - /*
> - * We know this is the only CRU call we need to make so lets keep as
> - * few instructions as possible once the NMI comes in.
> - */
> - cmn_regs.u1.rah = 0x0D;
> - cmn_regs.u1.ral = 0x02;
> }
>
> /*
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
* Peter Zijlstra <[email protected]> wrote:
> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> > Because in this particular case it does not appear to be so: the reason for the
> > BIOS/firmware call appears to be to determine how we nmi_panic() after receiving
> > an NMI that no other NMI handler handled: with a passive-aggressive "I don't know"
> > panic message or with a slightly more informative panic message.
>
> However much I like just ripping all that out, I think the ROM call
> actually does that logging, or that is how I read things.
>
> If you look at the original Changelog for that driver:
>
> Hp is providing a Hardware WatchDog Timer driver that will only work with the
> specific HW Timer located in the HP ProLiant iLO 2 ASIC. The iLO 2 HW Timer
> will generate a Non-maskable Interrupt (NMI) 9 seconds before physically
> resetting the server, by removing power, so that the event can be logged to
> the HP Integrated Management Log (IML), a Non-Volatile Random Access Memory
> (NVRAM). The logging of the event is performed using the HP ProLiant ROM via
> an Industry Standard access known as a BIOS Service Directory Entry.
Ok, that appears to be the case, too bad.
But the good news: if this callback is executed only once per system lifetime then
we don't actually have to perform *any* modification on this driver, right? The
reason is that this callback will panic unconditionally after performing the BIOS
call. The control flow to the panic is unconditional:
spin_lock_irqsave(&rom_lock, rom_pl);
if (!die_nmi_called && !is_icru && !is_uefi)
asminline_call(&cmn_regs, cru_rom_addr);
...
if (!is_icru && !is_uefi) {
if (cmn_regs.u1.ral == 0) {
nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
...
nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
"for the NMI is logged in any one of the following "
"resources:\n"
"1. Integrated Management Log (IML)\n"
"2. OA Syslog\n"
"3. OA Forward Progress Log\n"
"4. iLO Event Log");
This callback does not get executed when we get perf NMIs, correct?
Ingo
Ingo,
I have a patch set under review that brings hpwdt into compliance
with the watchdog core.
One of the changes removes the callback into firmware in hpwdt_pretimeout
and its associated spinlock.
https://lkml.org/lkml/2018/2/12/30
I will add you to the CC list of the next version of the set.
Jerry
On Wed, Feb 14, 2018 at 10:44:05AM +0100, Borislav Petkov wrote:
> + Jerry
>
> On Wed, Feb 14, 2018 at 10:31:59AM +0100, Ingo Molnar wrote:
> >
> > * Tim Chen <[email protected]> wrote:
> >
> > > Dave Hansen and I had some discussions on how to handle the nested NMI and
> > > firmware calls. We thought of using a per cpu counter to record the nesting
> > > call depth and toggle IBRS appropriately for the depth 0->1 and 1->0 transition.
> > > Will this change be sufficient?
> >
> > Yeah, so I think the first question should be: does the firmware call from NMI
> > context make sense to begin with?
> >
> > Because in this particular case it does not appear to be so: the reason for the
> > BIOS/firmware call appears to be to determine how we nmi_panic() after receiving
> > an NMI that no other NMI handler handled: with a passive-aggressive "I don't know"
> > panic message or with a slightly more informative panic message.
> >
> > That's not a real value-add to users - so we can avoid all these complications by
> > applying the patch below:
> >
> > drivers/watchdog/hpwdt.c | 30 ++++--------------------------
> > 1 file changed, 4 insertions(+), 26 deletions(-)
> >
> > As a bonus the spinlock use can be removed as well.
> >
> > Thanks,
> >
> > Ingo
> >
> > ====================>
> > From b038428a739a3fcf0b9678305c131f60af7422ca Mon Sep 17 00:00:00 2001
> > From: Ingo Molnar <[email protected]>
> > Date: Wed, 14 Feb 2018 10:24:41 +0100
> > Subject: [PATCH] watchdog: hpwdt: Remove spinlock acquire and BIOS calls from
> > NMI context
> >
> > Taking a spinlock and calling into the firmware are problematic things to
> > do from NMI callbacks.
> >
> > It also seems completely pointless in this particular case:
> >
> > - cmn_regs is updated by the firmware call in the hpwdt_pretimeout() NMI
> > callback, but there's almost no use for it: we use it only to determine
> > whether to panic with an 'unknown NMI' or a slightly more descriptive
> > message.
> >
> > - cmn_regs and rom_lock is not used anywhere else, other than early detection
> > of the firmware capability.
> >
> > So remove rom_lock, make the cmn_regs call from the detection routine only,
> > and thus make the NMI callback a lot more robust.
> >
> > Signed-off-by: Ingo Molnar <[email protected]>
> > ---
> > drivers/watchdog/hpwdt.c | 30 ++++--------------------------
> > 1 file changed, 4 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
> > index f1f00dfc0e68..2544fe482fe3 100644
> > --- a/drivers/watchdog/hpwdt.c
> > +++ b/drivers/watchdog/hpwdt.c
> > @@ -150,9 +150,7 @@ static unsigned int hpwdt_nmi_decoding;
> > static unsigned int allow_kdump = 1;
> > static unsigned int is_icru;
> > static unsigned int is_uefi;
> > -static DEFINE_SPINLOCK(rom_lock);
> > static void *cru_rom_addr;
> > -static struct cmn_registers cmn_regs;
> >
> > extern asmlinkage void asminline_call(struct cmn_registers *pi86Regs,
> > unsigned long *pRomEntry);
> > @@ -225,6 +223,7 @@ static int cru_detect(unsigned long map_entry,
> > unsigned long physical_bios_base = 0;
> > unsigned long physical_bios_offset = 0;
> > int retval = -ENODEV;
> > + struct cmn_registers cmn_regs = { };
> >
> > bios32_map = ioremap(map_entry, (2 * PAGE_SIZE));
> >
> > @@ -486,32 +485,18 @@ static int hpwdt_my_nmi(void)
> > */
> > static int hpwdt_pretimeout(unsigned int ulReason, struct pt_regs *regs)
> > {
> > - unsigned long rom_pl;
> > - static int die_nmi_called;
> > -
> > if (!hpwdt_nmi_decoding)
> > return NMI_DONE;
> >
> > if ((ulReason == NMI_UNKNOWN) && !hpwdt_my_nmi())
> > return NMI_DONE;
> >
> > - spin_lock_irqsave(&rom_lock, rom_pl);
> > - if (!die_nmi_called && !is_icru && !is_uefi)
> > - asminline_call(&cmn_regs, cru_rom_addr);
> > - die_nmi_called = 1;
> > - spin_unlock_irqrestore(&rom_lock, rom_pl);
> > -
> > if (allow_kdump)
> > hpwdt_stop();
> >
> > - if (!is_icru && !is_uefi) {
> > - if (cmn_regs.u1.ral == 0) {
> > - nmi_panic(regs, "An NMI occurred, but unable to determine source.\n");
> > - return NMI_HANDLED;
> > - }
> > - }
> > - nmi_panic(regs, "An NMI occurred. Depending on your system the reason "
> > - "for the NMI is logged in any one of the following "
> > + nmi_panic(regs,
> > + "An NMI occurred. Depending on your system the reason "
> > + "for the NMI might be logged in any one of the following "
> > "resources:\n"
> > "1. Integrated Management Log (IML)\n"
> > "2. OA Syslog\n"
> > @@ -744,13 +729,6 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
> > HPWDT_ARCH);
> > return retval;
> > }
> > -
> > - /*
> > - * We know this is the only CRU call we need to make so lets keep as
> > - * few instructions as possible once the NMI comes in.
> > - */
> > - cmn_regs.u1.rah = 0x0D;
> > - cmn_regs.u1.ral = 0x02;
> > }
> >
> > /*
>
> --
> Regards/Gruss,
> Boris.
>
> Good mailing practices for 400: avoid top-posting and trim the reply.
--
-----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett Packard Enterprise
-----------------------------------------------------------------------------
On 02/14/2018 12:56 AM, Peter Zijlstra wrote:
>
> At the very least this must disable and re-enable preemption, such that
> we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
> actually are preemptible so that wouldn't work.
>
> Further, consider:
>
> this_cpu_inc_return() // 0->1
> <NMI>
> this_cpu_inc_return() // 1->2
> call_broken_arse_firmware()
> this_cpu_dec_return() // 2->1
> </NMI>
> wrmsr(SPEC_CTRL, IBRS);
>
> /* from dodgy firmware crap */
>
> this_cpu_dec_return() // 1->0
> wrmsr(SPEC_CTRL, 0);
>
How about the following patch.
Thanks.
Tim
---
From a37d28622781acf2789dd63f2fdb57be733f15a4 Mon Sep 17 00:00:00 2001
From: Tim Chen <[email protected]>
Date: Tue, 13 Feb 2018 04:10:41 -0800
Subject: [PATCH] x86/firmware: Prevent IBRS from being turned off prematurely.
Dave Woodhoue proposed using IBRS to protect the firmware
call path against Spectre exploit. However, firmware path
can go through NMI and we can get nested calls, causing
unsafe firmware calls with missing IBRS as illustrated below:
firmware_restrict_branch_speculation_start (set IBRS=1)
NMI
firmware_restrict_branch_speculation_start (set IBRS=1)
firmware call
firmware_restrict_branch_speculation_end (set IBRS=0)
NMI return
firmware call (with IBRS=0) <---- unsafe call, premature IBRS disabling
firmware_restrict_branch_speculation_end (set IBRS=0)
This patch proposes using a per cpu counter to track the IBRS
firmware call nesting depth, to ensure that we don't turn off
IBRS prematurely before calling firmware.
Signed-off-by: Tim Chen <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 10 ++++++++--
arch/x86/kernel/cpu/bugs.c | 2 ++
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 297d457..a8dd9ea 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -146,6 +146,8 @@ enum spectre_v2_mitigation {
extern char __indirect_thunk_start[];
extern char __indirect_thunk_end[];
+DECLARE_PER_CPU(int, spec_ctrl_ibrs_fw_depth);
+
/*
* On VMEXIT we must ensure that no RSB predictions learned in the guest
* can be followed in the host, by overwriting the RSB completely. Both
@@ -186,14 +188,18 @@ static inline void indirect_branch_prediction_barrier(void)
*/
static inline void firmware_restrict_branch_speculation_start(void)
{
+ preempt_disable();
+ this_cpu_inc(spec_ctrl_ibrs_fw_depth);
alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
X86_FEATURE_USE_IBRS_FW);
}
static inline void firmware_restrict_branch_speculation_end(void)
{
- alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
- X86_FEATURE_USE_IBRS_FW);
+ if (this_cpu_dec_return(spec_ctrl_ibrs_fw_depth) == 0)
+ alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
+ X86_FEATURE_USE_IBRS_FW);
+ preempt_enable();
}
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index c994dab..4ab13f0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -27,6 +27,8 @@
#include <asm/intel-family.h>
static void __init spectre_v2_select_mitigation(void);
+DEFINE_PER_CPU(int, spec_ctrl_ibrs_fw_depth);
+EXPORT_PER_CPU_SYMBOL(spec_ctrl_ibrs_fw_depth);
void __init check_bugs(void)
{
--
2.7.4
* Jerry Hoemann <[email protected]> wrote:
>
> Ingo,
>
> I have a patch set under review that brings hpwdt into compliance
> with the watchdog core.
>
> One of the changes removes the callback into firmware in hpwdt_pretimeout
> and its associated spinlock.
>
> https://lkml.org/lkml/2018/2/12/30
drivers/watchdog/hpwdt.c | 490 +----------------------------------------------
1 file changed, 8 insertions(+), 482 deletions(-)
Very nice, and this should solve all the NMI handling complications!
> I will add you to the CC list of the next version of the set.
Thanks!
Ingo
* Tim Chen <[email protected]> wrote:
> On 02/14/2018 12:56 AM, Peter Zijlstra wrote:
>
> >
> > At the very least this must disable and re-enable preemption, such that
> > we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
> > actually are preemptible so that wouldn't work.
> >
> > Further, consider:
> >
> > this_cpu_inc_return() // 0->1
> > <NMI>
> > this_cpu_inc_return() // 1->2
> > call_broken_arse_firmware()
> > this_cpu_dec_return() // 2->1
> > </NMI>
> > wrmsr(SPEC_CTRL, IBRS);
> >
> > /* from dodgy firmware crap */
> >
> > this_cpu_dec_return() // 1->0
> > wrmsr(SPEC_CTRL, 0);
> >
>
> How about the following patch.
These fragile complications of the interface should now be unnecessary, as the
only driver that called firmware from NMI callbacks (hpwdt.c) is going to remove
those firmware callbacks in the near future - solving the problem at the source.
Thanks,
Ingo
Commit-ID: ea00f301285ea2f07393678cd2b6057878320c9d
Gitweb: https://git.kernel.org/tip/ea00f301285ea2f07393678cd2b6057878320c9d
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 13 Feb 2018 14:28:19 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 15 Feb 2018 01:15:51 +0100
x86/speculation: Add <asm/msr-index.h> dependency
Joe Konno reported a compile failure resulting from using an MSR
without inclusion of <asm/msr-index.h>, and while the current code builds
fine (by accident) this needs fixing for future patches.
Reported-by: Joe Konno <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 20ffa1caecca ("x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 300cc15..76b0585 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -6,6 +6,7 @@
#include <asm/alternative.h>
#include <asm/alternative-asm.h>
#include <asm/cpufeatures.h>
+#include <asm/msr-index.h>
#ifdef __ASSEMBLY__
On 02/14/2018 03:19 PM, Ingo Molnar wrote:
>
> * Tim Chen <[email protected]> wrote:
>
>> On 02/14/2018 12:56 AM, Peter Zijlstra wrote:
>>
>>>
>>> At the very least this must disable and re-enable preemption, such that
>>> we guarantee we inc/dec the same counter. ISTR some firmware calls (EFI)
>>> actually are preemptible so that wouldn't work.
>>>
>>> Further, consider:
>>>
>>> this_cpu_inc_return() // 0->1
>>> <NMI>
>>> this_cpu_inc_return() // 1->2
>>> call_broken_arse_firmware()
>>> this_cpu_dec_return() // 2->1
>>> </NMI>
>>> wrmsr(SPEC_CTRL, IBRS);
>>>
>>> /* from dodgy firmware crap */
>>>
>>> this_cpu_dec_return() // 1->0
>>> wrmsr(SPEC_CTRL, 0);
>>>
>>
>> How about the following patch.
>
> These fragile complications of the interface should now be unnecessary, as the
> only driver that called firmware from NMI callbacks (hpwdt.c) is going to remove
> those firmware callbacks in the near future - solving the problem at the source.
>
> Thanks,
>
> Ingo
>
Sounds good. I sent this out before seeing the other mails on removing NMI callbacks
from hpwdt.c
Tim
On Thu, Feb 15, 2018 at 12:17:04AM +0100, Ingo Molnar wrote:
>
> * Jerry Hoemann <[email protected]> wrote:
>
> >
> > Ingo,
> >
> > I have a patch set under review that brings hpwdt into compliance
> > with the watchdog core.
> >
> > One of the changes removes the callback into firmware in hpwdt_pretimeout
> > and its associated spinlock.
> >
> > https://lkml.org/lkml/2018/2/12/30
>
> drivers/watchdog/hpwdt.c | 490 +----------------------------------------------
> 1 file changed, 8 insertions(+), 482 deletions(-)
>
> Very nice, and this should solve all the NMI handling complications!
>
> > I will add you to the CC list of the next version of the set.
>
> Thanks!
>
> Ingo
Ingo,
Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
related to David Woodhouse patch set:
https://lkml.org/lkml/2018/2/14/305 ?
Which I think are to mitigate performance issues resulting from the
changes to address the specter/meltdown?
Any there other changes to hpwdt needed related to this work?
Thanks
Jerry
--
-----------------------------------------------------------------------------
Jerry Hoemann Software Engineer Hewlett Packard Enterprise
-----------------------------------------------------------------------------
* Jerry Hoemann <[email protected]> wrote:
> On Thu, Feb 15, 2018 at 12:17:04AM +0100, Ingo Molnar wrote:
> >
> > * Jerry Hoemann <[email protected]> wrote:
> >
> > >
> > > Ingo,
> > >
> > > I have a patch set under review that brings hpwdt into compliance
> > > with the watchdog core.
> > >
> > > One of the changes removes the callback into firmware in hpwdt_pretimeout
> > > and its associated spinlock.
> > >
> > > https://lkml.org/lkml/2018/2/12/30
> >
> > drivers/watchdog/hpwdt.c | 490 +----------------------------------------------
> > 1 file changed, 8 insertions(+), 482 deletions(-)
> >
> > Very nice, and this should solve all the NMI handling complications!
> >
> > > I will add you to the CC list of the next version of the set.
> >
> > Thanks!
> >
> > Ingo
>
>
> Ingo,
>
> Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
> related to David Woodhouse patch set:
>
> https://lkml.org/lkml/2018/2/14/305 ?
>
> Which I think are to mitigate performance issues resulting from the
> changes to address the specter/meltdown?
So the motivation was that we are trying to wrap BIOS/EFI calls into
Spectre-disabling sections - and while doing that we realized that hpwdt calls the
firmware from an NMI callback, which would have complicated the Spectre work..
But with that call removed from NMI context it's all perfect!
> Any there other changes to hpwdt needed related to this work?
No other changes needed I think!
Thanks,
Ingo
On Thu, Feb 15, 2018 at 10:44:44AM -0700, Jerry Hoemann wrote:
> Is your desire to remove of the firmware callback/spinlock in hpwdt_pretimeout
> related to David Woodhouse patch set:
That's the work that made us find this code, but no, even without that,
code like that is entirely dodgy. NMI code needs to be very careful and
firmware just isn't something I trust to get things right. Worse, its
not something we can fix.
And using spnilock from NMI context is just wrong, if anything it needs
be raw_spnilock but even then, yuck.
On 02/11/2018 11:19 AM, tip-bot for David Woodhouse wrote:
\
> })
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 300cc15..788c4da 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -162,17 +162,36 @@ static inline void vmexit_fill_RSB(void)
> #endif
> }
>
> +#define alternative_msr_write(_msr, _val, _feature) \
> + asm volatile(ALTERNATIVE("", \
> + "movl %[msr], %%ecx\n\t" \
> + "movl %[val], %%eax\n\t" \
> + "movl $0, %%edx\n\t" \
> + "wrmsr", \
> + _feature) \
> + : : [msr] "i" (_msr), [val] "i" (_val) \
> + : "eax", "ecx", "edx", "memory")
> +
I encountered hang on a machine but not others when using the above
macro. It is probably an alignment thing with ALTERNATIVE as the problem went
away after I made the change below:
Tim
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 8f2ff74..0f65bd2 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
#define alternative_msr_write(_msr, _val, _feature) \
asm volatile(ALTERNATIVE("", \
+ ".align 16\n\t" \
"movl %[msr], %%ecx\n\t" \
"movl %[val], %%eax\n\t" \
"movl $0, %%edx\n\t" \
On Fri, 2018-02-16 at 10:44 -0800, Tim Chen wrote:
>
> I encountered hang on a machine but not others when using the above
> macro. It is probably an alignment thing with ALTERNATIVE as the
> problem went
> away after I made the change below:
>
> Tim
>
> diff --git a/arch/x86/include/asm/nospec-branch.h
> b/arch/x86/include/asm/nospec-branch.h
> index 8f2ff74..0f65bd2 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
>
> #define alternative_msr_write(_msr, _val, _feature) \
> asm volatile(ALTERNATIVE("", \
> + ".align 16\n\t" \
> "movl %[msr], %%ecx\n\t" \
> "movl %[val], %%eax\n\t" \
> "movl $0, %%edx\n\t" \
That's weird. Note that .align in an altinstr section isn't actually
going to do what you'd expect; the oldinstr and altinstr sections
aren't necessarily aligned the same, so however many NOPs it inserts
into the alternative, might be deliberately *misaligning* it in the
code that actually gets executed.
Are you sure you're not running a kernel where the alternatives code
would turn that alternative which *starts* with a NOP, into *all* NOPs?
On 02/16/2018 11:16 AM, David Woodhouse wrote:
> On Fri, 2018-02-16 at 10:44 -0800, Tim Chen wrote:
>>
>> I encountered hang on a machine but not others when using the above
>> macro. It is probably an alignment thing with ALTERNATIVE as the
>> problem went
>> away after I made the change below:
>>
>> Tim
>>
>> diff --git a/arch/x86/include/asm/nospec-branch.h
>> b/arch/x86/include/asm/nospec-branch.h
>> index 8f2ff74..0f65bd2 100644
>> --- a/arch/x86/include/asm/nospec-branch.h
>> +++ b/arch/x86/include/asm/nospec-branch.h
>> @@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
>>
>> #define alternative_msr_write(_msr, _val, _feature) \
>> asm volatile(ALTERNATIVE("", \
>> + ".align 16\n\t" \
>> "movl %[msr], %%ecx\n\t" \
>> "movl %[val], %%eax\n\t" \
>> "movl $0, %%edx\n\t" \
>
> That's weird. Note that .align in an altinstr section isn't actually
> going to do what you'd expect; the oldinstr and altinstr sections
> aren't necessarily aligned the same, so however many NOPs it inserts
> into the alternative, might be deliberately *misaligning* it in the
> code that actually gets executed.
>
> Are you sure you're not running a kernel where the alternatives code
> would turn that alternative which *starts* with a NOP, into *all* NOPs?
>
I rebuild the kernel again without the align. I'm no longer
seeing the issue again on that machine that had an issue earlier.
So let's ignore this for now as I can't reproduce the problem.
It should be other issues causing the hang I saw earlier.
Thanks.
Tim
* Tim Chen <[email protected]> wrote:
> On 02/16/2018 11:16 AM, David Woodhouse wrote:
> > On Fri, 2018-02-16 at 10:44 -0800, Tim Chen wrote:
> >>
> >> I encountered hang on a machine but not others when using the above
> >> macro. It is probably an alignment thing with ALTERNATIVE as the
> >> problem went
> >> away after I made the change below:
> >>
> >> Tim
> >>
> >> diff --git a/arch/x86/include/asm/nospec-branch.h
> >> b/arch/x86/include/asm/nospec-branch.h
> >> index 8f2ff74..0f65bd2 100644
> >> --- a/arch/x86/include/asm/nospec-branch.h
> >> +++ b/arch/x86/include/asm/nospec-branch.h
> >> @@ -148,6 +148,7 @@ extern char __indirect_thunk_end[];
> >>
> >> #define alternative_msr_write(_msr, _val, _feature) \
> >> asm volatile(ALTERNATIVE("", \
> >> + ".align 16\n\t" \
> >> "movl %[msr], %%ecx\n\t" \
> >> "movl %[val], %%eax\n\t" \
> >> "movl $0, %%edx\n\t" \
> >
> > That's weird. Note that .align in an altinstr section isn't actually
> > going to do what you'd expect; the oldinstr and altinstr sections
> > aren't necessarily aligned the same, so however many NOPs it inserts
> > into the alternative, might be deliberately *misaligning* it in the
> > code that actually gets executed.
> >
> > Are you sure you're not running a kernel where the alternatives code
> > would turn that alternative which *starts* with a NOP, into *all* NOPs?
> >
>
> I rebuild the kernel again without the align. I'm no longer
> seeing the issue again on that machine that had an issue earlier.
> So let's ignore this for now as I can't reproduce the problem.
>
> It should be other issues causing the hang I saw earlier.
Note that PeterZ was struggling with intermittent boot hangs yesterday as well,
which hangs came and went during severeal (fruitless) bisection attempts. Then at
a certain point the hangs went away altogether.
The symptoms for both his and your hangs are consistent with an alignment
dependent bug.
My other guess is that it's perhaps somehow microcode related?
I'm not seeing any hangs myself, on various test systems.
Thanks,
Ingo
On Sat, Feb 17, 2018 at 11:26:16AM +0100, Ingo Molnar wrote:
> Note that PeterZ was struggling with intermittent boot hangs yesterday as well,
> which hangs came and went during severeal (fruitless) bisection attempts. Then at
> a certain point the hangs went away altogether.
>
> The symptoms for both his and your hangs are consistent with an alignment
> dependent bug.
Mine would consistently hang right after
"Freeing SMP alternatives memory: 44K"
At one point I bisected it to commit:
a06cc94f3f8d ("x86/build: Drop superfluous ALIGN from the linker script")
But shortly thereafter I started having trouble reproducing, and now I
can run kernels that before would consistently fail to boot :/
> My other guess is that it's perhaps somehow microcode related?
I did not update or otherwise change packages while I was bisecting; the
machine is:
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping : 4
microcode : 0x428
Like I wrote on IRC; what _seems_ to have 'cured' things is clearing out
my /boot. The amount of kernels generated by the bisect was immense and
running 'update-grub' was taking _much_ longer than actually building a
new kernel.
What I have not tried is again generating and installing that many
kernels to see if that will make it go 'funny' again.
On Mon, 2018-02-19 at 10:20 +0100, Peter Zijlstra wrote:
>
> I did not update or otherwise change packages while I was bisecting; the
> machine is:
>
> vendor_id : GenuineIntel
> cpu family : 6
> model : 62
> model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> stepping : 4
> microcode : 0x428
That's IVX with a microcode that doesn't *have* IBRS/IBPB. I don't
think there's a publicly available microcode that does; I assume you
didn't have one and build it into your kernel for early loading, and
thus you really weren't even using IBRS here? The code never even gets
patched in?
* Peter Zijlstra <[email protected]> wrote:
> On Sat, Feb 17, 2018 at 11:26:16AM +0100, Ingo Molnar wrote:
> > Note that PeterZ was struggling with intermittent boot hangs yesterday as well,
> > which hangs came and went during severeal (fruitless) bisection attempts. Then at
> > a certain point the hangs went away altogether.
> >
> > The symptoms for both his and your hangs are consistent with an alignment
> > dependent bug.
>
> Mine would consistently hang right after
>
> "Freeing SMP alternatives memory: 44K"
>
> At one point I bisected it to commit:
>
> a06cc94f3f8d ("x86/build: Drop superfluous ALIGN from the linker script")
So, just to make sure this commit had no effect: I cannot really see anything
wrong with that commit, it does a single substantial change, which is to remove
this explicit alignment:
. = ALIGN(8);
TEXT_TEXT
which seems fine to me, since it expanded to:
. = ALIGN(8);
ALIGN_FUNCTION(); \
...
which expanded to:
. = ALIGN(8);
. = ALIGN(8);
...
... which duplication the commit removed.
... where all the relevant defitions of TEXT_TEXT and ALIGN_FUNCTION are
unconditional and not overriden anywhere for arch/x86 builds.
I.e. the commit is a NOP AFAICS.
Thanks,
Ingo
* David Woodhouse <[email protected]> wrote:
> On Mon, 2018-02-19 at 10:20 +0100, Peter Zijlstra wrote:
> >
> > I did not update or otherwise change packages while I was bisecting; the
> > machine is:
> >
> > vendor_id?????? : GenuineIntel
> > cpu family????? : 6
> > model?????????? : 62
> > model name????? : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> > stepping??????? : 4
> > microcode?????? : 0x428
>
> That's IVX with a microcode that doesn't *have* IBRS/IBPB. I don't
> think there's a publicly available microcode that does; I assume you
> didn't have one and build it into your kernel for early loading, and
> thus you really weren't even using IBRS here? The code never even gets
> patched in?
Note that PeterZ's boot troubles only match the *symptoms* of the spurious
failures reported by Tim Chen. Your commit wasn't bisected to.
I linked these two reports on the (remote) possibility that they might be related
via some alignment dependent bug somewhere else in the x86 kernel - possibly
completely unrelated to any IBRS/IBPB details.
Thanks,
Ingo
On Mon, 2018-02-19 at 10:39 +0100, Ingo Molnar wrote:
> * David Woodhouse <[email protected]> wrote:
>
> >
> > On Mon, 2018-02-19 at 10:20 +0100, Peter Zijlstra wrote:
> > >
> > >
> > > I did not update or otherwise change packages while I was bisecting; the
> > > machine is:
> > >
> > > vendor_id : GenuineIntel
> > > cpu family : 6
> > > model : 62
> > > model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> > > stepping : 4
> > > microcode : 0x428
> > That's IVX with a microcode that doesn't *have* IBRS/IBPB. I don't
> > think there's a publicly available microcode that does; I assume you
> > didn't have one and build it into your kernel for early loading, and
> > thus you really weren't even using IBRS here? The code never even gets
> > patched in?
> Note that PeterZ's boot troubles only match the *symptoms* of the spurious
> failures reported by Tim Chen. Your commit wasn't bisected to.
>
> I linked these two reports on the (remote) possibility that they might be related
> via some alignment dependent bug somewhere else in the x86 kernel - possibly
> completely unrelated to any IBRS/IBPB details.
Understood. I was merely *accentuating* the "completely unrelated to
any IBRS/IBPB" bit, in a "la la la I'm ignoring this because I'm
working on other things" kind of a way... :)
On Mon, Feb 19, 2018 at 09:29:12AM +0000, David Woodhouse wrote:
> On Mon, 2018-02-19 at 10:20 +0100, Peter Zijlstra wrote:
> >
> > I did not update or otherwise change packages while I was bisecting; the
> > machine is:
> >
> > vendor_id?????? : GenuineIntel
> > cpu family????? : 6
> > model?????????? : 62
> > model name????? : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> > stepping??????? : 4
> > microcode?????? : 0x428
>
> That's IVX with a microcode that doesn't *have* IBRS/IBPB. I don't
> think there's a publicly available microcode that does; I assume you
> didn't have one and build it into your kernel for early loading, and
> thus you really weren't even using IBRS here? The code never even gets
> patched in?
I wasn't using IRBS afaik. I was bisceting tip/master before the
IBRS/firmware patches (not sure they're in now or not).
No fancy ucode setup, I think I have ucode image in the initrd, but that
would be same for all kernels.