2024-04-11 05:42:04

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 0/7] x86/bugs: BHI fixes / improvements

An assortment of BHI fixes.

Josh Poimboeuf (7):
x86/bugs: BHI documentation fixes
x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
x86/bugs: Fix BHI handling of RRSBA
x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
x86/bugs: Only harden syscalls when needed
x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with
CONFIG_MITIGATION_SPECTRE_BHI

Documentation/admin-guide/hw-vuln/spectre.rst | 22 ++---
.../admin-guide/kernel-parameters.txt | 12 +--
arch/x86/Kconfig | 22 +----
arch/x86/entry/common.c | 30 +++++-
arch/x86/entry/syscall_32.c | 11 +--
arch/x86/entry/syscall_64.c | 8 +-
arch/x86/entry/syscall_x32.c | 7 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/syscall.h | 8 +-
arch/x86/kernel/cpu/bugs.c | 98 +++++++++++--------
10 files changed, 119 insertions(+), 100 deletions(-)

--
2.44.0



2024-04-11 05:42:12

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 1/7] x86/bugs: BHI documentation fixes

Fix up some inaccuracies in the BHI documentation.

Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
---
Documentation/admin-guide/hw-vuln/spectre.rst | 15 ++++++++-------
Documentation/admin-guide/kernel-parameters.txt | 12 +++++++-----
2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index b70b1d8bd8e6..3cf18e4a1d9a 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -439,11 +439,11 @@ The possible values in this file are:
- System is protected by retpoline
* - BHI: BHI_DIS_S
- System is protected by BHI_DIS_S
- * - BHI: SW loop; KVM SW loop
+ * - BHI: SW loop, KVM SW loop
- System is protected by software clearing sequence
* - BHI: Syscall hardening
- Syscalls are hardened against BHI
- * - BHI: Syscall hardening; KVM: SW loop
+ * - BHI: Syscall hardening, KVM: SW loop
- System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence

Full mitigation might require a microcode update from the CPU
@@ -666,13 +666,14 @@ kernel command line.
of the HW BHI control and the SW BHB clearing sequence.

on
- unconditionally enable.
+ (default) Enable the HW or SW mitigation as
+ needed.
off
- unconditionally disable.
+ Disable the mitigation.
auto
- enable if hardware mitigation
- control(BHI_DIS_S) is available, otherwise
- enable alternate mitigation in KVM.
+ Enable the HW mitigation if needed, but
+ *don't* enable the SW mitigation except for KVM.
+ The system may be vulnerable.

For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 70046a019d42..a029ad6c4963 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3444,6 +3444,7 @@
retbleed=off [X86]
spec_rstack_overflow=off [X86]
spec_store_bypass_disable=off [X86,PPC]
+ spectre_bhi=off [X86]
spectre_v2_user=off [X86]
srbds=off [X86,INTEL]
ssbd=force-off [ARM64]
@@ -6069,11 +6070,12 @@
deployment of the HW BHI control and the SW BHB
clearing sequence.

- on - unconditionally enable.
- off - unconditionally disable.
- auto - (default) enable hardware mitigation
- (BHI_DIS_S) if available, otherwise enable
- alternate mitigation in KVM.
+ on - (default) Enable the HW or SW mitigation
+ as needed.
+ off - Disable the mitigation.
+ auto - Enable the HW mitigation if needed, but
+ *don't* enable the SW mitigation except
+ for KVM. The system may be vulnerable.

spectre_v2= [X86,EARLY] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
--
2.44.0


2024-04-11 05:43:13

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 2/7] x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES

There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
over. It's even read in the BHI sysfs function which is a big no-no.
Just read it once and cache it.

Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 295463707e68..27d6d64eeec3 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -61,6 +61,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
EXPORT_SYMBOL_GPL(x86_pred_cmd);

+static u64 __ro_after_init ia32_cap;
+
static DEFINE_MUTEX(spec_ctrl_mutex);

void (*x86_return_thunk)(void) __ro_after_init = __x86_return_thunk;
@@ -144,6 +146,8 @@ void __init cpu_select_mitigations(void)
x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
}

+ ia32_cap = x86_read_arch_cap_msr();
+
/* Select the proper CPU mitigations before patching alternatives: */
spectre_v1_select_mitigation();
spectre_v2_select_mitigation();
@@ -301,8 +305,6 @@ static const char * const taa_strings[] = {

static void __init taa_select_mitigation(void)
{
- u64 ia32_cap;
-
if (!boot_cpu_has_bug(X86_BUG_TAA)) {
taa_mitigation = TAA_MITIGATION_OFF;
return;
@@ -341,7 +343,6 @@ static void __init taa_select_mitigation(void)
* On MDS_NO=1 CPUs if ARCH_CAP_TSX_CTRL_MSR is not set, microcode
* update is required.
*/
- ia32_cap = x86_read_arch_cap_msr();
if ( (ia32_cap & ARCH_CAP_MDS_NO) &&
!(ia32_cap & ARCH_CAP_TSX_CTRL_MSR))
taa_mitigation = TAA_MITIGATION_UCODE_NEEDED;
@@ -401,8 +402,6 @@ static const char * const mmio_strings[] = {

static void __init mmio_select_mitigation(void)
{
- u64 ia32_cap;
-
if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) ||
boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN) ||
cpu_mitigations_off()) {
@@ -413,8 +412,6 @@ static void __init mmio_select_mitigation(void)
if (mmio_mitigation == MMIO_MITIGATION_OFF)
return;

- ia32_cap = x86_read_arch_cap_msr();
-
/*
* Enable CPU buffer clear mitigation for host and VMM, if also affected
* by MDS or TAA. Otherwise, enable mitigation for VMM only.
@@ -508,7 +505,7 @@ static void __init rfds_select_mitigation(void)
if (rfds_mitigation == RFDS_MITIGATION_OFF)
return;

- if (x86_read_arch_cap_msr() & ARCH_CAP_RFDS_CLEAR)
+ if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
else
rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED;
@@ -659,8 +656,6 @@ void update_srbds_msr(void)

static void __init srbds_select_mitigation(void)
{
- u64 ia32_cap;
-
if (!boot_cpu_has_bug(X86_BUG_SRBDS))
return;

@@ -669,7 +664,6 @@ static void __init srbds_select_mitigation(void)
* are only exposed to SRBDS when TSX is enabled or when CPU is affected
* by Processor MMIO Stale Data vulnerability.
*/
- ia32_cap = x86_read_arch_cap_msr();
if ((ia32_cap & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) &&
!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA))
srbds_mitigation = SRBDS_MITIGATION_TSX_OFF;
@@ -813,7 +807,7 @@ static void __init gds_select_mitigation(void)
/* Will verify below that mitigation _can_ be disabled */

/* No microcode */
- if (!(x86_read_arch_cap_msr() & ARCH_CAP_GDS_CTRL)) {
+ if (!(ia32_cap & ARCH_CAP_GDS_CTRL)) {
if (gds_mitigation == GDS_MITIGATION_FORCE) {
/*
* This only needs to be done on the boot CPU so do it
@@ -1908,8 +1902,6 @@ static void update_indir_branch_cond(void)
/* Update the static key controlling the MDS CPU buffer clear in idle */
static void update_mds_branch_idle(void)
{
- u64 ia32_cap = x86_read_arch_cap_msr();
-
/*
* Enable the idle clearing if SMT is active on CPUs which are
* affected only by MSBDS and not any other MDS variant.
@@ -2818,7 +2810,7 @@ static const char * const spectre_bhi_state(void)
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
return "; BHI: SW loop, KVM: SW loop";
else if (boot_cpu_has(X86_FEATURE_RETPOLINE) &&
- !(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
+ !(ia32_cap & ARCH_CAP_RRSBA))
return "; BHI: Retpoline";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
return "; BHI: Syscall hardening, KVM: SW loop";
--
2.44.0


2024-04-11 05:43:25

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 4/7] x86/bugs: Clarify that syscall hardening isn't a BHI mitigation

While syscall hardening helps prevent some BHI attacks, there's still
other low-hanging fruit remaining. Don't classify it as a mitigation
and make it clear that the system may still be vulnerable if it doesn't
have a HW or SW mitigation enabled.

Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
---
Documentation/admin-guide/hw-vuln/spectre.rst | 11 +++++------
Documentation/admin-guide/kernel-parameters.txt | 3 +--
arch/x86/kernel/cpu/bugs.c | 6 +++---
3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 3cf18e4a1d9a..5a39acf82483 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -441,10 +441,10 @@ The possible values in this file are:
- System is protected by BHI_DIS_S
* - BHI: SW loop, KVM SW loop
- System is protected by software clearing sequence
- * - BHI: Syscall hardening
- - Syscalls are hardened against BHI
- * - BHI: Syscall hardening, KVM: SW loop
- - System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence
+ * - BHI: Vulnerable
+ - System is vulnerable to BHI
+ * - BHI: Vulnerable, KVM: SW loop
+ - System is vulnerable; KVM is protected by software clearing sequence

Full mitigation might require a microcode update from the CPU
vendor. When the necessary microcode is not available, the kernel will
@@ -661,8 +661,7 @@ kernel command line.
spectre_bhi=

[X86] Control mitigation of Branch History Injection
- (BHI) vulnerability. Syscalls are hardened against BHI
- regardless of this setting. This setting affects the deployment
+ (BHI) vulnerability. This setting affects the deployment
of the HW BHI control and the SW BHB clearing sequence.

on
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a029ad6c4963..a3874cc97892 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6065,8 +6065,7 @@
See Documentation/admin-guide/laptops/sonypi.rst

spectre_bhi= [X86] Control mitigation of Branch History Injection
- (BHI) vulnerability. Syscalls are hardened against BHI
- reglardless of this setting. This setting affects the
+ (BHI) vulnerability. This setting affects the
deployment of the HW BHI control and the SW BHB
clearing sequence.

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0755600d5d18..a65c70709bb5 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2818,10 +2818,10 @@ static const char * const spectre_bhi_state(void)
return "; BHI: SW loop, KVM: SW loop";
else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && rrsba_disabled)
return "; BHI: Retpoline";
- else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
- return "; BHI: Syscall hardening, KVM: SW loop";
+ else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
+ return "; BHI: Vulnerable, KVM: SW loop";

- return "; BHI: Vulnerable (Syscall hardening enabled)";
+ return "; BHI: Vulnerable";
}

static ssize_t spectre_v2_show_state(char *buf)
--
2.44.0


2024-04-11 05:43:32

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

Syscall hardening (i.e., converting the syscall indirect branch to a
series of direct branches) may cause performance regressions in certain
scenarios. Only use the syscall hardening when indirect branches are
considered unsafe.

Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/entry/common.c | 30 +++++++++++++++++++++++++---
arch/x86/entry/syscall_32.c | 11 +---------
arch/x86/entry/syscall_64.c | 8 +-------
arch/x86/entry/syscall_x32.c | 7 ++++++-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/syscall.h | 8 +++++++-
arch/x86/kernel/cpu/bugs.c | 32 +++++++++++++++++++++++++++++-
7 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6de50b80702e..80d432d2fe44 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -39,6 +39,28 @@

#ifdef CONFIG_X86_64

+/*
+ * Do either a direct or an indirect call, depending on whether indirect calls
+ * are considered safe.
+ */
+#define __do_syscall(table, func_direct, nr, regs) \
+({ \
+ unsigned long __rax, __rdi, __rsi; \
+ \
+ asm_inline volatile( \
+ ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
+ ANNOTATE_RETPOLINE_SAFE \
+ "call *%[func_ptr]\n\t", \
+ X86_FEATURE_INDIRECT_SAFE) \
+ : "=D" (__rdi), "=S" (__rsi), "=a" (__rax), \
+ ASM_CALL_CONSTRAINT \
+ : "0" (regs), "1" (nr), [func_ptr] "r" (table[nr]) \
+ : "rdx", "rcx", "r8", "r9", "r10", "r11", \
+ "cc", "memory"); \
+ \
+ __rax; \
+})
+
static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
{
/*
@@ -49,7 +71,7 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)

if (likely(unr < NR_syscalls)) {
unr = array_index_nospec(unr, NR_syscalls);
- regs->ax = x64_sys_call(regs, unr);
+ regs->ax = __do_syscall(sys_call_table, x64_sys_call, unr, regs);
return true;
}
return false;
@@ -66,7 +88,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)

if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
xnr = array_index_nospec(xnr, X32_NR_syscalls);
- regs->ax = x32_sys_call(regs, xnr);
+ regs->ax = __do_syscall(x32_sys_call_table, x32_sys_call, xnr, regs);
return true;
}
return false;
@@ -147,6 +169,8 @@ static int ia32_emulation_override_cmdline(char *arg)
return kstrtobool(arg, &__ia32_enabled);
}
early_param("ia32_emulation", ia32_emulation_override_cmdline);
+#else
+#define __do_syscall(table, func_direct, nr, regs) table[nr](regs)
#endif

/*
@@ -162,7 +186,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)

if (likely(unr < IA32_NR_syscalls)) {
unr = array_index_nospec(unr, IA32_NR_syscalls);
- regs->ax = ia32_sys_call(regs, unr);
+ regs->ax = __do_syscall(ia32_sys_call_table, ia32_sys_call, unr, regs);
} else if (nr != -1) {
regs->ax = __ia32_sys_ni_syscall(regs);
}
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index c2235bae17ef..9185870a3ab3 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,25 +14,16 @@
#endif

#define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
-
#include <asm/syscalls_32.h>
#undef __SYSCALL

-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
-#ifdef CONFIG_X86_32
#define __SYSCALL(nr, sym) __ia32_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+__visible const sys_call_ptr_t ia32_sys_call_table[] = {
#include <asm/syscalls_32.h>
};
#undef __SYSCALL
-#endif

#define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
-
long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 33b3f09e6f15..c368048efa41 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -11,19 +11,13 @@
#include <asm/syscalls_64.h>
#undef __SYSCALL

-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
#define __SYSCALL(nr, sym) __x64_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+asmlinkage const sys_call_ptr_t sys_call_table[] = {
#include <asm/syscalls_64.h>
};
#undef __SYSCALL

#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
-
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 03de4a932131..89a717267fab 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -11,8 +11,13 @@
#include <asm/syscalls_x32.h>
#undef __SYSCALL

-#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+#define __SYSCALL(nr, sym) __x64_##sym,
+asmlinkage const sys_call_ptr_t x32_sys_call_table[] = {
+#include <asm/syscalls_x32.h>
+};
+#undef __SYSCALL

+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..7c87fe80c696 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_INDIRECT_SAFE (21*32+ 4) /* "" Indirect branches aren't vulnerable to Spectre v2 */

/*
* BUG word(s)
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3863ff..dfb59521244c 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,14 +16,20 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>

-/* This is used purely for kernel/trace/trace_syscalls.c */
typedef long (*sys_call_ptr_t)(const struct pt_regs *);
extern const sys_call_ptr_t sys_call_table[];

+#if defined(CONFIG_X86_32)
+#define ia32_sys_call_table sys_call_table
+#else
/*
* These may not exist, but still put the prototypes in so we
* can use IS_ENABLED().
*/
+extern const sys_call_ptr_t ia32_sys_call_table[];
+extern const sys_call_ptr_t x32_sys_call_table[];
+#endif
+
extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a65c70709bb5..efffd87381b1 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1669,6 +1669,15 @@ static void __init bhi_select_mitigation(void)
if (!IS_ENABLED(CONFIG_X86_64))
return;

+ /*
+ * There's no hardware mitigation in place, so mark indirect branches
+ * as unsafe.
+ *
+ * One could argue the SW loop makes indirect branches safe again, but
+ * Linus prefers it this way.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
+
/* Mitigate KVM by default */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
@@ -1686,6 +1695,21 @@ static void __init spectre_v2_select_mitigation(void)
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;

+ /*
+ * X86_FEATURE_INDIRECT_SAFE indicates whether indirect calls can be
+ * considered safe. That means either:
+ *
+ * - the CPU isn't vulnerable to Spectre v2 or its variants;
+ *
+ * - a hardware mitigation is in place (e.g., IBRS, BHI_DIS_S); or
+ *
+ * - the user turned off mitigations altogether.
+ *
+ * Assume innocence until proven guilty: set the cap bit now, then
+ * clear it later if/when needed.
+ */
+ setup_force_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
+
/*
* If the CPU is not affected and the command line mode is NONE or AUTO
* then nothing to do.
@@ -1720,6 +1744,7 @@ static void __init spectre_v2_select_mitigation(void)

case SPECTRE_V2_CMD_RETPOLINE_LFENCE:
pr_err(SPECTRE_V2_LFENCE_MSG);
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
mode = SPECTRE_V2_LFENCE;
break;

@@ -1772,11 +1797,16 @@ static void __init spectre_v2_select_mitigation(void)
break;

case SPECTRE_V2_LFENCE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
+ fallthrough;
case SPECTRE_V2_EIBRS_LFENCE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
- fallthrough;
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ break;

case SPECTRE_V2_RETPOLINE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
+ fallthrough;
case SPECTRE_V2_EIBRS_RETPOLINE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
break;
--
2.44.0


2024-04-11 05:43:39

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 7/7] x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

For consistency with the other CONFIG_MITIGATION_* options, replace the
CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
CONFIG_MITIGATION_SPECTRE_BHI option.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/Kconfig | 17 +++--------------
arch/x86/kernel/cpu/bugs.c | 2 +-
2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b63b6767a63d..4474bf32d0a4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2633,27 +2633,16 @@ config MITIGATION_RFDS
stored in floating point, vector and integer registers.
See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>

-choice
- prompt "Clear branch history"
+config MITIGATION_SPECTRE_BHI
+ bool "Mitigate Spectre-BHB (Branch History Injection)"
depends on CPU_SUP_INTEL
- default SPECTRE_BHI_ON
+ default y
help
Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks
where the branch history buffer is poisoned to speculatively steer
indirect branches.
See <file:Documentation/admin-guide/hw-vuln/spectre.rst>

-config SPECTRE_BHI_ON
- bool "on"
- help
- Equivalent to setting spectre_bhi=on command line parameter.
-config SPECTRE_BHI_OFF
- bool "off"
- help
- Equivalent to setting spectre_bhi=off command line parameter.
-
-endchoice
-
endif

config ARCH_HAS_ADD_PAGES
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 74ade6d7caa3..4c46fa2d08c2 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1628,7 +1628,7 @@ enum bhi_mitigations {
};

static enum bhi_mitigations bhi_mitigation __ro_after_init =
- IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
+ IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;

static int __init spectre_bhi_parse_cmdline(char *str)
{
--
2.44.0


2024-04-11 05:58:25

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 3/7] x86/bugs: Fix BHI handling of RRSBA

The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been
disabled by the Spectre v2 mitigation (or can otherwise be disabled by
the BHI mitigation itself if needed). In that case retpolines are fine.

Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 30 ++++++++++++++++++------------
1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 27d6d64eeec3..0755600d5d18 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1538,20 +1538,25 @@ static enum spectre_v2_mitigation __init spectre_v2_select_retpoline(void)
return SPECTRE_V2_RETPOLINE;
}

+static bool __ro_after_init rrsba_disabled;
+
/* Disable in-kernel use of non-RSB RET predictors */
static void __init spec_ctrl_disable_kernel_rrsba(void)
{
- u64 ia32_cap;
+ if (rrsba_disabled)
+ return;
+
+ if (!(ia32_cap & ARCH_CAP_RRSBA)) {
+ rrsba_disabled = true;
+ return;
+ }

if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL))
return;

- ia32_cap = x86_read_arch_cap_msr();
-
- if (ia32_cap & ARCH_CAP_RRSBA) {
- x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S;
- update_spec_ctrl(x86_spec_ctrl_base);
- }
+ x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S;
+ update_spec_ctrl(x86_spec_ctrl_base);
+ rrsba_disabled = true;
}

static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
@@ -1652,9 +1657,11 @@ static void __init bhi_select_mitigation(void)
return;

/* Retpoline mitigates against BHI unless the CPU has RRSBA behavior */
- if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
- !(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
- return;
+ if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
+ spec_ctrl_disable_kernel_rrsba();
+ if (rrsba_disabled)
+ return;
+ }

if (spec_ctrl_bhi_dis())
return;
@@ -2809,8 +2816,7 @@ static const char * const spectre_bhi_state(void)
return "; BHI: BHI_DIS_S";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
return "; BHI: SW loop, KVM: SW loop";
- else if (boot_cpu_has(X86_FEATURE_RETPOLINE) &&
- !(ia32_cap & ARCH_CAP_RRSBA))
+ else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && rrsba_disabled)
return "; BHI: Retpoline";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
return "; BHI: Syscall hardening, KVM: SW loop";
--
2.44.0


2024-04-11 05:59:25

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 6/7] x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto

Unlike most other mitigations' "auto" options, spectre_bhi=auto only
mitigates newer systems, which is confusing and not particularly useful.

Remove it.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
Documentation/admin-guide/hw-vuln/spectre.rst | 4 ----
Documentation/admin-guide/kernel-parameters.txt | 3 ---
arch/x86/Kconfig | 5 -----
arch/x86/kernel/cpu/bugs.c | 10 +---------
4 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 5a39acf82483..25a04cda4c2c 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -669,10 +669,6 @@ kernel command line.
needed.
off
Disable the mitigation.
- auto
- Enable the HW mitigation if needed, but
- *don't* enable the SW mitigation except for KVM.
- The system may be vulnerable.

For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a3874cc97892..902ecd92a29f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6072,9 +6072,6 @@
on - (default) Enable the HW or SW mitigation
as needed.
off - Disable the mitigation.
- auto - Enable the HW mitigation if needed, but
- *don't* enable the SW mitigation except
- for KVM. The system may be vulnerable.

spectre_v2= [X86,EARLY] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 10a6251f58f3..b63b6767a63d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2651,11 +2651,6 @@ config SPECTRE_BHI_OFF
bool "off"
help
Equivalent to setting spectre_bhi=off command line parameter.
-config SPECTRE_BHI_AUTO
- bool "auto"
- depends on BROKEN
- help
- Equivalent to setting spectre_bhi=auto command line parameter.

endchoice

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index efffd87381b1..74ade6d7caa3 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1625,13 +1625,10 @@ static bool __init spec_ctrl_bhi_dis(void)
enum bhi_mitigations {
BHI_MITIGATION_OFF,
BHI_MITIGATION_ON,
- BHI_MITIGATION_AUTO,
};

static enum bhi_mitigations bhi_mitigation __ro_after_init =
- IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON :
- IS_ENABLED(CONFIG_SPECTRE_BHI_OFF) ? BHI_MITIGATION_OFF :
- BHI_MITIGATION_AUTO;
+ IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;

static int __init spectre_bhi_parse_cmdline(char *str)
{
@@ -1642,8 +1639,6 @@ static int __init spectre_bhi_parse_cmdline(char *str)
bhi_mitigation = BHI_MITIGATION_OFF;
else if (!strcmp(str, "on"))
bhi_mitigation = BHI_MITIGATION_ON;
- else if (!strcmp(str, "auto"))
- bhi_mitigation = BHI_MITIGATION_AUTO;
else
pr_err("Ignoring unknown spectre_bhi option (%s)", str);

@@ -1682,9 +1677,6 @@ static void __init bhi_select_mitigation(void)
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");

- if (bhi_mitigation == BHI_MITIGATION_AUTO)
- return;
-
/* Mitigate syscalls when the mitigation is forced =on */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP);
pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");
--
2.44.0


2024-04-11 06:20:39

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed



On 11.04.24 г. 8:40 ч., Josh Poimboeuf wrote:
> Syscall hardening (i.e., converting the syscall indirect branch to a
> series of direct branches) may cause performance regressions in certain
> scenarios. Only use the syscall hardening when indirect branches are
> considered unsafe.
>
> Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
> Signed-off-by: Josh Poimboeuf <[email protected]>

Why fiddle with syscall mechanism if the bhb scrubbing sequence
mitigates bhb? AFAIU (correct me if I'm wrong) the original idea was to
have use syscall hardening instead of the BHB sequence but since it
became clear that's not sufficient bhb scrubbing completely subsumes the
direct branch approach in the syscall handler?

<snip>

2024-04-11 06:22:10

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH 1/7] x86/bugs: BHI documentation fixes



On 11.04.24 г. 8:40 ч., Josh Poimboeuf wrote:
> Fix up some inaccuracies in the BHI documentation.
>
> Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
> Signed-off-by: Josh Poimboeuf <[email protected]>

Reviewed-by: Nikolay Borisov <[email protected]>

2024-04-11 06:22:21

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH 2/7] x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES



On 11.04.24 г. 8:40 ч., Josh Poimboeuf wrote:
> There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
> over. It's even read in the BHI sysfs function which is a big no-no.
> Just read it once and cache it.
>
> Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
> Signed-off-by: Josh Poimboeuf <[email protected]>


Reviewed-by: Nikolay Borisov <[email protected]>

2024-04-11 06:23:21

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH 6/7] x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto



On 11.04.24 г. 8:40 ч., Josh Poimboeuf wrote:
> Unlike most other mitigations' "auto" options, spectre_bhi=auto only
> mitigates newer systems, which is confusing and not particularly useful.
>
> Remove it.
>
> Signed-off-by: Josh Poimboeuf <[email protected]>

Reviewed-by: Nikolay Borisov <[email protected]>

2024-04-11 07:32:58

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 2b/7] x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'


* Josh Poimboeuf <[email protected]> wrote:

> There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
> over. It's even read in the BHI sysfs function which is a big no-no.
> Just read it once and cache it.
>
> Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> arch/x86/kernel/cpu/bugs.c | 22 +++++++---------------
> 1 file changed, 7 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 295463707e68..27d6d64eeec3 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -61,6 +61,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
> u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
> EXPORT_SYMBOL_GPL(x86_pred_cmd);
>
> +static u64 __ro_after_init ia32_cap;

So the 'ia32' there is a misnomer - which already exists in a number of
other places but now becomes annoyingly commonly used.

So I did the patch below on top to clean this up and rename it to
x86_arch_cap_msr.

Thanks,

Ingo

==================>
From: Ingo Molnar <[email protected]>
Date: Thu, 11 Apr 2024 09:25:36 +0200
Subject: [PATCH] x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'

So we are using the 'ia32_cap' value in a number of places,
which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.

But there's very little 'IA32' about it - this isn't 32-bit only
code, nor does it originate from there, it's just a historic
quirk that many Intel MSR names are prefixed with IA32_.

This is already clear from the helper method around the MSR:
x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.

So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with
its role and with the naming of the helper function.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
---
arch/x86/kernel/apic/apic.c | 6 +++---
arch/x86/kernel/cpu/bugs.c | 30 +++++++++++++--------------
arch/x86/kernel/cpu/common.c | 48 ++++++++++++++++++++++----------------------
3 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a42d8a6f7149..c342c4aa9c68 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1687,11 +1687,11 @@ static int x2apic_state;

static bool x2apic_hw_locked(void)
{
- u64 ia32_cap;
+ u64 x86_arch_cap_msr;
u64 msr;

- ia32_cap = x86_read_arch_cap_msr();
- if (ia32_cap & ARCH_CAP_XAPIC_DISABLE) {
+ x86_arch_cap_msr = x86_read_arch_cap_msr();
+ if (x86_arch_cap_msr & ARCH_CAP_XAPIC_DISABLE) {
rdmsrl(MSR_IA32_XAPIC_DISABLE_STATUS, msr);
return (msr & LEGACY_XAPIC_DISABLED);
}
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 61614d69d71b..eb9972bf154a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -61,7 +61,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
EXPORT_SYMBOL_GPL(x86_pred_cmd);

-static u64 __ro_after_init ia32_cap;
+static u64 __ro_after_init x86_arch_cap_msr;

static DEFINE_MUTEX(spec_ctrl_mutex);

@@ -146,7 +146,7 @@ void __init cpu_select_mitigations(void)
x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
}

- ia32_cap = x86_read_arch_cap_msr();
+ x86_arch_cap_msr = x86_read_arch_cap_msr();

/* Select the proper CPU mitigations before patching alternatives: */
spectre_v1_select_mitigation();
@@ -343,8 +343,8 @@ static void __init taa_select_mitigation(void)
* On MDS_NO=1 CPUs if ARCH_CAP_TSX_CTRL_MSR is not set, microcode
* update is required.
*/
- if ( (ia32_cap & ARCH_CAP_MDS_NO) &&
- !(ia32_cap & ARCH_CAP_TSX_CTRL_MSR))
+ if ( (x86_arch_cap_msr & ARCH_CAP_MDS_NO) &&
+ !(x86_arch_cap_msr & ARCH_CAP_TSX_CTRL_MSR))
taa_mitigation = TAA_MITIGATION_UCODE_NEEDED;

/*
@@ -434,7 +434,7 @@ static void __init mmio_select_mitigation(void)
* be propagated to uncore buffers, clearing the Fill buffers on idle
* is required irrespective of SMT state.
*/
- if (!(ia32_cap & ARCH_CAP_FBSDP_NO))
+ if (!(x86_arch_cap_msr & ARCH_CAP_FBSDP_NO))
static_branch_enable(&mds_idle_clear);

/*
@@ -444,10 +444,10 @@ static void __init mmio_select_mitigation(void)
* FB_CLEAR or by the presence of both MD_CLEAR and L1D_FLUSH on MDS
* affected systems.
*/
- if ((ia32_cap & ARCH_CAP_FB_CLEAR) ||
+ if ((x86_arch_cap_msr & ARCH_CAP_FB_CLEAR) ||
(boot_cpu_has(X86_FEATURE_MD_CLEAR) &&
boot_cpu_has(X86_FEATURE_FLUSH_L1D) &&
- !(ia32_cap & ARCH_CAP_MDS_NO)))
+ !(x86_arch_cap_msr & ARCH_CAP_MDS_NO)))
mmio_mitigation = MMIO_MITIGATION_VERW;
else
mmio_mitigation = MMIO_MITIGATION_UCODE_NEEDED;
@@ -505,7 +505,7 @@ static void __init rfds_select_mitigation(void)
if (rfds_mitigation == RFDS_MITIGATION_OFF)
return;

- if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
+ if (x86_arch_cap_msr & ARCH_CAP_RFDS_CLEAR)
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
else
rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED;
@@ -664,7 +664,7 @@ static void __init srbds_select_mitigation(void)
* are only exposed to SRBDS when TSX is enabled or when CPU is affected
* by Processor MMIO Stale Data vulnerability.
*/
- if ((ia32_cap & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) &&
+ if ((x86_arch_cap_msr & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) &&
!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA))
srbds_mitigation = SRBDS_MITIGATION_TSX_OFF;
else if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
@@ -807,7 +807,7 @@ static void __init gds_select_mitigation(void)
/* Will verify below that mitigation _can_ be disabled */

/* No microcode */
- if (!(ia32_cap & ARCH_CAP_GDS_CTRL)) {
+ if (!(x86_arch_cap_msr & ARCH_CAP_GDS_CTRL)) {
if (gds_mitigation == GDS_MITIGATION_FORCE) {
/*
* This only needs to be done on the boot CPU so do it
@@ -1541,14 +1541,14 @@ static enum spectre_v2_mitigation __init spectre_v2_select_retpoline(void)
/* Disable in-kernel use of non-RSB RET predictors */
static void __init spec_ctrl_disable_kernel_rrsba(void)
{
- u64 ia32_cap;
+ u64 x86_arch_cap_msr;

if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL))
return;

- ia32_cap = x86_read_arch_cap_msr();
+ x86_arch_cap_msr = x86_read_arch_cap_msr();

- if (ia32_cap & ARCH_CAP_RRSBA) {
+ if (x86_arch_cap_msr & ARCH_CAP_RRSBA) {
x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S;
update_spec_ctrl(x86_spec_ctrl_base);
}
@@ -1916,7 +1916,7 @@ static void update_mds_branch_idle(void)
if (sched_smt_active()) {
static_branch_enable(&mds_idle_clear);
} else if (mmio_mitigation == MMIO_MITIGATION_OFF ||
- (ia32_cap & ARCH_CAP_FBSDP_NO)) {
+ (x86_arch_cap_msr & ARCH_CAP_FBSDP_NO)) {
static_branch_disable(&mds_idle_clear);
}
}
@@ -2810,7 +2810,7 @@ static const char *spectre_bhi_state(void)
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
return "; BHI: SW loop, KVM: SW loop";
else if (boot_cpu_has(X86_FEATURE_RETPOLINE) &&
- !(ia32_cap & ARCH_CAP_RRSBA))
+ !(x86_arch_cap_msr & ARCH_CAP_RRSBA))
return "; BHI: Retpoline";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
return "; BHI: Syscall hardening, KVM: SW loop";
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 754d91857d63..605c26c009c8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1284,25 +1284,25 @@ static bool __init cpu_matches(const struct x86_cpu_id *table, unsigned long whi

u64 x86_read_arch_cap_msr(void)
{
- u64 ia32_cap = 0;
+ u64 x86_arch_cap_msr = 0;

if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
- rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
+ rdmsrl(MSR_IA32_ARCH_CAPABILITIES, x86_arch_cap_msr);

- return ia32_cap;
+ return x86_arch_cap_msr;
}

-static bool arch_cap_mmio_immune(u64 ia32_cap)
+static bool arch_cap_mmio_immune(u64 x86_arch_cap_msr)
{
- return (ia32_cap & ARCH_CAP_FBSDP_NO &&
- ia32_cap & ARCH_CAP_PSDP_NO &&
- ia32_cap & ARCH_CAP_SBDR_SSDP_NO);
+ return (x86_arch_cap_msr & ARCH_CAP_FBSDP_NO &&
+ x86_arch_cap_msr & ARCH_CAP_PSDP_NO &&
+ x86_arch_cap_msr & ARCH_CAP_SBDR_SSDP_NO);
}

-static bool __init vulnerable_to_rfds(u64 ia32_cap)
+static bool __init vulnerable_to_rfds(u64 x86_arch_cap_msr)
{
/* The "immunity" bit trumps everything else: */
- if (ia32_cap & ARCH_CAP_RFDS_NO)
+ if (x86_arch_cap_msr & ARCH_CAP_RFDS_NO)
return false;

/*
@@ -1310,7 +1310,7 @@ static bool __init vulnerable_to_rfds(u64 ia32_cap)
* indicate that mitigation is needed because guest is running on a
* vulnerable hardware or may migrate to such hardware:
*/
- if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
+ if (x86_arch_cap_msr & ARCH_CAP_RFDS_CLEAR)
return true;

/* Only consult the blacklist when there is no enumeration: */
@@ -1319,11 +1319,11 @@ static bool __init vulnerable_to_rfds(u64 ia32_cap)

static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
{
- u64 ia32_cap = x86_read_arch_cap_msr();
+ u64 x86_arch_cap_msr = x86_read_arch_cap_msr();

/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */
if (!cpu_matches(cpu_vuln_whitelist, NO_ITLB_MULTIHIT) &&
- !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
+ !(x86_arch_cap_msr & ARCH_CAP_PSCHANGE_MC_NO))
setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);

if (cpu_matches(cpu_vuln_whitelist, NO_SPECULATION))
@@ -1335,7 +1335,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
setup_force_cpu_bug(X86_BUG_SPECTRE_V2);

if (!cpu_matches(cpu_vuln_whitelist, NO_SSB) &&
- !(ia32_cap & ARCH_CAP_SSB_NO) &&
+ !(x86_arch_cap_msr & ARCH_CAP_SSB_NO) &&
!cpu_has(c, X86_FEATURE_AMD_SSB_NO))
setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);

@@ -1346,17 +1346,17 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
* Don't use AutoIBRS when SNP is enabled because it degrades host
* userspace indirect branch performance.
*/
- if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
+ if ((x86_arch_cap_msr & ARCH_CAP_IBRS_ALL) ||
(cpu_has(c, X86_FEATURE_AUTOIBRS) &&
!cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
- !(ia32_cap & ARCH_CAP_PBRSB_NO))
+ !(x86_arch_cap_msr & ARCH_CAP_PBRSB_NO))
setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB);
}

if (!cpu_matches(cpu_vuln_whitelist, NO_MDS) &&
- !(ia32_cap & ARCH_CAP_MDS_NO)) {
+ !(x86_arch_cap_msr & ARCH_CAP_MDS_NO)) {
setup_force_cpu_bug(X86_BUG_MDS);
if (cpu_matches(cpu_vuln_whitelist, MSBDS_ONLY))
setup_force_cpu_bug(X86_BUG_MSBDS_ONLY);
@@ -1375,9 +1375,9 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
* TSX_CTRL check alone is not sufficient for cases when the microcode
* update is not present or running as guest that don't get TSX_CTRL.
*/
- if (!(ia32_cap & ARCH_CAP_TAA_NO) &&
+ if (!(x86_arch_cap_msr & ARCH_CAP_TAA_NO) &&
(cpu_has(c, X86_FEATURE_RTM) ||
- (ia32_cap & ARCH_CAP_TSX_CTRL_MSR)))
+ (x86_arch_cap_msr & ARCH_CAP_TSX_CTRL_MSR)))
setup_force_cpu_bug(X86_BUG_TAA);

/*
@@ -1403,7 +1403,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
* Set X86_BUG_MMIO_UNKNOWN for CPUs that are neither in the blacklist,
* nor in the whitelist and also don't enumerate MSR ARCH_CAP MMIO bits.
*/
- if (!arch_cap_mmio_immune(ia32_cap)) {
+ if (!arch_cap_mmio_immune(x86_arch_cap_msr)) {
if (cpu_matches(cpu_vuln_blacklist, MMIO))
setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
else if (!cpu_matches(cpu_vuln_whitelist, NO_MMIO))
@@ -1411,7 +1411,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
}

if (!cpu_has(c, X86_FEATURE_BTC_NO)) {
- if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA))
+ if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (x86_arch_cap_msr & ARCH_CAP_RSBA))
setup_force_cpu_bug(X86_BUG_RETBLEED);
}

@@ -1429,15 +1429,15 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
* disabling AVX2. The only way to do this in HW is to clear XCR0[2],
* which means that AVX will be disabled.
*/
- if (cpu_matches(cpu_vuln_blacklist, GDS) && !(ia32_cap & ARCH_CAP_GDS_NO) &&
+ if (cpu_matches(cpu_vuln_blacklist, GDS) && !(x86_arch_cap_msr & ARCH_CAP_GDS_NO) &&
boot_cpu_has(X86_FEATURE_AVX))
setup_force_cpu_bug(X86_BUG_GDS);

- if (vulnerable_to_rfds(ia32_cap))
+ if (vulnerable_to_rfds(x86_arch_cap_msr))
setup_force_cpu_bug(X86_BUG_RFDS);

/* When virtualized, eIBRS could be hidden, assume vulnerable */
- if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
+ if (!(x86_arch_cap_msr & ARCH_CAP_BHI_NO) &&
!cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
(boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
boot_cpu_has(X86_FEATURE_HYPERVISOR)))
@@ -1447,7 +1447,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
return;

/* Rogue Data Cache Load? No! */
- if (ia32_cap & ARCH_CAP_RDCL_NO)
+ if (x86_arch_cap_msr & ARCH_CAP_RDCL_NO)
return;

setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);

2024-04-11 07:48:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 7/7] x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI


* Josh Poimboeuf <[email protected]> wrote:

> For consistency with the other CONFIG_MITIGATION_* options, replace the
> CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
> CONFIG_MITIGATION_SPECTRE_BHI option.
>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> arch/x86/Kconfig | 17 +++--------------
> arch/x86/kernel/cpu/bugs.c | 2 +-
> 2 files changed, 4 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b63b6767a63d..4474bf32d0a4 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2633,27 +2633,16 @@ config MITIGATION_RFDS
> stored in floating point, vector and integer registers.
> See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>
>
> -choice
> - prompt "Clear branch history"
> +config MITIGATION_SPECTRE_BHI
> + bool "Mitigate Spectre-BHB (Branch History Injection)"
> depends on CPU_SUP_INTEL
> - default SPECTRE_BHI_ON
> + default y
> help
> Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks
> where the branch history buffer is poisoned to speculatively steer
> indirect branches.
> See <file:Documentation/admin-guide/hw-vuln/spectre.rst>
>
> -config SPECTRE_BHI_ON
> - bool "on"
> - help
> - Equivalent to setting spectre_bhi=on command line parameter.
> -config SPECTRE_BHI_OFF
> - bool "off"
> - help
> - Equivalent to setting spectre_bhi=off command line parameter.
> -
> -endchoice
> -
> endif
>
> config ARCH_HAS_ADD_PAGES
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 74ade6d7caa3..4c46fa2d08c2 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1628,7 +1628,7 @@ enum bhi_mitigations {
> };
>
> static enum bhi_mitigations bhi_mitigation __ro_after_init =
> - IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
> + IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;

Uhm, after this patch there's no CONFIG_MITIGATION_SPECTRE_BHI_ON anymore,
which is kindof nasty, as IS_ENABLED() doesn't generate a build failure for
non-existent Kconfig variables IIRC ...

So AFAICT this patch turns on BHI unconditionally.

I've fixed as per the patch below, but please double check the end result
in tip:x86/urgent once I've pushed it out..

Thanks,

Ingo

=====================>
arch/x86/kernel/cpu/bugs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 80d9018da3d2..25111ad388d9 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1628,7 +1628,7 @@ enum bhi_mitigations {
};

static enum bhi_mitigations bhi_mitigation __ro_after_init =
- IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
+ IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;

static int __init spectre_bhi_parse_cmdline(char *str)
{

2024-04-11 08:18:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 7/7] x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI


* Ingo Molnar <[email protected]> wrote:

> > static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > - IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
> > + IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
>
> Uhm, after this patch there's no CONFIG_MITIGATION_SPECTRE_BHI_ON
> anymore, which is kindof nasty, as IS_ENABLED() doesn't generate a build
> failure for non-existent Kconfig variables IIRC ...
>
> So AFAICT this patch turns on BHI unconditionally.

BTW., this is why IS_ENABLED() is a bad primitive IMO:

kepler:~/tip> for N in $(git grep -w IS_ENABLED arch/x86/ | sed 's/.*IS_ENABLED(//g' | sed 's/).*//g' | sort | uniq | sed 's/^CONFIG_//g'); do printf "# %40s: " $N; git grep -E "^config $N$" -- '**Kconfig**' | wc -l; done | grep -w 0
# RESCTRL_RMID_DEPENDS_ON_CLOSID: 0
# NODE_NOT_IN_PAGE_FLAGS: 0

1)

CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID doesn't exist anymore, but is used
widely:

kepler:~/tip> git grep RESCTRL_RMID_DEPENDS_ON_CLOSID
arch/x86/kernel/cpu/resctrl/monitor.c: * Only allocated when CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is defined.
arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
arch/x86/kernel/cpu/resctrl/monitor.c: if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
arch/x86/kernel/cpu/resctrl/rdtgroup.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {

Each of those uses is bogus, as the Kconfig symbol doesn't exist. (!)

AFAICT CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID was never defined within the
upstream kernel (!!).

AFAICT the first bogus CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID use was
introduced in this recent commit:

b30a55df60c3 ("x86/resctrl: Track the number of dirty RMID a CLOSID has")

.. and was cargo-cult copied in other patches. It was never explained in
the changelog why it's used without a Kconfig entry anywhere.

Maybe in the future some other arch might (or might not) introduce
RESCTRL_RMID_DEPENDS_ON_CLOSID, but that doesn't justify this bad pattern
of dead code ...

2)

The NODE_NOT_IN_PAGE_FLAGS use is borderline correct:

kepler:~/tip> git grep -w NODE_NOT_IN_PAGE_FLAGS arch/x86
arch/x86/mm/numa.c: if (IS_ENABLED(NODE_NOT_IN_PAGE_FLAGS)) {


As NODE_NOT_IN_PAGE_FLAGS is not a Kconfig symbol, but a flag defined by
the VM:

include/linux/page-flags-layout.h:#define NODE_NOT_IN_PAGE_FLAGS 1

.. which happens to work with how IS_ENABLED() is defined. No other user
of NODE_NOT_IN_PAGE_FLAGS uses the IS_ENABLED() pattern.

Thanks,

Ingo


Subject: [tip: x86/urgent] x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: f3f51c5865a9ae1488a35d97338f9f3f548adfee
Gitweb: https://git.kernel.org/tip/f3f51c5865a9ae1488a35d97338f9f3f548adfee
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:51 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 11 Apr 2024 10:30:34 +02:00

x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

For consistency with the other CONFIG_MITIGATION_* options, replace the
CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
CONFIG_MITIGATION_SPECTRE_BHI option.

[ mingo: Fix ]

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
---
arch/x86/Kconfig | 17 +++--------------
arch/x86/kernel/cpu/bugs.c | 2 +-
2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b63b676..4474bf3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2633,27 +2633,16 @@ config MITIGATION_RFDS
stored in floating point, vector and integer registers.
See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>

-choice
- prompt "Clear branch history"
+config MITIGATION_SPECTRE_BHI
+ bool "Mitigate Spectre-BHB (Branch History Injection)"
depends on CPU_SUP_INTEL
- default SPECTRE_BHI_ON
+ default y
help
Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks
where the branch history buffer is poisoned to speculatively steer
indirect branches.
See <file:Documentation/admin-guide/hw-vuln/spectre.rst>

-config SPECTRE_BHI_ON
- bool "on"
- help
- Equivalent to setting spectre_bhi=on command line parameter.
-config SPECTRE_BHI_OFF
- bool "off"
- help
- Equivalent to setting spectre_bhi=off command line parameter.
-
-endchoice
-
endif

config ARCH_HAS_ADD_PAGES
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0494787..25111ad 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1628,7 +1628,7 @@ enum bhi_mitigations {
};

static enum bhi_mitigations bhi_mitigation __ro_after_init =
- IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
+ IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;

static int __init spectre_bhi_parse_cmdline(char *str)
{

Subject: [tip: x86/urgent] x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 215bca01a67300ea884ce9302eaed53bcd1d996d
Gitweb: https://git.kernel.org/tip/215bca01a67300ea884ce9302eaed53bcd1d996d
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:50 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 11 Apr 2024 10:30:34 +02:00

x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto

Unlike most other mitigations' "auto" options, spectre_bhi=auto only
mitigates newer systems, which is confusing and not particularly useful.

Remove it.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
---
Documentation/admin-guide/hw-vuln/spectre.rst | 4 ----
Documentation/admin-guide/kernel-parameters.txt | 3 ---
arch/x86/Kconfig | 5 -----
arch/x86/kernel/cpu/bugs.c | 10 +---------
4 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 5a39acf..25a04cd 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -669,10 +669,6 @@ kernel command line.
needed.
off
Disable the mitigation.
- auto
- Enable the HW mitigation if needed, but
- *don't* enable the SW mitigation except for KVM.
- The system may be vulnerable.

For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a3874cc..902ecd9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6072,9 +6072,6 @@
on - (default) Enable the HW or SW mitigation
as needed.
off - Disable the mitigation.
- auto - Enable the HW mitigation if needed, but
- *don't* enable the SW mitigation except
- for KVM. The system may be vulnerable.

spectre_v2= [X86,EARLY] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 10a6251..b63b676 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2651,11 +2651,6 @@ config SPECTRE_BHI_OFF
bool "off"
help
Equivalent to setting spectre_bhi=off command line parameter.
-config SPECTRE_BHI_AUTO
- bool "auto"
- depends on BROKEN
- help
- Equivalent to setting spectre_bhi=auto command line parameter.

endchoice

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index b4ec0f0..0494787 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1625,13 +1625,10 @@ static bool __init spec_ctrl_bhi_dis(void)
enum bhi_mitigations {
BHI_MITIGATION_OFF,
BHI_MITIGATION_ON,
- BHI_MITIGATION_AUTO,
};

static enum bhi_mitigations bhi_mitigation __ro_after_init =
- IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON :
- IS_ENABLED(CONFIG_SPECTRE_BHI_OFF) ? BHI_MITIGATION_OFF :
- BHI_MITIGATION_AUTO;
+ IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;

static int __init spectre_bhi_parse_cmdline(char *str)
{
@@ -1642,8 +1639,6 @@ static int __init spectre_bhi_parse_cmdline(char *str)
bhi_mitigation = BHI_MITIGATION_OFF;
else if (!strcmp(str, "on"))
bhi_mitigation = BHI_MITIGATION_ON;
- else if (!strcmp(str, "auto"))
- bhi_mitigation = BHI_MITIGATION_AUTO;
else
pr_err("Ignoring unknown spectre_bhi option (%s)", str);

@@ -1682,9 +1677,6 @@ static void __init bhi_select_mitigation(void)
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");

- if (bhi_mitigation == BHI_MITIGATION_AUTO)
- return;
-
/* Mitigate syscalls when the mitigation is forced =on */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP);
pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");

Subject: [tip: x86/urgent] x86/bugs: Clarify that syscall hardening isn't a BHI mitigation

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 5f882f3b0a8bf0788d5a0ee44b1191de5319bb8a
Gitweb: https://git.kernel.org/tip/5f882f3b0a8bf0788d5a0ee44b1191de5319bb8a
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:48 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 11 Apr 2024 10:30:33 +02:00

x86/bugs: Clarify that syscall hardening isn't a BHI mitigation

While syscall hardening helps prevent some BHI attacks, there's still
other low-hanging fruit remaining. Don't classify it as a mitigation
and make it clear that the system may still be vulnerable if it doesn't
have a HW or SW mitigation enabled.

Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/b5951dae3fdee7f1520d5136a27be3bdfe95f88b.1712813475.git.jpoimboe@kernel.org
---
Documentation/admin-guide/hw-vuln/spectre.rst | 11 +++++------
Documentation/admin-guide/kernel-parameters.txt | 3 +--
arch/x86/kernel/cpu/bugs.c | 6 +++---
3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 3cf18e4..5a39acf 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -441,10 +441,10 @@ The possible values in this file are:
- System is protected by BHI_DIS_S
* - BHI: SW loop, KVM SW loop
- System is protected by software clearing sequence
- * - BHI: Syscall hardening
- - Syscalls are hardened against BHI
- * - BHI: Syscall hardening, KVM: SW loop
- - System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence
+ * - BHI: Vulnerable
+ - System is vulnerable to BHI
+ * - BHI: Vulnerable, KVM: SW loop
+ - System is vulnerable; KVM is protected by software clearing sequence

Full mitigation might require a microcode update from the CPU
vendor. When the necessary microcode is not available, the kernel will
@@ -661,8 +661,7 @@ kernel command line.
spectre_bhi=

[X86] Control mitigation of Branch History Injection
- (BHI) vulnerability. Syscalls are hardened against BHI
- regardless of this setting. This setting affects the deployment
+ (BHI) vulnerability. This setting affects the deployment
of the HW BHI control and the SW BHB clearing sequence.

on
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a029ad6..a3874cc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6065,8 +6065,7 @@
See Documentation/admin-guide/laptops/sonypi.rst

spectre_bhi= [X86] Control mitigation of Branch History Injection
- (BHI) vulnerability. Syscalls are hardened against BHI
- reglardless of this setting. This setting affects the
+ (BHI) vulnerability. This setting affects the
deployment of the HW BHI control and the SW BHB
clearing sequence.

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 08dfb94..9eeb60f 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2818,10 +2818,10 @@ static const char *spectre_bhi_state(void)
return "; BHI: SW loop, KVM: SW loop";
else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && rrsba_disabled)
return "; BHI: Retpoline";
- else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
- return "; BHI: Syscall hardening, KVM: SW loop";
+ else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
+ return "; BHI: Vulnerable, KVM: SW loop";

- return "; BHI: Vulnerable (Syscall hardening enabled)";
+ return "; BHI: Vulnerable";
}

static ssize_t spectre_v2_show_state(char *buf)

Subject: [tip: x86/urgent] x86/bugs: Fix BHI handling of RRSBA

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 1cea8a280dfd1016148a3820676f2f03e3f5b898
Gitweb: https://git.kernel.org/tip/1cea8a280dfd1016148a3820676f2f03e3f5b898
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:47 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 11 Apr 2024 10:30:33 +02:00

x86/bugs: Fix BHI handling of RRSBA

The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been
disabled by the Spectre v2 mitigation (or can otherwise be disabled by
the BHI mitigation itself if needed). In that case retpolines are fine.

Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/6f56f13da34a0834b69163467449be7f58f253dc.1712813475.git.jpoimboe@kernel.org
---
arch/x86/kernel/cpu/bugs.c | 30 ++++++++++++++++++------------
1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1b0cfc1..08dfb94 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1538,20 +1538,25 @@ static enum spectre_v2_mitigation __init spectre_v2_select_retpoline(void)
return SPECTRE_V2_RETPOLINE;
}

+static bool __ro_after_init rrsba_disabled;
+
/* Disable in-kernel use of non-RSB RET predictors */
static void __init spec_ctrl_disable_kernel_rrsba(void)
{
- u64 x86_arch_cap_msr;
+ if (rrsba_disabled)
+ return;

- if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL))
+ if (!(x86_arch_cap_msr & ARCH_CAP_RRSBA)) {
+ rrsba_disabled = true;
return;
+ }

- x86_arch_cap_msr = x86_read_arch_cap_msr();
+ if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL))
+ return;

- if (x86_arch_cap_msr & ARCH_CAP_RRSBA) {
- x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S;
- update_spec_ctrl(x86_spec_ctrl_base);
- }
+ x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S;
+ update_spec_ctrl(x86_spec_ctrl_base);
+ rrsba_disabled = true;
}

static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
@@ -1652,9 +1657,11 @@ static void __init bhi_select_mitigation(void)
return;

/* Retpoline mitigates against BHI unless the CPU has RRSBA behavior */
- if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
- !(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
- return;
+ if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
+ spec_ctrl_disable_kernel_rrsba();
+ if (rrsba_disabled)
+ return;
+ }

if (spec_ctrl_bhi_dis())
return;
@@ -2809,8 +2816,7 @@ static const char *spectre_bhi_state(void)
return "; BHI: BHI_DIS_S";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
return "; BHI: SW loop, KVM: SW loop";
- else if (boot_cpu_has(X86_FEATURE_RETPOLINE) &&
- !(x86_arch_cap_msr & ARCH_CAP_RRSBA))
+ else if (boot_cpu_has(X86_FEATURE_RETPOLINE) && rrsba_disabled)
return "; BHI: Retpoline";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
return "; BHI: Syscall hardening, KVM: SW loop";

Subject: [tip: x86/urgent] x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: cb2db5bb04d7f778fbc1a1ea2507aab436f1bff3
Gitweb: https://git.kernel.org/tip/cb2db5bb04d7f778fbc1a1ea2507aab436f1bff3
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:46 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 11 Apr 2024 10:30:33 +02:00

x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES

There's no need to keep reading MSR_IA32_ARCH_CAPABILITIES over and
over. It's even read in the BHI sysfs function which is a big no-no.
Just read it once and cache it.

Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
---
arch/x86/kernel/cpu/bugs.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 27f5004..ff59fa8 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -61,6 +61,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
EXPORT_SYMBOL_GPL(x86_pred_cmd);

+static u64 __ro_after_init ia32_cap;
+
static DEFINE_MUTEX(spec_ctrl_mutex);

void (*x86_return_thunk)(void) __ro_after_init = __x86_return_thunk;
@@ -144,6 +146,8 @@ void __init cpu_select_mitigations(void)
x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
}

+ ia32_cap = x86_read_arch_cap_msr();
+
/* Select the proper CPU mitigations before patching alternatives: */
spectre_v1_select_mitigation();
spectre_v2_select_mitigation();
@@ -301,8 +305,6 @@ static const char * const taa_strings[] = {

static void __init taa_select_mitigation(void)
{
- u64 ia32_cap;
-
if (!boot_cpu_has_bug(X86_BUG_TAA)) {
taa_mitigation = TAA_MITIGATION_OFF;
return;
@@ -341,7 +343,6 @@ static void __init taa_select_mitigation(void)
* On MDS_NO=1 CPUs if ARCH_CAP_TSX_CTRL_MSR is not set, microcode
* update is required.
*/
- ia32_cap = x86_read_arch_cap_msr();
if ( (ia32_cap & ARCH_CAP_MDS_NO) &&
!(ia32_cap & ARCH_CAP_TSX_CTRL_MSR))
taa_mitigation = TAA_MITIGATION_UCODE_NEEDED;
@@ -401,8 +402,6 @@ static const char * const mmio_strings[] = {

static void __init mmio_select_mitigation(void)
{
- u64 ia32_cap;
-
if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) ||
boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN) ||
cpu_mitigations_off()) {
@@ -413,8 +412,6 @@ static void __init mmio_select_mitigation(void)
if (mmio_mitigation == MMIO_MITIGATION_OFF)
return;

- ia32_cap = x86_read_arch_cap_msr();
-
/*
* Enable CPU buffer clear mitigation for host and VMM, if also affected
* by MDS or TAA. Otherwise, enable mitigation for VMM only.
@@ -508,7 +505,7 @@ static void __init rfds_select_mitigation(void)
if (rfds_mitigation == RFDS_MITIGATION_OFF)
return;

- if (x86_read_arch_cap_msr() & ARCH_CAP_RFDS_CLEAR)
+ if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
else
rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED;
@@ -659,8 +656,6 @@ void update_srbds_msr(void)

static void __init srbds_select_mitigation(void)
{
- u64 ia32_cap;
-
if (!boot_cpu_has_bug(X86_BUG_SRBDS))
return;

@@ -669,7 +664,6 @@ static void __init srbds_select_mitigation(void)
* are only exposed to SRBDS when TSX is enabled or when CPU is affected
* by Processor MMIO Stale Data vulnerability.
*/
- ia32_cap = x86_read_arch_cap_msr();
if ((ia32_cap & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) &&
!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA))
srbds_mitigation = SRBDS_MITIGATION_TSX_OFF;
@@ -813,7 +807,7 @@ static void __init gds_select_mitigation(void)
/* Will verify below that mitigation _can_ be disabled */

/* No microcode */
- if (!(x86_read_arch_cap_msr() & ARCH_CAP_GDS_CTRL)) {
+ if (!(ia32_cap & ARCH_CAP_GDS_CTRL)) {
if (gds_mitigation == GDS_MITIGATION_FORCE) {
/*
* This only needs to be done on the boot CPU so do it
@@ -1908,8 +1902,6 @@ static void update_indir_branch_cond(void)
/* Update the static key controlling the MDS CPU buffer clear in idle */
static void update_mds_branch_idle(void)
{
- u64 ia32_cap = x86_read_arch_cap_msr();
-
/*
* Enable the idle clearing if SMT is active on CPUs which are
* affected only by MSBDS and not any other MDS variant.
@@ -2818,7 +2810,7 @@ static const char *spectre_bhi_state(void)
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
return "; BHI: SW loop, KVM: SW loop";
else if (boot_cpu_has(X86_FEATURE_RETPOLINE) &&
- !(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
+ !(ia32_cap & ARCH_CAP_RRSBA))
return "; BHI: Retpoline";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
return "; BHI: Syscall hardening, KVM: SW loop";

Subject: [tip: x86/urgent] x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: d0485730d2189ffe5d986d4e9e191f1e4d5ffd24
Gitweb: https://git.kernel.org/tip/d0485730d2189ffe5d986d4e9e191f1e4d5ffd24
Author: Ingo Molnar <[email protected]>
AuthorDate: Thu, 11 Apr 2024 09:25:36 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 11 Apr 2024 10:30:33 +02:00

x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'

So we are using the 'ia32_cap' value in a number of places,
which got its name from MSR_IA32_ARCH_CAPABILITIES MSR register.

But there's very little 'IA32' about it - this isn't 32-bit only
code, nor does it originate from there, it's just a historic
quirk that many Intel MSR names are prefixed with IA32_.

This is already clear from the helper method around the MSR:
x86_read_arch_cap_msr(), which doesn't have the IA32 prefix.

So rename 'ia32_cap' to 'x86_arch_cap_msr' to be consistent with
its role and with the naming of the helper function.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/9592a18a814368e75f8f4b9d74d3883aa4fd1eaf.1712813475.git.jpoimboe@kernel.org
---
arch/x86/kernel/apic/apic.c | 6 ++--
arch/x86/kernel/cpu/bugs.c | 30 +++++++++++-----------
arch/x86/kernel/cpu/common.c | 48 +++++++++++++++++------------------
3 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a42d8a6..c342c4a 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1687,11 +1687,11 @@ static int x2apic_state;

static bool x2apic_hw_locked(void)
{
- u64 ia32_cap;
+ u64 x86_arch_cap_msr;
u64 msr;

- ia32_cap = x86_read_arch_cap_msr();
- if (ia32_cap & ARCH_CAP_XAPIC_DISABLE) {
+ x86_arch_cap_msr = x86_read_arch_cap_msr();
+ if (x86_arch_cap_msr & ARCH_CAP_XAPIC_DISABLE) {
rdmsrl(MSR_IA32_XAPIC_DISABLE_STATUS, msr);
return (msr & LEGACY_XAPIC_DISABLED);
}
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ff59fa8..1b0cfc1 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -61,7 +61,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
EXPORT_SYMBOL_GPL(x86_pred_cmd);

-static u64 __ro_after_init ia32_cap;
+static u64 __ro_after_init x86_arch_cap_msr;

static DEFINE_MUTEX(spec_ctrl_mutex);

@@ -146,7 +146,7 @@ void __init cpu_select_mitigations(void)
x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
}

- ia32_cap = x86_read_arch_cap_msr();
+ x86_arch_cap_msr = x86_read_arch_cap_msr();

/* Select the proper CPU mitigations before patching alternatives: */
spectre_v1_select_mitigation();
@@ -343,8 +343,8 @@ static void __init taa_select_mitigation(void)
* On MDS_NO=1 CPUs if ARCH_CAP_TSX_CTRL_MSR is not set, microcode
* update is required.
*/
- if ( (ia32_cap & ARCH_CAP_MDS_NO) &&
- !(ia32_cap & ARCH_CAP_TSX_CTRL_MSR))
+ if ( (x86_arch_cap_msr & ARCH_CAP_MDS_NO) &&
+ !(x86_arch_cap_msr & ARCH_CAP_TSX_CTRL_MSR))
taa_mitigation = TAA_MITIGATION_UCODE_NEEDED;

/*
@@ -434,7 +434,7 @@ static void __init mmio_select_mitigation(void)
* be propagated to uncore buffers, clearing the Fill buffers on idle
* is required irrespective of SMT state.
*/
- if (!(ia32_cap & ARCH_CAP_FBSDP_NO))
+ if (!(x86_arch_cap_msr & ARCH_CAP_FBSDP_NO))
static_branch_enable(&mds_idle_clear);

/*
@@ -444,10 +444,10 @@ static void __init mmio_select_mitigation(void)
* FB_CLEAR or by the presence of both MD_CLEAR and L1D_FLUSH on MDS
* affected systems.
*/
- if ((ia32_cap & ARCH_CAP_FB_CLEAR) ||
+ if ((x86_arch_cap_msr & ARCH_CAP_FB_CLEAR) ||
(boot_cpu_has(X86_FEATURE_MD_CLEAR) &&
boot_cpu_has(X86_FEATURE_FLUSH_L1D) &&
- !(ia32_cap & ARCH_CAP_MDS_NO)))
+ !(x86_arch_cap_msr & ARCH_CAP_MDS_NO)))
mmio_mitigation = MMIO_MITIGATION_VERW;
else
mmio_mitigation = MMIO_MITIGATION_UCODE_NEEDED;
@@ -505,7 +505,7 @@ static void __init rfds_select_mitigation(void)
if (rfds_mitigation == RFDS_MITIGATION_OFF)
return;

- if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
+ if (x86_arch_cap_msr & ARCH_CAP_RFDS_CLEAR)
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
else
rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED;
@@ -664,7 +664,7 @@ static void __init srbds_select_mitigation(void)
* are only exposed to SRBDS when TSX is enabled or when CPU is affected
* by Processor MMIO Stale Data vulnerability.
*/
- if ((ia32_cap & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) &&
+ if ((x86_arch_cap_msr & ARCH_CAP_MDS_NO) && !boot_cpu_has(X86_FEATURE_RTM) &&
!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA))
srbds_mitigation = SRBDS_MITIGATION_TSX_OFF;
else if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
@@ -807,7 +807,7 @@ static void __init gds_select_mitigation(void)
/* Will verify below that mitigation _can_ be disabled */

/* No microcode */
- if (!(ia32_cap & ARCH_CAP_GDS_CTRL)) {
+ if (!(x86_arch_cap_msr & ARCH_CAP_GDS_CTRL)) {
if (gds_mitigation == GDS_MITIGATION_FORCE) {
/*
* This only needs to be done on the boot CPU so do it
@@ -1541,14 +1541,14 @@ static enum spectre_v2_mitigation __init spectre_v2_select_retpoline(void)
/* Disable in-kernel use of non-RSB RET predictors */
static void __init spec_ctrl_disable_kernel_rrsba(void)
{
- u64 ia32_cap;
+ u64 x86_arch_cap_msr;

if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL))
return;

- ia32_cap = x86_read_arch_cap_msr();
+ x86_arch_cap_msr = x86_read_arch_cap_msr();

- if (ia32_cap & ARCH_CAP_RRSBA) {
+ if (x86_arch_cap_msr & ARCH_CAP_RRSBA) {
x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S;
update_spec_ctrl(x86_spec_ctrl_base);
}
@@ -1916,7 +1916,7 @@ static void update_mds_branch_idle(void)
if (sched_smt_active()) {
static_branch_enable(&mds_idle_clear);
} else if (mmio_mitigation == MMIO_MITIGATION_OFF ||
- (ia32_cap & ARCH_CAP_FBSDP_NO)) {
+ (x86_arch_cap_msr & ARCH_CAP_FBSDP_NO)) {
static_branch_disable(&mds_idle_clear);
}
}
@@ -2810,7 +2810,7 @@ static const char *spectre_bhi_state(void)
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
return "; BHI: SW loop, KVM: SW loop";
else if (boot_cpu_has(X86_FEATURE_RETPOLINE) &&
- !(ia32_cap & ARCH_CAP_RRSBA))
+ !(x86_arch_cap_msr & ARCH_CAP_RRSBA))
return "; BHI: Retpoline";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
return "; BHI: Syscall hardening, KVM: SW loop";
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 754d918..605c26c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1284,25 +1284,25 @@ static bool __init cpu_matches(const struct x86_cpu_id *table, unsigned long whi

u64 x86_read_arch_cap_msr(void)
{
- u64 ia32_cap = 0;
+ u64 x86_arch_cap_msr = 0;

if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
- rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
+ rdmsrl(MSR_IA32_ARCH_CAPABILITIES, x86_arch_cap_msr);

- return ia32_cap;
+ return x86_arch_cap_msr;
}

-static bool arch_cap_mmio_immune(u64 ia32_cap)
+static bool arch_cap_mmio_immune(u64 x86_arch_cap_msr)
{
- return (ia32_cap & ARCH_CAP_FBSDP_NO &&
- ia32_cap & ARCH_CAP_PSDP_NO &&
- ia32_cap & ARCH_CAP_SBDR_SSDP_NO);
+ return (x86_arch_cap_msr & ARCH_CAP_FBSDP_NO &&
+ x86_arch_cap_msr & ARCH_CAP_PSDP_NO &&
+ x86_arch_cap_msr & ARCH_CAP_SBDR_SSDP_NO);
}

-static bool __init vulnerable_to_rfds(u64 ia32_cap)
+static bool __init vulnerable_to_rfds(u64 x86_arch_cap_msr)
{
/* The "immunity" bit trumps everything else: */
- if (ia32_cap & ARCH_CAP_RFDS_NO)
+ if (x86_arch_cap_msr & ARCH_CAP_RFDS_NO)
return false;

/*
@@ -1310,7 +1310,7 @@ static bool __init vulnerable_to_rfds(u64 ia32_cap)
* indicate that mitigation is needed because guest is running on a
* vulnerable hardware or may migrate to such hardware:
*/
- if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
+ if (x86_arch_cap_msr & ARCH_CAP_RFDS_CLEAR)
return true;

/* Only consult the blacklist when there is no enumeration: */
@@ -1319,11 +1319,11 @@ static bool __init vulnerable_to_rfds(u64 ia32_cap)

static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
{
- u64 ia32_cap = x86_read_arch_cap_msr();
+ u64 x86_arch_cap_msr = x86_read_arch_cap_msr();

/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */
if (!cpu_matches(cpu_vuln_whitelist, NO_ITLB_MULTIHIT) &&
- !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
+ !(x86_arch_cap_msr & ARCH_CAP_PSCHANGE_MC_NO))
setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);

if (cpu_matches(cpu_vuln_whitelist, NO_SPECULATION))
@@ -1335,7 +1335,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
setup_force_cpu_bug(X86_BUG_SPECTRE_V2);

if (!cpu_matches(cpu_vuln_whitelist, NO_SSB) &&
- !(ia32_cap & ARCH_CAP_SSB_NO) &&
+ !(x86_arch_cap_msr & ARCH_CAP_SSB_NO) &&
!cpu_has(c, X86_FEATURE_AMD_SSB_NO))
setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);

@@ -1346,17 +1346,17 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
* Don't use AutoIBRS when SNP is enabled because it degrades host
* userspace indirect branch performance.
*/
- if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
+ if ((x86_arch_cap_msr & ARCH_CAP_IBRS_ALL) ||
(cpu_has(c, X86_FEATURE_AUTOIBRS) &&
!cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
- !(ia32_cap & ARCH_CAP_PBRSB_NO))
+ !(x86_arch_cap_msr & ARCH_CAP_PBRSB_NO))
setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB);
}

if (!cpu_matches(cpu_vuln_whitelist, NO_MDS) &&
- !(ia32_cap & ARCH_CAP_MDS_NO)) {
+ !(x86_arch_cap_msr & ARCH_CAP_MDS_NO)) {
setup_force_cpu_bug(X86_BUG_MDS);
if (cpu_matches(cpu_vuln_whitelist, MSBDS_ONLY))
setup_force_cpu_bug(X86_BUG_MSBDS_ONLY);
@@ -1375,9 +1375,9 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
* TSX_CTRL check alone is not sufficient for cases when the microcode
* update is not present or running as guest that don't get TSX_CTRL.
*/
- if (!(ia32_cap & ARCH_CAP_TAA_NO) &&
+ if (!(x86_arch_cap_msr & ARCH_CAP_TAA_NO) &&
(cpu_has(c, X86_FEATURE_RTM) ||
- (ia32_cap & ARCH_CAP_TSX_CTRL_MSR)))
+ (x86_arch_cap_msr & ARCH_CAP_TSX_CTRL_MSR)))
setup_force_cpu_bug(X86_BUG_TAA);

/*
@@ -1403,7 +1403,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
* Set X86_BUG_MMIO_UNKNOWN for CPUs that are neither in the blacklist,
* nor in the whitelist and also don't enumerate MSR ARCH_CAP MMIO bits.
*/
- if (!arch_cap_mmio_immune(ia32_cap)) {
+ if (!arch_cap_mmio_immune(x86_arch_cap_msr)) {
if (cpu_matches(cpu_vuln_blacklist, MMIO))
setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
else if (!cpu_matches(cpu_vuln_whitelist, NO_MMIO))
@@ -1411,7 +1411,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
}

if (!cpu_has(c, X86_FEATURE_BTC_NO)) {
- if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA))
+ if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (x86_arch_cap_msr & ARCH_CAP_RSBA))
setup_force_cpu_bug(X86_BUG_RETBLEED);
}

@@ -1429,15 +1429,15 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
* disabling AVX2. The only way to do this in HW is to clear XCR0[2],
* which means that AVX will be disabled.
*/
- if (cpu_matches(cpu_vuln_blacklist, GDS) && !(ia32_cap & ARCH_CAP_GDS_NO) &&
+ if (cpu_matches(cpu_vuln_blacklist, GDS) && !(x86_arch_cap_msr & ARCH_CAP_GDS_NO) &&
boot_cpu_has(X86_FEATURE_AVX))
setup_force_cpu_bug(X86_BUG_GDS);

- if (vulnerable_to_rfds(ia32_cap))
+ if (vulnerable_to_rfds(x86_arch_cap_msr))
setup_force_cpu_bug(X86_BUG_RFDS);

/* When virtualized, eIBRS could be hidden, assume vulnerable */
- if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
+ if (!(x86_arch_cap_msr & ARCH_CAP_BHI_NO) &&
!cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
(boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
boot_cpu_has(X86_FEATURE_HYPERVISOR)))
@@ -1447,7 +1447,7 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
return;

/* Rogue Data Cache Load? No! */
- if (ia32_cap & ARCH_CAP_RDCL_NO)
+ if (x86_arch_cap_msr & ARCH_CAP_RDCL_NO)
return;

setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN);

Subject: [tip: x86/urgent] x86/bugs: Fix BHI documentation

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: dfe648903f42296866d79f10d03f8c85c9dfba30
Gitweb: https://git.kernel.org/tip/dfe648903f42296866d79f10d03f8c85c9dfba30
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:45 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 11 Apr 2024 10:30:25 +02:00

x86/bugs: Fix BHI documentation

Fix up some inaccuracies in the BHI documentation.

Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/8c84f7451bfe0dd08543c6082a383f390d4aa7e2.1712813475.git.jpoimboe@kernel.org
---
Documentation/admin-guide/hw-vuln/spectre.rst | 15 ++++++++-------
Documentation/admin-guide/kernel-parameters.txt | 12 +++++++-----
2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index b70b1d8..3cf18e4 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -439,11 +439,11 @@ The possible values in this file are:
- System is protected by retpoline
* - BHI: BHI_DIS_S
- System is protected by BHI_DIS_S
- * - BHI: SW loop; KVM SW loop
+ * - BHI: SW loop, KVM SW loop
- System is protected by software clearing sequence
* - BHI: Syscall hardening
- Syscalls are hardened against BHI
- * - BHI: Syscall hardening; KVM: SW loop
+ * - BHI: Syscall hardening, KVM: SW loop
- System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence

Full mitigation might require a microcode update from the CPU
@@ -666,13 +666,14 @@ kernel command line.
of the HW BHI control and the SW BHB clearing sequence.

on
- unconditionally enable.
+ (default) Enable the HW or SW mitigation as
+ needed.
off
- unconditionally disable.
+ Disable the mitigation.
auto
- enable if hardware mitigation
- control(BHI_DIS_S) is available, otherwise
- enable alternate mitigation in KVM.
+ Enable the HW mitigation if needed, but
+ *don't* enable the SW mitigation except for KVM.
+ The system may be vulnerable.

For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 70046a0..a029ad6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3444,6 +3444,7 @@
retbleed=off [X86]
spec_rstack_overflow=off [X86]
spec_store_bypass_disable=off [X86,PPC]
+ spectre_bhi=off [X86]
spectre_v2_user=off [X86]
srbds=off [X86,INTEL]
ssbd=force-off [ARM64]
@@ -6069,11 +6070,12 @@
deployment of the HW BHI control and the SW BHB
clearing sequence.

- on - unconditionally enable.
- off - unconditionally disable.
- auto - (default) enable hardware mitigation
- (BHI_DIS_S) if available, otherwise enable
- alternate mitigation in KVM.
+ on - (default) Enable the HW or SW mitigation
+ as needed.
+ off - Disable the mitigation.
+ auto - Enable the HW mitigation if needed, but
+ *don't* enable the SW mitigation except
+ for KVM. The system may be vulnerable.

spectre_v2= [X86,EARLY] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.

Subject: [tip: x86/urgent] x86/bugs: Only harden syscalls when needed

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 92dc47c57eac757c0987b0af0405d12192feff80
Gitweb: https://git.kernel.org/tip/92dc47c57eac757c0987b0af0405d12192feff80
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:49 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 11 Apr 2024 10:30:33 +02:00

x86/bugs: Only harden syscalls when needed

Syscall hardening (i.e., converting the syscall indirect branch to a
series of direct branches) may cause performance regressions in certain
scenarios. Only use the syscall hardening when indirect branches are
considered unsafe.

Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/97befd7c1e008797734dee05181c49056ff6de57.1712813475.git.jpoimboe@kernel.org
---
arch/x86/entry/common.c | 30 ++++++++++++++++++++++++---
arch/x86/entry/syscall_32.c | 11 +----------
arch/x86/entry/syscall_64.c | 8 +-------
arch/x86/entry/syscall_x32.c | 7 +++++-
arch/x86/include/asm/cpufeatures.h | 1 +-
arch/x86/include/asm/syscall.h | 8 ++++++-
arch/x86/kernel/cpu/bugs.c | 32 ++++++++++++++++++++++++++++-
7 files changed, 74 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6de50b8..80d432d 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -39,6 +39,28 @@

#ifdef CONFIG_X86_64

+/*
+ * Do either a direct or an indirect call, depending on whether indirect calls
+ * are considered safe.
+ */
+#define __do_syscall(table, func_direct, nr, regs) \
+({ \
+ unsigned long __rax, __rdi, __rsi; \
+ \
+ asm_inline volatile( \
+ ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
+ ANNOTATE_RETPOLINE_SAFE \
+ "call *%[func_ptr]\n\t", \
+ X86_FEATURE_INDIRECT_SAFE) \
+ : "=D" (__rdi), "=S" (__rsi), "=a" (__rax), \
+ ASM_CALL_CONSTRAINT \
+ : "0" (regs), "1" (nr), [func_ptr] "r" (table[nr]) \
+ : "rdx", "rcx", "r8", "r9", "r10", "r11", \
+ "cc", "memory"); \
+ \
+ __rax; \
+})
+
static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
{
/*
@@ -49,7 +71,7 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)

if (likely(unr < NR_syscalls)) {
unr = array_index_nospec(unr, NR_syscalls);
- regs->ax = x64_sys_call(regs, unr);
+ regs->ax = __do_syscall(sys_call_table, x64_sys_call, unr, regs);
return true;
}
return false;
@@ -66,7 +88,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)

if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
xnr = array_index_nospec(xnr, X32_NR_syscalls);
- regs->ax = x32_sys_call(regs, xnr);
+ regs->ax = __do_syscall(x32_sys_call_table, x32_sys_call, xnr, regs);
return true;
}
return false;
@@ -147,6 +169,8 @@ static int ia32_emulation_override_cmdline(char *arg)
return kstrtobool(arg, &__ia32_enabled);
}
early_param("ia32_emulation", ia32_emulation_override_cmdline);
+#else
+#define __do_syscall(table, func_direct, nr, regs) table[nr](regs)
#endif

/*
@@ -162,7 +186,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)

if (likely(unr < IA32_NR_syscalls)) {
unr = array_index_nospec(unr, IA32_NR_syscalls);
- regs->ax = ia32_sys_call(regs, unr);
+ regs->ax = __do_syscall(ia32_sys_call_table, ia32_sys_call, unr, regs);
} else if (nr != -1) {
regs->ax = __ia32_sys_ni_syscall(regs);
}
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index c2235ba..9185870 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -14,25 +14,16 @@
#endif

#define __SYSCALL(nr, sym) extern long __ia32_##sym(const struct pt_regs *);
-
#include <asm/syscalls_32.h>
#undef __SYSCALL

-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
-#ifdef CONFIG_X86_32
#define __SYSCALL(nr, sym) __ia32_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+__visible const sys_call_ptr_t ia32_sys_call_table[] = {
#include <asm/syscalls_32.h>
};
#undef __SYSCALL
-#endif

#define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
-
long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 33b3f09..c368048 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -11,19 +11,13 @@
#include <asm/syscalls_64.h>
#undef __SYSCALL

-/*
- * The sys_call_table[] is no longer used for system calls, but
- * kernel/trace/trace_syscalls.c still wants to know the system
- * call address.
- */
#define __SYSCALL(nr, sym) __x64_##sym,
-const sys_call_ptr_t sys_call_table[] = {
+asmlinkage const sys_call_ptr_t sys_call_table[] = {
#include <asm/syscalls_64.h>
};
#undef __SYSCALL

#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
-
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 03de4a9..89a7172 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -11,8 +11,13 @@
#include <asm/syscalls_x32.h>
#undef __SYSCALL

-#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
+#define __SYSCALL(nr, sym) __x64_##sym,
+asmlinkage const sys_call_ptr_t x32_sys_call_table[] = {
+#include <asm/syscalls_x32.h>
+};
+#undef __SYSCALL

+#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c74343..7c87fe8 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -470,6 +470,7 @@
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_INDIRECT_SAFE (21*32+ 4) /* "" Indirect branches aren't vulnerable to Spectre v2 */

/*
* BUG word(s)
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 2fc7bc3..dfb5952 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -16,14 +16,20 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>

-/* This is used purely for kernel/trace/trace_syscalls.c */
typedef long (*sys_call_ptr_t)(const struct pt_regs *);
extern const sys_call_ptr_t sys_call_table[];

+#if defined(CONFIG_X86_32)
+#define ia32_sys_call_table sys_call_table
+#else
/*
* These may not exist, but still put the prototypes in so we
* can use IS_ENABLED().
*/
+extern const sys_call_ptr_t ia32_sys_call_table[];
+extern const sys_call_ptr_t x32_sys_call_table[];
+#endif
+
extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 9eeb60f..b4ec0f0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1669,6 +1669,15 @@ static void __init bhi_select_mitigation(void)
if (!IS_ENABLED(CONFIG_X86_64))
return;

+ /*
+ * There's no hardware mitigation in place, so mark indirect branches
+ * as unsafe.
+ *
+ * One could argue the SW loop makes indirect branches safe again, but
+ * Linus prefers it this way.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
+
/* Mitigate KVM by default */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
@@ -1687,6 +1696,21 @@ static void __init spectre_v2_select_mitigation(void)
enum spectre_v2_mitigation mode = SPECTRE_V2_NONE;

/*
+ * X86_FEATURE_INDIRECT_SAFE indicates whether indirect calls can be
+ * considered safe. That means either:
+ *
+ * - the CPU isn't vulnerable to Spectre v2 or its variants;
+ *
+ * - a hardware mitigation is in place (e.g., IBRS, BHI_DIS_S); or
+ *
+ * - the user turned off mitigations altogether.
+ *
+ * Assume innocence until proven guilty: set the cap bit now, then
+ * clear it later if/when needed.
+ */
+ setup_force_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
+
+ /*
* If the CPU is not affected and the command line mode is NONE or AUTO
* then nothing to do.
*/
@@ -1720,6 +1744,7 @@ static void __init spectre_v2_select_mitigation(void)

case SPECTRE_V2_CMD_RETPOLINE_LFENCE:
pr_err(SPECTRE_V2_LFENCE_MSG);
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
mode = SPECTRE_V2_LFENCE;
break;

@@ -1772,11 +1797,16 @@ static void __init spectre_v2_select_mitigation(void)
break;

case SPECTRE_V2_LFENCE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
+ fallthrough;
case SPECTRE_V2_EIBRS_LFENCE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
- fallthrough;
+ setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
+ break;

case SPECTRE_V2_RETPOLINE:
+ setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
+ fallthrough;
case SPECTRE_V2_EIBRS_RETPOLINE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
break;

2024-04-11 10:03:10

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/bugs: Fix BHI handling of RRSBA

On 11/04/2024 6:40 am, Josh Poimboeuf wrote:
> The ARCH_CAP_RRSBA check isn't correct: RRSBA may have already been
> disabled by the Spectre v2 mitigation (or can otherwise be disabled by
> the BHI mitigation itself if needed). In that case retpolines are fine.
>
> Fixes: ec9404e40e8f ("x86/bhi: Add BHI mitigation knob")
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> arch/x86/kernel/cpu/bugs.c | 30 ++++++++++++++++++------------
> 1 file changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 27d6d64eeec3..0755600d5d18 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1538,20 +1538,25 @@ static enum spectre_v2_mitigation __init spectre_v2_select_retpoline(void)
> return SPECTRE_V2_RETPOLINE;
> }
>
> +static bool __ro_after_init rrsba_disabled;
> +
> /* Disable in-kernel use of non-RSB RET predictors */
> static void __init spec_ctrl_disable_kernel_rrsba(void)
> {
> - u64 ia32_cap;
> + if (rrsba_disabled)
> + return;
> +
> + if (!(ia32_cap & ARCH_CAP_RRSBA)) {
> + rrsba_disabled = true;
> + return;
> + }

You'll take this path if you have out-of-date microcode.

RRSBA is only enumerated from September last year, IIRC.  (Definitely
from this point on some CPUs.)

When RRSBA was introduced, I was under the (false) impression that all
eIBRS systems suffered RRSBA, but it turns out that select parts
(ICX,TGL,RKL) are non-RRSBA despite not having RRSBA_CTRL.

~Andrew

2024-04-11 10:17:53

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On 11/04/2024 6:40 am, Josh Poimboeuf wrote:
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 6de50b80702e..80d432d2fe44 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -39,6 +39,28 @@
>
> #ifdef CONFIG_X86_64
>
> +/*
> + * Do either a direct or an indirect call, depending on whether indirect calls
> + * are considered safe.
> + */
> +#define __do_syscall(table, func_direct, nr, regs) \
> +({ \
> + unsigned long __rax, __rdi, __rsi; \
> + \
> + asm_inline volatile( \
> + ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
> + ANNOTATE_RETPOLINE_SAFE \
> + "call *%[func_ptr]\n\t", \

This wants to be a plain maybe-thunk'd indirect call, and without the
ANNOTATE_RETPOLINE_SAFE.

Or you're going to get into cases where some combinations of command
line options do unexpected things e.g. retpolining everything except the
syscall dispatch.

~Andrew

2024-04-11 15:09:06

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Thu, Apr 11, 2024 at 09:20:17AM +0300, Nikolay Borisov wrote:
> On 11.04.24 г. 8:40 ч., Josh Poimboeuf wrote:
> > Syscall hardening (i.e., converting the syscall indirect branch to a
> > series of direct branches) may cause performance regressions in certain
> > scenarios. Only use the syscall hardening when indirect branches are
> > considered unsafe.
> >
> > Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
> > Signed-off-by: Josh Poimboeuf <[email protected]>
>
> Why fiddle with syscall mechanism if the bhb scrubbing sequence mitigates
> bhb? AFAIU (correct me if I'm wrong) the original idea was to have use
> syscall hardening instead of the BHB sequence but since it became clear
> that's not sufficient bhb scrubbing completely subsumes the direct branch
> approach in the syscall handler?

I agree, but I think Linus wanted it for some reason. I might not have
gotten the X86_FEATURE_INDIRECT_SAFE conditions right, maybe Linus can
clarify.

I'm going to experiment with having objtool find all indirect branches
reachable 66 branches from syscall entry. If we converted all those to
direct branches then the SW loop wouldn't be needed.

But until then I don't see much point in the syscall direct branches.
We could just disable it completely until if/when it's really needed.

--
Josh

2024-04-11 15:28:08

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 7/7] x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

On Thu, Apr 11, 2024 at 09:48:42AM +0200, Ingo Molnar wrote:
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -1628,7 +1628,7 @@ enum bhi_mitigations {
> > };
> >
> > static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > - IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
> > + IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
>
> Uhm, after this patch there's no CONFIG_MITIGATION_SPECTRE_BHI_ON anymore,
> which is kindof nasty, as IS_ENABLED() doesn't generate a build failure for
> non-existent Kconfig variables IIRC ...
>
> So AFAICT this patch turns on BHI unconditionally.
>
> I've fixed as per the patch below, but please double check the end result
> in tip:x86/urgent once I've pushed it out..

Oof, thanks. The result in tip looks good.

We should add a permanent build time check to catch cases like this.

--
Josh

2024-04-11 16:00:50

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 3/7] x86/bugs: Fix BHI handling of RRSBA

On Thu, Apr 11, 2024 at 11:02:42AM +0100, Andrew Cooper wrote:
> > /* Disable in-kernel use of non-RSB RET predictors */
> > static void __init spec_ctrl_disable_kernel_rrsba(void)
> > {
> > - u64 ia32_cap;
> > + if (rrsba_disabled)
> > + return;
> > +
> > + if (!(ia32_cap & ARCH_CAP_RRSBA)) {
> > + rrsba_disabled = true;
> > + return;
> > + }
>
> You'll take this path if you have out-of-date microcode.
>
> RRSBA is only enumerated from September last year, IIRC.  (Definitely
> from this point on some CPUs.)
>
> When RRSBA was introduced, I was under the (false) impression that all
> eIBRS systems suffered RRSBA, but it turns out that select parts
> (ICX,TGL,RKL) are non-RRSBA despite not having RRSBA_CTRL.

Hm, so the original code here had this problem too, right?

if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
!(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
return;

At this point I'm having a hard time caring about 7 months out-of-date
microcode, but is there a reasonable way to check for that?

--
Josh

2024-04-11 16:38:48

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Thu, Apr 11, 2024 at 11:06:37AM +0100, Andrew Cooper wrote:
> > +#define __do_syscall(table, func_direct, nr, regs) \
> > +({ \
> > + unsigned long __rax, __rdi, __rsi; \
> > + \
> > + asm_inline volatile( \
> > + ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
> > + ANNOTATE_RETPOLINE_SAFE \
> > + "call *%[func_ptr]\n\t", \
>
> This wants to be a plain maybe-thunk'd indirect call, and without the
> ANNOTATE_RETPOLINE_SAFE.
>
> Or you're going to get into cases where some combinations of command
> line options do unexpected things e.g. retpolining everything except the
> syscall dispatch.

In that case won't X86_FEATURE_INDIRECT_SAFE get cleared, resulting in
the above using a direct call? Or did I miss something?

--
Josh

2024-04-12 00:15:37

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Wed, Apr 10, 2024 at 10:40:49PM -0700, Josh Poimboeuf wrote:
> Syscall hardening (i.e., converting the syscall indirect branch to a
> series of direct branches) may cause performance regressions in certain
> scenarios. Only use the syscall hardening when indirect branches are
> considered unsafe.
>
> Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> arch/x86/entry/common.c | 30 +++++++++++++++++++++++++---
> arch/x86/entry/syscall_32.c | 11 +---------
> arch/x86/entry/syscall_64.c | 8 +-------
> arch/x86/entry/syscall_x32.c | 7 ++++++-
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/syscall.h | 8 +++++++-
> arch/x86/kernel/cpu/bugs.c | 32 +++++++++++++++++++++++++++++-
> 7 files changed, 74 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 6de50b80702e..80d432d2fe44 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -39,6 +39,28 @@
>
> #ifdef CONFIG_X86_64
>
> +/*
> + * Do either a direct or an indirect call, depending on whether indirect calls
> + * are considered safe.
> + */
> +#define __do_syscall(table, func_direct, nr, regs) \
> +({ \
> + unsigned long __rax, __rdi, __rsi; \
> + \
> + asm_inline volatile( \
> + ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
> + ANNOTATE_RETPOLINE_SAFE \
> + "call *%[func_ptr]\n\t", \

This will likely not insert the lfence before the indirect call in
spectre_v2=eibrs,lfence mode. As X86_FEATURE_INDIRECT_SAFE is not
cleared when eIBRS is enabled, this will not be converted to direct
call.

[...]
> @@ -1720,6 +1744,7 @@ static void __init spectre_v2_select_mitigation(void)
>
> case SPECTRE_V2_CMD_RETPOLINE_LFENCE:
> pr_err(SPECTRE_V2_LFENCE_MSG);
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);

I don't know if it intentional, this seems to be the duplicate of
X86_FEATURE_INDIRECT_SAFE clear later in SPECTRE_V2_LFENCE mode. Also it
seems a bit odd to do this here in SPECTRE_V2_CMD handling.

> mode = SPECTRE_V2_LFENCE;
> break;
>
> @@ -1772,11 +1797,16 @@ static void __init spectre_v2_select_mitigation(void)
> break;
>
> case SPECTRE_V2_LFENCE:
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
> + fallthrough;
> case SPECTRE_V2_EIBRS_LFENCE:
> setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
> - fallthrough;
> + setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> + break;
>
> case SPECTRE_V2_RETPOLINE:
> + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
> + fallthrough;
> case SPECTRE_V2_EIBRS_RETPOLINE:
> setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
> break;

2024-04-12 03:57:59

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Thu, Apr 11, 2024 at 05:15:22PM -0700, Pawan Gupta wrote:
> > + * Do either a direct or an indirect call, depending on whether indirect calls
> > + * are considered safe.
> > + */
> > +#define __do_syscall(table, func_direct, nr, regs) \
> > +({ \
> > + unsigned long __rax, __rdi, __rsi; \
> > + \
> > + asm_inline volatile( \
> > + ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
> > + ANNOTATE_RETPOLINE_SAFE \
> > + "call *%[func_ptr]\n\t", \
>
> This will likely not insert the lfence before the indirect call in
> spectre_v2=eibrs,lfence mode. As X86_FEATURE_INDIRECT_SAFE is not
> cleared when eIBRS is enabled, this will not be converted to direct
> call.

Hm, I think the problem here is that SPECTRE_V2_EIBRS_LFENCE confusingly
sets X86_FEATURE_RETPOLINE. So the following bit unintentionally takes
effect:

/* Retpoline mitigates against BHI unless the CPU has RRSBA behavior */
if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
spec_ctrl_disable_kernel_rrsba();
if (rrsba_disabled)
return;
}

If RRSBA gets disabled (which is likely), bhi_select_mitigation()
returns early and X86_FEATURE_INDIRECT_SAFE doesn't get cleared.

"LFENCE; CALL" is most definitely not a retpoline, so it's weird for
SPECTRE_V2_EIBRS_LFENCE to be setting X86_FEATURE_RETPOLINE. We should
fix that.

Honestly, I think SPECTRE_V2_EIBRS_LFENCE is obsolete anyway. It was
originally intended to be a BHI mitigation, but the real-world
benchmarks I've seen are showing it to be quite a bit slower than the
actual BHI mitigations.

Plus it's only a partial fix because the speculative window after the
branch can still be big enough to do multiple loads.

For similar reasons I'm thinking we should also remove the non-eIBRS
version (SPECTRE_V2_LFENCE).

I'll make some patches to do that, with warnings printed if somebody
tries to use them. They can just fall back to the (more secure and
generally faster) defaults.

> [...]
> > @@ -1720,6 +1744,7 @@ static void __init spectre_v2_select_mitigation(void)
> >
> > case SPECTRE_V2_CMD_RETPOLINE_LFENCE:
> > pr_err(SPECTRE_V2_LFENCE_MSG);
> > + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
>
> I don't know if it intentional, this seems to be the duplicate of
> X86_FEATURE_INDIRECT_SAFE clear later in SPECTRE_V2_LFENCE mode. Also it
> seems a bit odd to do this here in SPECTRE_V2_CMD handling.

Yeah, I accidentally left that in from an earlier implementation. It's
harmless but I'll clean that up too with a new patch unless Ingo wants
to remove that line.

--
Josh

2024-04-12 04:17:39

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Thu, Apr 11, 2024 at 08:57:42PM -0700, Josh Poimboeuf wrote:
> For similar reasons I'm thinking we should also remove the non-eIBRS
> version (SPECTRE_V2_LFENCE).

Actually I guess that's still the default mitigation for AMD so I'll
leave that one in.

--
Josh

2024-04-12 05:21:08

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Thu, Apr 11, 2024 at 09:17:27PM -0700, Josh Poimboeuf wrote:
> On Thu, Apr 11, 2024 at 08:57:42PM -0700, Josh Poimboeuf wrote:
> > For similar reasons I'm thinking we should also remove the non-eIBRS
> > version (SPECTRE_V2_LFENCE).
>
> Actually I guess that's still the default mitigation for AMD so I'll
> leave that one in.

Never mind, I forgot that got deprecated for AMD.

--
Josh

2024-04-12 05:27:53

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Thu, Apr 11, 2024 at 08:57:40PM -0700, Josh Poimboeuf wrote:
> On Thu, Apr 11, 2024 at 05:15:22PM -0700, Pawan Gupta wrote:
> > > + * Do either a direct or an indirect call, depending on whether indirect calls
> > > + * are considered safe.
> > > + */
> > > +#define __do_syscall(table, func_direct, nr, regs) \
> > > +({ \
> > > + unsigned long __rax, __rdi, __rsi; \
> > > + \
> > > + asm_inline volatile( \
> > > + ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
> > > + ANNOTATE_RETPOLINE_SAFE \
> > > + "call *%[func_ptr]\n\t", \
> >
> > This will likely not insert the lfence before the indirect call in
> > spectre_v2=eibrs,lfence mode. As X86_FEATURE_INDIRECT_SAFE is not
> > cleared when eIBRS is enabled, this will not be converted to direct
> > call.
>
> Hm, I think the problem here is that SPECTRE_V2_EIBRS_LFENCE confusingly
> sets X86_FEATURE_RETPOLINE. So the following bit unintentionally takes

I think it is intentional, more on it below.

> effect:
>
> /* Retpoline mitigates against BHI unless the CPU has RRSBA behavior */
> if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
> spec_ctrl_disable_kernel_rrsba();
> if (rrsba_disabled)
> return;
> }
>
> If RRSBA gets disabled (which is likely), bhi_select_mitigation()
> returns early and X86_FEATURE_INDIRECT_SAFE doesn't get cleared.
>
> "LFENCE; CALL" is most definitely not a retpoline, so it's weird for
> SPECTRE_V2_EIBRS_LFENCE to be setting X86_FEATURE_RETPOLINE. We should
> fix that.

I could be completely wrong here, but my guess is, it is needed because
the thunk call inserted by the compiler with X86_FEATURE_RETPOLINE
provides room for adding the extra lfence.

In order to prefix lfence(3 bytes) indirect call is first converted to
call __x86_indirect_thunk_reg, which has a 5 byte opcode. At runtime,
thunk call is patched to "lfence;call *reg", which is also 3+2=5 bytes.

Thunk call is anyways needed because, there are indirect
calls opcodes that are 3 byte long e.g. call *%r8. So, wherever possible
lfence+call* is inlined, otherwise lfence is executed in a call to thunk,
which then does jmp *%reg.

> Honestly, I think SPECTRE_V2_EIBRS_LFENCE is obsolete anyway. It was
> originally intended to be a BHI mitigation, but the real-world
> benchmarks I've seen are showing it to be quite a bit slower than the
> actual BHI mitigations.
>
> Plus it's only a partial fix because the speculative window after the
> branch can still be big enough to do multiple loads.

Thats fair.

2024-04-12 06:29:09

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Wed, Apr 10, 2024 at 10:40:49PM -0700, Josh Poimboeuf wrote:
> Syscall hardening (i.e., converting the syscall indirect branch to a
> series of direct branches) may cause performance regressions in certain
> scenarios. Only use the syscall hardening when indirect branches are
> considered unsafe.
>
> Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> arch/x86/entry/common.c | 30 +++++++++++++++++++++++++---
> arch/x86/entry/syscall_32.c | 11 +---------
> arch/x86/entry/syscall_64.c | 8 +-------
> arch/x86/entry/syscall_x32.c | 7 ++++++-
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/syscall.h | 8 +++++++-
> arch/x86/kernel/cpu/bugs.c | 32 +++++++++++++++++++++++++++++-
> 7 files changed, 74 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 6de50b80702e..80d432d2fe44 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -39,6 +39,28 @@
>
> #ifdef CONFIG_X86_64
>
> +/*
> + * Do either a direct or an indirect call, depending on whether indirect calls
> + * are considered safe.
> + */
> +#define __do_syscall(table, func_direct, nr, regs) \
> +({ \
> + unsigned long __rax, __rdi, __rsi; \
> + \
> + asm_inline volatile( \
> + ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
> + ANNOTATE_RETPOLINE_SAFE \
> + "call *%[func_ptr]\n\t", \
> + X86_FEATURE_INDIRECT_SAFE) \
> + : "=D" (__rdi), "=S" (__rsi), "=a" (__rax), \
> + ASM_CALL_CONSTRAINT \
> + : "0" (regs), "1" (nr), [func_ptr] "r" (table[nr]) \
> + : "rdx", "rcx", "r8", "r9", "r10", "r11", \
> + "cc", "memory"); \
> + \
> + __rax; \
> +})

This is a nice implementation, but I think we can avoid the complexity
by using cpu_feature_enabled(). As cpu_feature_enabled() is also runtime
patched, atleast the likely() path should be comparable to this. Please
let me know if you have any concerns with this approach.

---
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6de50b80702e..7c5332b83246 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,7 +49,11 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)

if (likely(unr < NR_syscalls)) {
unr = array_index_nospec(unr, NR_syscalls);
- regs->ax = x64_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_SAFE)))
+ regs->ax = sys_call_table[unr](regs);
+ else
+ regs->ax = x64_sys_call(regs, unr);
+
return true;
}
return false;
@@ -66,7 +70,11 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)

if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
xnr = array_index_nospec(xnr, X32_NR_syscalls);
- regs->ax = x32_sys_call(regs, xnr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_SAFE)))
+ regs->ax = x32_sys_call_table[xnr](regs);
+ else
+ regs->ax = x32_sys_call(regs, xnr);
+
return true;
}
return false;
@@ -162,7 +170,10 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)

if (likely(unr < IA32_NR_syscalls)) {
unr = array_index_nospec(unr, IA32_NR_syscalls);
- regs->ax = ia32_sys_call(regs, unr);
+ if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_SAFE)))
+ regs->ax = ia32_sys_call_table[unr](regs);
+ else
+ regs->ax = ia32_sys_call(regs, unr);
} else if (nr != -1) {
regs->ax = __ia32_sys_ni_syscall(regs);
}

2024-04-12 06:38:02

by Pawan Gupta

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Thu, Apr 11, 2024 at 11:28:54PM -0700, Pawan Gupta wrote:
> > +#define __do_syscall(table, func_direct, nr, regs) \
> > +({ \
> > + unsigned long __rax, __rdi, __rsi; \
> > + \
> > + asm_inline volatile( \
> > + ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
> > + ANNOTATE_RETPOLINE_SAFE \
> > + "call *%[func_ptr]\n\t", \
> > + X86_FEATURE_INDIRECT_SAFE) \
> > + : "=D" (__rdi), "=S" (__rsi), "=a" (__rax), \
> > + ASM_CALL_CONSTRAINT \
> > + : "0" (regs), "1" (nr), [func_ptr] "r" (table[nr]) \
> > + : "rdx", "rcx", "r8", "r9", "r10", "r11", \
> > + "cc", "memory"); \
> > + \
> > + __rax; \
> > +})
>
> This is a nice implementation, but I think we can avoid the complexity
> by using cpu_feature_enabled(). As cpu_feature_enabled() is also runtime
> patched, atleast the likely() path should be comparable to this. Please
> let me know if you have any concerns with this approach.
>
> ---
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 6de50b80702e..7c5332b83246 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -49,7 +49,11 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
>
> if (likely(unr < NR_syscalls)) {
> unr = array_index_nospec(unr, NR_syscalls);
> - regs->ax = x64_sys_call(regs, unr);
> + if (likely(cpu_feature_enabled(X86_FEATURE_INDIRECT_SAFE)))
> + regs->ax = sys_call_table[unr](regs);
> + else
> + regs->ax = x64_sys_call(regs, unr);

BTW, this also solves the missing lfence case before the indirect call.

2024-04-12 10:08:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed


* Josh Poimboeuf <[email protected]> wrote:

> > [...]
> > > @@ -1720,6 +1744,7 @@ static void __init spectre_v2_select_mitigation(void)
> > >
> > > case SPECTRE_V2_CMD_RETPOLINE_LFENCE:
> > > pr_err(SPECTRE_V2_LFENCE_MSG);
> > > + setup_clear_cpu_cap(X86_FEATURE_INDIRECT_SAFE);
> >
> > I don't know if it intentional, this seems to be the duplicate of
> > X86_FEATURE_INDIRECT_SAFE clear later in SPECTRE_V2_LFENCE mode. Also it
> > seems a bit odd to do this here in SPECTRE_V2_CMD handling.
>
> Yeah, I accidentally left that in from an earlier implementation. It's
> harmless but I'll clean that up too with a new patch unless Ingo wants to
> remove that line.

Lemme remove it entirely from x86/urgent, so that you can submit an updated
patch with all feedback included.

In addition to the above line, Pawan's suggestion of doing it in C via
cpu_feature_enabled() looks quite a bit simpler and easier to read & argue
about, right?

Thanks,

Ingo

Subject: [tip: x86/urgent] x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 4f511739c54b549061993b53fc0380f48dfca23b
Gitweb: https://git.kernel.org/tip/4f511739c54b549061993b53fc0380f48dfca23b
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:51 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 12:05:54 +02:00

x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

For consistency with the other CONFIG_MITIGATION_* options, replace the
CONFIG_SPECTRE_BHI_{ON,OFF} options with a single
CONFIG_MITIGATION_SPECTRE_BHI option.

[ mingo: Fix ]

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Link: https://lore.kernel.org/r/3833812ea63e7fdbe36bf8b932e63f70d18e2a2a.1712813475.git.jpoimboe@kernel.org
---
arch/x86/Kconfig | 17 +++--------------
arch/x86/kernel/cpu/bugs.c | 2 +-
2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b63b676..4474bf3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2633,27 +2633,16 @@ config MITIGATION_RFDS
stored in floating point, vector and integer registers.
See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>

-choice
- prompt "Clear branch history"
+config MITIGATION_SPECTRE_BHI
+ bool "Mitigate Spectre-BHB (Branch History Injection)"
depends on CPU_SUP_INTEL
- default SPECTRE_BHI_ON
+ default y
help
Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks
where the branch history buffer is poisoned to speculatively steer
indirect branches.
See <file:Documentation/admin-guide/hw-vuln/spectre.rst>

-config SPECTRE_BHI_ON
- bool "on"
- help
- Equivalent to setting spectre_bhi=on command line parameter.
-config SPECTRE_BHI_OFF
- bool "off"
- help
- Equivalent to setting spectre_bhi=off command line parameter.
-
-endchoice
-
endif

config ARCH_HAS_ADD_PAGES
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6af4780..ca295b0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1628,7 +1628,7 @@ enum bhi_mitigations {
};

static enum bhi_mitigations bhi_mitigation __ro_after_init =
- IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
+ IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;

static int __init spectre_bhi_parse_cmdline(char *str)
{

Subject: [tip: x86/urgent] x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 36d4fe147c870f6d3f6602befd7ef44393a1c87a
Gitweb: https://git.kernel.org/tip/36d4fe147c870f6d3f6602befd7ef44393a1c87a
Author: Josh Poimboeuf <[email protected]>
AuthorDate: Wed, 10 Apr 2024 22:40:50 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 12:05:54 +02:00

x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto

Unlike most other mitigations' "auto" options, spectre_bhi=auto only
mitigates newer systems, which is confusing and not particularly useful.

Remove it.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/412e9dc87971b622bbbaf64740ebc1f140bff343.1712813475.git.jpoimboe@kernel.org
---
Documentation/admin-guide/hw-vuln/spectre.rst | 4 ----
Documentation/admin-guide/kernel-parameters.txt | 3 ---
arch/x86/Kconfig | 5 -----
arch/x86/kernel/cpu/bugs.c | 10 +---------
4 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 5a39acf..25a04cd 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -669,10 +669,6 @@ kernel command line.
needed.
off
Disable the mitigation.
- auto
- Enable the HW mitigation if needed, but
- *don't* enable the SW mitigation except for KVM.
- The system may be vulnerable.

For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a3874cc..902ecd9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6072,9 +6072,6 @@
on - (default) Enable the HW or SW mitigation
as needed.
off - Disable the mitigation.
- auto - Enable the HW mitigation if needed, but
- *don't* enable the SW mitigation except
- for KVM. The system may be vulnerable.

spectre_v2= [X86,EARLY] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 10a6251..b63b676 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2651,11 +2651,6 @@ config SPECTRE_BHI_OFF
bool "off"
help
Equivalent to setting spectre_bhi=off command line parameter.
-config SPECTRE_BHI_AUTO
- bool "auto"
- depends on BROKEN
- help
- Equivalent to setting spectre_bhi=auto command line parameter.

endchoice

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 9eeb60f..6af4780 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1625,13 +1625,10 @@ static bool __init spec_ctrl_bhi_dis(void)
enum bhi_mitigations {
BHI_MITIGATION_OFF,
BHI_MITIGATION_ON,
- BHI_MITIGATION_AUTO,
};

static enum bhi_mitigations bhi_mitigation __ro_after_init =
- IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON :
- IS_ENABLED(CONFIG_SPECTRE_BHI_OFF) ? BHI_MITIGATION_OFF :
- BHI_MITIGATION_AUTO;
+ IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;

static int __init spectre_bhi_parse_cmdline(char *str)
{
@@ -1642,8 +1639,6 @@ static int __init spectre_bhi_parse_cmdline(char *str)
bhi_mitigation = BHI_MITIGATION_OFF;
else if (!strcmp(str, "on"))
bhi_mitigation = BHI_MITIGATION_ON;
- else if (!strcmp(str, "auto"))
- bhi_mitigation = BHI_MITIGATION_AUTO;
else
pr_err("Ignoring unknown spectre_bhi option (%s)", str);

@@ -1673,9 +1668,6 @@ static void __init bhi_select_mitigation(void)
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");

- if (bhi_mitigation == BHI_MITIGATION_AUTO)
- return;
-
/* Mitigate syscalls when the mitigation is forced =on */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP);
pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");

2024-04-12 10:25:07

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On 11/04/2024 4:38 pm, Josh Poimboeuf wrote:
> On Thu, Apr 11, 2024 at 11:06:37AM +0100, Andrew Cooper wrote:
>>> +#define __do_syscall(table, func_direct, nr, regs) \
>>> +({ \
>>> + unsigned long __rax, __rdi, __rsi; \
>>> + \
>>> + asm_inline volatile( \
>>> + ALTERNATIVE("call " __stringify(func_direct) "\n\t", \
>>> + ANNOTATE_RETPOLINE_SAFE \
>>> + "call *%[func_ptr]\n\t", \
>> This wants to be a plain maybe-thunk'd indirect call, and without the
>> ANNOTATE_RETPOLINE_SAFE.
>>
>> Or you're going to get into cases where some combinations of command
>> line options do unexpected things e.g. retpolining everything except the
>> syscall dispatch.
> In that case won't X86_FEATURE_INDIRECT_SAFE get cleared, resulting in
> the above using a direct call? Or did I miss something?

That works until the next time anyone touches the logic.  Then it's
latent vulnerability, or an incorrect trial of a non-default option.

I guarantee you'll save someone (probably someone on this CC list) a
headache in the future by not introducing an unnecessary special case here.

~Andrew

2024-04-12 10:38:18

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On 12/04/2024 6:20 am, Josh Poimboeuf wrote:
> On Thu, Apr 11, 2024 at 09:17:27PM -0700, Josh Poimboeuf wrote:
>> On Thu, Apr 11, 2024 at 08:57:42PM -0700, Josh Poimboeuf wrote:
>>> For similar reasons I'm thinking we should also remove the non-eIBRS
>>> version (SPECTRE_V2_LFENCE).
>> Actually I guess that's still the default mitigation for AMD so I'll
>> leave that one in.
> Never mind, I forgot that got deprecated for AMD.

And then became necessary on two Atoms, although I can't for the life of
of me find Intel's footnote about this in the maze of speculation docs...

~Andrew

2024-04-12 20:24:50

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 5/7] x86/bugs: Only harden syscalls when needed

On Fri, Apr 12, 2024 at 11:36:04AM +0100, Andrew Cooper wrote:
> On 12/04/2024 6:20 am, Josh Poimboeuf wrote:
> > On Thu, Apr 11, 2024 at 09:17:27PM -0700, Josh Poimboeuf wrote:
> >> On Thu, Apr 11, 2024 at 08:57:42PM -0700, Josh Poimboeuf wrote:
> >>> For similar reasons I'm thinking we should also remove the non-eIBRS
> >>> version (SPECTRE_V2_LFENCE).
> >> Actually I guess that's still the default mitigation for AMD so I'll
> >> leave that one in.
> > Never mind, I forgot that got deprecated for AMD.
>
> And then became necessary on two Atoms, although I can't for the life of
> of me find Intel's footnote about this in the maze of speculation docs...

Found it on this page [1] but it doesn't seem to be a very confident
endorsement. And Linux doesn't seem to enable it for those parts
regardless.

Intel® Atom Goldmont Plus and Tremont Mitigation

Retpoline may not be a fully effective branch target injection
mitigation on processors which are based on Intel Atom
microarchitectures code named Goldmont Plus and Tremont, as documented
in our existing guidance. On such processors, an LFENCE;JMP sequence may
be an alternative for retpoline, although this is not architecturally
guaranteed. Instructions may still be speculatively executed at the
predicted near JMP target, which can allow some forms of shallow gadgets
(for example, revealing register values) to be transiently executed.

Intel is not currently evaluating LFENCE;JMP as an option other than for
processors based on Goldmont Plus and Tremont microarchitectures, given
the possibility of a sufficiently large transient window to execute a
disclosure gadget.

https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html

--
Josh

2024-04-17 05:36:28

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH 7/7] x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI

Hi Ingo,

On 4/11/2024 1:18 AM, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
>>> static enum bhi_mitigations bhi_mitigation __ro_after_init =
>>> - IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
>>> + IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON : BHI_MITIGATION_OFF;
>>
>> Uhm, after this patch there's no CONFIG_MITIGATION_SPECTRE_BHI_ON
>> anymore, which is kindof nasty, as IS_ENABLED() doesn't generate a build
>> failure for non-existent Kconfig variables IIRC ...
>>
>> So AFAICT this patch turns on BHI unconditionally.
>
> BTW., this is why IS_ENABLED() is a bad primitive IMO:
>
> kepler:~/tip> for N in $(git grep -w IS_ENABLED arch/x86/ | sed 's/.*IS_ENABLED(//g' | sed 's/).*//g' | sort | uniq | sed 's/^CONFIG_//g'); do printf "# %40s: " $N; git grep -E "^config $N$" -- '**Kconfig**' | wc -l; done | grep -w 0
> # RESCTRL_RMID_DEPENDS_ON_CLOSID: 0
> # NODE_NOT_IN_PAGE_FLAGS: 0
>
> 1)
>
> CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID doesn't exist anymore, but is used
> widely:
>
> kepler:~/tip> git grep RESCTRL_RMID_DEPENDS_ON_CLOSID
> arch/x86/kernel/cpu/resctrl/monitor.c: * Only allocated when CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is defined.
> arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> arch/x86/kernel/cpu/resctrl/monitor.c: if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> arch/x86/kernel/cpu/resctrl/monitor.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> arch/x86/kernel/cpu/resctrl/rdtgroup.c: if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
>
> Each of those uses is bogus, as the Kconfig symbol doesn't exist. (!)
>
> AFAICT CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID was never defined within the
> upstream kernel (!!).
>
> AFAICT the first bogus CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID use was
> introduced in this recent commit:
>
> b30a55df60c3 ("x86/resctrl: Track the number of dirty RMID a CLOSID has")
>
> ... and was cargo-cult copied in other patches. It was never explained in
> the changelog why it's used without a Kconfig entry anywhere.
>
> Maybe in the future some other arch might (or might not) introduce
> RESCTRL_RMID_DEPENDS_ON_CLOSID, but that doesn't justify this bad pattern
> of dead code ...
>

We are in the final stage [1] of untangling the resctrl filesystem code from the
x86 specific code with the goal for the resctrl interface to also be used for Arm's
(Memory System Resource Partitioning and Monitoring) MPAM that is similar enough
to AMD's Platform Quality of Service (PQOS) and Intel's Resource Director Technology
(RDT) for which resctrl is currently used.

During this work we incrementally separated the resctrl filesystem from the x86
specific code. RESCTRL_RMID_DEPENDS_ON_CLOSID was introduced during this effort
to prepare the resctrl filesystem for a fundamental difference between x86 and Arm
and you are correct that this is dead code until the Arm support arrives. The Kconfig
entry is currently being reviewed [2] as part of the portion where the resctrl
filesystem structure starts to take shape but it remains without a user until the
Arm support arrives in the next part of this effort.

I do believe that there are precedent for subsystems obtaining support for features
a couple of kernel versions before users of the features appear. With Arm MPAM support
requiring this capability from the new "resctrl filesystem" it did seem reasonable at
the time to add the capability in preparation for Arm support.

Reinette

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/