2022-07-12 19:37:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 00/78] 5.15.55-rc1 review

This is the start of the stable review cycle for the 5.15.55 release.
There are 78 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 5.15.55-rc1

Thomas Gleixner <[email protected]>
x86/static_call: Serialize __static_call_fixup() properly

Pawan Gupta <[email protected]>
x86/speculation: Disable RRSBA behavior

Konrad Rzeszutek Wilk <[email protected]>
x86/kexec: Disable RET on kexec

Thadeu Lima de Souza Cascardo <[email protected]>
x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported

Peter Zijlstra <[email protected]>
x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry

Pawan Gupta <[email protected]>
x86/bugs: Add Cannon lake to RETBleed affected CPU list

Peter Zijlstra <[email protected]>
x86/retbleed: Add fine grained Kconfig knobs

Andrew Cooper <[email protected]>
x86/cpu/amd: Enumerate BTC_NO

Peter Zijlstra <[email protected]>
x86/common: Stamp out the stepping madness

Josh Poimboeuf <[email protected]>
x86/speculation: Fill RSB on vmexit for IBRS

Josh Poimboeuf <[email protected]>
KVM: VMX: Fix IBRS handling after vmexit

Josh Poimboeuf <[email protected]>
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS

Josh Poimboeuf <[email protected]>
KVM: VMX: Convert launched argument to flags

Josh Poimboeuf <[email protected]>
KVM: VMX: Flatten __vmx_vcpu_run()

Josh Poimboeuf <[email protected]>
objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}

Josh Poimboeuf <[email protected]>
x86/speculation: Remove x86_spec_ctrl_mask

Josh Poimboeuf <[email protected]>
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit

Josh Poimboeuf <[email protected]>
x86/speculation: Fix SPEC_CTRL write on SMT state change

Josh Poimboeuf <[email protected]>
x86/speculation: Fix firmware entry SPEC_CTRL handling

Josh Poimboeuf <[email protected]>
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n

Peter Zijlstra <[email protected]>
x86/cpu/amd: Add Spectral Chicken

Peter Zijlstra <[email protected]>
objtool: Add entry UNRET validation

Josh Poimboeuf <[email protected]>
x86/bugs: Do IBPB fallback check only once

Peter Zijlstra <[email protected]>
x86/bugs: Add retbleed=ibpb

Peter Zijlstra <[email protected]>
x86/xen: Add UNTRAIN_RET

Peter Zijlstra <[email protected]>
x86/xen: Rename SYS* entry points

Peter Zijlstra <[email protected]>
objtool: Update Retpoline validation

Peter Zijlstra <[email protected]>
intel_idle: Disable IBRS during long idle

Peter Zijlstra <[email protected]>
x86/bugs: Report Intel retbleed vulnerability

Peter Zijlstra <[email protected]>
x86/bugs: Split spectre_v2_select_mitigation() and spectre_v2_user_select_mitigation()

Pawan Gupta <[email protected]>
x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS

Peter Zijlstra <[email protected]>
x86/bugs: Optimize SPEC_CTRL MSR writes

Thadeu Lima de Souza Cascardo <[email protected]>
x86/entry: Add kernel IBRS implementation

Peter Zijlstra <[email protected]>
x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value

Kim Phillips <[email protected]>
x86/bugs: Enable STIBP for JMP2RET

Alexandre Chartre <[email protected]>
x86/bugs: Add AMD retbleed= boot parameter

Alexandre Chartre <[email protected]>
x86/bugs: Report AMD retbleed vulnerability

Peter Zijlstra <[email protected]>
x86: Add magic AMD return-thunk

Peter Zijlstra <[email protected]>
objtool: Treat .text.__x86.* as noinstr

Peter Zijlstra <[email protected]>
x86/entry: Avoid very early RET

Peter Zijlstra <[email protected]>
x86: Use return-thunk in asm code

Kim Phillips <[email protected]>
x86/sev: Avoid using __x86_return_thunk

Peter Zijlstra <[email protected]>
x86/vsyscall_emu/64: Don't use RET in vsyscall emulation

Peter Zijlstra <[email protected]>
x86/kvm: Fix SETcc emulation for return thunks

Peter Zijlstra <[email protected]>
x86/bpf: Use alternative RET encoding

Peter Zijlstra <[email protected]>
x86/ftrace: Use alternative RET encoding

Peter Zijlstra <[email protected]>
x86,static_call: Use alternative RET encoding

Thadeu Lima de Souza Cascardo <[email protected]>
objtool: skip non-text sections when adding return-thunk sites

Peter Zijlstra <[email protected]>
x86,objtool: Create .return_sites

Peter Zijlstra <[email protected]>
x86: Undo return-thunk damage

Peter Zijlstra <[email protected]>
x86/retpoline: Use -mfunction-return

Peter Zijlstra <[email protected]>
x86/retpoline: Swizzle retpoline thunk

Peter Zijlstra <[email protected]>
x86/retpoline: Cleanup some #ifdefery

Peter Zijlstra <[email protected]>
x86/cpufeatures: Move RETPOLINE flags to word 11

Peter Zijlstra <[email protected]>
x86/kvm/vmx: Make noinstr clean

Thadeu Lima de Souza Cascardo <[email protected]>
x86/realmode: build with -D__DISABLE_EXPORTS

Peter Zijlstra <[email protected]>
x86/entry: Remove skip_r11rcx

Peter Zijlstra <[email protected]>
objtool: Default ignore INT3 for unreachable

Peter Zijlstra <[email protected]>
bpf,x86: Respect X86_FEATURE_RETPOLINE*

Peter Zijlstra <[email protected]>
bpf,x86: Simplify computing label offsets

Peter Zijlstra <[email protected]>
x86/alternative: Add debug prints to apply_retpolines()

Peter Zijlstra <[email protected]>
x86/alternative: Try inline spectre_v2=retpoline,amd

Peter Zijlstra <[email protected]>
x86/alternative: Handle Jcc __x86_indirect_thunk_\reg

Peter Zijlstra <[email protected]>
x86/alternative: Implement .retpoline_sites support

Peter Zijlstra <[email protected]>
x86/retpoline: Create a retpoline thunk array

Peter Zijlstra <[email protected]>
x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h

Peter Zijlstra <[email protected]>
x86/asm: Fixup odd GEN-for-each-reg.h usage

Peter Zijlstra <[email protected]>
x86/asm: Fix register order

Peter Zijlstra <[email protected]>
x86/retpoline: Remove unused replacement symbols

Peter Zijlstra <[email protected]>
objtool: Introduce CFI hash

Peter Zijlstra <[email protected]>
objtool,x86: Replace alternatives with .retpoline_sites

Peter Zijlstra <[email protected]>
objtool: Shrink struct instruction

Peter Zijlstra <[email protected]>
objtool: Explicitly avoid self modifying code in .altinstr_replacement

Peter Zijlstra <[email protected]>
objtool: Classify symbols

Lai Jiangshan <[email protected]>
x86/entry: Don't call error_entry() for XENPV

Lai Jiangshan <[email protected]>
x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry()

Lai Jiangshan <[email protected]>
x86/entry: Switch the stack after error_entry() returns

Lai Jiangshan <[email protected]>
x86/traps: Use pt_regs directly in fixup_bad_iret()


-------------

Diffstat:

Documentation/admin-guide/kernel-parameters.txt | 25 +
Makefile | 10 +-
arch/um/kernel/um_arch.c | 4 +
arch/x86/Kconfig | 103 +++-
arch/x86/Makefile | 2 +-
arch/x86/entry/Makefile | 2 +-
arch/x86/entry/calling.h | 72 ++-
arch/x86/entry/entry.S | 22 +
arch/x86/entry/entry_32.S | 2 -
arch/x86/entry/entry_64.S | 88 ++-
arch/x86/entry/entry_64_compat.S | 21 +-
arch/x86/entry/vdso/Makefile | 1 +
arch/x86/entry/vsyscall/vsyscall_emu_64.S | 9 +-
arch/x86/include/asm/GEN-for-each-reg.h | 14 +-
arch/x86/include/asm/alternative.h | 2 +
arch/x86/include/asm/asm-prototypes.h | 18 -
arch/x86/include/asm/cpufeatures.h | 12 +-
arch/x86/include/asm/disabled-features.h | 21 +-
arch/x86/include/asm/linkage.h | 8 +
arch/x86/include/asm/msr-index.h | 13 +
arch/x86/include/asm/nospec-branch.h | 133 ++---
arch/x86/include/asm/static_call.h | 17 +
arch/x86/include/asm/traps.h | 2 +-
arch/x86/include/asm/unwind_hints.h | 14 +-
arch/x86/kernel/alternative.c | 260 ++++++++-
arch/x86/kernel/cpu/amd.c | 46 +-
arch/x86/kernel/cpu/bugs.c | 475 +++++++++++++---
arch/x86/kernel/cpu/common.c | 61 ++-
arch/x86/kernel/cpu/cpu.h | 2 +
arch/x86/kernel/cpu/hygon.c | 6 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/ftrace.c | 7 +-
arch/x86/kernel/head_64.S | 5 +
arch/x86/kernel/module.c | 15 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/relocate_kernel_32.S | 25 +-
arch/x86/kernel/relocate_kernel_64.S | 23 +-
arch/x86/kernel/static_call.c | 49 +-
arch/x86/kernel/traps.c | 19 +-
arch/x86/kernel/vmlinux.lds.S | 23 +-
arch/x86/kvm/emulate.c | 26 +-
arch/x86/kvm/svm/vmenter.S | 18 +
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/run_flags.h | 8 +
arch/x86/kvm/vmx/vmenter.S | 164 +++---
arch/x86/kvm/vmx/vmx.c | 76 ++-
arch/x86/kvm/vmx/vmx.h | 6 +-
arch/x86/kvm/x86.c | 4 +-
arch/x86/lib/memmove_64.S | 7 +-
arch/x86/lib/retpoline.S | 133 +++--
arch/x86/mm/mem_encrypt_boot.S | 10 +-
arch/x86/net/bpf_jit_comp.c | 179 +++---
arch/x86/net/bpf_jit_comp32.c | 22 +-
arch/x86/xen/setup.c | 6 +-
arch/x86/xen/xen-asm.S | 30 +-
arch/x86/xen/xen-head.S | 1 +
arch/x86/xen/xen-ops.h | 6 +-
drivers/base/cpu.c | 8 +
drivers/idle/intel_idle.c | 43 +-
include/linux/cpu.h | 2 +
include/linux/kvm_host.h | 2 +-
include/linux/objtool.h | 9 +-
scripts/Makefile.build | 1 +
scripts/link-vmlinux.sh | 3 +
security/Kconfig | 11 -
tools/arch/x86/include/asm/msr-index.h | 9 +
tools/include/linux/objtool.h | 9 +-
tools/objtool/arch/x86/decode.c | 145 +----
tools/objtool/builtin-check.c | 4 +-
tools/objtool/check.c | 701 ++++++++++++++++++++----
tools/objtool/elf.c | 84 ---
tools/objtool/include/objtool/arch.h | 3 +-
tools/objtool/include/objtool/builtin.h | 2 +-
tools/objtool/include/objtool/cfi.h | 2 +
tools/objtool/include/objtool/check.h | 10 +-
tools/objtool/include/objtool/elf.h | 9 +-
tools/objtool/include/objtool/objtool.h | 1 +
tools/objtool/objtool.c | 1 +
tools/objtool/orc_gen.c | 15 +-
tools/objtool/special.c | 8 -
80 files changed, 2470 insertions(+), 944 deletions(-)



2022-07-12 19:37:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 13/78] x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h

From: Peter Zijlstra <[email protected]>

commit 6fda8a38865607db739be3e567a2387376222dbd upstream.

Because it makes no sense to split the retpoline gunk over multiple
headers.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/asm-prototypes.h | 8 --------
arch/x86/include/asm/nospec-branch.h | 7 +++++++
arch/x86/net/bpf_jit_comp.c | 1 -
3 files changed, 7 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -17,11 +17,3 @@
extern void cmpxchg8b_emu(void);
#endif

-#ifdef CONFIG_RETPOLINE
-
-#define GEN(reg) \
- extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
-#include <asm/GEN-for-each-reg.h>
-#undef GEN
-
-#endif /* CONFIG_RETPOLINE */
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -5,6 +5,7 @@

#include <linux/static_key.h>
#include <linux/objtool.h>
+#include <linux/linkage.h>

#include <asm/alternative.h>
#include <asm/cpufeatures.h>
@@ -118,6 +119,12 @@
".popsection\n\t"

#ifdef CONFIG_RETPOLINE
+
+#define GEN(reg) \
+ extern asmlinkage void __x86_indirect_thunk_ ## reg (void);
+#include <asm/GEN-for-each-reg.h>
+#undef GEN
+
#ifdef CONFIG_X86_64

/*
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -15,7 +15,6 @@
#include <asm/set_memory.h>
#include <asm/nospec-branch.h>
#include <asm/text-patching.h>
-#include <asm/asm-prototypes.h>

static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
{


2022-07-12 19:37:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 55/78] x86/bugs: Add retbleed=ibpb

From: Peter Zijlstra <[email protected]>

commit 3ebc170068885b6fc7bedda6c667bb2c4d533159 upstream.

jmp2ret mitigates the easy-to-attack case at relatively low overhead.
It mitigates the long speculation windows after a mispredicted RET, but
it does not mitigate the short speculation window from arbitrary
instruction boundaries.

On Zen2, there is a chicken bit which needs setting, which mitigates
"arbitrary instruction boundaries" down to just "basic block boundaries".

But there is no fix for the short speculation window on basic block
boundaries, other than to flush the entire BTB to evict all attacker
predictions.

On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP
or no-SMT):

1) Nothing System wide open
2) jmp2ret May stop a script kiddy
3) jmp2ret+chickenbit Raises the bar rather further
4) IBPB Only thing which can count as "safe".

Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit
on Zen1 according to lmbench.

[ bp: Fixup feature bit comments, document option, 32-bit build fix. ]

Suggested-by: Andrew Cooper <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 3 +
arch/x86/entry/Makefile | 2 -
arch/x86/entry/entry.S | 22 ++++++++++++
arch/x86/include/asm/cpufeatures.h | 2 -
arch/x86/include/asm/nospec-branch.h | 8 +++-
arch/x86/kernel/cpu/bugs.c | 43 ++++++++++++++++++------
6 files changed, 67 insertions(+), 13 deletions(-)
create mode 100644 arch/x86/entry/entry.S

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4978,6 +4978,9 @@
disabling SMT if necessary for
the full mitigation (only on Zen1
and older without STIBP).
+ ibpb - mitigate short speculation windows on
+ basic block boundaries too. Safe, highest
+ perf impact.
unret - force enable untrained return thunks,
only effective on AMD f15h-f17h
based systems.
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -11,7 +11,7 @@ CFLAGS_REMOVE_common.o = $(CC_FLAGS_FTR

CFLAGS_common.o += -fno-stack-protector

-obj-y := entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
+obj-y := entry.o entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
obj-y += common.o

obj-y += vdso/
--- /dev/null
+++ b/arch/x86/entry/entry.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Common place for both 32- and 64-bit entry routines.
+ */
+
+#include <linux/linkage.h>
+#include <asm/export.h>
+#include <asm/msr-index.h>
+
+.pushsection .noinstr.text, "ax"
+
+SYM_FUNC_START(entry_ibpb)
+ movl $MSR_IA32_PRED_CMD, %ecx
+ movl $PRED_CMD_IBPB, %eax
+ xorl %edx, %edx
+ wrmsr
+ RET
+SYM_FUNC_END(entry_ibpb)
+/* For KVM */
+EXPORT_SYMBOL_GPL(entry_ibpb);
+
+.popsection
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -294,7 +294,7 @@
#define X86_FEATURE_PER_THREAD_MBA (11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
#define X86_FEATURE_SGX1 (11*32+ 8) /* "" Basic SGX */
#define X86_FEATURE_SGX2 (11*32+ 9) /* "" SGX Enclave Dynamic Memory Management (EDMM) */
-/* FREE! (11*32+10) */
+#define X86_FEATURE_ENTRY_IBPB (11*32+10) /* "" Issue an IBPB on kernel entry */
/* FREE! (11*32+11) */
#define X86_FEATURE_RETPOLINE (11*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -123,14 +123,17 @@
* return thunk isn't mapped into the userspace tables (then again, AMD
* typically has NO_MELTDOWN).
*
- * Doesn't clobber any registers but does require a stable stack.
+ * While zen_untrain_ret() doesn't clobber anything but requires stack,
+ * entry_ibpb() will clobber AX, CX, DX.
*
* As such, this must be placed after every *SWITCH_TO_KERNEL_CR3 at a point
* where we have a stack but before any RET instruction.
*/
.macro UNTRAIN_RET
#ifdef CONFIG_RETPOLINE
- ALTERNATIVE "", "call zen_untrain_ret", X86_FEATURE_UNRET
+ ALTERNATIVE_2 "", \
+ "call zen_untrain_ret", X86_FEATURE_UNRET, \
+ "call entry_ibpb", X86_FEATURE_ENTRY_IBPB
#endif
.endm

@@ -144,6 +147,7 @@

extern void __x86_return_thunk(void);
extern void zen_untrain_ret(void);
+extern void entry_ibpb(void);

#ifdef CONFIG_RETPOLINE

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -798,6 +798,7 @@ static enum spectre_v2_mitigation spectr
enum retbleed_mitigation {
RETBLEED_MITIGATION_NONE,
RETBLEED_MITIGATION_UNRET,
+ RETBLEED_MITIGATION_IBPB,
RETBLEED_MITIGATION_IBRS,
RETBLEED_MITIGATION_EIBRS,
};
@@ -806,11 +807,13 @@ enum retbleed_mitigation_cmd {
RETBLEED_CMD_OFF,
RETBLEED_CMD_AUTO,
RETBLEED_CMD_UNRET,
+ RETBLEED_CMD_IBPB,
};

const char * const retbleed_strings[] = {
[RETBLEED_MITIGATION_NONE] = "Vulnerable",
[RETBLEED_MITIGATION_UNRET] = "Mitigation: untrained return thunk",
+ [RETBLEED_MITIGATION_IBPB] = "Mitigation: IBPB",
[RETBLEED_MITIGATION_IBRS] = "Mitigation: IBRS",
[RETBLEED_MITIGATION_EIBRS] = "Mitigation: Enhanced IBRS",
};
@@ -840,6 +843,8 @@ static int __init retbleed_parse_cmdline
retbleed_cmd = RETBLEED_CMD_AUTO;
} else if (!strcmp(str, "unret")) {
retbleed_cmd = RETBLEED_CMD_UNRET;
+ } else if (!strcmp(str, "ibpb")) {
+ retbleed_cmd = RETBLEED_CMD_IBPB;
} else if (!strcmp(str, "nosmt")) {
retbleed_nosmt = true;
} else {
@@ -854,11 +859,13 @@ static int __init retbleed_parse_cmdline
early_param("retbleed", retbleed_parse_cmdline);

#define RETBLEED_UNTRAIN_MSG "WARNING: BTB untrained return thunk mitigation is only effective on AMD/Hygon!\n"
-#define RETBLEED_COMPILER_MSG "WARNING: kernel not compiled with RETPOLINE or -mfunction-return capable compiler!\n"
+#define RETBLEED_COMPILER_MSG "WARNING: kernel not compiled with RETPOLINE or -mfunction-return capable compiler; falling back to IBPB!\n"
#define RETBLEED_INTEL_MSG "WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!\n"

static void __init retbleed_select_mitigation(void)
{
+ bool mitigate_smt = false;
+
if (!boot_cpu_has_bug(X86_BUG_RETBLEED) || cpu_mitigations_off())
return;

@@ -870,11 +877,21 @@ static void __init retbleed_select_mitig
retbleed_mitigation = RETBLEED_MITIGATION_UNRET;
break;

+ case RETBLEED_CMD_IBPB:
+ retbleed_mitigation = RETBLEED_MITIGATION_IBPB;
+ break;
+
case RETBLEED_CMD_AUTO:
default:
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
- boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
- retbleed_mitigation = RETBLEED_MITIGATION_UNRET;
+ boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
+
+ if (IS_ENABLED(CONFIG_RETPOLINE) &&
+ IS_ENABLED(CONFIG_CC_HAS_RETURN_THUNK))
+ retbleed_mitigation = RETBLEED_MITIGATION_UNRET;
+ else
+ retbleed_mitigation = RETBLEED_MITIGATION_IBPB;
+ }

/*
* The Intel mitigation (IBRS) was already selected in
@@ -890,26 +907,34 @@ static void __init retbleed_select_mitig
if (!IS_ENABLED(CONFIG_RETPOLINE) ||
!IS_ENABLED(CONFIG_CC_HAS_RETURN_THUNK)) {
pr_err(RETBLEED_COMPILER_MSG);
- retbleed_mitigation = RETBLEED_MITIGATION_NONE;
- break;
+ retbleed_mitigation = RETBLEED_MITIGATION_IBPB;
+ goto retbleed_force_ibpb;
}

setup_force_cpu_cap(X86_FEATURE_RETHUNK);
setup_force_cpu_cap(X86_FEATURE_UNRET);

- if (!boot_cpu_has(X86_FEATURE_STIBP) &&
- (retbleed_nosmt || cpu_mitigations_auto_nosmt()))
- cpu_smt_disable(false);
-
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
pr_err(RETBLEED_UNTRAIN_MSG);
+
+ mitigate_smt = true;
+ break;
+
+ case RETBLEED_MITIGATION_IBPB:
+retbleed_force_ibpb:
+ setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB);
+ mitigate_smt = true;
break;

default:
break;
}

+ if (mitigate_smt && !boot_cpu_has(X86_FEATURE_STIBP) &&
+ (retbleed_nosmt || cpu_mitigations_auto_nosmt()))
+ cpu_smt_disable(false);
+
/*
* Let IBRS trump all on Intel without affecting the effects of the
* retbleed= cmdline option.


2022-07-12 19:38:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 50/78] x86/bugs: Report Intel retbleed vulnerability

From: Peter Zijlstra <[email protected]>

commit 6ad0ad2bf8a67e27d1f9d006a1dabb0e1c360cc3 upstream.

Skylake suffers from RSB underflow speculation issues; report this
vulnerability and it's mitigation (spectre_v2=ibrs).

[jpoimboe: cleanups, eibrs]

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/bugs.c | 39 +++++++++++++++++++++++++++++++++------
arch/x86/kernel/cpu/common.c | 24 ++++++++++++------------
3 files changed, 46 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -91,6 +91,7 @@
#define MSR_IA32_ARCH_CAPABILITIES 0x0000010a
#define ARCH_CAP_RDCL_NO BIT(0) /* Not susceptible to Meltdown */
#define ARCH_CAP_IBRS_ALL BIT(1) /* Enhanced IBRS support */
+#define ARCH_CAP_RSBA BIT(2) /* RET may use alternative branch predictors */
#define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH BIT(3) /* Skip L1D flush on vmentry */
#define ARCH_CAP_SSB_NO BIT(4) /*
* Not susceptible to Speculative Store Bypass
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -783,12 +783,17 @@ static int __init nospectre_v1_cmdline(c
}
early_param("nospectre_v1", nospectre_v1_cmdline);

+static enum spectre_v2_mitigation spectre_v2_enabled __ro_after_init =
+ SPECTRE_V2_NONE;
+
#undef pr_fmt
#define pr_fmt(fmt) "RETBleed: " fmt

enum retbleed_mitigation {
RETBLEED_MITIGATION_NONE,
RETBLEED_MITIGATION_UNRET,
+ RETBLEED_MITIGATION_IBRS,
+ RETBLEED_MITIGATION_EIBRS,
};

enum retbleed_mitigation_cmd {
@@ -800,6 +805,8 @@ enum retbleed_mitigation_cmd {
const char * const retbleed_strings[] = {
[RETBLEED_MITIGATION_NONE] = "Vulnerable",
[RETBLEED_MITIGATION_UNRET] = "Mitigation: untrained return thunk",
+ [RETBLEED_MITIGATION_IBRS] = "Mitigation: IBRS",
+ [RETBLEED_MITIGATION_EIBRS] = "Mitigation: Enhanced IBRS",
};

static enum retbleed_mitigation retbleed_mitigation __ro_after_init =
@@ -842,6 +849,7 @@ early_param("retbleed", retbleed_parse_c

#define RETBLEED_UNTRAIN_MSG "WARNING: BTB untrained return thunk mitigation is only effective on AMD/Hygon!\n"
#define RETBLEED_COMPILER_MSG "WARNING: kernel not compiled with RETPOLINE or -mfunction-return capable compiler!\n"
+#define RETBLEED_INTEL_MSG "WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!\n"

static void __init retbleed_select_mitigation(void)
{
@@ -858,12 +866,15 @@ static void __init retbleed_select_mitig

case RETBLEED_CMD_AUTO:
default:
- if (!boot_cpu_has_bug(X86_BUG_RETBLEED))
- break;
-
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
retbleed_mitigation = RETBLEED_MITIGATION_UNRET;
+
+ /*
+ * The Intel mitigation (IBRS) was already selected in
+ * spectre_v2_select_mitigation().
+ */
+
break;
}

@@ -893,15 +904,31 @@ static void __init retbleed_select_mitig
break;
}

+ /*
+ * Let IBRS trump all on Intel without affecting the effects of the
+ * retbleed= cmdline option.
+ */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
+ switch (spectre_v2_enabled) {
+ case SPECTRE_V2_IBRS:
+ retbleed_mitigation = RETBLEED_MITIGATION_IBRS;
+ break;
+ case SPECTRE_V2_EIBRS:
+ case SPECTRE_V2_EIBRS_RETPOLINE:
+ case SPECTRE_V2_EIBRS_LFENCE:
+ retbleed_mitigation = RETBLEED_MITIGATION_EIBRS;
+ break;
+ default:
+ pr_err(RETBLEED_INTEL_MSG);
+ }
+ }
+
pr_info("%s\n", retbleed_strings[retbleed_mitigation]);
}

#undef pr_fmt
#define pr_fmt(fmt) "Spectre V2 : " fmt

-static enum spectre_v2_mitigation spectre_v2_enabled __ro_after_init =
- SPECTRE_V2_NONE;
-
static enum spectre_v2_user_mitigation spectre_v2_user_stibp __ro_after_init =
SPECTRE_V2_USER_NONE;
static enum spectre_v2_user_mitigation spectre_v2_user_ibpb __ro_after_init =
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1127,24 +1127,24 @@ static const struct x86_cpu_id cpu_vuln_
VULNBL_INTEL_STEPPINGS(BROADWELL_G, X86_STEPPING_ANY, SRBDS),
VULNBL_INTEL_STEPPINGS(BROADWELL_X, X86_STEPPING_ANY, MMIO),
VULNBL_INTEL_STEPPINGS(BROADWELL, X86_STEPPING_ANY, SRBDS),
- VULNBL_INTEL_STEPPINGS(SKYLAKE_L, X86_STEPPINGS(0x3, 0x3), SRBDS | MMIO),
+ VULNBL_INTEL_STEPPINGS(SKYLAKE_L, X86_STEPPINGS(0x3, 0x3), SRBDS | MMIO | RETBLEED),
VULNBL_INTEL_STEPPINGS(SKYLAKE_L, X86_STEPPING_ANY, SRBDS),
VULNBL_INTEL_STEPPINGS(SKYLAKE_X, BIT(3) | BIT(4) | BIT(6) |
- BIT(7) | BIT(0xB), MMIO),
- VULNBL_INTEL_STEPPINGS(SKYLAKE, X86_STEPPINGS(0x3, 0x3), SRBDS | MMIO),
+ BIT(7) | BIT(0xB), MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(SKYLAKE, X86_STEPPINGS(0x3, 0x3), SRBDS | MMIO | RETBLEED),
VULNBL_INTEL_STEPPINGS(SKYLAKE, X86_STEPPING_ANY, SRBDS),
- VULNBL_INTEL_STEPPINGS(KABYLAKE_L, X86_STEPPINGS(0x9, 0xC), SRBDS | MMIO),
+ VULNBL_INTEL_STEPPINGS(KABYLAKE_L, X86_STEPPINGS(0x9, 0xC), SRBDS | MMIO | RETBLEED),
VULNBL_INTEL_STEPPINGS(KABYLAKE_L, X86_STEPPINGS(0x0, 0x8), SRBDS),
- VULNBL_INTEL_STEPPINGS(KABYLAKE, X86_STEPPINGS(0x9, 0xD), SRBDS | MMIO),
+ VULNBL_INTEL_STEPPINGS(KABYLAKE, X86_STEPPINGS(0x9, 0xD), SRBDS | MMIO | RETBLEED),
VULNBL_INTEL_STEPPINGS(KABYLAKE, X86_STEPPINGS(0x0, 0x8), SRBDS),
- VULNBL_INTEL_STEPPINGS(ICELAKE_L, X86_STEPPINGS(0x5, 0x5), MMIO | MMIO_SBDS),
+ VULNBL_INTEL_STEPPINGS(ICELAKE_L, X86_STEPPINGS(0x5, 0x5), MMIO | MMIO_SBDS | RETBLEED),
VULNBL_INTEL_STEPPINGS(ICELAKE_D, X86_STEPPINGS(0x1, 0x1), MMIO),
VULNBL_INTEL_STEPPINGS(ICELAKE_X, X86_STEPPINGS(0x4, 0x6), MMIO),
- VULNBL_INTEL_STEPPINGS(COMETLAKE, BIT(2) | BIT(3) | BIT(5), MMIO | MMIO_SBDS),
- VULNBL_INTEL_STEPPINGS(COMETLAKE_L, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS),
- VULNBL_INTEL_STEPPINGS(COMETLAKE_L, X86_STEPPINGS(0x0, 0x0), MMIO),
- VULNBL_INTEL_STEPPINGS(LAKEFIELD, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS),
- VULNBL_INTEL_STEPPINGS(ROCKETLAKE, X86_STEPPINGS(0x1, 0x1), MMIO),
+ VULNBL_INTEL_STEPPINGS(COMETLAKE, BIT(2) | BIT(3) | BIT(5), MMIO | MMIO_SBDS | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(COMETLAKE_L, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(COMETLAKE_L, X86_STEPPINGS(0x0, 0x0), MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(LAKEFIELD, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(ROCKETLAKE, X86_STEPPINGS(0x1, 0x1), MMIO | RETBLEED),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPINGS(0x0, 0x0), MMIO | MMIO_SBDS),
@@ -1254,7 +1254,7 @@ static void __init cpu_set_bug_bits(stru
!arch_cap_mmio_immune(ia32_cap))
setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);

- if (cpu_matches(cpu_vuln_blacklist, RETBLEED))
+ if ((cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA)))
setup_force_cpu_bug(X86_BUG_RETBLEED);

if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))


2022-07-12 19:38:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 69/78] x86/speculation: Fill RSB on vmexit for IBRS

From: Josh Poimboeuf <[email protected]>

commit 9756bba28470722dacb79ffce554336dd1f6a6cd upstream.

Prevent RSB underflow/poisoning attacks with RSB. While at it, add a
bunch of comments to attempt to document the current state of tribal
knowledge about RSB attacks and what exactly is being mitigated.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 2 -
arch/x86/kernel/cpu/bugs.c | 63 ++++++++++++++++++++++++++++++++++---
arch/x86/kvm/vmx/vmenter.S | 6 +--
3 files changed, 62 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -204,7 +204,7 @@
/* FREE! ( 7*32+10) */
#define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */
#define X86_FEATURE_KERNEL_IBRS ( 7*32+12) /* "" Set/clear IBRS on kernel entry/exit */
-/* FREE! ( 7*32+13) */
+#define X86_FEATURE_RSB_VMEXIT ( 7*32+13) /* "" Fill RSB on VM-Exit */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_CDP_L2 ( 7*32+15) /* Code and Data Prioritization L2 */
#define X86_FEATURE_MSR_SPEC_CTRL ( 7*32+16) /* "" MSR SPEC_CTRL is implemented */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1394,17 +1394,70 @@ static void __init spectre_v2_select_mit
pr_info("%s\n", spectre_v2_strings[mode]);

/*
- * If spectre v2 protection has been enabled, unconditionally fill
- * RSB during a context switch; this protects against two independent
- * issues:
+ * If Spectre v2 protection has been enabled, fill the RSB during a
+ * context switch. In general there are two types of RSB attacks
+ * across context switches, for which the CALLs/RETs may be unbalanced.
*
- * - RSB underflow (and switch to BTB) on Skylake+
- * - SpectreRSB variant of spectre v2 on X86_BUG_SPECTRE_V2 CPUs
+ * 1) RSB underflow
+ *
+ * Some Intel parts have "bottomless RSB". When the RSB is empty,
+ * speculated return targets may come from the branch predictor,
+ * which could have a user-poisoned BTB or BHB entry.
+ *
+ * AMD has it even worse: *all* returns are speculated from the BTB,
+ * regardless of the state of the RSB.
+ *
+ * When IBRS or eIBRS is enabled, the "user -> kernel" attack
+ * scenario is mitigated by the IBRS branch prediction isolation
+ * properties, so the RSB buffer filling wouldn't be necessary to
+ * protect against this type of attack.
+ *
+ * The "user -> user" attack scenario is mitigated by RSB filling.
+ *
+ * 2) Poisoned RSB entry
+ *
+ * If the 'next' in-kernel return stack is shorter than 'prev',
+ * 'next' could be tricked into speculating with a user-poisoned RSB
+ * entry.
+ *
+ * The "user -> kernel" attack scenario is mitigated by SMEP and
+ * eIBRS.
+ *
+ * The "user -> user" scenario, also known as SpectreBHB, requires
+ * RSB clearing.
+ *
+ * So to mitigate all cases, unconditionally fill RSB on context
+ * switches.
+ *
+ * FIXME: Is this pointless for retbleed-affected AMD?
*/
setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");

/*
+ * Similar to context switches, there are two types of RSB attacks
+ * after vmexit:
+ *
+ * 1) RSB underflow
+ *
+ * 2) Poisoned RSB entry
+ *
+ * When retpoline is enabled, both are mitigated by filling/clearing
+ * the RSB.
+ *
+ * When IBRS is enabled, while #1 would be mitigated by the IBRS branch
+ * prediction isolation protections, RSB still needs to be cleared
+ * because of #2. Note that SMEP provides no protection here, unlike
+ * user-space-poisoned RSB entries.
+ *
+ * eIBRS, on the other hand, has RSB-poisoning protections, so it
+ * doesn't need RSB clearing after vmexit.
+ */
+ if (boot_cpu_has(X86_FEATURE_RETPOLINE) ||
+ boot_cpu_has(X86_FEATURE_KERNEL_IBRS))
+ setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
+
+ /*
* Retpoline protects the kernel, but doesn't protect firmware. IBRS
* and Enhanced IBRS protect firmware too, so enable IBRS around
* firmware calls only when IBRS / Enhanced IBRS aren't otherwise
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -193,15 +193,15 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL
* IMPORTANT: RSB filling and SPEC_CTRL handling must be done before
* the first unbalanced RET after vmexit!
*
- * For retpoline, RSB filling is needed to prevent poisoned RSB entries
- * and (in some cases) RSB underflow.
+ * For retpoline or IBRS, RSB filling is needed to prevent poisoned RSB
+ * entries and (in some cases) RSB underflow.
*
* eIBRS has its own protection against poisoned RSB, so it doesn't
* need the RSB filling sequence. But it does need to be enabled
* before the first unbalanced RET.
*/

- FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE
+ FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT

pop %_ASM_ARG2 /* @flags */
pop %_ASM_ARG1 /* @vmx */


2022-07-12 19:38:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 78/78] x86/static_call: Serialize __static_call_fixup() properly

From: Thomas Gleixner <[email protected]>

commit c27c753ea6fd1237f4f96abf8b623d7bab505513 upstream.

__static_call_fixup() invokes __static_call_transform() without holding
text_mutex, which causes lockdep to complain in text_poke_bp().

Adding the proper locking cures that, but as this is either used during
early boot or during module finalizing, it's not required to use
text_poke_bp(). Add an argument to __static_call_transform() which tells
it to use text_poke_early() for it.

Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding")
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/static_call.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -25,7 +25,8 @@ static const u8 xor5rax[] = { 0x2e, 0x2e

static const u8 retinsn[] = { RET_INSN_OPCODE, 0xcc, 0xcc, 0xcc, 0xcc };

-static void __ref __static_call_transform(void *insn, enum insn_type type, void *func)
+static void __ref __static_call_transform(void *insn, enum insn_type type,
+ void *func, bool modinit)
{
const void *emulate = NULL;
int size = CALL_INSN_SIZE;
@@ -60,7 +61,7 @@ static void __ref __static_call_transfor
if (memcmp(insn, code, size) == 0)
return;

- if (unlikely(system_state == SYSTEM_BOOTING))
+ if (system_state == SYSTEM_BOOTING || modinit)
return text_poke_early(insn, code, size);

text_poke_bp(insn, code, size, emulate);
@@ -108,12 +109,12 @@ void arch_static_call_transform(void *si

if (tramp) {
__static_call_validate(tramp, true);
- __static_call_transform(tramp, __sc_insn(!func, true), func);
+ __static_call_transform(tramp, __sc_insn(!func, true), func, false);
}

if (IS_ENABLED(CONFIG_HAVE_STATIC_CALL_INLINE) && site) {
__static_call_validate(site, tail);
- __static_call_transform(site, __sc_insn(!func, tail), func);
+ __static_call_transform(site, __sc_insn(!func, tail), func, false);
}

mutex_unlock(&text_mutex);
@@ -139,8 +140,10 @@ bool __static_call_fixup(void *tramp, u8
return false;
}

+ mutex_lock(&text_mutex);
if (op == RET_INSN_OPCODE || dest == &__x86_return_thunk)
- __static_call_transform(tramp, RET, NULL);
+ __static_call_transform(tramp, RET, NULL, true);
+ mutex_unlock(&text_mutex);

return true;
}


2022-07-12 19:38:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 73/78] x86/bugs: Add Cannon lake to RETBleed affected CPU list

From: Pawan Gupta <[email protected]>

commit f54d45372c6ac9c993451de5e51312485f7d10bc upstream.

Cannon lake is also affected by RETBleed, add it to the list.

Fixes: 6ad0ad2bf8a6 ("x86/bugs: Report Intel retbleed vulnerability")
Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/common.c | 1 +
1 file changed, 1 insertion(+)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1132,6 +1132,7 @@ static const struct x86_cpu_id cpu_vuln_
VULNBL_INTEL_STEPPINGS(SKYLAKE, X86_STEPPING_ANY, SRBDS | MMIO | RETBLEED),
VULNBL_INTEL_STEPPINGS(KABYLAKE_L, X86_STEPPING_ANY, SRBDS | MMIO | RETBLEED),
VULNBL_INTEL_STEPPINGS(KABYLAKE, X86_STEPPING_ANY, SRBDS | MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(CANNONLAKE_L, X86_STEPPING_ANY, RETBLEED),
VULNBL_INTEL_STEPPINGS(ICELAKE_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RETBLEED),
VULNBL_INTEL_STEPPINGS(ICELAKE_D, X86_STEPPING_ANY, MMIO),
VULNBL_INTEL_STEPPINGS(ICELAKE_X, X86_STEPPING_ANY, MMIO),


2022-07-12 19:39:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 77/78] x86/speculation: Disable RRSBA behavior

From: Pawan Gupta <[email protected]>

commit 4ad3278df6fe2b0852b00d5757fc2ccd8e92c26e upstream.

Some Intel processors may use alternate predictors for RETs on
RSB-underflow. This condition may be vulnerable to Branch History
Injection (BHI) and intramode-BTI.

Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines,
eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against
such attacks. However, on RSB-underflow, RET target prediction may
fallback to alternate predictors. As a result, RET's predicted target
may get influenced by branch history.

A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback
behavior when in kernel mode. When set, RETs will not take predictions
from alternate predictors, hence mitigating RETs as well. Support for
this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2).

For spectre v2 mitigation, when a user selects a mitigation that
protects indirect CALLs and JMPs against BHI and intramode-BTI, set
RRSBA_DIS_S also to protect RETs for RSB-underflow case.

Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: no X86_FEATURE_INTEL_PPIN]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/msr-index.h | 9 +++++++++
arch/x86/kernel/cpu/bugs.c | 26 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/scattered.c | 1 +
tools/arch/x86/include/asm/msr-index.h | 9 +++++++++
5 files changed, 46 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -295,7 +295,7 @@
#define X86_FEATURE_SGX1 (11*32+ 8) /* "" Basic SGX */
#define X86_FEATURE_SGX2 (11*32+ 9) /* "" SGX Enclave Dynamic Memory Management (EDMM) */
#define X86_FEATURE_ENTRY_IBPB (11*32+10) /* "" Issue an IBPB on kernel entry */
-/* FREE! (11*32+11) */
+#define X86_FEATURE_RRSBA_CTRL (11*32+11) /* "" RET prediction control */
#define X86_FEATURE_RETPOLINE (11*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */
#define X86_FEATURE_RETHUNK (11*32+14) /* "" Use REturn THUNK */
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -51,6 +51,8 @@
#define SPEC_CTRL_STIBP BIT(SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */
#define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */
#define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
+#define SPEC_CTRL_RRSBA_DIS_S_SHIFT 6 /* Disable RRSBA behavior */
+#define SPEC_CTRL_RRSBA_DIS_S BIT(SPEC_CTRL_RRSBA_DIS_S_SHIFT)

#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */
#define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */
@@ -139,6 +141,13 @@
* bit available to control VERW
* behavior.
*/
+#define ARCH_CAP_RRSBA BIT(19) /*
+ * Indicates RET may use predictors
+ * other than the RSB. With eIBRS
+ * enabled predictions in kernel mode
+ * are restricted to targets in
+ * kernel.
+ */

#define MSR_IA32_FLUSH_CMD 0x0000010b
#define L1D_FLUSH BIT(0) /*
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1311,6 +1311,22 @@ static enum spectre_v2_mitigation __init
return SPECTRE_V2_RETPOLINE;
}

+/* Disable in-kernel use of non-RSB RET predictors */
+static void __init spec_ctrl_disable_kernel_rrsba(void)
+{
+ u64 ia32_cap;
+
+ if (!boot_cpu_has(X86_FEATURE_RRSBA_CTRL))
+ return;
+
+ ia32_cap = x86_read_arch_cap_msr();
+
+ if (ia32_cap & ARCH_CAP_RRSBA) {
+ x86_spec_ctrl_base |= SPEC_CTRL_RRSBA_DIS_S;
+ write_spec_ctrl_current(x86_spec_ctrl_base, true);
+ }
+}
+
static void __init spectre_v2_select_mitigation(void)
{
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -1405,6 +1421,16 @@ static void __init spectre_v2_select_mit
break;
}

+ /*
+ * Disable alternate RSB predictions in kernel when indirect CALLs and
+ * JMPs gets protection against BHI and Intramode-BTI, but RET
+ * prediction from a non-RSB predictor is still a risk.
+ */
+ if (mode == SPECTRE_V2_EIBRS_LFENCE ||
+ mode == SPECTRE_V2_EIBRS_RETPOLINE ||
+ mode == SPECTRE_V2_RETPOLINE)
+ spec_ctrl_disable_kernel_rrsba();
+
spectre_v2_enabled = mode;
pr_info("%s\n", spectre_v2_strings[mode]);

--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -26,6 +26,7 @@ struct cpuid_bit {
static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
+ { X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
{ X86_FEATURE_CQM_OCCUP_LLC, CPUID_EDX, 0, 0x0000000f, 1 },
{ X86_FEATURE_CQM_MBM_TOTAL, CPUID_EDX, 1, 0x0000000f, 1 },
--- a/tools/arch/x86/include/asm/msr-index.h
+++ b/tools/arch/x86/include/asm/msr-index.h
@@ -51,6 +51,8 @@
#define SPEC_CTRL_STIBP BIT(SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */
#define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */
#define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
+#define SPEC_CTRL_RRSBA_DIS_S_SHIFT 6 /* Disable RRSBA behavior */
+#define SPEC_CTRL_RRSBA_DIS_S BIT(SPEC_CTRL_RRSBA_DIS_S_SHIFT)

#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */
#define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */
@@ -138,6 +140,13 @@
* bit available to control VERW
* behavior.
*/
+#define ARCH_CAP_RRSBA BIT(19) /*
+ * Indicates RET may use predictors
+ * other than the RSB. With eIBRS
+ * enabled predictions in kernel mode
+ * are restricted to targets in
+ * kernel.
+ */

#define MSR_IA32_FLUSH_CMD 0x0000010b
#define L1D_FLUSH BIT(0) /*


2022-07-12 19:39:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 54/78] x86/xen: Add UNTRAIN_RET

From: Peter Zijlstra <[email protected]>

commit d147553b64bad34d2f92cb7d8ba454ae95c3baac upstream.

Ensure the Xen entry also passes through UNTRAIN_RET.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/entry/entry_64.S | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -320,6 +320,12 @@ SYM_CODE_END(ret_from_fork)
#endif
.endm

+SYM_CODE_START_LOCAL(xen_error_entry)
+ UNWIND_HINT_FUNC
+ UNTRAIN_RET
+ RET
+SYM_CODE_END(xen_error_entry)
+
/**
* idtentry_body - Macro to emit code calling the C function
* @cfunc: C function to be called
@@ -339,7 +345,7 @@ SYM_CODE_END(ret_from_fork)
* switch the CR3. So it can skip invoking error_entry().
*/
ALTERNATIVE "call error_entry; movq %rax, %rsp", \
- "", X86_FEATURE_XENPV
+ "call xen_error_entry", X86_FEATURE_XENPV

ENCODE_FRAME_POINTER
UNWIND_HINT_REGS


2022-07-12 19:39:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 71/78] x86/cpu/amd: Enumerate BTC_NO

From: Andrew Cooper <[email protected]>

commit 26aae8ccbc1972233afd08fb3f368947c0314265 upstream.

BTC_NO indicates that hardware is not susceptible to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

Hypervisors are expected to synthesise BTC_NO when it is appropriate
given the migration pool, to prevent kernels using heuristics.

[ bp: Massage. ]

Signed-off-by: Andrew Cooper <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: no X86_FEATURE_BRS]
[cascardo: no X86_FEATURE_CPPC]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/amd.c | 21 +++++++++++++++------
arch/x86/kernel/cpu/common.c | 6 ++++--
3 files changed, 20 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -319,6 +319,7 @@
#define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */
#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
+#define X86_FEATURE_BTC_NO (13*32+29) /* "" Not vulnerable to Branch Type Confusion */

/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -914,12 +914,21 @@ static void init_amd_zn(struct cpuinfo_x
node_reclaim_distance = 32;
#endif

- /*
- * Fix erratum 1076: CPB feature bit not being set in CPUID.
- * Always set it, except when running under a hypervisor.
- */
- if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !cpu_has(c, X86_FEATURE_CPB))
- set_cpu_cap(c, X86_FEATURE_CPB);
+ /* Fix up CPUID bits, but only if not virtualised. */
+ if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
+
+ /* Erratum 1076: CPB feature bit not being set in CPUID. */
+ if (!cpu_has(c, X86_FEATURE_CPB))
+ set_cpu_cap(c, X86_FEATURE_CPB);
+
+ /*
+ * Zen3 (Fam19 model < 0x10) parts are not susceptible to
+ * Branch Type Confusion, but predate the allocation of the
+ * BTC_NO bit.
+ */
+ if (c->x86 == 0x19 && !cpu_has(c, X86_FEATURE_BTC_NO))
+ set_cpu_cap(c, X86_FEATURE_BTC_NO);
+ }
}

static void init_amd(struct cpuinfo_x86 *c)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1249,8 +1249,10 @@ static void __init cpu_set_bug_bits(stru
!arch_cap_mmio_immune(ia32_cap))
setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);

- if ((cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA)))
- setup_force_cpu_bug(X86_BUG_RETBLEED);
+ if (!cpu_has(c, X86_FEATURE_BTC_NO)) {
+ if (cpu_matches(cpu_vuln_blacklist, RETBLEED) || (ia32_cap & ARCH_CAP_RSBA))
+ setup_force_cpu_bug(X86_BUG_RETBLEED);
+ }

if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))
return;


2022-07-12 19:39:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 45/78] x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value

From: Peter Zijlstra <[email protected]>

commit caa0ff24d5d0e02abce5e65c3d2b7f20a6617be5 upstream.

Due to TIF_SSBD and TIF_SPEC_IB the actual IA32_SPEC_CTRL value can
differ from x86_spec_ctrl_base. As such, keep a per-CPU value
reflecting the current task's MSR content.

[jpoimboe: rename]

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 1 +
arch/x86/kernel/cpu/bugs.c | 28 +++++++++++++++++++++++-----
arch/x86/kernel/process.c | 2 +-
3 files changed, 25 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -254,6 +254,7 @@ static inline void indirect_branch_predi

/* The Intel SPEC CTRL MSR base value cache */
extern u64 x86_spec_ctrl_base;
+extern void write_spec_ctrl_current(u64 val);

/*
* With retpoline, we must use IBRS to restrict branch prediction
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -49,12 +49,30 @@ static void __init mmio_select_mitigatio
static void __init srbds_select_mitigation(void);
static void __init l1d_flush_select_mitigation(void);

-/* The base value of the SPEC_CTRL MSR that always has to be preserved. */
+/* The base value of the SPEC_CTRL MSR without task-specific bits set */
u64 x86_spec_ctrl_base;
EXPORT_SYMBOL_GPL(x86_spec_ctrl_base);
+
+/* The current value of the SPEC_CTRL MSR with task-specific bits set */
+DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
+EXPORT_SYMBOL_GPL(x86_spec_ctrl_current);
+
static DEFINE_MUTEX(spec_ctrl_mutex);

/*
+ * Keep track of the SPEC_CTRL MSR value for the current task, which may differ
+ * from x86_spec_ctrl_base due to STIBP/SSB in __speculation_ctrl_update().
+ */
+void write_spec_ctrl_current(u64 val)
+{
+ if (this_cpu_read(x86_spec_ctrl_current) == val)
+ return;
+
+ this_cpu_write(x86_spec_ctrl_current, val);
+ wrmsrl(MSR_IA32_SPEC_CTRL, val);
+}
+
+/*
* The vendor and possibly platform specific bits which can be modified in
* x86_spec_ctrl_base.
*/
@@ -1272,7 +1290,7 @@ static void __init spectre_v2_select_mit
if (spectre_v2_in_eibrs_mode(mode)) {
/* Force it so VMEXIT will restore correctly */
x86_spec_ctrl_base |= SPEC_CTRL_IBRS;
- wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+ write_spec_ctrl_current(x86_spec_ctrl_base);
}

switch (mode) {
@@ -1327,7 +1345,7 @@ static void __init spectre_v2_select_mit

static void update_stibp_msr(void * __unused)
{
- wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+ write_spec_ctrl_current(x86_spec_ctrl_base);
}

/* Update x86_spec_ctrl_base in case SMT state changed. */
@@ -1570,7 +1588,7 @@ static enum ssb_mitigation __init __ssb_
x86_amd_ssb_disable();
} else {
x86_spec_ctrl_base |= SPEC_CTRL_SSBD;
- wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+ write_spec_ctrl_current(x86_spec_ctrl_base);
}
}

@@ -1821,7 +1839,7 @@ int arch_prctl_spec_ctrl_get(struct task
void x86_spec_ctrl_setup_ap(void)
{
if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL))
- wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+ write_spec_ctrl_current(x86_spec_ctrl_base);

if (ssb_mode == SPEC_STORE_BYPASS_DISABLE)
x86_amd_ssb_disable();
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -584,7 +584,7 @@ static __always_inline void __speculatio
}

if (updmsr)
- wrmsrl(MSR_IA32_SPEC_CTRL, msr);
+ write_spec_ctrl_current(msr);
}

static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk)


2022-07-12 19:40:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 20/78] bpf,x86: Respect X86_FEATURE_RETPOLINE*

From: Peter Zijlstra <[email protected]>

commit 87c87ecd00c54ecd677798cb49ef27329e0fab41 upstream.

Current BPF codegen doesn't respect X86_FEATURE_RETPOLINE* flags and
unconditionally emits a thunk call, this is sub-optimal and doesn't
match the regular, compiler generated, code.

Update the i386 JIT to emit code equal to what the compiler emits for
the regular kernel text (IOW. a plain THUNK call).

Update the x86_64 JIT to emit code similar to the result of compiler
and kernel rewrites as according to X86_FEATURE_RETPOLINE* flags.
Inlining RETPOLINE_AMD (lfence; jmp *%reg) and !RETPOLINE (jmp *%reg),
while doing a THUNK call for RETPOLINE.

This removes the hard-coded retpoline thunks and shrinks the generated
code. Leaving a single retpoline thunk definition in the kernel.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[cascardo: RETPOLINE_AMD was renamed to RETPOLINE_LFENCE]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 59 -----------------------------------
arch/x86/net/bpf_jit_comp.c | 46 +++++++++++++--------------
arch/x86/net/bpf_jit_comp32.c | 22 +++++++++++--
3 files changed, 41 insertions(+), 86 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -320,63 +320,4 @@ static inline void mds_idle_clear_cpu_bu

#endif /* __ASSEMBLY__ */

-/*
- * Below is used in the eBPF JIT compiler and emits the byte sequence
- * for the following assembly:
- *
- * With retpolines configured:
- *
- * callq do_rop
- * spec_trap:
- * pause
- * lfence
- * jmp spec_trap
- * do_rop:
- * mov %rcx,(%rsp) for x86_64
- * mov %edx,(%esp) for x86_32
- * retq
- *
- * Without retpolines configured:
- *
- * jmp *%rcx for x86_64
- * jmp *%edx for x86_32
- */
-#ifdef CONFIG_RETPOLINE
-# ifdef CONFIG_X86_64
-# define RETPOLINE_RCX_BPF_JIT_SIZE 17
-# define RETPOLINE_RCX_BPF_JIT() \
-do { \
- EMIT1_off32(0xE8, 7); /* callq do_rop */ \
- /* spec_trap: */ \
- EMIT2(0xF3, 0x90); /* pause */ \
- EMIT3(0x0F, 0xAE, 0xE8); /* lfence */ \
- EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \
- /* do_rop: */ \
- EMIT4(0x48, 0x89, 0x0C, 0x24); /* mov %rcx,(%rsp) */ \
- EMIT1(0xC3); /* retq */ \
-} while (0)
-# else /* !CONFIG_X86_64 */
-# define RETPOLINE_EDX_BPF_JIT() \
-do { \
- EMIT1_off32(0xE8, 7); /* call do_rop */ \
- /* spec_trap: */ \
- EMIT2(0xF3, 0x90); /* pause */ \
- EMIT3(0x0F, 0xAE, 0xE8); /* lfence */ \
- EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \
- /* do_rop: */ \
- EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */ \
- EMIT1(0xC3); /* ret */ \
-} while (0)
-# endif
-#else /* !CONFIG_RETPOLINE */
-# ifdef CONFIG_X86_64
-# define RETPOLINE_RCX_BPF_JIT_SIZE 2
-# define RETPOLINE_RCX_BPF_JIT() \
- EMIT2(0xFF, 0xE1); /* jmp *%rcx */
-# else /* !CONFIG_X86_64 */
-# define RETPOLINE_EDX_BPF_JIT() \
- EMIT2(0xFF, 0xE2) /* jmp *%edx */
-# endif
-#endif
-
#endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -387,6 +387,25 @@ int bpf_arch_text_poke(void *ip, enum bp
return __bpf_arch_text_poke(ip, t, old_addr, new_addr, true);
}

+#define EMIT_LFENCE() EMIT3(0x0F, 0xAE, 0xE8)
+
+static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
+{
+ u8 *prog = *pprog;
+
+#ifdef CONFIG_RETPOLINE
+ if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_LFENCE)) {
+ EMIT_LFENCE();
+ EMIT2(0xFF, 0xE0 + reg);
+ } else if (cpu_feature_enabled(X86_FEATURE_RETPOLINE)) {
+ emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
+ } else
+#endif
+ EMIT2(0xFF, 0xE0 + reg);
+
+ *pprog = prog;
+}
+
/*
* Generate the following code:
*
@@ -468,7 +487,7 @@ static void emit_bpf_tail_call_indirect(
* rdi == ctx (1st arg)
* rcx == prog->bpf_func + X86_TAIL_CALL_OFFSET
*/
- RETPOLINE_RCX_BPF_JIT();
+ emit_indirect_jump(&prog, 1 /* rcx */, ip + (prog - start));

/* out: */
ctx->tail_call_indirect_label = prog - start;
@@ -1185,8 +1204,7 @@ static int do_jit(struct bpf_prog *bpf_p
/* speculation barrier */
case BPF_ST | BPF_NOSPEC:
if (boot_cpu_has(X86_FEATURE_XMM2))
- /* Emit 'lfence' */
- EMIT3(0x0F, 0xAE, 0xE8);
+ EMIT_LFENCE();
break;

/* ST: *(u8*)(dst_reg + off) = imm */
@@ -2122,24 +2140,6 @@ cleanup:
return ret;
}

-static int emit_fallback_jump(u8 **pprog)
-{
- u8 *prog = *pprog;
- int err = 0;
-
-#ifdef CONFIG_RETPOLINE
- /* Note that this assumes the the compiler uses external
- * thunks for indirect calls. Both clang and GCC use the same
- * naming convention for external thunks.
- */
- err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
-#else
- EMIT2(0xFF, 0xE2); /* jmp rdx */
-#endif
- *pprog = prog;
- return err;
-}
-
static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
{
u8 *jg_reloc, *prog = *pprog;
@@ -2161,9 +2161,7 @@ static int emit_bpf_dispatcher(u8 **ppro
if (err)
return err;

- err = emit_fallback_jump(&prog); /* jmp thunk/indirect */
- if (err)
- return err;
+ emit_indirect_jump(&prog, 2 /* rdx */, prog);

*pprog = prog;
return 0;
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -15,6 +15,7 @@
#include <asm/cacheflush.h>
#include <asm/set_memory.h>
#include <asm/nospec-branch.h>
+#include <asm/asm-prototypes.h>
#include <linux/bpf.h>

/*
@@ -1267,6 +1268,21 @@ static void emit_epilogue(u8 **pprog, u3
*pprog = prog;
}

+static int emit_jmp_edx(u8 **pprog, u8 *ip)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+
+#ifdef CONFIG_RETPOLINE
+ EMIT1_off32(0xE9, (u8 *)__x86_indirect_thunk_edx - (ip + 5));
+#else
+ EMIT2(0xFF, 0xE2);
+#endif
+ *pprog = prog;
+
+ return cnt;
+}
+
/*
* Generate the following code:
* ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
@@ -1280,7 +1296,7 @@ static void emit_epilogue(u8 **pprog, u3
* goto *(prog->bpf_func + prologue_size);
* out:
*/
-static void emit_bpf_tail_call(u8 **pprog)
+static void emit_bpf_tail_call(u8 **pprog, u8 *ip)
{
u8 *prog = *pprog;
int cnt = 0;
@@ -1362,7 +1378,7 @@ static void emit_bpf_tail_call(u8 **ppro
* eax == ctx (1st arg)
* edx == prog->bpf_func + prologue_size
*/
- RETPOLINE_EDX_BPF_JIT();
+ cnt += emit_jmp_edx(&prog, ip + cnt);

if (jmp_label1 == -1)
jmp_label1 = cnt;
@@ -2122,7 +2138,7 @@ static int do_jit(struct bpf_prog *bpf_p
break;
}
case BPF_JMP | BPF_TAIL_CALL:
- emit_bpf_tail_call(&prog);
+ emit_bpf_tail_call(&prog, image + addrs[i - 1]);
break;

/* cond jump */


2022-07-12 19:40:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 11/78] x86/asm: Fix register order

From: Peter Zijlstra <[email protected]>

commit a92ede2d584a2e070def59c7e47e6b6f6341c55c upstream.

Ensure the register order is correct; this allows for easy translation
between register number and trampoline and vice-versa.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/GEN-for-each-reg.h | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/GEN-for-each-reg.h
+++ b/arch/x86/include/asm/GEN-for-each-reg.h
@@ -1,11 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * These are in machine order; things rely on that.
+ */
#ifdef CONFIG_64BIT
GEN(rax)
-GEN(rbx)
GEN(rcx)
GEN(rdx)
+GEN(rbx)
+GEN(rsp)
+GEN(rbp)
GEN(rsi)
GEN(rdi)
-GEN(rbp)
GEN(r8)
GEN(r9)
GEN(r10)
@@ -16,10 +21,11 @@ GEN(r14)
GEN(r15)
#else
GEN(eax)
-GEN(ebx)
GEN(ecx)
GEN(edx)
+GEN(ebx)
+GEN(esp)
+GEN(ebp)
GEN(esi)
GEN(edi)
-GEN(ebp)
#endif


2022-07-12 19:40:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 33/78] x86/ftrace: Use alternative RET encoding

From: Peter Zijlstra <[email protected]>

commit 1f001e9da6bbf482311e45e48f53c2bd2179e59c upstream.

Use the return thunk in ftrace trampolines, if needed.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: still copy return from ftrace_stub]
[cascardo: use memcpy(text_gen_insn) as there is no __text_gen_insn]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/ftrace.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -308,7 +308,7 @@ union ftrace_op_code_union {
} __attribute__((packed));
};

-#define RET_SIZE 1 + IS_ENABLED(CONFIG_SLS)
+#define RET_SIZE (IS_ENABLED(CONFIG_RETPOLINE) ? 5 : 1 + IS_ENABLED(CONFIG_SLS))

static unsigned long
create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
@@ -367,7 +367,10 @@ create_trampoline(struct ftrace_ops *ops

/* The trampoline ends with ret(q) */
retq = (unsigned long)ftrace_stub;
- ret = copy_from_kernel_nofault(ip, (void *)retq, RET_SIZE);
+ if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
+ memcpy(ip, text_gen_insn(JMP32_INSN_OPCODE, ip, &__x86_return_thunk), JMP32_INSN_SIZE);
+ else
+ ret = copy_from_kernel_nofault(ip, (void *)retq, RET_SIZE);
if (WARN_ON(ret < 0))
goto fail;



2022-07-12 19:40:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 29/78] x86: Undo return-thunk damage

From: Peter Zijlstra <[email protected]>

commit 15e67227c49a57837108acfe1c80570e1bd9f962 upstream.

Introduce X86_FEATURE_RETHUNK for those afflicted with needing this.

[ bp: Do only INT3 padding - simpler. ]

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: CONFIG_STACK_VALIDATION vs CONFIG_OBJTOOL]
[cascardo: no IBT support]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/alternative.h | 1
arch/x86/include/asm/cpufeatures.h | 1
arch/x86/include/asm/disabled-features.h | 3 +
arch/x86/kernel/alternative.c | 60 +++++++++++++++++++++++++++++++
arch/x86/kernel/module.c | 8 +++-
arch/x86/kernel/vmlinux.lds.S | 7 +++
6 files changed, 78 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -76,6 +76,7 @@ extern int alternatives_patched;
extern void alternative_instructions(void);
extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
extern void apply_retpolines(s32 *start, s32 *end);
+extern void apply_returns(s32 *start, s32 *end);

struct module;

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -298,6 +298,7 @@
/* FREE! (11*32+11) */
#define X86_FEATURE_RETPOLINE (11*32+12) /* "" Generic Retpoline mitigation for Spectre variant 2 */
#define X86_FEATURE_RETPOLINE_LFENCE (11*32+13) /* "" Use LFENCE for Spectre variant 2 */
+#define X86_FEATURE_RETHUNK (11*32+14) /* "" Use REturn THUNK */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -60,7 +60,8 @@
# define DISABLE_RETPOLINE 0
#else
# define DISABLE_RETPOLINE ((1 << (X86_FEATURE_RETPOLINE & 31)) | \
- (1 << (X86_FEATURE_RETPOLINE_LFENCE & 31)))
+ (1 << (X86_FEATURE_RETPOLINE_LFENCE & 31)) | \
+ (1 << (X86_FEATURE_RETHUNK & 31)))
#endif

/* Force disable because it's broken beyond repair */
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -115,6 +115,7 @@ static void __init_or_module add_nops(vo
}

extern s32 __retpoline_sites[], __retpoline_sites_end[];
+extern s32 __return_sites[], __return_sites_end[];
extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern s32 __smp_locks[], __smp_locks_end[];
void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -506,9 +507,67 @@ void __init_or_module noinline apply_ret
}
}

+/*
+ * Rewrite the compiler generated return thunk tail-calls.
+ *
+ * For example, convert:
+ *
+ * JMP __x86_return_thunk
+ *
+ * into:
+ *
+ * RET
+ */
+static int patch_return(void *addr, struct insn *insn, u8 *bytes)
+{
+ int i = 0;
+
+ if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
+ return -1;
+
+ bytes[i++] = RET_INSN_OPCODE;
+
+ for (; i < insn->length;)
+ bytes[i++] = INT3_INSN_OPCODE;
+
+ return i;
+}
+
+void __init_or_module noinline apply_returns(s32 *start, s32 *end)
+{
+ s32 *s;
+
+ for (s = start; s < end; s++) {
+ void *addr = (void *)s + *s;
+ struct insn insn;
+ int len, ret;
+ u8 bytes[16];
+ u8 op1;
+
+ ret = insn_decode_kernel(&insn, addr);
+ if (WARN_ON_ONCE(ret < 0))
+ continue;
+
+ op1 = insn.opcode.bytes[0];
+ if (WARN_ON_ONCE(op1 != JMP32_INSN_OPCODE))
+ continue;
+
+ DPRINTK("return thunk at: %pS (%px) len: %d to: %pS",
+ addr, addr, insn.length,
+ addr + insn.length + insn.immediate.value);
+
+ len = patch_return(addr, &insn, bytes);
+ if (len == insn.length) {
+ DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr);
+ DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
+ text_poke_early(addr, bytes, len);
+ }
+ }
+}
#else /* !RETPOLINES || !CONFIG_STACK_VALIDATION */

void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
+void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }

#endif /* CONFIG_RETPOLINE && CONFIG_STACK_VALIDATION */

@@ -824,6 +883,7 @@ void __init alternative_instructions(voi
* those can rewrite the retpoline thunks.
*/
apply_retpolines(__retpoline_sites, __retpoline_sites_end);
+ apply_returns(__return_sites, __return_sites_end);

/*
* Then patch alternatives, such that those paravirt calls that are in
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -253,7 +253,7 @@ int module_finalize(const Elf_Ehdr *hdr,
{
const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
*para = NULL, *orc = NULL, *orc_ip = NULL,
- *retpolines = NULL;
+ *retpolines = NULL, *returns = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;

for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -271,6 +271,8 @@ int module_finalize(const Elf_Ehdr *hdr,
orc_ip = s;
if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
retpolines = s;
+ if (!strcmp(".return_sites", secstrings + s->sh_name))
+ returns = s;
}

/*
@@ -285,6 +287,10 @@ int module_finalize(const Elf_Ehdr *hdr,
void *rseg = (void *)retpolines->sh_addr;
apply_retpolines(rseg, rseg + retpolines->sh_size);
}
+ if (returns) {
+ void *rseg = (void *)returns->sh_addr;
+ apply_returns(rseg, rseg + returns->sh_size);
+ }
if (alt) {
/* patch .altinstructions */
void *aseg = (void *)alt->sh_addr;
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -284,6 +284,13 @@ SECTIONS
*(.retpoline_sites)
__retpoline_sites_end = .;
}
+
+ . = ALIGN(8);
+ .return_sites : AT(ADDR(.return_sites) - LOAD_OFFSET) {
+ __return_sites = .;
+ *(.return_sites)
+ __return_sites_end = .;
+ }
#endif

/*


2022-07-12 19:41:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 61/78] x86/speculation: Fix SPEC_CTRL write on SMT state change

From: Josh Poimboeuf <[email protected]>

commit 56aa4d221f1ee2c3a49b45b800778ec6e0ab73c5 upstream.

If the SMT state changes, SSBD might get accidentally disabled. Fix
that.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1451,7 +1451,8 @@ static void __init spectre_v2_select_mit

static void update_stibp_msr(void * __unused)
{
- write_spec_ctrl_current(x86_spec_ctrl_base, true);
+ u64 val = spec_ctrl_current() | (x86_spec_ctrl_base & SPEC_CTRL_STIBP);
+ write_spec_ctrl_current(val, true);
}

/* Update x86_spec_ctrl_base in case SMT state changed. */


2022-07-12 19:41:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 48/78] x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS

From: Pawan Gupta <[email protected]>

commit 7c693f54c873691a4b7da05c7e0f74e67745d144 upstream.

Extend spectre_v2= boot option with Kernel IBRS.

[jpoimboe: no STIBP with IBRS]

Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 1
arch/x86/include/asm/nospec-branch.h | 1
arch/x86/kernel/cpu/bugs.c | 66 ++++++++++++++++++------
3 files changed, 54 insertions(+), 14 deletions(-)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5335,6 +5335,7 @@
eibrs - enhanced IBRS
eibrs,retpoline - enhanced IBRS + Retpolines
eibrs,lfence - enhanced IBRS + LFENCE
+ ibrs - use IBRS to protect kernel

Not specifying this option is equivalent to
spectre_v2=auto.
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -212,6 +212,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_EIBRS,
SPECTRE_V2_EIBRS_RETPOLINE,
SPECTRE_V2_EIBRS_LFENCE,
+ SPECTRE_V2_IBRS,
};

/* The indirect branch speculation control variants */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -965,6 +965,7 @@ enum spectre_v2_mitigation_cmd {
SPECTRE_V2_CMD_EIBRS,
SPECTRE_V2_CMD_EIBRS_RETPOLINE,
SPECTRE_V2_CMD_EIBRS_LFENCE,
+ SPECTRE_V2_CMD_IBRS,
};

enum spectre_v2_user_cmd {
@@ -1037,11 +1038,12 @@ spectre_v2_parse_user_cmdline(enum spect
return SPECTRE_V2_USER_CMD_AUTO;
}

-static inline bool spectre_v2_in_eibrs_mode(enum spectre_v2_mitigation mode)
+static inline bool spectre_v2_in_ibrs_mode(enum spectre_v2_mitigation mode)
{
- return (mode == SPECTRE_V2_EIBRS ||
- mode == SPECTRE_V2_EIBRS_RETPOLINE ||
- mode == SPECTRE_V2_EIBRS_LFENCE);
+ return mode == SPECTRE_V2_IBRS ||
+ mode == SPECTRE_V2_EIBRS ||
+ mode == SPECTRE_V2_EIBRS_RETPOLINE ||
+ mode == SPECTRE_V2_EIBRS_LFENCE;
}

static void __init
@@ -1106,12 +1108,12 @@ spectre_v2_user_select_mitigation(enum s
}

/*
- * If no STIBP, enhanced IBRS is enabled or SMT impossible, STIBP is not
- * required.
+ * If no STIBP, IBRS or enhanced IBRS is enabled, or SMT impossible,
+ * STIBP is not required.
*/
if (!boot_cpu_has(X86_FEATURE_STIBP) ||
!smt_possible ||
- spectre_v2_in_eibrs_mode(spectre_v2_enabled))
+ spectre_v2_in_ibrs_mode(spectre_v2_enabled))
return;

/*
@@ -1143,6 +1145,7 @@ static const char * const spectre_v2_str
[SPECTRE_V2_EIBRS] = "Mitigation: Enhanced IBRS",
[SPECTRE_V2_EIBRS_LFENCE] = "Mitigation: Enhanced IBRS + LFENCE",
[SPECTRE_V2_EIBRS_RETPOLINE] = "Mitigation: Enhanced IBRS + Retpolines",
+ [SPECTRE_V2_IBRS] = "Mitigation: IBRS",
};

static const struct {
@@ -1160,6 +1163,7 @@ static const struct {
{ "eibrs,lfence", SPECTRE_V2_CMD_EIBRS_LFENCE, false },
{ "eibrs,retpoline", SPECTRE_V2_CMD_EIBRS_RETPOLINE, false },
{ "auto", SPECTRE_V2_CMD_AUTO, false },
+ { "ibrs", SPECTRE_V2_CMD_IBRS, false },
};

static void __init spec_v2_print_cond(const char *reason, bool secure)
@@ -1222,6 +1226,24 @@ static enum spectre_v2_mitigation_cmd __
return SPECTRE_V2_CMD_AUTO;
}

+ if (cmd == SPECTRE_V2_CMD_IBRS && boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) {
+ pr_err("%s selected but not Intel CPU. Switching to AUTO select\n",
+ mitigation_options[i].option);
+ return SPECTRE_V2_CMD_AUTO;
+ }
+
+ if (cmd == SPECTRE_V2_CMD_IBRS && !boot_cpu_has(X86_FEATURE_IBRS)) {
+ pr_err("%s selected but CPU doesn't have IBRS. Switching to AUTO select\n",
+ mitigation_options[i].option);
+ return SPECTRE_V2_CMD_AUTO;
+ }
+
+ if (cmd == SPECTRE_V2_CMD_IBRS && boot_cpu_has(X86_FEATURE_XENPV)) {
+ pr_err("%s selected but running as XenPV guest. Switching to AUTO select\n",
+ mitigation_options[i].option);
+ return SPECTRE_V2_CMD_AUTO;
+ }
+
spec_v2_print_cond(mitigation_options[i].option,
mitigation_options[i].secure);
return cmd;
@@ -1261,6 +1283,14 @@ static void __init spectre_v2_select_mit
break;
}

+ if (boot_cpu_has_bug(X86_BUG_RETBLEED) &&
+ retbleed_cmd != RETBLEED_CMD_OFF &&
+ boot_cpu_has(X86_FEATURE_IBRS) &&
+ boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
+ mode = SPECTRE_V2_IBRS;
+ break;
+ }
+
mode = spectre_v2_select_retpoline();
break;

@@ -1277,6 +1307,10 @@ static void __init spectre_v2_select_mit
mode = spectre_v2_select_retpoline();
break;

+ case SPECTRE_V2_CMD_IBRS:
+ mode = SPECTRE_V2_IBRS;
+ break;
+
case SPECTRE_V2_CMD_EIBRS:
mode = SPECTRE_V2_EIBRS;
break;
@@ -1293,7 +1327,7 @@ static void __init spectre_v2_select_mit
if (mode == SPECTRE_V2_EIBRS && unprivileged_ebpf_enabled())
pr_err(SPECTRE_V2_EIBRS_EBPF_MSG);

- if (spectre_v2_in_eibrs_mode(mode)) {
+ if (spectre_v2_in_ibrs_mode(mode)) {
/* Force it so VMEXIT will restore correctly */
x86_spec_ctrl_base |= SPEC_CTRL_IBRS;
write_spec_ctrl_current(x86_spec_ctrl_base, true);
@@ -1304,6 +1338,10 @@ static void __init spectre_v2_select_mit
case SPECTRE_V2_EIBRS:
break;

+ case SPECTRE_V2_IBRS:
+ setup_force_cpu_cap(X86_FEATURE_KERNEL_IBRS);
+ break;
+
case SPECTRE_V2_LFENCE:
case SPECTRE_V2_EIBRS_LFENCE:
setup_force_cpu_cap(X86_FEATURE_RETPOLINE_LFENCE);
@@ -1330,17 +1368,17 @@ static void __init spectre_v2_select_mit
pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");

/*
- * Retpoline means the kernel is safe because it has no indirect
- * branches. Enhanced IBRS protects firmware too, so, enable restricted
- * speculation around firmware calls only when Enhanced IBRS isn't
- * supported.
+ * Retpoline protects the kernel, but doesn't protect firmware. IBRS
+ * and Enhanced IBRS protect firmware too, so enable IBRS around
+ * firmware calls only when IBRS / Enhanced IBRS aren't otherwise
+ * enabled.
*
* Use "mode" to check Enhanced IBRS instead of boot_cpu_has(), because
* the user might select retpoline on the kernel command line and if
* the CPU supports Enhanced IBRS, kernel might un-intentionally not
* enable IBRS around firmware calls.
*/
- if (boot_cpu_has(X86_FEATURE_IBRS) && !spectre_v2_in_eibrs_mode(mode)) {
+ if (boot_cpu_has(X86_FEATURE_IBRS) && !spectre_v2_in_ibrs_mode(mode)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
pr_info("Enabling Restricted Speculation for firmware calls\n");
}
@@ -2082,7 +2120,7 @@ static ssize_t mmio_stale_data_show_stat

static char *stibp_state(void)
{
- if (spectre_v2_in_eibrs_mode(spectre_v2_enabled))
+ if (spectre_v2_in_ibrs_mode(spectre_v2_enabled))
return "";

switch (spectre_v2_user_stibp) {


2022-07-12 19:41:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 39/78] x86/entry: Avoid very early RET

From: Peter Zijlstra <[email protected]>

commit 7c81c0c9210c9bfab2bae76aab2999de5bad27db upstream.

Commit

ee774dac0da1 ("x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry()")

manages to introduce a CALL/RET pair that is before SWITCH_TO_KERNEL_CR3,
which means it is before RETBleed can be mitigated.

Revert to an earlier version of the commit in Fixes. Down side is that
this will bloat .text size somewhat. The alternative is fully reverting
it.

The purpose of this patch was to allow migrating error_entry() to C,
including the whole of kPTI. Much care needs to be taken moving that
forward to not re-introduce this problem of early RETs.

Fixes: ee774dac0da1 ("x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry()")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/entry/entry_64.S | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -314,14 +314,6 @@ SYM_CODE_END(ret_from_fork)
#endif
.endm

-/* Save all registers in pt_regs */
-SYM_CODE_START_LOCAL(push_and_clear_regs)
- UNWIND_HINT_FUNC
- PUSH_AND_CLEAR_REGS save_ret=1
- ENCODE_FRAME_POINTER 8
- RET
-SYM_CODE_END(push_and_clear_regs)
-
/**
* idtentry_body - Macro to emit code calling the C function
* @cfunc: C function to be called
@@ -329,8 +321,8 @@ SYM_CODE_END(push_and_clear_regs)
*/
.macro idtentry_body cfunc has_error_code:req

- call push_and_clear_regs
- UNWIND_HINT_REGS
+ PUSH_AND_CLEAR_REGS
+ ENCODE_FRAME_POINTER

/*
* Call error_entry() and switch to the task stack if from userspace.


2022-07-12 19:41:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 46/78] x86/entry: Add kernel IBRS implementation

From: Thadeu Lima de Souza Cascardo <[email protected]>

commit 2dbb887e875b1de3ca8f40ddf26bcfe55798c609 upstream.

Implement Kernel IBRS - currently the only known option to mitigate RSB
underflow speculation issues on Skylake hardware.

Note: since IBRS_ENTER requires fuller context established than
UNTRAIN_RET, it must be placed after it. However, since UNTRAIN_RET
itself implies a RET, it must come after IBRS_ENTER. This means
IBRS_ENTER needs to also move UNTRAIN_RET.

Note 2: KERNEL_IBRS is sub-optimal for XenPV.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: conflict at arch/x86/entry/entry_64_compat.S]
[cascardo: conflict fixups, no ANNOTATE_NOENDBR]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/entry/calling.h | 58 +++++++++++++++++++++++++++++++++++++
arch/x86/entry/entry_64.S | 44 ++++++++++++++++++++++++----
arch/x86/entry/entry_64_compat.S | 17 ++++++++--
arch/x86/include/asm/cpufeatures.h | 2 -
4 files changed, 111 insertions(+), 10 deletions(-)

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -7,6 +7,8 @@
#include <asm/asm-offsets.h>
#include <asm/processor-flags.h>
#include <asm/ptrace-abi.h>
+#include <asm/msr.h>
+#include <asm/nospec-branch.h>

/*

@@ -282,6 +284,62 @@ For 32-bit we have the following convent
#endif

/*
+ * IBRS kernel mitigation for Spectre_v2.
+ *
+ * Assumes full context is established (PUSH_REGS, CR3 and GS) and it clobbers
+ * the regs it uses (AX, CX, DX). Must be called before the first RET
+ * instruction (NOTE! UNTRAIN_RET includes a RET instruction)
+ *
+ * The optional argument is used to save/restore the current value,
+ * which is used on the paranoid paths.
+ *
+ * Assumes x86_spec_ctrl_{base,current} to have SPEC_CTRL_IBRS set.
+ */
+.macro IBRS_ENTER save_reg
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_KERNEL_IBRS
+ movl $MSR_IA32_SPEC_CTRL, %ecx
+
+.ifnb \save_reg
+ rdmsr
+ shl $32, %rdx
+ or %rdx, %rax
+ mov %rax, \save_reg
+ test $SPEC_CTRL_IBRS, %eax
+ jz .Ldo_wrmsr_\@
+ lfence
+ jmp .Lend_\@
+.Ldo_wrmsr_\@:
+.endif
+
+ movq PER_CPU_VAR(x86_spec_ctrl_current), %rdx
+ movl %edx, %eax
+ shr $32, %rdx
+ wrmsr
+.Lend_\@:
+.endm
+
+/*
+ * Similar to IBRS_ENTER, requires KERNEL GS,CR3 and clobbers (AX, CX, DX)
+ * regs. Must be called after the last RET.
+ */
+.macro IBRS_EXIT save_reg
+ ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_KERNEL_IBRS
+ movl $MSR_IA32_SPEC_CTRL, %ecx
+
+.ifnb \save_reg
+ mov \save_reg, %rdx
+.else
+ movq PER_CPU_VAR(x86_spec_ctrl_current), %rdx
+ andl $(~SPEC_CTRL_IBRS), %edx
+.endif
+
+ movl %edx, %eax
+ shr $32, %rdx
+ wrmsr
+.Lend_\@:
+.endm
+
+/*
* Mitigate Spectre v1 for conditional swapgs code paths.
*
* FENCE_SWAPGS_USER_ENTRY is used in the user entry swapgs code path, to
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -94,7 +94,6 @@ SYM_CODE_START(entry_SYSCALL_64)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

SYM_INNER_LABEL(entry_SYSCALL_64_safe_stack, SYM_L_GLOBAL)
- UNTRAIN_RET

/* Construct struct pt_regs on stack */
pushq $__USER_DS /* pt_regs->ss */
@@ -111,6 +110,11 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_h
movq %rsp, %rdi
/* Sign extend the lower 32bit as syscall numbers are treated as int */
movslq %eax, %rsi
+
+ /* clobbers %rax, make sure it is after saving the syscall nr */
+ IBRS_ENTER
+ UNTRAIN_RET
+
call do_syscall_64 /* returns with IRQs disabled */

/*
@@ -190,6 +194,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_h
* perf profiles. Nothing jumps here.
*/
syscall_return_via_sysret:
+ IBRS_EXIT
POP_REGS pop_rdi=0

/*
@@ -582,6 +587,7 @@ __irqentry_text_end:

SYM_CODE_START_LOCAL(common_interrupt_return)
SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
+ IBRS_EXIT
#ifdef CONFIG_DEBUG_ENTRY
/* Assert that pt_regs indicates user mode. */
testb $3, CS(%rsp)
@@ -861,6 +867,9 @@ SYM_CODE_END(xen_failsafe_callback)
* 1 -> no SWAPGS on exit
*
* Y GSBASE value at entry, must be restored in paranoid_exit
+ *
+ * R14 - old CR3
+ * R15 - old SPEC_CTRL
*/
SYM_CODE_START_LOCAL(paranoid_entry)
UNWIND_HINT_FUNC
@@ -884,7 +893,6 @@ SYM_CODE_START_LOCAL(paranoid_entry)
* be retrieved from a kernel internal table.
*/
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
- UNTRAIN_RET

/*
* Handling GSBASE depends on the availability of FSGSBASE.
@@ -906,7 +914,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
* is needed here.
*/
SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
- RET
+ jmp .Lparanoid_gsbase_done

.Lparanoid_entry_checkgs:
/* EBX = 1 -> kernel GSBASE active, no restore required */
@@ -925,8 +933,16 @@ SYM_CODE_START_LOCAL(paranoid_entry)
xorl %ebx, %ebx
swapgs
.Lparanoid_kernel_gsbase:
-
FENCE_SWAPGS_KERNEL_ENTRY
+.Lparanoid_gsbase_done:
+
+ /*
+ * Once we have CR3 and %GS setup save and set SPEC_CTRL. Just like
+ * CR3 above, keep the old value in a callee saved register.
+ */
+ IBRS_ENTER save_reg=%r15
+ UNTRAIN_RET
+
RET
SYM_CODE_END(paranoid_entry)

@@ -948,9 +964,19 @@ SYM_CODE_END(paranoid_entry)
* 1 -> no SWAPGS on exit
*
* Y User space GSBASE, must be restored unconditionally
+ *
+ * R14 - old CR3
+ * R15 - old SPEC_CTRL
*/
SYM_CODE_START_LOCAL(paranoid_exit)
UNWIND_HINT_REGS
+
+ /*
+ * Must restore IBRS state before both CR3 and %GS since we need access
+ * to the per-CPU x86_spec_ctrl_shadow variable.
+ */
+ IBRS_EXIT save_reg=%r15
+
/*
* The order of operations is important. RESTORE_CR3 requires
* kernel GSBASE.
@@ -995,10 +1021,12 @@ SYM_CODE_START_LOCAL(error_entry)
FENCE_SWAPGS_USER_ENTRY
/* We have user CR3. Change to kernel CR3. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+ IBRS_ENTER
UNTRAIN_RET

leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
.Lerror_entry_from_usermode_after_swapgs:
+
/* Put us onto the real thread stack. */
call sync_regs
RET
@@ -1048,6 +1076,7 @@ SYM_CODE_START_LOCAL(error_entry)
SWAPGS
FENCE_SWAPGS_USER_ENTRY
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
+ IBRS_ENTER
UNTRAIN_RET

/*
@@ -1143,7 +1172,6 @@ SYM_CODE_START(asm_exc_nmi)
movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
UNWIND_HINT_IRET_REGS base=%rdx offset=8
- UNTRAIN_RET
pushq 5*8(%rdx) /* pt_regs->ss */
pushq 4*8(%rdx) /* pt_regs->rsp */
pushq 3*8(%rdx) /* pt_regs->flags */
@@ -1154,6 +1182,9 @@ SYM_CODE_START(asm_exc_nmi)
PUSH_AND_CLEAR_REGS rdx=(%rdx)
ENCODE_FRAME_POINTER

+ IBRS_ENTER
+ UNTRAIN_RET
+
/*
* At this point we no longer need to worry about stack damage
* due to nesting -- we're on the normal thread stack and we're
@@ -1376,6 +1407,9 @@ end_repeat_nmi:
movq $-1, %rsi
call exc_nmi

+ /* Always restore stashed SPEC_CTRL value (see paranoid_entry) */
+ IBRS_EXIT save_reg=%r15
+
/* Always restore stashed CR3 value (see paranoid_entry) */
RESTORE_CR3 scratch_reg=%r15 save_reg=%r14

--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -4,7 +4,6 @@
*
* Copyright 2000-2002 Andi Kleen, SuSE Labs.
*/
-#include "calling.h"
#include <asm/asm-offsets.h>
#include <asm/current.h>
#include <asm/errno.h>
@@ -18,6 +17,8 @@
#include <linux/linkage.h>
#include <linux/err.h>

+#include "calling.h"
+
.section .entry.text, "ax"

/*
@@ -72,7 +73,6 @@ SYM_CODE_START(entry_SYSENTER_compat)
pushq $__USER32_CS /* pt_regs->cs */
pushq $0 /* pt_regs->ip = 0 (placeholder) */
SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
- UNTRAIN_RET

/*
* User tracing code (ptrace or signal handlers) might assume that
@@ -114,6 +114,9 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_af

cld

+ IBRS_ENTER
+ UNTRAIN_RET
+
/*
* SYSENTER doesn't filter flags, so we need to clear NT and AC
* ourselves. To save a few cycles, we can check whether
@@ -213,7 +216,6 @@ SYM_CODE_START(entry_SYSCALL_compat)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

SYM_INNER_LABEL(entry_SYSCALL_compat_safe_stack, SYM_L_GLOBAL)
- UNTRAIN_RET

/* Construct struct pt_regs on stack */
pushq $__USER32_DS /* pt_regs->ss */
@@ -255,6 +257,9 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_aft

UNWIND_HINT_REGS

+ IBRS_ENTER
+ UNTRAIN_RET
+
movq %rsp, %rdi
call do_fast_syscall_32
/* XEN PV guests always use IRET path */
@@ -269,6 +274,8 @@ sysret32_from_system_call:
*/
STACKLEAK_ERASE

+ IBRS_EXIT
+
movq RBX(%rsp), %rbx /* pt_regs->rbx */
movq RBP(%rsp), %rbp /* pt_regs->rbp */
movq EFLAGS(%rsp), %r11 /* pt_regs->flags (in r11) */
@@ -380,7 +387,6 @@ SYM_CODE_START(entry_INT80_compat)
pushq (%rdi) /* pt_regs->di */
.Lint80_keep_stack:

- UNTRAIN_RET
pushq %rsi /* pt_regs->si */
xorl %esi, %esi /* nospec si */
pushq %rdx /* pt_regs->dx */
@@ -413,6 +419,9 @@ SYM_CODE_START(entry_INT80_compat)

cld

+ IBRS_ENTER
+ UNTRAIN_RET
+
movq %rsp, %rdi
call do_int80_syscall_32
jmp swapgs_restore_regs_and_return_to_usermode
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -203,7 +203,7 @@
#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
/* FREE! ( 7*32+10) */
#define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */
-/* FREE! ( 7*32+12) */
+#define X86_FEATURE_KERNEL_IBRS ( 7*32+12) /* "" Set/clear IBRS on kernel entry/exit */
/* FREE! ( 7*32+13) */
#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */
#define X86_FEATURE_CDP_L2 ( 7*32+15) /* Code and Data Prioritization L2 */


2022-07-12 19:41:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 09/78] objtool: Introduce CFI hash

From: Peter Zijlstra <[email protected]>

commit 8b946cc38e063f0f7bb67789478c38f6d7d457c9 upstream.

Andi reported that objtool on vmlinux.o consumes more memory than his
system has, leading to horrific performance.

This is in part because we keep a struct instruction for every
instruction in the file in-memory. Shrink struct instruction by
removing the CFI state (which includes full register state) from it
and demand allocating it.

Given most instructions don't actually change CFI state, there's lots
of repetition there, so add a hash table to find previous CFI
instances.

Reduces memory consumption (and runtime) for processing an
x86_64-allyesconfig:

pre: 4:40.84 real, 143.99 user, 44.18 sys, 30624988 mem
post: 2:14.61 real, 108.58 user, 25.04 sys, 16396184 mem

Suggested-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/objtool/arch/x86/decode.c | 20 +---
tools/objtool/check.c | 154 ++++++++++++++++++++++++++++++----
tools/objtool/include/objtool/arch.h | 2
tools/objtool/include/objtool/cfi.h | 2
tools/objtool/include/objtool/check.h | 2
tools/objtool/orc_gen.c | 15 ++-
6 files changed, 160 insertions(+), 35 deletions(-)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -684,34 +684,32 @@ const char *arch_ret_insn(int len)
return ret[len-1];
}

-int arch_decode_hint_reg(struct instruction *insn, u8 sp_reg)
+int arch_decode_hint_reg(u8 sp_reg, int *base)
{
- struct cfi_reg *cfa = &insn->cfi.cfa;
-
switch (sp_reg) {
case ORC_REG_UNDEFINED:
- cfa->base = CFI_UNDEFINED;
+ *base = CFI_UNDEFINED;
break;
case ORC_REG_SP:
- cfa->base = CFI_SP;
+ *base = CFI_SP;
break;
case ORC_REG_BP:
- cfa->base = CFI_BP;
+ *base = CFI_BP;
break;
case ORC_REG_SP_INDIRECT:
- cfa->base = CFI_SP_INDIRECT;
+ *base = CFI_SP_INDIRECT;
break;
case ORC_REG_R10:
- cfa->base = CFI_R10;
+ *base = CFI_R10;
break;
case ORC_REG_R13:
- cfa->base = CFI_R13;
+ *base = CFI_R13;
break;
case ORC_REG_DI:
- cfa->base = CFI_DI;
+ *base = CFI_DI;
break;
case ORC_REG_DX:
- cfa->base = CFI_DX;
+ *base = CFI_DX;
break;
default:
return -1;
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -6,6 +6,7 @@
#include <string.h>
#include <stdlib.h>
#include <inttypes.h>
+#include <sys/mman.h>

#include <arch/elf.h>
#include <objtool/builtin.h>
@@ -27,7 +28,11 @@ struct alternative {
bool skip_orig;
};

-struct cfi_init_state initial_func_cfi;
+static unsigned long nr_cfi, nr_cfi_reused, nr_cfi_cache;
+
+static struct cfi_init_state initial_func_cfi;
+static struct cfi_state init_cfi;
+static struct cfi_state func_cfi;

struct instruction *find_insn(struct objtool_file *file,
struct section *sec, unsigned long offset)
@@ -267,6 +272,78 @@ static void init_insn_state(struct insn_
state->noinstr = sec->noinstr;
}

+static struct cfi_state *cfi_alloc(void)
+{
+ struct cfi_state *cfi = calloc(sizeof(struct cfi_state), 1);
+ if (!cfi) {
+ WARN("calloc failed");
+ exit(1);
+ }
+ nr_cfi++;
+ return cfi;
+}
+
+static int cfi_bits;
+static struct hlist_head *cfi_hash;
+
+static inline bool cficmp(struct cfi_state *cfi1, struct cfi_state *cfi2)
+{
+ return memcmp((void *)cfi1 + sizeof(cfi1->hash),
+ (void *)cfi2 + sizeof(cfi2->hash),
+ sizeof(struct cfi_state) - sizeof(struct hlist_node));
+}
+
+static inline u32 cfi_key(struct cfi_state *cfi)
+{
+ return jhash((void *)cfi + sizeof(cfi->hash),
+ sizeof(*cfi) - sizeof(cfi->hash), 0);
+}
+
+static struct cfi_state *cfi_hash_find_or_add(struct cfi_state *cfi)
+{
+ struct hlist_head *head = &cfi_hash[hash_min(cfi_key(cfi), cfi_bits)];
+ struct cfi_state *obj;
+
+ hlist_for_each_entry(obj, head, hash) {
+ if (!cficmp(cfi, obj)) {
+ nr_cfi_cache++;
+ return obj;
+ }
+ }
+
+ obj = cfi_alloc();
+ *obj = *cfi;
+ hlist_add_head(&obj->hash, head);
+
+ return obj;
+}
+
+static void cfi_hash_add(struct cfi_state *cfi)
+{
+ struct hlist_head *head = &cfi_hash[hash_min(cfi_key(cfi), cfi_bits)];
+
+ hlist_add_head(&cfi->hash, head);
+}
+
+static void *cfi_hash_alloc(unsigned long size)
+{
+ cfi_bits = max(10, ilog2(size));
+ cfi_hash = mmap(NULL, sizeof(struct hlist_head) << cfi_bits,
+ PROT_READ|PROT_WRITE,
+ MAP_PRIVATE|MAP_ANON, -1, 0);
+ if (cfi_hash == (void *)-1L) {
+ WARN("mmap fail cfi_hash");
+ cfi_hash = NULL;
+ } else if (stats) {
+ printf("cfi_bits: %d\n", cfi_bits);
+ }
+
+ return cfi_hash;
+}
+
+static unsigned long nr_insns;
+static unsigned long nr_insns_visited;
+
/*
* Call the arch-specific instruction decoder for all the instructions and add
* them to the global instruction list.
@@ -277,7 +354,6 @@ static int decode_instructions(struct ob
struct symbol *func;
unsigned long offset;
struct instruction *insn;
- unsigned long nr_insns = 0;
int ret;

for_each_sec(file, sec) {
@@ -303,7 +379,6 @@ static int decode_instructions(struct ob
memset(insn, 0, sizeof(*insn));
INIT_LIST_HEAD(&insn->alts);
INIT_LIST_HEAD(&insn->stack_ops);
- init_cfi_state(&insn->cfi);

insn->sec = sec;
insn->offset = offset;
@@ -1239,7 +1314,6 @@ static int handle_group_alt(struct objto
memset(nop, 0, sizeof(*nop));
INIT_LIST_HEAD(&nop->alts);
INIT_LIST_HEAD(&nop->stack_ops);
- init_cfi_state(&nop->cfi);

nop->sec = special_alt->new_sec;
nop->offset = special_alt->new_off + special_alt->new_len;
@@ -1648,10 +1722,11 @@ static void set_func_state(struct cfi_st

static int read_unwind_hints(struct objtool_file *file)
{
+ struct cfi_state cfi = init_cfi;
struct section *sec, *relocsec;
- struct reloc *reloc;
struct unwind_hint *hint;
struct instruction *insn;
+ struct reloc *reloc;
int i;

sec = find_section_by_name(file->elf, ".discard.unwind_hints");
@@ -1689,19 +1764,24 @@ static int read_unwind_hints(struct objt
insn->hint = true;

if (hint->type == UNWIND_HINT_TYPE_FUNC) {
- set_func_state(&insn->cfi);
+ insn->cfi = &func_cfi;
continue;
}

- if (arch_decode_hint_reg(insn, hint->sp_reg)) {
+ if (insn->cfi)
+ cfi = *(insn->cfi);
+
+ if (arch_decode_hint_reg(hint->sp_reg, &cfi.cfa.base)) {
WARN_FUNC("unsupported unwind_hint sp base reg %d",
insn->sec, insn->offset, hint->sp_reg);
return -1;
}

- insn->cfi.cfa.offset = bswap_if_needed(hint->sp_offset);
- insn->cfi.type = hint->type;
- insn->cfi.end = hint->end;
+ cfi.cfa.offset = bswap_if_needed(hint->sp_offset);
+ cfi.type = hint->type;
+ cfi.end = hint->end;
+
+ insn->cfi = cfi_hash_find_or_add(&cfi);
}

return 0;
@@ -2552,13 +2632,18 @@ static int propagate_alt_cfi(struct objt
if (!insn->alt_group)
return 0;

+ if (!insn->cfi) {
+ WARN("CFI missing");
+ return -1;
+ }
+
alt_cfi = insn->alt_group->cfi;
group_off = insn->offset - insn->alt_group->first_insn->offset;

if (!alt_cfi[group_off]) {
- alt_cfi[group_off] = &insn->cfi;
+ alt_cfi[group_off] = insn->cfi;
} else {
- if (memcmp(alt_cfi[group_off], &insn->cfi, sizeof(struct cfi_state))) {
+ if (cficmp(alt_cfi[group_off], insn->cfi)) {
WARN_FUNC("stack layout conflict in alternatives",
insn->sec, insn->offset);
return -1;
@@ -2609,9 +2694,14 @@ static int handle_insn_ops(struct instru

static bool insn_cfi_match(struct instruction *insn, struct cfi_state *cfi2)
{
- struct cfi_state *cfi1 = &insn->cfi;
+ struct cfi_state *cfi1 = insn->cfi;
int i;

+ if (!cfi1) {
+ WARN("CFI missing");
+ return false;
+ }
+
if (memcmp(&cfi1->cfa, &cfi2->cfa, sizeof(cfi1->cfa))) {

WARN_FUNC("stack state mismatch: cfa1=%d%+d cfa2=%d%+d",
@@ -2796,7 +2886,7 @@ static int validate_branch(struct objtoo
struct instruction *insn, struct insn_state state)
{
struct alternative *alt;
- struct instruction *next_insn;
+ struct instruction *next_insn, *prev_insn = NULL;
struct section *sec;
u8 visited;
int ret;
@@ -2825,15 +2915,25 @@ static int validate_branch(struct objtoo

if (insn->visited & visited)
return 0;
+ } else {
+ nr_insns_visited++;
}

if (state.noinstr)
state.instr += insn->instr;

- if (insn->hint)
- state.cfi = insn->cfi;
- else
- insn->cfi = state.cfi;
+ if (insn->hint) {
+ state.cfi = *insn->cfi;
+ } else {
+ /* XXX track if we actually changed state.cfi */
+
+ if (prev_insn && !cficmp(prev_insn->cfi, &state.cfi)) {
+ insn->cfi = prev_insn->cfi;
+ nr_cfi_reused++;
+ } else {
+ insn->cfi = cfi_hash_find_or_add(&state.cfi);
+ }
+ }

insn->visited |= visited;

@@ -2997,6 +3097,7 @@ static int validate_branch(struct objtoo
return 1;
}

+ prev_insn = insn;
insn = next_insn;
}

@@ -3252,10 +3353,20 @@ int check(struct objtool_file *file)
int ret, warnings = 0;

arch_initial_func_cfi_state(&initial_func_cfi);
+ init_cfi_state(&init_cfi);
+ init_cfi_state(&func_cfi);
+ set_func_state(&func_cfi);
+
+ if (!cfi_hash_alloc(1UL << (file->elf->symbol_bits - 3)))
+ goto out;
+
+ cfi_hash_add(&init_cfi);
+ cfi_hash_add(&func_cfi);

ret = decode_sections(file);
if (ret < 0)
goto out;
+
warnings += ret;

if (list_empty(&file->insn_list))
@@ -3313,6 +3424,13 @@ int check(struct objtool_file *file)
warnings += ret;
}

+ if (stats) {
+ printf("nr_insns_visited: %ld\n", nr_insns_visited);
+ printf("nr_cfi: %ld\n", nr_cfi);
+ printf("nr_cfi_reused: %ld\n", nr_cfi_reused);
+ printf("nr_cfi_cache: %ld\n", nr_cfi_cache);
+ }
+
out:
/*
* For now, don't fail the kernel build on fatal warnings. These
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -85,7 +85,7 @@ unsigned long arch_dest_reloc_offset(int
const char *arch_nop_insn(int len);
const char *arch_ret_insn(int len);

-int arch_decode_hint_reg(struct instruction *insn, u8 sp_reg);
+int arch_decode_hint_reg(u8 sp_reg, int *base);

bool arch_is_retpoline(struct symbol *sym);

--- a/tools/objtool/include/objtool/cfi.h
+++ b/tools/objtool/include/objtool/cfi.h
@@ -7,6 +7,7 @@
#define _OBJTOOL_CFI_H

#include <arch/cfi_regs.h>
+#include <linux/list.h>

#define CFI_UNDEFINED -1
#define CFI_CFA -2
@@ -24,6 +25,7 @@ struct cfi_init_state {
};

struct cfi_state {
+ struct hlist_node hash; /* must be first, cficmp() */
struct cfi_reg regs[CFI_NUM_REGS];
struct cfi_reg vals[CFI_NUM_REGS];
struct cfi_reg cfa;
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -59,7 +59,7 @@ struct instruction {
struct list_head alts;
struct symbol *func;
struct list_head stack_ops;
- struct cfi_state cfi;
+ struct cfi_state *cfi;
};

static inline bool is_static_jump(struct instruction *insn)
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -13,13 +13,19 @@
#include <objtool/warn.h>
#include <objtool/endianness.h>

-static int init_orc_entry(struct orc_entry *orc, struct cfi_state *cfi)
+static int init_orc_entry(struct orc_entry *orc, struct cfi_state *cfi,
+ struct instruction *insn)
{
- struct instruction *insn = container_of(cfi, struct instruction, cfi);
struct cfi_reg *bp = &cfi->regs[CFI_BP];

memset(orc, 0, sizeof(*orc));

+ if (!cfi) {
+ orc->end = 0;
+ orc->sp_reg = ORC_REG_UNDEFINED;
+ return 0;
+ }
+
orc->end = cfi->end;

if (cfi->cfa.base == CFI_UNDEFINED) {
@@ -162,7 +168,7 @@ int orc_create(struct objtool_file *file
int i;

if (!alt_group) {
- if (init_orc_entry(&orc, &insn->cfi))
+ if (init_orc_entry(&orc, insn->cfi, insn))
return -1;
if (!memcmp(&prev_orc, &orc, sizeof(orc)))
continue;
@@ -186,7 +192,8 @@ int orc_create(struct objtool_file *file
struct cfi_state *cfi = alt_group->cfi[i];
if (!cfi)
continue;
- if (init_orc_entry(&orc, cfi))
+ /* errors are reported on the original insn */
+ if (init_orc_entry(&orc, cfi, insn))
return -1;
if (!memcmp(&prev_orc, &orc, sizeof(orc)))
continue;


2022-07-12 19:41:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 58/78] x86/cpu/amd: Add Spectral Chicken

From: Peter Zijlstra <[email protected]>

commit d7caac991feeef1b871ee6988fd2c9725df09039 upstream.

Zen2 uarchs have an undocumented, unnamed, MSR that contains a chicken
bit for some speculation behaviour. It needs setting.

Note: very belatedly AMD released naming; it's now officially called
MSR_AMD64_DE_CFG2 and MSR_AMD64_DE_CFG2_SUPPRESS_NOBR_PRED_BIT
but shall remain the SPECTRAL CHICKEN.

Suggested-by: Andrew Cooper <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/msr-index.h | 3 +++
arch/x86/kernel/cpu/amd.c | 23 ++++++++++++++++++++++-
arch/x86/kernel/cpu/cpu.h | 2 ++
arch/x86/kernel/cpu/hygon.c | 6 ++++++
4 files changed, 33 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -515,6 +515,9 @@
/* Fam 17h MSRs */
#define MSR_F17H_IRPERF 0xc00000e9

+#define MSR_ZEN2_SPECTRAL_CHICKEN 0xc00110e3
+#define MSR_ZEN2_SPECTRAL_CHICKEN_BIT BIT_ULL(1)
+
/* Fam 16h MSRs */
#define MSR_F16H_L2I_PERF_CTL 0xc0010230
#define MSR_F16H_L2I_PERF_CTR 0xc0010231
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -886,6 +886,26 @@ static void init_amd_bd(struct cpuinfo_x
clear_rdrand_cpuid_bit(c);
}

+void init_spectral_chicken(struct cpuinfo_x86 *c)
+{
+ u64 value;
+
+ /*
+ * On Zen2 we offer this chicken (bit) on the altar of Speculation.
+ *
+ * This suppresses speculation from the middle of a basic block, i.e. it
+ * suppresses non-branch predictions.
+ *
+ * We use STIBP as a heuristic to filter out Zen2 from the rest of F17H
+ */
+ if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && cpu_has(c, X86_FEATURE_AMD_STIBP)) {
+ if (!rdmsrl_safe(MSR_ZEN2_SPECTRAL_CHICKEN, &value)) {
+ value |= MSR_ZEN2_SPECTRAL_CHICKEN_BIT;
+ wrmsrl_safe(MSR_ZEN2_SPECTRAL_CHICKEN, value);
+ }
+ }
+}
+
static void init_amd_zn(struct cpuinfo_x86 *c)
{
set_cpu_cap(c, X86_FEATURE_ZEN);
@@ -931,7 +951,8 @@ static void init_amd(struct cpuinfo_x86
case 0x12: init_amd_ln(c); break;
case 0x15: init_amd_bd(c); break;
case 0x16: init_amd_jg(c); break;
- case 0x17: fallthrough;
+ case 0x17: init_spectral_chicken(c);
+ fallthrough;
case 0x19: init_amd_zn(c); break;
}

--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -61,6 +61,8 @@ static inline void tsx_init(void) { }
static inline void tsx_ap_init(void) { }
#endif /* CONFIG_CPU_SUP_INTEL */

+extern void init_spectral_chicken(struct cpuinfo_x86 *c);
+
extern void get_cpu_cap(struct cpuinfo_x86 *c);
extern void get_cpu_address_sizes(struct cpuinfo_x86 *c);
extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -302,6 +302,12 @@ static void init_hygon(struct cpuinfo_x8
/* get apicid instead of initial apic id from cpuid */
c->apicid = hard_smp_processor_id();

+ /*
+ * XXX someone from Hygon needs to confirm this DTRT
+ *
+ init_spectral_chicken(c);
+ */
+
set_cpu_cap(c, X86_FEATURE_ZEN);
set_cpu_cap(c, X86_FEATURE_CPB);



2022-07-12 19:42:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 42/78] x86/bugs: Report AMD retbleed vulnerability

From: Alexandre Chartre <[email protected]>

commit 6b80b59b3555706508008f1f127b5412c89c7fd8 upstream.

Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary
Speculative Code Execution with Return Instructions) attack.

[peterz: add hygon]
[kim: invert parity; fam15h]

Co-developed-by: Kim Phillips <[email protected]>
Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Alexandre Chartre <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/bugs.c | 13 +++++++++++++
arch/x86/kernel/cpu/common.c | 19 +++++++++++++++++++
drivers/base/cpu.c | 8 ++++++++
include/linux/cpu.h | 2 ++
5 files changed, 43 insertions(+)

--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -443,5 +443,6 @@
#define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
#define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */
#define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */
+#define X86_BUG_RETBLEED X86_BUG(26) /* CPU is affected by RETBleed */

#endif /* _ASM_X86_CPUFEATURES_H */
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1987,6 +1987,11 @@ static ssize_t srbds_show_state(char *bu
return sprintf(buf, "%s\n", srbds_strings[srbds_mitigation]);
}

+static ssize_t retbleed_show_state(char *buf)
+{
+ return sprintf(buf, "Vulnerable\n");
+}
+
static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr,
char *buf, unsigned int bug)
{
@@ -2032,6 +2037,9 @@ static ssize_t cpu_show_common(struct de
case X86_BUG_MMIO_STALE_DATA:
return mmio_stale_data_show_state(buf);

+ case X86_BUG_RETBLEED:
+ return retbleed_show_state(buf);
+
default:
break;
}
@@ -2088,4 +2096,9 @@ ssize_t cpu_show_mmio_stale_data(struct
{
return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA);
}
+
+ssize_t cpu_show_retbleed(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ return cpu_show_common(dev, attr, buf, X86_BUG_RETBLEED);
+}
#endif
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1095,16 +1095,27 @@ static const __initconst struct x86_cpu_
{}
};

+#define VULNBL(vendor, family, model, blacklist) \
+ X86_MATCH_VENDOR_FAM_MODEL(vendor, family, model, blacklist)
+
#define VULNBL_INTEL_STEPPINGS(model, steppings, issues) \
X86_MATCH_VENDOR_FAM_MODEL_STEPPINGS_FEATURE(INTEL, 6, \
INTEL_FAM6_##model, steppings, \
X86_FEATURE_ANY, issues)

+#define VULNBL_AMD(family, blacklist) \
+ VULNBL(AMD, family, X86_MODEL_ANY, blacklist)
+
+#define VULNBL_HYGON(family, blacklist) \
+ VULNBL(HYGON, family, X86_MODEL_ANY, blacklist)
+
#define SRBDS BIT(0)
/* CPU is affected by X86_BUG_MMIO_STALE_DATA */
#define MMIO BIT(1)
/* CPU is affected by Shared Buffers Data Sampling (SBDS), a variant of X86_BUG_MMIO_STALE_DATA */
#define MMIO_SBDS BIT(2)
+/* CPU is affected by RETbleed, speculating where you would not expect it */
+#define RETBLEED BIT(3)

static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = {
VULNBL_INTEL_STEPPINGS(IVYBRIDGE, X86_STEPPING_ANY, SRBDS),
@@ -1137,6 +1148,11 @@ static const struct x86_cpu_id cpu_vuln_
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPINGS(0x0, 0x0), MMIO | MMIO_SBDS),
+
+ VULNBL_AMD(0x15, RETBLEED),
+ VULNBL_AMD(0x16, RETBLEED),
+ VULNBL_AMD(0x17, RETBLEED),
+ VULNBL_HYGON(0x18, RETBLEED),
{}
};

@@ -1238,6 +1254,9 @@ static void __init cpu_set_bug_bits(stru
!arch_cap_mmio_immune(ia32_cap))
setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);

+ if (cpu_matches(cpu_vuln_blacklist, RETBLEED))
+ setup_force_cpu_bug(X86_BUG_RETBLEED);
+
if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))
return;

--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -570,6 +570,12 @@ ssize_t __weak cpu_show_mmio_stale_data(
return sysfs_emit(buf, "Not affected\n");
}

+ssize_t __weak cpu_show_retbleed(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "Not affected\n");
+}
+
static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
@@ -580,6 +586,7 @@ static DEVICE_ATTR(tsx_async_abort, 0444
static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL);
static DEVICE_ATTR(srbds, 0444, cpu_show_srbds, NULL);
static DEVICE_ATTR(mmio_stale_data, 0444, cpu_show_mmio_stale_data, NULL);
+static DEVICE_ATTR(retbleed, 0444, cpu_show_retbleed, NULL);

static struct attribute *cpu_root_vulnerabilities_attrs[] = {
&dev_attr_meltdown.attr,
@@ -592,6 +599,7 @@ static struct attribute *cpu_root_vulner
&dev_attr_itlb_multihit.attr,
&dev_attr_srbds.attr,
&dev_attr_mmio_stale_data.attr,
+ &dev_attr_retbleed.attr,
NULL
};

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -68,6 +68,8 @@ extern ssize_t cpu_show_srbds(struct dev
extern ssize_t cpu_show_mmio_stale_data(struct device *dev,
struct device_attribute *attr,
char *buf);
+extern ssize_t cpu_show_retbleed(struct device *dev,
+ struct device_attribute *attr, char *buf);

extern __printf(4, 5)
struct device *cpu_device_create(struct device *parent, void *drvdata,


2022-07-12 19:42:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 52/78] objtool: Update Retpoline validation

From: Peter Zijlstra <[email protected]>

commit 9bb2ec608a209018080ca262f771e6a9ff203b6f upstream.

Update retpoline validation with the new CONFIG_RETPOLINE requirement of
not having bare naked RET instructions.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: conflict fixup at arch/x86/xen/xen-head.S]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 6 ++++++
arch/x86/mm/mem_encrypt_boot.S | 2 ++
arch/x86/xen/xen-head.S | 1 +
tools/objtool/check.c | 19 +++++++++++++------
4 files changed, 22 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -76,6 +76,12 @@
.endm

/*
+ * (ab)use RETPOLINE_SAFE on RET to annotate away 'bare' RET instructions
+ * vs RETBleed validation.
+ */
+#define ANNOTATE_UNRET_SAFE ANNOTATE_RETPOLINE_SAFE
+
+/*
* JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple
* indirect jmp/call which may be susceptible to the Spectre variant 2
* attack.
--- a/arch/x86/mm/mem_encrypt_boot.S
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -66,6 +66,7 @@ SYM_FUNC_START(sme_encrypt_execute)
pop %rbp

/* Offset to __x86_return_thunk would be wrong here */
+ ANNOTATE_UNRET_SAFE
ret
int3
SYM_FUNC_END(sme_encrypt_execute)
@@ -154,6 +155,7 @@ SYM_FUNC_START(__enc_copy)
pop %r15

/* Offset to __x86_return_thunk would be wrong here */
+ ANNOTATE_UNRET_SAFE
ret
int3
.L__enc_copy_end:
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -70,6 +70,7 @@ SYM_CODE_START(hypercall_page)
.rept (PAGE_SIZE / 32)
UNWIND_HINT_FUNC
.skip 31, 0x90
+ ANNOTATE_UNRET_SAFE
RET
.endr

--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1894,8 +1894,9 @@ static int read_retpoline_hints(struct o
}

if (insn->type != INSN_JUMP_DYNAMIC &&
- insn->type != INSN_CALL_DYNAMIC) {
- WARN_FUNC("retpoline_safe hint not an indirect jump/call",
+ insn->type != INSN_CALL_DYNAMIC &&
+ insn->type != INSN_RETURN) {
+ WARN_FUNC("retpoline_safe hint not an indirect jump/call/ret",
insn->sec, insn->offset);
return -1;
}
@@ -3229,7 +3230,8 @@ static int validate_retpoline(struct obj

for_each_insn(file, insn) {
if (insn->type != INSN_JUMP_DYNAMIC &&
- insn->type != INSN_CALL_DYNAMIC)
+ insn->type != INSN_CALL_DYNAMIC &&
+ insn->type != INSN_RETURN)
continue;

if (insn->retpoline_safe)
@@ -3244,9 +3246,14 @@ static int validate_retpoline(struct obj
if (!strcmp(insn->sec->name, ".init.text") && !module)
continue;

- WARN_FUNC("indirect %s found in RETPOLINE build",
- insn->sec, insn->offset,
- insn->type == INSN_JUMP_DYNAMIC ? "jump" : "call");
+ if (insn->type == INSN_RETURN) {
+ WARN_FUNC("'naked' return found in RETPOLINE build",
+ insn->sec, insn->offset);
+ } else {
+ WARN_FUNC("indirect %s found in RETPOLINE build",
+ insn->sec, insn->offset,
+ insn->type == INSN_JUMP_DYNAMIC ? "jump" : "call");
+ }

warnings++;
}


2022-07-12 19:44:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 57/78] objtool: Add entry UNRET validation

From: Peter Zijlstra <[email protected]>

commit a09a6e2399ba0595c3042b3164f3ca68a3cff33e upstream.

Since entry asm is tricky, add a validation pass that ensures the
retbleed mitigation has been done before the first actual RET
instruction.

Entry points are those that either have UNWIND_HINT_ENTRY, which acts
as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or
those that have UWIND_HINT_IRET_REGS at +0.

This is basically a variant of validate_branch() that is
intra-function and it will simply follow all branches from marked
entry points and ensures that all paths lead to ANNOTATE_UNRET_END.

If a path hits RET or an indirection the path is a fail and will be
reported.

There are 3 ANNOTATE_UNRET_END instances:

- UNTRAIN_RET itself
- exception from-kernel; this path doesn't need UNTRAIN_RET
- all early exceptions; these also don't need UNTRAIN_RET

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: tools/objtool/builtin-check.c no link option validation]
[cascardo: tools/objtool/check.c opts.ibt is ibt]
[cascardo: tools/objtool/include/objtool/builtin.h leave unret option as bool, no struct opts]
[cascardo: objtool is still called from scripts/link-vmlinux.sh]
[cascardo: no IBT support]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/entry/entry_64.S | 3
arch/x86/entry/entry_64_compat.S | 6 -
arch/x86/include/asm/nospec-branch.h | 12 ++
arch/x86/include/asm/unwind_hints.h | 4
arch/x86/kernel/head_64.S | 5
arch/x86/xen/xen-asm.S | 10 -
include/linux/objtool.h | 3
scripts/link-vmlinux.sh | 3
tools/include/linux/objtool.h | 3
tools/objtool/builtin-check.c | 3
tools/objtool/check.c | 172 +++++++++++++++++++++++++++++++-
tools/objtool/include/objtool/builtin.h | 2
tools/objtool/include/objtool/check.h | 6 +
13 files changed, 217 insertions(+), 15 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -85,7 +85,7 @@
*/

SYM_CODE_START(entry_SYSCALL_64)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY

swapgs
/* tss.sp2 is scratch space. */
@@ -1067,6 +1067,7 @@ SYM_CODE_START_LOCAL(error_entry)
.Lerror_entry_done_lfence:
FENCE_SWAPGS_KERNEL_ENTRY
leaq 8(%rsp), %rax /* return pt_regs pointer */
+ ANNOTATE_UNRET_END
RET

.Lbstep_iret:
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -49,7 +49,7 @@
* 0(%ebp) arg6
*/
SYM_CODE_START(entry_SYSENTER_compat)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY
/* Interrupts are off on entry. */
SWAPGS

@@ -202,7 +202,7 @@ SYM_CODE_END(entry_SYSENTER_compat)
* 0(%esp) arg6
*/
SYM_CODE_START(entry_SYSCALL_compat)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY
/* Interrupts are off on entry. */
swapgs

@@ -349,7 +349,7 @@ SYM_CODE_END(entry_SYSCALL_compat)
* ebp arg6
*/
SYM_CODE_START(entry_INT80_compat)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY
/*
* Interrupts are off on entry.
*/
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -82,6 +82,17 @@
#define ANNOTATE_UNRET_SAFE ANNOTATE_RETPOLINE_SAFE

/*
+ * Abuse ANNOTATE_RETPOLINE_SAFE on a NOP to indicate UNRET_END, should
+ * eventually turn into it's own annotation.
+ */
+.macro ANNOTATE_UNRET_END
+#ifdef CONFIG_DEBUG_ENTRY
+ ANNOTATE_RETPOLINE_SAFE
+ nop
+#endif
+.endm
+
+/*
* JMP_NOSPEC and CALL_NOSPEC macros can be used instead of a simple
* indirect jmp/call which may be susceptible to the Spectre variant 2
* attack.
@@ -131,6 +142,7 @@
*/
.macro UNTRAIN_RET
#ifdef CONFIG_RETPOLINE
+ ANNOTATE_UNRET_END
ALTERNATIVE_2 "", \
"call zen_untrain_ret", X86_FEATURE_UNRET, \
"call entry_ibpb", X86_FEATURE_ENTRY_IBPB
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -11,6 +11,10 @@
UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_CALL end=1
.endm

+.macro UNWIND_HINT_ENTRY
+ UNWIND_HINT sp_reg=ORC_REG_UNDEFINED type=UNWIND_HINT_TYPE_ENTRY end=1
+.endm
+
.macro UNWIND_HINT_REGS base=%rsp offset=0 indirect=0 extra=1 partial=0
.if \base == %rsp
.if \indirect
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -312,6 +312,8 @@ SYM_CODE_END(start_cpu0)
SYM_CODE_START_NOALIGN(vc_boot_ghcb)
UNWIND_HINT_IRET_REGS offset=8

+ ANNOTATE_UNRET_END
+
/* Build pt_regs */
PUSH_AND_CLEAR_REGS

@@ -369,6 +371,7 @@ SYM_CODE_START(early_idt_handler_array)
SYM_CODE_END(early_idt_handler_array)

SYM_CODE_START_LOCAL(early_idt_handler_common)
+ ANNOTATE_UNRET_END
/*
* The stack is the hardware frame, an error code or zero, and the
* vector number.
@@ -415,6 +418,8 @@ SYM_CODE_END(early_idt_handler_common)
SYM_CODE_START_NOALIGN(vc_no_ghcb)
UNWIND_HINT_IRET_REGS offset=8

+ ANNOTATE_UNRET_END
+
/* Build pt_regs */
PUSH_AND_CLEAR_REGS

--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -120,7 +120,7 @@ SYM_FUNC_END(xen_read_cr2_direct);

.macro xen_pv_trap name
SYM_CODE_START(xen_\name)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY
pop %rcx
pop %r11
jmp \name
@@ -228,7 +228,7 @@ SYM_CODE_END(xenpv_restore_regs_and_retu

/* Normal 64-bit system call target */
SYM_CODE_START(xen_entry_SYSCALL_64)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY
popq %rcx
popq %r11

@@ -247,7 +247,7 @@ SYM_CODE_END(xen_entry_SYSCALL_64)

/* 32-bit compat syscall target */
SYM_CODE_START(xen_entry_SYSCALL_compat)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY
popq %rcx
popq %r11

@@ -264,7 +264,7 @@ SYM_CODE_END(xen_entry_SYSCALL_compat)

/* 32-bit compat sysenter target */
SYM_CODE_START(xen_entry_SYSENTER_compat)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY
/*
* NB: Xen is polite and clears TF from EFLAGS for us. This means
* that we don't need to guard against single step exceptions here.
@@ -287,7 +287,7 @@ SYM_CODE_END(xen_entry_SYSENTER_compat)

SYM_CODE_START(xen_entry_SYSCALL_compat)
SYM_CODE_START(xen_entry_SYSENTER_compat)
- UNWIND_HINT_EMPTY
+ UNWIND_HINT_ENTRY
lea 16(%rsp), %rsp /* strip %rcx, %r11 */
mov $-ENOSYS, %rax
pushq $0
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -32,11 +32,14 @@ struct unwind_hint {
*
* UNWIND_HINT_FUNC: Generate the unwind metadata of a callable function.
* Useful for code which doesn't have an ELF function annotation.
+ *
+ * UNWIND_HINT_ENTRY: machine entry without stack, SYSCALL/SYSENTER etc.
*/
#define UNWIND_HINT_TYPE_CALL 0
#define UNWIND_HINT_TYPE_REGS 1
#define UNWIND_HINT_TYPE_REGS_PARTIAL 2
#define UNWIND_HINT_TYPE_FUNC 3
+#define UNWIND_HINT_TYPE_ENTRY 4

#ifdef CONFIG_STACK_VALIDATION

--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -120,6 +120,9 @@ objtool_link()

if [ -n "${CONFIG_VMLINUX_VALIDATION}" ]; then
objtoolopt="${objtoolopt} --noinstr"
+ if is_enabled CONFIG_RETPOLINE; then
+ objtoolopt="${objtoolopt} --unret"
+ fi
fi

if [ -n "${objtoolopt}" ]; then
--- a/tools/include/linux/objtool.h
+++ b/tools/include/linux/objtool.h
@@ -32,11 +32,14 @@ struct unwind_hint {
*
* UNWIND_HINT_FUNC: Generate the unwind metadata of a callable function.
* Useful for code which doesn't have an ELF function annotation.
+ *
+ * UNWIND_HINT_ENTRY: machine entry without stack, SYSCALL/SYSENTER etc.
*/
#define UNWIND_HINT_TYPE_CALL 0
#define UNWIND_HINT_TYPE_REGS 1
#define UNWIND_HINT_TYPE_REGS_PARTIAL 2
#define UNWIND_HINT_TYPE_FUNC 3
+#define UNWIND_HINT_TYPE_ENTRY 4

#ifdef CONFIG_STACK_VALIDATION

--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -20,7 +20,7 @@
#include <objtool/objtool.h>

bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- validate_dup, vmlinux, mcount, noinstr, backup, sls;
+ validate_dup, vmlinux, mcount, noinstr, backup, sls, unret;

static const char * const check_usage[] = {
"objtool check [<options>] file.o",
@@ -36,6 +36,7 @@ const struct option check_options[] = {
OPT_BOOLEAN('f', "no-fp", &no_fp, "Skip frame pointer validation"),
OPT_BOOLEAN('u', "no-unreachable", &no_unreachable, "Skip 'unreachable instruction' warnings"),
OPT_BOOLEAN('r', "retpoline", &retpoline, "Validate retpoline assumptions"),
+ OPT_BOOLEAN(0, "unret", &unret, "validate entry unret placement"),
OPT_BOOLEAN('m', "module", &module, "Indicates the object will be part of a kernel module"),
OPT_BOOLEAN('b', "backtrace", &backtrace, "unwind on error"),
OPT_BOOLEAN('a', "uaccess", &uaccess, "enable uaccess checking"),
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1847,6 +1847,19 @@ static int read_unwind_hints(struct objt

insn->hint = true;

+ if (hint->type == UNWIND_HINT_TYPE_REGS_PARTIAL) {
+ struct symbol *sym = find_symbol_by_offset(insn->sec, insn->offset);
+
+ if (sym && sym->bind == STB_GLOBAL) {
+ insn->entry = 1;
+ }
+ }
+
+ if (hint->type == UNWIND_HINT_TYPE_ENTRY) {
+ hint->type = UNWIND_HINT_TYPE_CALL;
+ insn->entry = 1;
+ }
+
if (hint->type == UNWIND_HINT_TYPE_FUNC) {
insn->cfi = &func_cfi;
continue;
@@ -1895,8 +1908,9 @@ static int read_retpoline_hints(struct o

if (insn->type != INSN_JUMP_DYNAMIC &&
insn->type != INSN_CALL_DYNAMIC &&
- insn->type != INSN_RETURN) {
- WARN_FUNC("retpoline_safe hint not an indirect jump/call/ret",
+ insn->type != INSN_RETURN &&
+ insn->type != INSN_NOP) {
+ WARN_FUNC("retpoline_safe hint not an indirect jump/call/ret/nop",
insn->sec, insn->offset);
return -1;
}
@@ -2996,8 +3010,8 @@ static int validate_branch(struct objtoo
return 1;
}

- visited = 1 << state.uaccess;
- if (insn->visited) {
+ visited = VISITED_BRANCH << state.uaccess;
+ if (insn->visited & VISITED_BRANCH_MASK) {
if (!insn->hint && !insn_cfi_match(insn, &state.cfi))
return 1;

@@ -3223,6 +3237,145 @@ static int validate_unwind_hints(struct
return warnings;
}

+/*
+ * Validate rethunk entry constraint: must untrain RET before the first RET.
+ *
+ * Follow every branch (intra-function) and ensure ANNOTATE_UNRET_END comes
+ * before an actual RET instruction.
+ */
+static int validate_entry(struct objtool_file *file, struct instruction *insn)
+{
+ struct instruction *next, *dest;
+ int ret, warnings = 0;
+
+ for (;;) {
+ next = next_insn_to_validate(file, insn);
+
+ if (insn->visited & VISITED_ENTRY)
+ return 0;
+
+ insn->visited |= VISITED_ENTRY;
+
+ if (!insn->ignore_alts && !list_empty(&insn->alts)) {
+ struct alternative *alt;
+ bool skip_orig = false;
+
+ list_for_each_entry(alt, &insn->alts, list) {
+ if (alt->skip_orig)
+ skip_orig = true;
+
+ ret = validate_entry(file, alt->insn);
+ if (ret) {
+ if (backtrace)
+ BT_FUNC("(alt)", insn);
+ return ret;
+ }
+ }
+
+ if (skip_orig)
+ return 0;
+ }
+
+ switch (insn->type) {
+
+ case INSN_CALL_DYNAMIC:
+ case INSN_JUMP_DYNAMIC:
+ case INSN_JUMP_DYNAMIC_CONDITIONAL:
+ WARN_FUNC("early indirect call", insn->sec, insn->offset);
+ return 1;
+
+ case INSN_JUMP_UNCONDITIONAL:
+ case INSN_JUMP_CONDITIONAL:
+ if (!is_sibling_call(insn)) {
+ if (!insn->jump_dest) {
+ WARN_FUNC("unresolved jump target after linking?!?",
+ insn->sec, insn->offset);
+ return -1;
+ }
+ ret = validate_entry(file, insn->jump_dest);
+ if (ret) {
+ if (backtrace) {
+ BT_FUNC("(branch%s)", insn,
+ insn->type == INSN_JUMP_CONDITIONAL ? "-cond" : "");
+ }
+ return ret;
+ }
+
+ if (insn->type == INSN_JUMP_UNCONDITIONAL)
+ return 0;
+
+ break;
+ }
+
+ /* fallthrough */
+ case INSN_CALL:
+ dest = find_insn(file, insn->call_dest->sec,
+ insn->call_dest->offset);
+ if (!dest) {
+ WARN("Unresolved function after linking!?: %s",
+ insn->call_dest->name);
+ return -1;
+ }
+
+ ret = validate_entry(file, dest);
+ if (ret) {
+ if (backtrace)
+ BT_FUNC("(call)", insn);
+ return ret;
+ }
+ /*
+ * If a call returns without error, it must have seen UNTRAIN_RET.
+ * Therefore any non-error return is a success.
+ */
+ return 0;
+
+ case INSN_RETURN:
+ WARN_FUNC("RET before UNTRAIN", insn->sec, insn->offset);
+ return 1;
+
+ case INSN_NOP:
+ if (insn->retpoline_safe)
+ return 0;
+ break;
+
+ default:
+ break;
+ }
+
+ if (!next) {
+ WARN_FUNC("teh end!", insn->sec, insn->offset);
+ return -1;
+ }
+ insn = next;
+ }
+
+ return warnings;
+}
+
+/*
+ * Validate that all branches starting at 'insn->entry' encounter UNRET_END
+ * before RET.
+ */
+static int validate_unret(struct objtool_file *file)
+{
+ struct instruction *insn;
+ int ret, warnings = 0;
+
+ for_each_insn(file, insn) {
+ if (!insn->entry)
+ continue;
+
+ ret = validate_entry(file, insn);
+ if (ret < 0) {
+ WARN_FUNC("Failed UNRET validation", insn->sec, insn->offset);
+ return ret;
+ }
+ warnings += ret;
+ }
+
+ return warnings;
+}
+
static int validate_retpoline(struct objtool_file *file)
{
struct instruction *insn;
@@ -3490,6 +3643,17 @@ int check(struct objtool_file *file)
goto out;
warnings += ret;

+ if (unret) {
+ /*
+ * Must be after validate_branch() and friends, it plays
+ * further games with insn->visited.
+ */
+ ret = validate_unret(file);
+ if (ret < 0)
+ return ret;
+ warnings += ret;
+ }
+
if (!warnings) {
ret = validate_reachable_instructions(file);
if (ret < 0)
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -9,7 +9,7 @@

extern const struct option check_options[];
extern bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats,
- validate_dup, vmlinux, mcount, noinstr, backup, sls;
+ validate_dup, vmlinux, mcount, noinstr, backup, sls, unret;

extern int cmd_parse_options(int argc, const char **argv, const char * const usage[]);

--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -48,6 +48,7 @@ struct instruction {
bool dead_end, ignore, ignore_alts;
bool hint;
bool retpoline_safe;
+ bool entry;
s8 instr;
u8 visited;
struct alt_group *alt_group;
@@ -62,6 +63,11 @@ struct instruction {
struct cfi_state *cfi;
};

+#define VISITED_BRANCH 0x01
+#define VISITED_BRANCH_UACCESS 0x02
+#define VISITED_BRANCH_MASK 0x03
+#define VISITED_ENTRY 0x04
+
static inline bool is_static_jump(struct instruction *insn)
{
return insn->type == INSN_JUMP_CONDITIONAL ||


2022-07-12 20:05:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 17/78] x86/alternative: Try inline spectre_v2=retpoline,amd

From: Peter Zijlstra <[email protected]>

commit bbe2df3f6b6da7848398d55b1311d58a16ec21e4 upstream.

Try and replace retpoline thunk calls with:

LFENCE
CALL *%\reg

for spectre_v2=retpoline,amd.

Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.

Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.

Note: it will also generate a correct replacement for the Jcc.d32
case, except unless the compilers start to prefix stuff that, it'll
never fit. Specifically:

Jncc.d8 1f
LFENCE
JMP *%\reg
1:

is 7-8 bytes long, where the original instruction in unpadded form is
only 6 bytes.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[cascardo: RETPOLINE_AMD was renamed to RETPOLINE_LFENCE]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/alternative.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -389,6 +389,7 @@ static int emit_indirect(int op, int reg
*
* CALL *%\reg
*
+ * It also tries to inline spectre_v2=retpoline,amd when size permits.
*/
static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
{
@@ -405,7 +406,8 @@ static int patch_retpoline(void *addr, s
/* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
BUG_ON(reg == 4);

- if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+ if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
+ !cpu_feature_enabled(X86_FEATURE_RETPOLINE_LFENCE))
return -1;

op = insn->opcode.bytes[0];
@@ -418,8 +420,9 @@ static int patch_retpoline(void *addr, s
* into:
*
* Jncc.d8 1f
+ * [ LFENCE ]
* JMP *%\reg
- * NOP
+ * [ NOP ]
* 1:
*/
/* Jcc.d32 second opcode byte is in the range: 0x80-0x8f */
@@ -434,6 +437,15 @@ static int patch_retpoline(void *addr, s
op = JMP32_INSN_OPCODE;
}

+ /*
+ * For RETPOLINE_AMD: prepend the indirect CALL/JMP with an LFENCE.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_RETPOLINE_LFENCE)) {
+ bytes[i++] = 0x0f;
+ bytes[i++] = 0xae;
+ bytes[i++] = 0xe8; /* LFENCE */
+ }
+
ret = emit_indirect(op, reg, bytes + i);
if (ret < 0)
return ret;


2022-07-12 20:07:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 44/78] x86/bugs: Enable STIBP for JMP2RET

From: Kim Phillips <[email protected]>

commit e8ec1b6e08a2102d8755ccb06fa26d540f26a2fa upstream.

For untrained return thunks to be fully effective, STIBP must be enabled
or SMT disabled.

Co-developed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 16 ++++--
arch/x86/kernel/cpu/bugs.c | 58 +++++++++++++++++++-----
2 files changed, 57 insertions(+), 17 deletions(-)

--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4972,11 +4972,17 @@
Speculative Code Execution with Return Instructions)
vulnerability.

- off - unconditionally disable
- auto - automatically select a migitation
- unret - force enable untrained return thunks,
- only effective on AMD Zen {1,2}
- based systems.
+ off - no mitigation
+ auto - automatically select a migitation
+ auto,nosmt - automatically select a mitigation,
+ disabling SMT if necessary for
+ the full mitigation (only on Zen1
+ and older without STIBP).
+ unret - force enable untrained return thunks,
+ only effective on AMD f15h-f17h
+ based systems.
+ unret,nosmt - like unret, will disable SMT when STIBP
+ is not available.

Selecting 'auto' will choose a mitigation method at run
time according to the CPU.
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -776,19 +776,34 @@ static enum retbleed_mitigation retbleed
static enum retbleed_mitigation_cmd retbleed_cmd __ro_after_init =
RETBLEED_CMD_AUTO;

+static int __ro_after_init retbleed_nosmt = false;
+
static int __init retbleed_parse_cmdline(char *str)
{
if (!str)
return -EINVAL;

- if (!strcmp(str, "off"))
- retbleed_cmd = RETBLEED_CMD_OFF;
- else if (!strcmp(str, "auto"))
- retbleed_cmd = RETBLEED_CMD_AUTO;
- else if (!strcmp(str, "unret"))
- retbleed_cmd = RETBLEED_CMD_UNRET;
- else
- pr_err("Unknown retbleed option (%s). Defaulting to 'auto'\n", str);
+ while (str) {
+ char *next = strchr(str, ',');
+ if (next) {
+ *next = 0;
+ next++;
+ }
+
+ if (!strcmp(str, "off")) {
+ retbleed_cmd = RETBLEED_CMD_OFF;
+ } else if (!strcmp(str, "auto")) {
+ retbleed_cmd = RETBLEED_CMD_AUTO;
+ } else if (!strcmp(str, "unret")) {
+ retbleed_cmd = RETBLEED_CMD_UNRET;
+ } else if (!strcmp(str, "nosmt")) {
+ retbleed_nosmt = true;
+ } else {
+ pr_err("Ignoring unknown retbleed option (%s).", str);
+ }
+
+ str = next;
+ }

return 0;
}
@@ -834,6 +849,10 @@ static void __init retbleed_select_mitig
setup_force_cpu_cap(X86_FEATURE_RETHUNK);
setup_force_cpu_cap(X86_FEATURE_UNRET);

+ if (!boot_cpu_has(X86_FEATURE_STIBP) &&
+ (retbleed_nosmt || cpu_mitigations_auto_nosmt()))
+ cpu_smt_disable(false);
+
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
pr_err(RETBLEED_UNTRAIN_MSG);
@@ -1080,6 +1099,13 @@ spectre_v2_user_select_mitigation(enum s
boot_cpu_has(X86_FEATURE_AMD_STIBP_ALWAYS_ON))
mode = SPECTRE_V2_USER_STRICT_PREFERRED;

+ if (retbleed_mitigation == RETBLEED_MITIGATION_UNRET) {
+ if (mode != SPECTRE_V2_USER_STRICT &&
+ mode != SPECTRE_V2_USER_STRICT_PREFERRED)
+ pr_info("Selecting STIBP always-on mode to complement retbleed mitigation'\n");
+ mode = SPECTRE_V2_USER_STRICT_PREFERRED;
+ }
+
spectre_v2_user_stibp = mode;

set_mode:
@@ -2090,10 +2116,18 @@ static ssize_t srbds_show_state(char *bu

static ssize_t retbleed_show_state(char *buf)
{
- if (retbleed_mitigation == RETBLEED_MITIGATION_UNRET &&
- (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
- boot_cpu_data.x86_vendor != X86_VENDOR_HYGON))
- return sprintf(buf, "Vulnerable: untrained return thunk on non-Zen uarch\n");
+ if (retbleed_mitigation == RETBLEED_MITIGATION_UNRET) {
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
+ boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+ return sprintf(buf, "Vulnerable: untrained return thunk on non-Zen uarch\n");
+
+ return sprintf(buf, "%s; SMT %s\n",
+ retbleed_strings[retbleed_mitigation],
+ !sched_smt_active() ? "disabled" :
+ spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
+ spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED ?
+ "enabled with STIBP protection" : "vulnerable");
+ }

return sprintf(buf, "%s\n", retbleed_strings[retbleed_mitigation]);
}


2022-07-12 20:08:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 03/78] x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry()

From: Lai Jiangshan <[email protected]>

commit ee774dac0da1543376a69fd90840af6aa86879b3 upstream.

The macro idtentry() (through idtentry_body()) calls error_entry()
unconditionally even on XENPV. But XENPV needs to only push and clear
regs.

PUSH_AND_CLEAR_REGS in error_entry() makes the stack not return to its
original place when the function returns, which means it is not possible
to convert it to a C function.

Carve out PUSH_AND_CLEAR_REGS out of error_entry() and into a separate
function and call it before error_entry() in order to avoid calling
error_entry() on XENPV.

It will also allow for error_entry() to be converted to C code that can
use inlined sync_regs() and save a function call.

[ bp: Massage commit message. ]

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/entry/entry_64.S | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -315,6 +315,14 @@ SYM_CODE_END(ret_from_fork)
#endif
.endm

+/* Save all registers in pt_regs */
+SYM_CODE_START_LOCAL(push_and_clear_regs)
+ UNWIND_HINT_FUNC
+ PUSH_AND_CLEAR_REGS save_ret=1
+ ENCODE_FRAME_POINTER 8
+ RET
+SYM_CODE_END(push_and_clear_regs)
+
/**
* idtentry_body - Macro to emit code calling the C function
* @cfunc: C function to be called
@@ -322,6 +330,9 @@ SYM_CODE_END(ret_from_fork)
*/
.macro idtentry_body cfunc has_error_code:req

+ call push_and_clear_regs
+ UNWIND_HINT_REGS
+
call error_entry
movq %rax, %rsp /* switch to the task stack if from userspace */
ENCODE_FRAME_POINTER
@@ -965,13 +976,11 @@ SYM_CODE_START_LOCAL(paranoid_exit)
SYM_CODE_END(paranoid_exit)

/*
- * Save all registers in pt_regs, and switch GS if needed.
+ * Switch GS and CR3 if needed.
*/
SYM_CODE_START_LOCAL(error_entry)
UNWIND_HINT_FUNC
cld
- PUSH_AND_CLEAR_REGS save_ret=1
- ENCODE_FRAME_POINTER 8
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace



2022-07-12 20:10:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 18/78] x86/alternative: Add debug prints to apply_retpolines()

From: Peter Zijlstra <[email protected]>

commit d4b5a5c993009ffeb5febe3b701da3faab6adb96 upstream.

Make sure we can see the text changes when booting with
'debug-alternative'.

Example output:

[ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
[ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
[ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
[ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/alternative.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -492,9 +492,15 @@ void __init_or_module noinline apply_ret
continue;
}

+ DPRINTK("retpoline at: %pS (%px) len: %d to: %pS",
+ addr, addr, insn.length,
+ addr + insn.length + insn.immediate.value);
+
len = patch_retpoline(addr, &insn, bytes);
if (len == insn.length) {
optimize_nops(bytes, len);
+ DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr);
+ DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
text_poke_early(addr, bytes, len);
}
}


2022-07-12 20:10:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 70/78] x86/common: Stamp out the stepping madness

From: Peter Zijlstra <[email protected]>

commit 7a05bc95ed1c5a59e47aaade9fb4083c27de9e62 upstream.

The whole MMIO/RETBLEED enumeration went overboard on steppings. Get
rid of all that and simply use ANY.

If a future stepping of these models would not be affected, it had
better set the relevant ARCH_CAP_$FOO_NO bit in
IA32_ARCH_CAPABILITIES.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/common.c | 37 ++++++++++++++++---------------------
1 file changed, 16 insertions(+), 21 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1122,32 +1122,27 @@ static const struct x86_cpu_id cpu_vuln_
VULNBL_INTEL_STEPPINGS(HASWELL, X86_STEPPING_ANY, SRBDS),
VULNBL_INTEL_STEPPINGS(HASWELL_L, X86_STEPPING_ANY, SRBDS),
VULNBL_INTEL_STEPPINGS(HASWELL_G, X86_STEPPING_ANY, SRBDS),
- VULNBL_INTEL_STEPPINGS(HASWELL_X, BIT(2) | BIT(4), MMIO),
- VULNBL_INTEL_STEPPINGS(BROADWELL_D, X86_STEPPINGS(0x3, 0x5), MMIO),
+ VULNBL_INTEL_STEPPINGS(HASWELL_X, X86_STEPPING_ANY, MMIO),
+ VULNBL_INTEL_STEPPINGS(BROADWELL_D, X86_STEPPING_ANY, MMIO),
VULNBL_INTEL_STEPPINGS(BROADWELL_G, X86_STEPPING_ANY, SRBDS),
VULNBL_INTEL_STEPPINGS(BROADWELL_X, X86_STEPPING_ANY, MMIO),
VULNBL_INTEL_STEPPINGS(BROADWELL, X86_STEPPING_ANY, SRBDS),
- VULNBL_INTEL_STEPPINGS(SKYLAKE_L, X86_STEPPINGS(0x3, 0x3), SRBDS | MMIO | RETBLEED),
- VULNBL_INTEL_STEPPINGS(SKYLAKE_L, X86_STEPPING_ANY, SRBDS),
- VULNBL_INTEL_STEPPINGS(SKYLAKE_X, BIT(3) | BIT(4) | BIT(6) |
- BIT(7) | BIT(0xB), MMIO | RETBLEED),
- VULNBL_INTEL_STEPPINGS(SKYLAKE, X86_STEPPINGS(0x3, 0x3), SRBDS | MMIO | RETBLEED),
- VULNBL_INTEL_STEPPINGS(SKYLAKE, X86_STEPPING_ANY, SRBDS),
- VULNBL_INTEL_STEPPINGS(KABYLAKE_L, X86_STEPPINGS(0x9, 0xC), SRBDS | MMIO | RETBLEED),
- VULNBL_INTEL_STEPPINGS(KABYLAKE_L, X86_STEPPINGS(0x0, 0x8), SRBDS),
- VULNBL_INTEL_STEPPINGS(KABYLAKE, X86_STEPPINGS(0x9, 0xD), SRBDS | MMIO | RETBLEED),
- VULNBL_INTEL_STEPPINGS(KABYLAKE, X86_STEPPINGS(0x0, 0x8), SRBDS),
- VULNBL_INTEL_STEPPINGS(ICELAKE_L, X86_STEPPINGS(0x5, 0x5), MMIO | MMIO_SBDS | RETBLEED),
- VULNBL_INTEL_STEPPINGS(ICELAKE_D, X86_STEPPINGS(0x1, 0x1), MMIO),
- VULNBL_INTEL_STEPPINGS(ICELAKE_X, X86_STEPPINGS(0x4, 0x6), MMIO),
- VULNBL_INTEL_STEPPINGS(COMETLAKE, BIT(2) | BIT(3) | BIT(5), MMIO | MMIO_SBDS | RETBLEED),
- VULNBL_INTEL_STEPPINGS(COMETLAKE_L, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(SKYLAKE_L, X86_STEPPING_ANY, SRBDS | MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(SKYLAKE_X, X86_STEPPING_ANY, MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(SKYLAKE, X86_STEPPING_ANY, SRBDS | MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(KABYLAKE_L, X86_STEPPING_ANY, SRBDS | MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(KABYLAKE, X86_STEPPING_ANY, SRBDS | MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(ICELAKE_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(ICELAKE_D, X86_STEPPING_ANY, MMIO),
+ VULNBL_INTEL_STEPPINGS(ICELAKE_X, X86_STEPPING_ANY, MMIO),
+ VULNBL_INTEL_STEPPINGS(COMETLAKE, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RETBLEED),
VULNBL_INTEL_STEPPINGS(COMETLAKE_L, X86_STEPPINGS(0x0, 0x0), MMIO | RETBLEED),
- VULNBL_INTEL_STEPPINGS(LAKEFIELD, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS | RETBLEED),
- VULNBL_INTEL_STEPPINGS(ROCKETLAKE, X86_STEPPINGS(0x1, 0x1), MMIO | RETBLEED),
- VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPINGS(0x1, 0x1), MMIO | MMIO_SBDS),
+ VULNBL_INTEL_STEPPINGS(COMETLAKE_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(LAKEFIELD, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(ROCKETLAKE, X86_STEPPING_ANY, MMIO | RETBLEED),
+ VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPING_ANY, MMIO | MMIO_SBDS),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO),
- VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPINGS(0x0, 0x0), MMIO | MMIO_SBDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS),

VULNBL_AMD(0x15, RETBLEED),
VULNBL_AMD(0x16, RETBLEED),


2022-07-12 20:10:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 53/78] x86/xen: Rename SYS* entry points

From: Peter Zijlstra <[email protected]>

commit b75b7f8ef1148be1b9321ffc2f6c19238904b438 upstream.

Native SYS{CALL,ENTER} entry points are called
entry_SYS{CALL,ENTER}_{64,compat}, make sure the Xen versions are
named consistently.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/xen/setup.c | 6 +++---
arch/x86/xen/xen-asm.S | 20 ++++++++++----------
arch/x86/xen/xen-ops.h | 6 +++---
3 files changed, 16 insertions(+), 16 deletions(-)

--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -922,7 +922,7 @@ void xen_enable_sysenter(void)
if (!boot_cpu_has(sysenter_feature))
return;

- ret = register_callback(CALLBACKTYPE_sysenter, xen_sysenter_target);
+ ret = register_callback(CALLBACKTYPE_sysenter, xen_entry_SYSENTER_compat);
if(ret != 0)
setup_clear_cpu_cap(sysenter_feature);
}
@@ -931,7 +931,7 @@ void xen_enable_syscall(void)
{
int ret;

- ret = register_callback(CALLBACKTYPE_syscall, xen_syscall_target);
+ ret = register_callback(CALLBACKTYPE_syscall, xen_entry_SYSCALL_64);
if (ret != 0) {
printk(KERN_ERR "Failed to set syscall callback: %d\n", ret);
/* Pretty fatal; 64-bit userspace has no other
@@ -940,7 +940,7 @@ void xen_enable_syscall(void)

if (boot_cpu_has(X86_FEATURE_SYSCALL32)) {
ret = register_callback(CALLBACKTYPE_syscall32,
- xen_syscall32_target);
+ xen_entry_SYSCALL_compat);
if (ret != 0)
setup_clear_cpu_cap(X86_FEATURE_SYSCALL32);
}
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -227,7 +227,7 @@ SYM_CODE_END(xenpv_restore_regs_and_retu
*/

/* Normal 64-bit system call target */
-SYM_CODE_START(xen_syscall_target)
+SYM_CODE_START(xen_entry_SYSCALL_64)
UNWIND_HINT_EMPTY
popq %rcx
popq %r11
@@ -241,12 +241,12 @@ SYM_CODE_START(xen_syscall_target)
movq $__USER_CS, 1*8(%rsp)

jmp entry_SYSCALL_64_after_hwframe
-SYM_CODE_END(xen_syscall_target)
+SYM_CODE_END(xen_entry_SYSCALL_64)

#ifdef CONFIG_IA32_EMULATION

/* 32-bit compat syscall target */
-SYM_CODE_START(xen_syscall32_target)
+SYM_CODE_START(xen_entry_SYSCALL_compat)
UNWIND_HINT_EMPTY
popq %rcx
popq %r11
@@ -260,10 +260,10 @@ SYM_CODE_START(xen_syscall32_target)
movq $__USER32_CS, 1*8(%rsp)

jmp entry_SYSCALL_compat_after_hwframe
-SYM_CODE_END(xen_syscall32_target)
+SYM_CODE_END(xen_entry_SYSCALL_compat)

/* 32-bit compat sysenter target */
-SYM_CODE_START(xen_sysenter_target)
+SYM_CODE_START(xen_entry_SYSENTER_compat)
UNWIND_HINT_EMPTY
/*
* NB: Xen is polite and clears TF from EFLAGS for us. This means
@@ -281,18 +281,18 @@ SYM_CODE_START(xen_sysenter_target)
movq $__USER32_CS, 1*8(%rsp)

jmp entry_SYSENTER_compat_after_hwframe
-SYM_CODE_END(xen_sysenter_target)
+SYM_CODE_END(xen_entry_SYSENTER_compat)

#else /* !CONFIG_IA32_EMULATION */

-SYM_CODE_START(xen_syscall32_target)
-SYM_CODE_START(xen_sysenter_target)
+SYM_CODE_START(xen_entry_SYSCALL_compat)
+SYM_CODE_START(xen_entry_SYSENTER_compat)
UNWIND_HINT_EMPTY
lea 16(%rsp), %rsp /* strip %rcx, %r11 */
mov $-ENOSYS, %rax
pushq $0
jmp hypercall_iret
-SYM_CODE_END(xen_sysenter_target)
-SYM_CODE_END(xen_syscall32_target)
+SYM_CODE_END(xen_entry_SYSENTER_compat)
+SYM_CODE_END(xen_entry_SYSCALL_compat)

#endif /* CONFIG_IA32_EMULATION */
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -10,10 +10,10 @@
/* These are code, but not functions. Defined in entry.S */
extern const char xen_failsafe_callback[];

-void xen_sysenter_target(void);
+void xen_entry_SYSENTER_compat(void);
#ifdef CONFIG_X86_64
-void xen_syscall_target(void);
-void xen_syscall32_target(void);
+void xen_entry_SYSCALL_64(void);
+void xen_entry_SYSCALL_compat(void);
#endif

extern void *xen_initial_gdt;


2022-07-12 20:19:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 76/78] x86/kexec: Disable RET on kexec

From: Konrad Rzeszutek Wilk <[email protected]>

commit 697977d8415d61f3acbc4ee6d564c9dcf0309507 upstream.

All the invocations unroll to __x86_return_thunk and this file
must be PIC independent.

This fixes kexec on 64-bit AMD boxes.

[ bp: Fix 32-bit build. ]

Reported-by: Edward Tran <[email protected]>
Reported-by: Awais Tanveer <[email protected]>
Suggested-by: Ankur Arora <[email protected]>
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Alexandre Chartre <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/relocate_kernel_32.S | 25 +++++++++++++++++++------
arch/x86/kernel/relocate_kernel_64.S | 23 +++++++++++++++++------
2 files changed, 36 insertions(+), 12 deletions(-)

--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -7,10 +7,12 @@
#include <linux/linkage.h>
#include <asm/page_types.h>
#include <asm/kexec.h>
+#include <asm/nospec-branch.h>
#include <asm/processor-flags.h>

/*
- * Must be relocatable PIC code callable as a C function
+ * Must be relocatable PIC code callable as a C function, in particular
+ * there must be a plain RET and not jump to return thunk.
*/

#define PTR(x) (x << 2)
@@ -91,7 +93,9 @@ SYM_CODE_START_NOALIGN(relocate_kernel)
movl %edi, %eax
addl $(identity_mapped - relocate_kernel), %eax
pushl %eax
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
SYM_CODE_END(relocate_kernel)

SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
@@ -159,12 +163,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_ma
xorl %edx, %edx
xorl %esi, %esi
xorl %ebp, %ebp
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
1:
popl %edx
movl CP_PA_SWAP_PAGE(%edi), %esp
addl $PAGE_SIZE, %esp
2:
+ ANNOTATE_RETPOLINE_SAFE
call *%edx

/* get the re-entry point of the peer system */
@@ -190,7 +197,9 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_ma
movl %edi, %eax
addl $(virtual_mapped - relocate_kernel), %eax
pushl %eax
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
SYM_CODE_END(identity_mapped)

SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
@@ -208,7 +217,9 @@ SYM_CODE_START_LOCAL_NOALIGN(virtual_map
popl %edi
popl %esi
popl %ebx
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
SYM_CODE_END(virtual_mapped)

/* Do the copies */
@@ -271,7 +282,9 @@ SYM_CODE_START_LOCAL_NOALIGN(swap_pages)
popl %edi
popl %ebx
popl %ebp
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
SYM_CODE_END(swap_pages)

.globl kexec_control_code_size
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -13,7 +13,8 @@
#include <asm/unwind_hints.h>

/*
- * Must be relocatable PIC code callable as a C function
+ * Must be relocatable PIC code callable as a C function, in particular
+ * there must be a plain RET and not jump to return thunk.
*/

#define PTR(x) (x << 3)
@@ -104,7 +105,9 @@ SYM_CODE_START_NOALIGN(relocate_kernel)
/* jump to identity mapped page */
addq $(identity_mapped - relocate_kernel), %r8
pushq %r8
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
SYM_CODE_END(relocate_kernel)

SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
@@ -191,7 +194,9 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_ma
xorl %r14d, %r14d
xorl %r15d, %r15d

- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3

1:
popq %rdx
@@ -210,7 +215,9 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_ma
call swap_pages
movq $virtual_mapped, %rax
pushq %rax
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
SYM_CODE_END(identity_mapped)

SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
@@ -231,7 +238,9 @@ SYM_CODE_START_LOCAL_NOALIGN(virtual_map
popq %r12
popq %rbp
popq %rbx
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
SYM_CODE_END(virtual_mapped)

/* Do the copies */
@@ -288,7 +297,9 @@ SYM_CODE_START_LOCAL_NOALIGN(swap_pages)
lea PAGE_SIZE(%rax), %rsi
jmp 0b
3:
- RET
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
SYM_CODE_END(swap_pages)

.globl kexec_control_code_size


2022-07-12 20:20:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 30/78] x86,objtool: Create .return_sites

From: Peter Zijlstra <[email protected]>

commit d9e9d2300681d68a775c28de6aa6e5290ae17796 upstream.

Find all the return-thunk sites and record them in a .return_sites
section such that the kernel can undo this.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: conflict fixup because of functions added to support IBT]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
tools/objtool/arch/x86/decode.c | 5 ++
tools/objtool/check.c | 75 ++++++++++++++++++++++++++++++++
tools/objtool/include/objtool/arch.h | 1
tools/objtool/include/objtool/elf.h | 1
tools/objtool/include/objtool/objtool.h | 1
tools/objtool/objtool.c | 1
6 files changed, 84 insertions(+)

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -722,3 +722,8 @@ bool arch_is_retpoline(struct symbol *sy
{
return !strncmp(sym->name, "__x86_indirect_", 15);
}
+
+bool arch_is_rethunk(struct symbol *sym)
+{
+ return !strcmp(sym->name, "__x86_return_thunk");
+}
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -654,6 +654,52 @@ static int create_retpoline_sites_sectio
return 0;
}

+static int create_return_sites_sections(struct objtool_file *file)
+{
+ struct instruction *insn;
+ struct section *sec;
+ int idx;
+
+ sec = find_section_by_name(file->elf, ".return_sites");
+ if (sec) {
+ WARN("file already has .return_sites, skipping");
+ return 0;
+ }
+
+ idx = 0;
+ list_for_each_entry(insn, &file->return_thunk_list, call_node)
+ idx++;
+
+ if (!idx)
+ return 0;
+
+ sec = elf_create_section(file->elf, ".return_sites", 0,
+ sizeof(int), idx);
+ if (!sec) {
+ WARN("elf_create_section: .return_sites");
+ return -1;
+ }
+
+ idx = 0;
+ list_for_each_entry(insn, &file->return_thunk_list, call_node) {
+
+ int *site = (int *)sec->data->d_buf + idx;
+ *site = 0;
+
+ if (elf_add_reloc_to_insn(file->elf, sec,
+ idx * sizeof(int),
+ R_X86_64_PC32,
+ insn->sec, insn->offset)) {
+ WARN("elf_add_reloc_to_insn: .return_sites");
+ return -1;
+ }
+
+ idx++;
+ }
+
+ return 0;
+}
+
static int create_mcount_loc_sections(struct objtool_file *file)
{
struct section *sec;
@@ -932,6 +978,11 @@ __weak bool arch_is_retpoline(struct sym
return false;
}

+__weak bool arch_is_rethunk(struct symbol *sym)
+{
+ return false;
+}
+
#define NEGATIVE_RELOC ((void *)-1L)

static struct reloc *insn_reloc(struct objtool_file *file, struct instruction *insn)
@@ -1092,6 +1143,19 @@ static void add_retpoline_call(struct ob

annotate_call_site(file, insn, false);
}
+
+static void add_return_call(struct objtool_file *file, struct instruction *insn)
+{
+ /*
+ * Return thunk tail calls are really just returns in disguise,
+ * so convert them accordingly.
+ */
+ insn->type = INSN_RETURN;
+ insn->retpoline_safe = true;
+
+ list_add_tail(&insn->call_node, &file->return_thunk_list);
+}
+
/*
* Find the destination instructions for all jumps.
*/
@@ -1116,6 +1180,9 @@ static int add_jump_destinations(struct
} else if (reloc->sym->retpoline_thunk) {
add_retpoline_call(file, insn);
continue;
+ } else if (reloc->sym->return_thunk) {
+ add_return_call(file, insn);
+ continue;
} else if (insn->func) {
/* internal or external sibling call (with reloc) */
add_call_dest(file, insn, reloc->sym, true);
@@ -1937,6 +2004,9 @@ static int classify_symbols(struct objto
if (arch_is_retpoline(func))
func->retpoline_thunk = true;

+ if (arch_is_rethunk(func))
+ func->return_thunk = true;
+
if (!strcmp(func->name, "__fentry__"))
func->fentry = true;

@@ -3413,6 +3483,11 @@ int check(struct objtool_file *file)
if (ret < 0)
goto out;
warnings += ret;
+
+ ret = create_return_sites_sections(file);
+ if (ret < 0)
+ goto out;
+ warnings += ret;
}

if (mcount) {
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -88,6 +88,7 @@ const char *arch_ret_insn(int len);
int arch_decode_hint_reg(u8 sp_reg, int *base);

bool arch_is_retpoline(struct symbol *sym);
+bool arch_is_rethunk(struct symbol *sym);

int arch_rewrite_retpolines(struct objtool_file *file);

--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -57,6 +57,7 @@ struct symbol {
u8 uaccess_safe : 1;
u8 static_call_tramp : 1;
u8 retpoline_thunk : 1;
+ u8 return_thunk : 1;
u8 fentry : 1;
u8 kcov : 1;
};
--- a/tools/objtool/include/objtool/objtool.h
+++ b/tools/objtool/include/objtool/objtool.h
@@ -19,6 +19,7 @@ struct objtool_file {
struct list_head insn_list;
DECLARE_HASHTABLE(insn_hash, 20);
struct list_head retpoline_call_list;
+ struct list_head return_thunk_list;
struct list_head static_call_list;
struct list_head mcount_loc_list;
bool ignore_unreachables, c_file, hints, rodata;
--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -126,6 +126,7 @@ struct objtool_file *objtool_open_read(c
INIT_LIST_HEAD(&file.insn_list);
hash_init(file.insn_hash);
INIT_LIST_HEAD(&file.retpoline_call_list);
+ INIT_LIST_HEAD(&file.return_thunk_list);
INIT_LIST_HEAD(&file.static_call_list);
INIT_LIST_HEAD(&file.mcount_loc_list);
file.c_file = !vmlinux && find_section_by_name(file.elf, ".comment");


2022-07-12 20:20:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 15/78] x86/alternative: Implement .retpoline_sites support

From: Peter Zijlstra <[email protected]>

commit 7508500900814d14e2e085cdc4e28142721abbdf upstream.

Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.

This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.

One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.

Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Tested-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[cascardo: small conflict fixup at arch/x86/kernel/module.c]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/um/kernel/um_arch.c | 4 +
arch/x86/include/asm/alternative.h | 1
arch/x86/kernel/alternative.c | 141 +++++++++++++++++++++++++++++++++++--
arch/x86/kernel/module.c | 9 ++
4 files changed, 150 insertions(+), 5 deletions(-)

--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -421,6 +421,10 @@ void __init check_bugs(void)
os_check_bugs();
}

+void apply_retpolines(s32 *start, s32 *end)
+{
+}
+
void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
{
}
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -75,6 +75,7 @@ extern int alternatives_patched;

extern void alternative_instructions(void);
extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
+extern void apply_retpolines(s32 *start, s32 *end);

struct module;

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -29,6 +29,7 @@
#include <asm/io.h>
#include <asm/fixmap.h>
#include <asm/paravirt.h>
+#include <asm/asm-prototypes.h>

int __read_mostly alternatives_patched;

@@ -113,6 +114,7 @@ static void __init_or_module add_nops(vo
}
}

+extern s32 __retpoline_sites[], __retpoline_sites_end[];
extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern s32 __smp_locks[], __smp_locks_end[];
void text_poke_early(void *addr, const void *opcode, size_t len);
@@ -221,7 +223,7 @@ static __always_inline int optimize_nops
* "noinline" to cause control flow change and thus invalidate I$ and
* cause refetch after modification.
*/
-static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *instr)
+static void __init_or_module noinline optimize_nops(u8 *instr, size_t len)
{
struct insn insn;
int i = 0;
@@ -239,11 +241,11 @@ static void __init_or_module noinline op
* optimized.
*/
if (insn.length == 1 && insn.opcode.bytes[0] == 0x90)
- i += optimize_nops_range(instr, a->instrlen, i);
+ i += optimize_nops_range(instr, len, i);
else
i += insn.length;

- if (i >= a->instrlen)
+ if (i >= len)
return;
}
}
@@ -331,10 +333,135 @@ void __init_or_module noinline apply_alt
text_poke_early(instr, insn_buff, insn_buff_sz);

next:
- optimize_nops(a, instr);
+ optimize_nops(instr, a->instrlen);
}
}

+#if defined(CONFIG_RETPOLINE) && defined(CONFIG_STACK_VALIDATION)
+
+/*
+ * CALL/JMP *%\reg
+ */
+static int emit_indirect(int op, int reg, u8 *bytes)
+{
+ int i = 0;
+ u8 modrm;
+
+ switch (op) {
+ case CALL_INSN_OPCODE:
+ modrm = 0x10; /* Reg = 2; CALL r/m */
+ break;
+
+ case JMP32_INSN_OPCODE:
+ modrm = 0x20; /* Reg = 4; JMP r/m */
+ break;
+
+ default:
+ WARN_ON_ONCE(1);
+ return -1;
+ }
+
+ if (reg >= 8) {
+ bytes[i++] = 0x41; /* REX.B prefix */
+ reg -= 8;
+ }
+
+ modrm |= 0xc0; /* Mod = 3 */
+ modrm += reg;
+
+ bytes[i++] = 0xff; /* opcode */
+ bytes[i++] = modrm;
+
+ return i;
+}
+
+/*
+ * Rewrite the compiler generated retpoline thunk calls.
+ *
+ * For spectre_v2=off (!X86_FEATURE_RETPOLINE), rewrite them into immediate
+ * indirect instructions, avoiding the extra indirection.
+ *
+ * For example, convert:
+ *
+ * CALL __x86_indirect_thunk_\reg
+ *
+ * into:
+ *
+ * CALL *%\reg
+ *
+ */
+static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
+{
+ retpoline_thunk_t *target;
+ int reg, i = 0;
+
+ target = addr + insn->length + insn->immediate.value;
+ reg = target - __x86_indirect_thunk_array;
+
+ if (WARN_ON_ONCE(reg & ~0xf))
+ return -1;
+
+ /* If anyone ever does: CALL/JMP *%rsp, we're in deep trouble. */
+ BUG_ON(reg == 4);
+
+ if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+ return -1;
+
+ i = emit_indirect(insn->opcode.bytes[0], reg, bytes);
+ if (i < 0)
+ return i;
+
+ for (; i < insn->length;)
+ bytes[i++] = BYTES_NOP1;
+
+ return i;
+}
+
+/*
+ * Generated by 'objtool --retpoline'.
+ */
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end)
+{
+ s32 *s;
+
+ for (s = start; s < end; s++) {
+ void *addr = (void *)s + *s;
+ struct insn insn;
+ int len, ret;
+ u8 bytes[16];
+ u8 op1, op2;
+
+ ret = insn_decode_kernel(&insn, addr);
+ if (WARN_ON_ONCE(ret < 0))
+ continue;
+
+ op1 = insn.opcode.bytes[0];
+ op2 = insn.opcode.bytes[1];
+
+ switch (op1) {
+ case CALL_INSN_OPCODE:
+ case JMP32_INSN_OPCODE:
+ break;
+
+ default:
+ WARN_ON_ONCE(1);
+ continue;
+ }
+
+ len = patch_retpoline(addr, &insn, bytes);
+ if (len == insn.length) {
+ optimize_nops(bytes, len);
+ text_poke_early(addr, bytes, len);
+ }
+ }
+}
+
+#else /* !RETPOLINES || !CONFIG_STACK_VALIDATION */
+
+void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_RETPOLINE && CONFIG_STACK_VALIDATION */
+
#ifdef CONFIG_SMP
static void alternatives_smp_lock(const s32 *start, const s32 *end,
u8 *text, u8 *text_end)
@@ -643,6 +770,12 @@ void __init alternative_instructions(voi
apply_paravirt(__parainstructions, __parainstructions_end);

/*
+ * Rewrite the retpolines, must be done before alternatives since
+ * those can rewrite the retpoline thunks.
+ */
+ apply_retpolines(__retpoline_sites, __retpoline_sites_end);
+
+ /*
* Then patch alternatives, such that those paravirt calls that are in
* alternatives can be overwritten by their immediate fragments.
*/
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -252,7 +252,8 @@ int module_finalize(const Elf_Ehdr *hdr,
struct module *me)
{
const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
- *para = NULL, *orc = NULL, *orc_ip = NULL;
+ *para = NULL, *orc = NULL, *orc_ip = NULL,
+ *retpolines = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;

for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -268,6 +269,8 @@ int module_finalize(const Elf_Ehdr *hdr,
orc = s;
if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
orc_ip = s;
+ if (!strcmp(".retpoline_sites", secstrings + s->sh_name))
+ retpolines = s;
}

/*
@@ -278,6 +281,10 @@ int module_finalize(const Elf_Ehdr *hdr,
void *pseg = (void *)para->sh_addr;
apply_paravirt(pseg, pseg + para->sh_size);
}
+ if (retpolines) {
+ void *rseg = (void *)retpolines->sh_addr;
+ apply_retpolines(rseg, rseg + retpolines->sh_size);
+ }
if (alt) {
/* patch .altinstructions */
void *aseg = (void *)alt->sh_addr;


2022-07-12 20:20:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 51/78] intel_idle: Disable IBRS during long idle

From: Peter Zijlstra <[email protected]>

commit bf5835bcdb9635c97f85120dba9bfa21e111130f upstream.

Having IBRS enabled while the SMT sibling is idle unnecessarily slows
down the running sibling. OTOH, disabling IBRS around idle takes two
MSR writes, which will increase the idle latency.

Therefore, only disable IBRS around deeper idle states. Shallow idle
states are bounded by the tick in duration, since NOHZ is not allowed
for them by virtue of their short target residency.

Only do this for mwait-driven idle, since that keeps interrupts disabled
across idle, which makes disabling IBRS vs IRQ-entry a non-issue.

Note: C6 is a random threshold, most importantly C1 probably shouldn't
disable IBRS, benchmarking needed.

Suggested-by: Tim Chen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
[cascardo: no CPUIDLE_FLAG_IRQ_ENABLE]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/include/asm/nospec-branch.h | 1
arch/x86/kernel/cpu/bugs.c | 6 ++++
drivers/idle/intel_idle.c | 43 ++++++++++++++++++++++++++++++-----
3 files changed, 44 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -256,6 +256,7 @@ static inline void indirect_branch_predi
/* The Intel SPEC CTRL MSR base value cache */
extern u64 x86_spec_ctrl_base;
extern void write_spec_ctrl_current(u64 val, bool force);
+extern u64 spec_ctrl_current(void);

/*
* With retpoline, we must use IBRS to restrict branch prediction
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -79,6 +79,12 @@ void write_spec_ctrl_current(u64 val, bo
wrmsrl(MSR_IA32_SPEC_CTRL, val);
}

+u64 spec_ctrl_current(void)
+{
+ return this_cpu_read(x86_spec_ctrl_current);
+}
+EXPORT_SYMBOL_GPL(spec_ctrl_current);
+
/*
* The vendor and possibly platform specific bits which can be modified in
* x86_spec_ctrl_base.
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -47,11 +47,13 @@
#include <linux/tick.h>
#include <trace/events/power.h>
#include <linux/sched.h>
+#include <linux/sched/smt.h>
#include <linux/notifier.h>
#include <linux/cpu.h>
#include <linux/moduleparam.h>
#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
+#include <asm/nospec-branch.h>
#include <asm/mwait.h>
#include <asm/msr.h>

@@ -94,6 +96,12 @@ static unsigned int mwait_substates __in
#define CPUIDLE_FLAG_ALWAYS_ENABLE BIT(15)

/*
+ * Disable IBRS across idle (when KERNEL_IBRS), is exclusive vs IRQ_ENABLE
+ * above.
+ */
+#define CPUIDLE_FLAG_IBRS BIT(16)
+
+/*
* MWAIT takes an 8-bit "hint" in EAX "suggesting"
* the C-state (top nibble) and sub-state (bottom nibble)
* 0x00 means "MWAIT(C1)", 0x10 means "MWAIT(C2)" etc.
@@ -132,6 +140,24 @@ static __cpuidle int intel_idle(struct c
return index;
}

+static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
+{
+ bool smt_active = sched_smt_active();
+ u64 spec_ctrl = spec_ctrl_current();
+ int ret;
+
+ if (smt_active)
+ wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+
+ ret = intel_idle(dev, drv, index);
+
+ if (smt_active)
+ wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
+
+ return ret;
+}
+
/**
* intel_idle_s2idle - Ask the processor to enter the given idle state.
* @dev: cpuidle device of the target CPU.
@@ -653,7 +679,7 @@ static struct cpuidle_state skl_cstates[
{
.name = "C6",
.desc = "MWAIT 0x20",
- .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS,
.exit_latency = 85,
.target_residency = 200,
.enter = &intel_idle,
@@ -661,7 +687,7 @@ static struct cpuidle_state skl_cstates[
{
.name = "C7s",
.desc = "MWAIT 0x33",
- .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS,
.exit_latency = 124,
.target_residency = 800,
.enter = &intel_idle,
@@ -669,7 +695,7 @@ static struct cpuidle_state skl_cstates[
{
.name = "C8",
.desc = "MWAIT 0x40",
- .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS,
.exit_latency = 200,
.target_residency = 800,
.enter = &intel_idle,
@@ -677,7 +703,7 @@ static struct cpuidle_state skl_cstates[
{
.name = "C9",
.desc = "MWAIT 0x50",
- .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS,
.exit_latency = 480,
.target_residency = 5000,
.enter = &intel_idle,
@@ -685,7 +711,7 @@ static struct cpuidle_state skl_cstates[
{
.name = "C10",
.desc = "MWAIT 0x60",
- .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS,
.exit_latency = 890,
.target_residency = 5000,
.enter = &intel_idle,
@@ -714,7 +740,7 @@ static struct cpuidle_state skx_cstates[
{
.name = "C6",
.desc = "MWAIT 0x20",
- .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED,
+ .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS,
.exit_latency = 133,
.target_residency = 600,
.enter = &intel_idle,
@@ -1574,6 +1600,11 @@ static void __init intel_idle_init_cstat
/* Structure copy. */
drv->states[drv->state_count] = cpuidle_state_table[cstate];

+ if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
+ cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IBRS) {
+ drv->states[drv->state_count].enter = intel_idle_ibrs;
+ }
+
if ((disabled_states_mask & BIT(drv->state_count)) ||
((icpu->use_acpi || force_use_acpi) &&
intel_idle_off_by_default(mwait_hint) &&


2022-07-12 20:20:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 24/78] x86/kvm/vmx: Make noinstr clean

From: Peter Zijlstra <[email protected]>

commit 742ab6df974ae8384a2dd213db1a3a06cf6d8936 upstream.

The recent mmio_stale_data fixes broke the noinstr constraints:

vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section

make it all happy again.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 6 +++---
arch/x86/kvm/x86.c | 4 ++--
include/linux/kvm_host.h | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -380,9 +380,9 @@ static __always_inline void vmx_disable_
if (!vmx->disable_fb_clear)
return;

- rdmsrl(MSR_IA32_MCU_OPT_CTRL, msr);
+ msr = __rdmsr(MSR_IA32_MCU_OPT_CTRL);
msr |= FB_CLEAR_DIS;
- wrmsrl(MSR_IA32_MCU_OPT_CTRL, msr);
+ native_wrmsrl(MSR_IA32_MCU_OPT_CTRL, msr);
/* Cache the MSR value to avoid reading it later */
vmx->msr_ia32_mcu_opt_ctrl = msr;
}
@@ -393,7 +393,7 @@ static __always_inline void vmx_enable_f
return;

vmx->msr_ia32_mcu_opt_ctrl &= ~FB_CLEAR_DIS;
- wrmsrl(MSR_IA32_MCU_OPT_CTRL, vmx->msr_ia32_mcu_opt_ctrl);
+ native_wrmsrl(MSR_IA32_MCU_OPT_CTRL, vmx->msr_ia32_mcu_opt_ctrl);
}

static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12177,9 +12177,9 @@ void kvm_arch_end_assignment(struct kvm
}
EXPORT_SYMBOL_GPL(kvm_arch_end_assignment);

-bool kvm_arch_has_assigned_device(struct kvm *kvm)
+bool noinstr kvm_arch_has_assigned_device(struct kvm *kvm)
{
- return atomic_read(&kvm->arch.assigned_device_count);
+ return arch_atomic_read(&kvm->arch.assigned_device_count);
}
EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);

--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1233,7 +1233,7 @@ static inline void kvm_arch_end_assignme
{
}

-static inline bool kvm_arch_has_assigned_device(struct kvm *kvm)
+static __always_inline bool kvm_arch_has_assigned_device(struct kvm *kvm)
{
return false;
}


2022-07-12 20:20:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 62/78] x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit

From: Josh Poimboeuf <[email protected]>

commit bbb69e8bee1bd882784947095ffb2bfe0f7c9470 upstream.

There's no need to recalculate the host value for every entry/exit.
Just use the cached value in spec_ctrl_current().

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -208,7 +208,7 @@ void __init check_bugs(void)
void
x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
{
- u64 msrval, guestval, hostval = x86_spec_ctrl_base;
+ u64 msrval, guestval, hostval = spec_ctrl_current();
struct thread_info *ti = current_thread_info();

/* Is MSR_SPEC_CTRL implemented ? */
@@ -221,15 +221,6 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl,
guestval = hostval & ~x86_spec_ctrl_mask;
guestval |= guest_spec_ctrl & x86_spec_ctrl_mask;

- /* SSBD controlled in MSR_SPEC_CTRL */
- if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) ||
- static_cpu_has(X86_FEATURE_AMD_SSBD))
- hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
-
- /* Conditional STIBP enabled? */
- if (static_branch_unlikely(&switch_to_cond_stibp))
- hostval |= stibp_tif_to_spec_ctrl(ti->flags);
-
if (hostval != guestval) {
msrval = setguest ? guestval : hostval;
wrmsrl(MSR_IA32_SPEC_CTRL, msrval);
@@ -1390,7 +1381,6 @@ static void __init spectre_v2_select_mit
pr_err(SPECTRE_V2_EIBRS_EBPF_MSG);

if (spectre_v2_in_ibrs_mode(mode)) {
- /* Force it so VMEXIT will restore correctly */
x86_spec_ctrl_base |= SPEC_CTRL_IBRS;
write_spec_ctrl_current(x86_spec_ctrl_base, true);
}


2022-07-12 20:21:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 5.15 22/78] x86/entry: Remove skip_r11rcx

From: Peter Zijlstra <[email protected]>

commit 1b331eeea7b8676fc5dbdf80d0a07e41be226177 upstream.

Yes, r11 and rcx have been restored previously, but since they're being
popped anyway (into rsi) might as well pop them into their own regs --
setting them to the value they already are.

Less magical code.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/entry/calling.h | 10 +---------
arch/x86/entry/entry_64.S | 3 +--
2 files changed, 2 insertions(+), 11 deletions(-)

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -119,27 +119,19 @@ For 32-bit we have the following convent
CLEAR_REGS
.endm

-.macro POP_REGS pop_rdi=1 skip_r11rcx=0
+.macro POP_REGS pop_rdi=1
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbp
popq %rbx
- .if \skip_r11rcx
- popq %rsi
- .else
popq %r11
- .endif
popq %r10
popq %r9
popq %r8
popq %rax
- .if \skip_r11rcx
- popq %rsi
- .else
popq %rcx
- .endif
popq %rdx
popq %rsi
.if \pop_rdi
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -189,8 +189,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_h
* perf profiles. Nothing jumps here.
*/
syscall_return_via_sysret:
- /* rcx and r11 are already restored (see code above) */
- POP_REGS pop_rdi=0 skip_r11rcx=1
+ POP_REGS pop_rdi=0

/*
* Now all regs are restored except RSP and RDI.


2022-07-12 23:46:45

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On 7/12/22 11:38, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.55 release.
> There are 78 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels:

Tested-by: Florian Fainelli <[email protected]>
--
Florian

2022-07-13 03:32:38

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On 7/12/22 12:38 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.55 release.
> There are 78 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

Tested-by: Shuah Khan <[email protected]>

thanks,
-- Shuah

2022-07-13 03:33:33

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Tue, Jul 12, 2022 at 08:38:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.55 release.
> There are 78 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>

Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0)
and powerpc (ps3_defconfig, GCC 12.1.0).

Tested-by: Bagas Sanjaya <[email protected]>

--
An old man doll... just what I always wanted! - Clara

2022-07-13 10:31:25

by Sudip Mukherjee

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

Hi Greg,

On Tue, Jul 12, 2022 at 08:38:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.55 release.
> There are 78 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> Anything received after that time might be too late.

Build test (gcc version 11.3.1 20220706):
mips: 62 configs -> no failure
arm: 99 configs -> no failure
arm64: 3 configs -> no failure
x86_64: 4 configs -> no failure
alpha allmodconfig -> no failure
csky allmodconfig -> no failure
powerpc allmodconfig -> no failure
riscv allmodconfig -> no failure
s390 allmodconfig -> no failure
xtensa allmodconfig -> no failure

Boot test:
x86_64: Booted on my test laptop. No regression.
x86_64: Booted on qemu. No regression. [1]
arm64: Booted on rpi4b (4GB model). No regression. [2]
mips: Booted on ci20 board. No regression. [3]

[1]. https://openqa.qa.codethink.co.uk/tests/1508
[2]. https://openqa.qa.codethink.co.uk/tests/1511
[3]. https://openqa.qa.codethink.co.uk/tests/1514

Tested-by: Sudip Mukherjee <[email protected]>

--
Regards
Sudip

2022-07-13 13:01:38

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Wed, 13 Jul 2022 at 00:17, Greg Kroah-Hartman
<[email protected]> wrote:
>
> This is the start of the stable review cycle for the 5.15.55 release.
> There are 78 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
Regressions on x86_64.

Reported-by: Linux Kernel Functional Testing <[email protected]>

1) Kernel panic noticed on device x86_6 while running kvm-unit-tests.
- APIC base relocation is unsupported by KVM

2) while booting qemu-x86_64 the following warning noticed.
- WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:557
apply_returns+0x19c/0x1d0

kernel panic log:
- https://lkft.validation.linaro.org/scheduler/job/5278112#L1703
TESTNAME=emulator TIMEOUT=90s ACCEL= ./x86/run x86/emulator.flat -smp 1
[ 67.774572] APIC base relocation is unsupported by KVM
[ 105.643057] kvm: emulating exchange as write
[ 105.653717] int3: 0000 [#1] SMP PTI
[ 105.653720] CPU: 3 PID: 3747 Comm: qemu-system-x86 Not tainted 5.15.55-rc1 #1
[ 105.653721] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 105.653722] RIP: 0010:xaddw_ax_dx+0x9/0x10
[ 105.653727] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc
cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc
cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3
cc cc
[ 105.653728] RSP: 0018:ffffb98bc5157ce8 EFLAGS: 00000206
[ 105.653729] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[ 105.653730] RDX: 0000000076543210 RSI: ffffffff8ea56000 RDI: 0000000000000204
[ 105.653731] RBP: ffffb98bc5157cf0 R08: ffffa306c6cf5df0 R09: 0000000000000002
[ 105.653732] R10: ffffa306c6cf5df0 R11: 0000000000000000 R12: ffffa306c6cf5df0
[ 105.653733] R13: ffffffff900090c0 R14: 0000000000000000 R15: 0000000000000000
[ 105.653734] FS: 00007f30ab0df700(0000) GS:ffffa30a1fd80000(0000)
knlGS:0000000000000000
[ 105.653735] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 105.653736] CR2: 0000000000000000 CR3: 000000014e0d0003 CR4: 00000000003726e0
[ 105.653736] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 105.653737] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 105.653738] Call Trace:
[ 105.653738] <TASK>
[ 105.653739] ? fastop+0x5d/0xa0
[ 105.653741] x86_emulate_insn+0x7c9/0xf20
[ 105.653743] x86_emulate_instruction+0x2e3/0x790
[ 105.653746] complete_emulated_mmio+0x238/0x310
[ 105.653748] kvm_arch_vcpu_ioctl_run+0x11ba/0x1a70
[ 105.653750] ? vfs_writev+0xcb/0x140
[ 105.653753] kvm_vcpu_ioctl+0x281/0x6b0
[ 105.653755] ? clockevents_program_event+0x98/0x100
[ 105.653757] ? selinux_file_ioctl+0xae/0x140
[ 105.653760] ? selinux_file_ioctl+0xae/0x140
[ 105.653762] __x64_sys_ioctl+0x95/0xd0
[ 105.653764] do_syscall_64+0x3b/0x90
[ 105.653767] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 105.653769] RIP: 0033:0x7f30aca698f7
[ 105.653770] Code: b3 66 90 48 8b 05 a1 35 2c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 71 35 2c 00 f7 d8 64 89
01 48
[ 105.653771] RSP: 002b:00007f30ab0dea28 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 105.653772] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f30aca698f7
[ 105.653773] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000f
[ 105.653774] RBP: 000055fb7fb6faf0 R08: 000055fb7d67c450 R09: 00000000ffffffff
[ 105.653775] R10: 00000000000216a8 R11: 0000000000000246 R12: 0000000000000000
[ 105.653775] R13: 00007f30aed99000 R14: 0000000000000006 R15: 000055fb7fb6faf0
[ 105.653776] </TASK>
[ 105.653777] Modules linked in: x86_pkg_temp_thermal
[ 105.902123] ---[ end trace cec99cae36bcbfd7 ]---
[ 105.902124] RIP: 0010:xaddw_ax_dx+0x9/0x10
[ 105.902126] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc
cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc
cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3
cc cc
[ 105.902127] RSP: 0018:ffffb98bc5157ce8 EFLAGS: 00000206
[ 105.902127] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[ 105.902128] RDX: 0000000076543210 RSI: ffffffff8ea56000 RDI: 0000000000000204
[ 105.902129] RBP: ffffb98bc5157cf0 R08: ffffa306c6cf5df0 R09: 0000000000000002
[ 105.902129] R10: ffffa306c6cf5df0 R11: 0000000000000000 R12: ffffa306c6cf5df0
[ 105.902130] R13: ffffffff900090c0 R14: 0000000000000000 R15: 0000000000000000
[ 105.902130] FS: 00007f30ab0df700(0000) GS:ffffa30a1fd80000(0000)
knlGS:0000000000000000
[ 105.902131] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 105.902132] CR2: 0000000000000000 CR3: 000000014e0d0003 CR4: 00000000003726e0
[ 105.902133] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 105.902133] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 105.902134] Kernel panic - not syncing: Fatal exception in interrupt
[ 105.902170] Kernel Offset: 0xda00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 106.022663] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---
[ 106.030224] ------------[ cut here ]------------
[ 106.030224] sched: Unexpected reschedule of offline CPU#0!
[ 106.030226] WARNING: CPU: 3 PID: 3747 at
arch/x86/kernel/apic/ipi.c:68 native_smp_send_reschedule+0x3e/0x50
[ 106.030229] Modules linked in: x86_pkg_temp_thermal
[ 106.030230] CPU: 3 PID: 3747 Comm: qemu-system-x86 Tainted: G
D 5.15.55-rc1 #1
[ 106.030231] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 106.030232] RIP: 0010:native_smp_send_reschedule+0x3e/0x50
[ 106.030234] Code: 1b 48 8b 05 d4 70 a6 01 be fd 00 00 00 48 8b 40
30 e8 66 dc 31 01 5d c3 cc cc cc cc 89 fe 48 c7 c7 a0 e9 41 90 e8 1e
c3 ea 00 <0f> 0b 5d c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00
[ 106.030234] RSP: 0018:ffffb98bc0160c60 EFLAGS: 00010086
[ 106.030235] RAX: 0000000000000000 RBX: ffffa30a1fc29b00 RCX: 0000000000000027
[ 106.030236] RDX: ffffa30a1fd9b4b8 RSI: 0000000000000001 RDI: ffffa30a1fd9b4b0
[ 106.030237] RBP: ffffb98bc0160c60 R08: ffffffff90b65665 R09: 0000000000000000
[ 106.030237] R10: 0000000000000030 R11: ffffffff90b65665 R12: ffffa306c097c100
[ 106.030238] R13: ffffb98bc0160d00 R14: ffffb98bc0160d00 R15: 0000000000000009
[ 106.030239] FS: 00007f30ab0df700(0000) GS:ffffa30a1fd80000(0000)
knlGS:0000000000000000
[ 106.030239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 106.030240] CR2: 0000000000000000 CR3: 000000014e0d0003 CR4: 00000000003726e0
[ 106.030241] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 106.030241] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 106.030242] Call Trace:
[ 106.030242] <IRQ>
[ 106.030243] resched_curr+0x52/0xb0
[ 106.030245] check_preempt_curr+0x3b/0x70
[ 106.030247] ttwu_do_wakeup+0x1c/0x160
[ 106.030249] ttwu_do_activate+0x94/0x190
[ 106.030251] try_to_wake_up+0x1c4/0x480
[ 106.030253] default_wake_function+0x1a/0x40
[ 106.030254] autoremove_wake_function+0x12/0x40
[ 106.030256] __wake_up_common+0x7d/0x140
[ 106.030258] __wake_up_common_lock+0x7c/0xc0
[ 106.030261] __wake_up+0x13/0x20
[ 106.030263] wake_up_klogd_work_func+0x7b/0x90
[ 106.030265] irq_work_single+0x46/0x80
[ 106.030267] irq_work_run_list+0x2a/0x40
[ 106.030269] irq_work_tick+0x3b/0x50
[ 106.030270] update_process_times+0xba/0xd0
[ 106.030272] tick_sched_handle+0x38/0x50
[ 106.030274] tick_sched_timer+0x8c/0xc0
[ 106.030276] ? can_stop_idle_tick+0xa0/0xa0
[ 106.030278] __hrtimer_run_queues+0xa6/0x2b0
[ 106.030280] hrtimer_interrupt+0x101/0x220
[ 106.030281] __sysvec_apic_timer_interrupt+0x61/0xe0
[ 106.030283] sysvec_apic_timer_interrupt+0x7b/0x90
[ 106.030285] </IRQ>
[ 106.030285] <TASK>
[ 106.030286] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 106.030288] RIP: 0010:panic+0x277/0x2b6
[ 106.030290] Code: e8 65 ac 25 ff 48 c7 c6 80 04 b5 90 48 c7 c7 80
3f 42 90 e8 6a 58 00 00 c7 05 cc 8e f5 00 01 00 00 00 e8 d3 80 33 ff
fb 31 db <4c> 39 eb 7c 1d 41 83 f4 01 48 8b 05 a0 d2 1b 01 44 89 e7 e8
18 16
[ 106.030291] RSP: 0018:ffffb98bc5157b10 EFLAGS: 00000246
[ 106.030292] RAX: ffffb98bc5157b80 RBX: 0000000000000000 RCX: 0000000000000027
[ 106.030292] RDX: 0000000000000000 RSI: ffffffff8f98de8f RDI: ffffffff8f99314d
[ 106.030293] RBP: ffffb98bc5157b80 R08: ffffffff90b655fa R09: 0000000090b655d6
[ 106.030293] R10: ffffffffffffffff R11: ffffffffffffffff R12: 0000000000000000
[ 106.030294] R13: 0000000000000000 R14: ffffffff904131e0 R15: 0000000000000000
[ 106.030295] ? oops_end.cold+0xc/0x18
[ 106.030297] ? panic+0x274/0x2b6
[ 106.030299] oops_end.cold+0xc/0x18
[ 106.030300] die+0x43/0x60
[ 106.030302] exc_int3+0x137/0x160
[ 106.030303] asm_exc_int3+0x39/0x40
[ 106.030305] RIP: 0010:xaddw_ax_dx+0x9/0x10
[ 106.030307] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc
cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc
cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3
cc cc
[ 106.030308] RSP: 0018:ffffb98bc5157ce8 EFLAGS: 00000206
[ 106.030308] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[ 106.030309] RDX: 0000000076543210 RSI: ffffffff8ea56000 RDI: 0000000000000204
[ 106.030310] RBP: ffffb98bc5157cf0 R08: ffffa306c6cf5df0 R09: 0000000000000002
[ 106.030310] R10: ffffa306c6cf5df0 R11: 0000000000000000 R12: ffffa306c6cf5df0
[ 106.030311] R13: ffffffff900090c0 R14: 0000000000000000 R15: 0000000000000000
[ 106.030312] ? xaddw_ax_dx+0x8/0x10
[ 106.030314] ? xaddw_ax_dx+0x9/0x10
[ 106.030315] ? xaddw_ax_dx+0x8/0x10
[ 106.030317] ? fastop+0x5d/0xa0
[ 106.030319] x86_emulate_insn+0x7c9/0xf20
[ 106.030321] x86_emulate_instruction+0x2e3/0x790
[ 106.030323] complete_emulated_mmio+0x238/0x310
[ 106.030325] kvm_arch_vcpu_ioctl_run+0x11ba/0x1a70
[ 106.030327] ? vfs_writev+0xcb/0x140
[ 106.030330] kvm_vcpu_ioctl+0x281/0x6b0
[ 106.030331] ? clockevents_program_event+0x98/0x100
[ 106.030333] ? selinux_file_ioctl+0xae/0x140
[ 106.030335] ? selinux_file_ioctl+0xae/0x140
[ 106.030337] __x64_sys_ioctl+0x95/0xd0
[ 106.030339] do_syscall_64+0x3b/0x90
[ 106.030340] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 106.030342] RIP: 0033:0x7f30aca698f7
[ 106.030343] Code: b3 66 90 48 8b 05 a1 35 2c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 71 35 2c 00 f7 d8 64 89
01 48
[ 106.030343] RSP: 002b:00007f30ab0dea28 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 106.030344] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f30aca698f7
[ 106.030345] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000f
[ 106.030346] RBP: 000055fb7fb6faf0 R08: 000055fb7d67c450 R09: 00000000ffffffff
[ 106.030346] R10: 00000000000216a8 R11: 0000000000000246 R12: 0000000000000000
[ 106.030347] R13: 00007f30aed99000 R14: 0000000000000006 R15: 000055fb7fb6faf0
[ 106.030348] </TASK>
[ 106.030348] ---[ end trace cec99cae36bcbfd8 ]---
[ 106.030350] ------------[ cut here ]------------
[ 106.030350] sched: Unexpected reschedule of offline CPU#2!
[ 106.030351] WARNING: CPU: 3 PID: 3747 at
arch/x86/kernel/apic/ipi.c:68 native_smp_send_reschedule+0x3e/0x50
[ 106.030354] Modules linked in: x86_pkg_temp_thermal
[ 106.030354] CPU: 3 PID: 3747 Comm: qemu-system-x86 Tainted: G
D W 5.15.55-rc1 #1
[ 106.030355] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 106.030356] RIP: 0010:native_smp_send_reschedule+0x3e/0x50
[ 106.030357] Code: 1b 48 8b 05 d4 70 a6 01 be fd 00 00 00 48 8b 40
30 e8 66 dc 31 01 5d c3 cc cc cc cc 89 fe 48 c7 c7 a0 e9 41 90 e8 1e
c3 ea 00 <0f> 0b 5d c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00
[ 106.030358] RSP: 0018:ffffb98bc0160b40 EFLAGS: 00010086
[ 106.030359] RAX: 0000000000000000 RBX: ffffa30a1fd29b00 RCX: 0000000000000027
[ 106.030360] RDX: ffffa30a1fd9b4b8 RSI: 0000000000000001 RDI: ffffa30a1fd9b4b0
[ 106.030360] RBP: ffffb98bc0160b40 R08: ffffffff90b66d1d R09: 0000000000000000
[ 106.030361] R10: 0000000000000030 R11: ffffffff90b66d1d R12: ffffa306c2216180
[ 106.030361] R13: ffffb98bc0160be0 R14: ffffb98bc0160be0 R15: 0000000000000009
[ 106.030362] FS: 00007f30ab0df700(0000) GS:ffffa30a1fd80000(0000)
knlGS:0000000000000000
[ 106.030363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 106.030364] CR2: 0000000000000000 CR3: 000000014e0d0003 CR4: 00000000003726e0
[ 106.030364] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 106.030365] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 106.030365] Call Trace:
[ 106.030366] <IRQ>
[ 106.030366] resched_curr+0x52/0xb0
[ 106.030368] check_preempt_curr+0x3b/0x70
[ 106.030369] ttwu_do_wakeup+0x1c/0x160
[ 106.030371] ttwu_do_activate+0x94/0x190
[ 106.030373] try_to_wake_up+0x1c4/0x480
[ 106.030375] default_wake_function+0x1a/0x40
[ 106.030377] autoremove_wake_function+0x12/0x40
[ 106.030378] __wake_up_common+0x7d/0x140
[ 106.030380] __wake_up_common_lock+0x7c/0xc0
[ 106.030382] __wake_up+0x13/0x20
[ 106.030384] ep_poll_callback+0x10e/0x290
[ 106.030386] __wake_up_common+0x7d/0x140
[ 106.030389] __wake_up_common_lock+0x7c/0xc0
[ 106.030391] __wake_up+0x13/0x20
[ 106.030393] wake_up_klogd_work_func+0x7b/0x90
[ 106.030395] irq_work_single+0x46/0x80
[ 106.030397] irq_work_run_list+0x2a/0x40
[ 106.030398] irq_work_tick+0x3b/0x50
[ 106.030400] update_process_times+0xba/0xd0
[ 106.030401] tick_sched_handle+0x38/0x50
[ 106.030403] tick_sched_timer+0x8c/0xc0
[ 106.030405] ? can_stop_idle_tick+0xa0/0xa0
[ 106.030407] __hrtimer_run_queues+0xa6/0x2b0
[ 106.030408] hrtimer_interrupt+0x101/0x220
[ 106.030410] __sysvec_apic_timer_interrupt+0x61/0xe0
[ 106.030411] sysvec_apic_timer_interrupt+0x7b/0x90
[ 106.030413] </IRQ>
[ 106.030413] <TASK>
[ 106.030414] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 106.030416] RIP: 0010:panic+0x277/0x2b6
[ 106.030417] Code: e8 65 ac 25 ff 48 c7 c6 80 04 b5 90 48 c7 c7 80
3f 42 90 e8 6a 58 00 00 c7 05 cc 8e f5 00 01 00 00 00 e8 d3 80 33 ff
fb 31 db <4c> 39 eb 7c 1d 41 83 f4 01 48 8b 05 a0 d2 1b 01 44 89 e7 e8
18 16
[ 106.030418] RSP: 0018:ffffb98bc5157b10 EFLAGS: 00000246
[ 106.030419] RAX: ffffb98bc5157b80 RBX: 0000000000000000 RCX: 0000000000000027
[ 106.030419] RDX: 0000000000000000 RSI: ffffffff8f98de8f RDI: ffffffff8f99314d
[ 106.030420] RBP: ffffb98bc5157b80 R08: ffffffff90b655fa R09: 0000000090b655d6
[ 106.030420] R10: ffffffffffffffff R11: ffffffffffffffff R12: 0000000000000000
[ 106.030421] R13: 0000000000000000 R14: ffffffff904131e0 R15: 0000000000000000
[ 106.030422] ? oops_end.cold+0xc/0x18
[ 106.030423] ? panic+0x274/0x2b6
[ 106.030425] oops_end.cold+0xc/0x18
[ 106.030427] die+0x43/0x60
[ 106.030428] exc_int3+0x137/0x160
[ 106.030430] asm_exc_int3+0x39/0x40
[ 106.030431] RIP: 0010:xaddw_ax_dx+0x9/0x10
[ 106.030433] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc
cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc
cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3
cc cc
[ 106.030434] RSP: 0018:ffffb98bc5157ce8 EFLAGS: 00000206
[ 106.030434] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
[ 106.030435] RDX: 0000000076543210 RSI: ffffffff8ea56000 RDI: 0000000000000204
[ 106.030436] RBP: ffffb98bc5157cf0 R08: ffffa306c6cf5df0 R09: 0000000000000002
[ 106.030436] R10: ffffa306c6cf5df0 R11: 0000000000000000 R12: ffffa306c6cf5df0
[ 106.030437] R13: ffffffff900090c0 R14: 0000000000000000 R15: 0000000000000000
[ 106.030437] ? xaddw_ax_dx+0x8/0x10
[ 106.030439] ? xaddw_ax_dx+0x9/0x10
[ 106.030441] ? xaddw_ax_dx+0x8/0x10
[ 106.030443] ? fastop+0x5d/0xa0
[ 106.030445] x86_emulate_insn+0x7c9/0xf20
[ 106.030446] x86_emulate_instruction+0x2e3/0x790
[ 106.030449] complete_emulated_mmio+0x238/0x310
[ 106.030450] kvm_arch_vcpu_ioctl_run+0x11ba/0x1a70
[ 106.030453] ? vfs_writev+0xcb/0x140
[ 106.030455] kvm_vcpu_ioctl+0x281/0x6b0
[ 106.030456] ? clockevents_program_event+0x98/0x100
[ 106.030458] ? selinux_file_ioctl+0xae/0x140
[ 106.030460] ? selinux_file_ioctl+0xae/0x140
[ 106.030462] __x64_sys_ioctl+0x95/0xd0
[ 106.030464] do_syscall_64+0x3b/0x90
[ 106.030465] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 106.030467] RIP: 0033:0x7f30aca698f7
[ 106.030468] Code: b3 66 90 48 8b 05 a1 35 2c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 71 35 2c 00 f7 d8 64 89
01 48
[ 106.030468] RSP: 002b:00007f30ab0dea28 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 106.030469] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f30aca698f7
[ 106.030470] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000f
[ 106.030470] RBP: 000055fb7fb6faf0 R08: 000055fb7d67c450 R09: 00000000ffffffff
[ 106.030471] R10: 00000000000216a8 R11: 0000000000000246 R12: 0000000000000000
[ 106.030472] R13: 00007f30aed99000 R14: 0000000000000006 R15: 000055fb7fb6faf0
[ 106.030473] </TASK>
[ 106.030473] ---[ end trace cec99cae36bcbfd9 ]---

2. Kernel warning
- https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15.54-79-g0fe4fdf9b1da/testrun/10799646/suite/log-parser-test/tests/
<6>[ 0.571674] Speculative Store Bypass: Vulnerable
<6>[ 0.573329] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
<4>[ 0.942716] ------------[ cut here ]------------
<4>[ 0.944263] WARNING: CPU: 0 PID: 0 at
arch/x86/kernel/alternative.c:557 apply_returns+0x19c/0x1d0
<4>[ 0.945106] Modules linked in:
<4>[ 0.946322] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.55-rc1 #1
<4>[ 0.947052] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.14.0-2 04/01/2014
<4>[ 0.947844] RIP: 0010:apply_returns+0x19c/0x1d0
<4>[ 0.948453] Code: 8d 7d c8 4c 89 75 c1 4c 89 74 08 f8 48 29 f8
8d 0c 02 4c 89 f0 c1 e9 03 f3 48 ab 8b 05 9d ba 52 02 85 c0 74 a0 e9
45 32 1a 01 <0f> 0b 48 83 c3 04 49 39 dd 0f 87 96 fe ff ff e9 0f ff ff
ff c7 45
<4>[ 0.950059] RSP: 0000:ffffffff89803d78 EFLAGS: 00000206
<4>[ 0.950484] RAX: 0000000000000000 RBX: ffffffff89dbb174 RCX:
0000000000000000
<4>[ 0.951033] RDX: ffffffff891011b5 RSI: 00000000000000e9 RDI:
ffffffff891011b0
<4>[ 0.951488] RBP: ffffffff89803e40 R08: 0000000000000000 R09:
0000000000000001
<4>[ 0.952032] R10: 0000000000000029 R11: 0000000000000000 R12:
ffffffff891011b0
<4>[ 0.952488] R13: ffffffff89ddb184 R14: cccccccccccccccc R15:
ffffffff891011b5
<4>[ 0.953125] FS: 0000000000000000(0000)
GS:ffff90f63fc00000(0000) knlGS:0000000000000000
<4>[ 0.953634] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 0.954031] CR2: ffff90f63fa01000 CR3: 000000003e826000 CR4:
00000000000006f0
<4>[ 0.954669] Call Trace:
<4>[ 0.955846] <TASK>
<4>[ 0.957183] alternative_instructions+0x7d/0x146
<4>[ 0.957673] check_bugs+0xf16/0xf57
<4>[ 0.958114] start_kernel+0x6b6/0x6ef
<4>[ 0.958446] x86_64_start_reservations+0x24/0x2a
<4>[ 0.958764] x86_64_start_kernel+0x8d/0x95
<4>[ 0.959035] secondary_startup_64_no_verify+0xc2/0xcb
<4>[ 0.959591] </TASK>
<4>[ 0.960146] irq event stamp: 121171
<4>[ 0.960419] hardirqs last enabled at (121179):
[<ffffffff87a7247c>] __up_console_sem+0x5c/0x70
<4>[ 0.961036] hardirqs last disabled at (121188):
[<ffffffff87a72461>] __up_console_sem+0x41/0x70
<4>[ 0.961577] softirqs last enabled at (1484):
[<ffffffff87ad3f20>] cgroup_idr_alloc.constprop.0+0x60/0xe0
<4>[ 0.962031] softirqs last disabled at (1482):
[<ffffffff87ad3ef3>] cgroup_idr_alloc.constprop.0+0x33/0xe0
<4>[ 0.963358] ---[ end trace 7af0f35d34a8be8b ]---
<6>[ 1.100332] Freeing SMP alternatives memory: 52K
<6>[ 1.233988] smpboot: CPU0: Intel Core i7 9xx (Nehalem Class Core
i7) (family: 0x6, model: 0x1a, stepping: 0x3)
<6>[ 1.266761] Running RCU-tasks wait API self tests
<6>[ 1.271070] Performance Events: unsupported p6 CPU model 26 no
PMU driver, software events only.
<6>[ 1.276776] rcu: Hierarchical SRCU implementation.
<6>[ 1.281153] Callback from call_rcu_ta#
sks_trace() invoked.
<6>[ 1.308215] smp: Bringing up secondary CPUs ...
<6>[ 1.322732] x86: Booting SMP configuration:
<6>[ 1.323121] .... node #0, CPUs: #1
<6>[ 1.361239] smp: Brought up 1 node, 2 CPUs


## Build
* kernel: 5.15.55-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-5.15.y
* git commit: 0fe4fdf9b1dac90e23465eefeff45e529adcf3c6
* git describe: v5.15.54-79-g0fe4fdf9b1da
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15.54-79-g0fe4fdf9b1da

## Test Regressions (compared to v5.15.53-227-g71721f5974f2)
kernel crash regression found.

## Metric Regressions (compared to v5.15.53-227-g71721f5974f2)
No metric regressions found.

## Test Fixes (compared to v5.15.53-227-g71721f5974f2)
No test fixes found.

## Metric Fixes (compared to v5.15.53-227-g71721f5974f2)
No metric fixes found.

## Test result summary
total: 137556, pass: 124337, fail: 404, skip: 12121, xfail: 694

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 308 total, 308 passed, 0 failed
* arm64: 62 total, 62 passed, 0 failed
* i386: 52 total, 49 passed, 3 failed
* mips: 48 total, 48 passed, 0 failed
* parisc: 12 total, 12 passed, 0 failed
* powerpc: 54 total, 52 passed, 2 failed
* riscv: 22 total, 22 passed, 0 failed
* s390: 21 total, 21 passed, 0 failed
* sh: 24 total, 24 passed, 0 failed
* sparc: 12 total, 12 passed, 0 failed
* x86_64: 56 total, 55 passed, 1 failed

## Test suites summary
* fwts
* kunit
* kvm-unit-tests
* libgpiod
* libhugetlbfs
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-commands
* ltp-containers
* ltp-controllers
* ltp-cpuhotplug
* ltp-crypto
* ltp-cve
* ltp-dio
* ltp-fcntl-locktests
* ltp-filecaps
* ltp-fs
* ltp-fs_bind
* ltp-fs_perms_simple
* ltp-fsx
* ltp-hugetlb
* ltp-io
* ltp-ipc
* ltp-math
* ltp-mm
* ltp-nptl
* ltp-open-posix-tests
* ltp-pty
* ltp-sched
* ltp-securebits
* ltp-smoke
* ltp-syscalls
* ltp-tracing
* network-basic-tests
* packetdrill
* perf
* perf/Zstd-perf.data-compression
* rcutorture
* ssuite
* v4l2-compliance
* vdso

--
Linaro LKFT
https://lkft.linaro.org

2022-07-13 13:39:55

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On 7/13/22 05:52, Naresh Kamboju wrote:
> On Wed, 13 Jul 2022 at 00:17, Greg Kroah-Hartman
> <[email protected]> wrote:
>>
>> This is the start of the stable review cycle for the 5.15.55 release.
>> There are 78 patches in this series, all will be posted as a response
>> to this one. If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
>> Anything received after that time might be too late.
>>
>> The whole patch series can be found in one patch at:
>> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
>> or in the git tree and branch at:
>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>> and the diffstat can be found below.
>>
>> thanks,
>>
>> greg k-h
>
> Results from Linaro’s test farm.
> Regressions on x86_64.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> 1) Kernel panic noticed on device x86_6 while running kvm-unit-tests.
> - APIC base relocation is unsupported by KVM
>

Looking into the log, I don't think that message is related to the crash.

[ 67.774572] APIC base relocation is unsupported by KVM
[ 105.643057] kvm: emulating exchange as write <--- warning
[ 105.653717] int3: 0000 [#1] SMP PTI
[ 105.653720] CPU: 3 PID: 3747 Comm: qemu-system-x86 Not tainted 5.15.55-rc1 #1
[ 105.653721] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[ 105.653722] RIP: 0010:xaddw_ax_dx+0x9/0x10
...
[ 105.653777] Modules linked in: x86_pkg_temp_thermal
[ 105.902123] ---[ end trace cec99cae36bcbfd7 ]---
[ 105.902124] RIP: 0010:xaddw_ax_dx+0x9/0x10 <--- crash
[ 105.902126] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc

Guenter

2022-07-13 16:55:18

by Ron Economos

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On 7/12/22 11:38 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.55 release.
> There are 78 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Built and booted successfully on RISC-V RV64 (HiFive Unmatched).

Tested-by: Ron Economos <[email protected]>

2022-07-13 18:44:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Wed, Jul 13, 2022 at 11:33 AM Linus Torvalds
<[email protected]> wrote:
>
> So I think that that is where the "xaddw_ax_dx+8" comes from: some
> code assumes that FASTOP_SIZE is 8, but that xaddw_ax_dx case was
> actually 9 bytes and thus got that "int3 + padding" in the next 8
> bytes.
>
> The whole kvm x86 emulation thing is quite complicated and has lots
> of instruction size #defines and magic.
>
> I'm not familiar enough with it to go "Ahh, it's obviously XYZ", but
> I'm sure PeterZ and Borislav know exactly what's going on.

And I see that Thadeau already figured it out:

https://lore.kernel.org/all/[email protected]/

So presumably we need that patch everywhere.

Linus

2022-07-13 19:11:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Wed, Jul 13, 2022 at 6:34 AM Guenter Roeck <[email protected]> wrote:
>
> Looking into the log, I don't think that message is related to the crash.
>
> ...
> [ 105.653777] Modules linked in: x86_pkg_temp_thermal
> [ 105.902123] ---[ end trace cec99cae36bcbfd7 ]---
> [ 105.902124] RIP: 0010:xaddw_ax_dx+0x9/0x10 <--- crash
> [ 105.902126] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc

Yeah, the code you snipped, shows

20: 66 0f c1 d0 xadd %dx,%ax
24: c3 ret
25: cc int3
26: cc int3
27: cc int3
28: cc int3
29:* 0f 1f 80 00 00 00 00 nopl 0x0(%rax) <-- trapping instruction
30: 0f c1 d0 xadd %edx,%eax
33: c3 ret
34: cc int3
35: cc int3
36: cc int3
37: cc int3
38: 48 0f c1 d0 xadd %rdx,%rax
3c: c3 ret
3d: cc int3

and that's a bit odd.

It says "xaddw_ax_dx+0x9/0x10", but I think somebody jumped to
"xaddw_ax_dx+8", hit the 'int3', and the RIP points to the next
instruction (because that's how int3 works).

And the fastop code says:

* fastop functions have a special calling convention:
...
* Moreover, they are all exactly FASTOP_SIZE bytes long,

but that is clearly *NOT* the case for xaddw_ax_dx, because it's 16
bytes in size, and the other ones are 8 bytes. That's where the "nopl"
comes from: it's the alignment instruction to the next fastop
function.

Compare that to the word-sized 'xaddl' case rigth afterwards: that one
*is* just 8 bytes in size, so the 64-byte 'xaddq' comes 8 bytes aftrer
it, and there's no 7-byte padding nop-instruction.

So I think that that is where the "xaddw_ax_dx+8" comes from: some
code assumes that FASTOP_SIZE is 8, but that xaddw_ax_dx case was
actually 9 bytes and thus got that "int3 + padding" in the next 8
bytes.

The whole kvm x86 emulation thiing is quite complicated and has lots
of instruction size #defines and magic.

I'm not familiar enough with it to go "Ahh, it's obviously XYZ", but
I'm sure PeterZ and Borislav know exactly what's going on.

Linus

2022-07-13 23:42:44

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Tue, Jul 12, 2022 at 08:38:30PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.15.55 release.
> There are 78 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> Anything received after that time might be too late.
>

Build results:
total: 159 pass: 158 fail: 1
Failed builds:
um:defconfig
Qemu test results:
total: 488 pass: 480 fail: 8
Failed tests:
x86_64:q35:SandyBridge:defconfig:smp4:net,ne2k_pci:efi32:mem1G:usb:hd
x86_64:q35:Conroe:defconfig:smp4:net,tulip:efi32:mem256:scsi[DC395]:hd
x86_64:q35:Skylake-Server:defconfig:smp4:net,e1000-82544gc:efi32:mem2G:scsi[53C895A]:hd
x86_64:q35:Opteron_G5:defconfig:smp4:net,i82559c:efi32:mem256:scsi[MEGASAS2]:hd
x86_64:pc:Opteron_G2:defconfig:smp:net,usb:efi32:mem2G:scsi[virtio-pci]:hd
x86_64:q35:Nehalem:defconfig:smp2:net,i82558a:efi32:mem1G:virtio:hd
x86_64:q35:Skylake-Client-IBRS:defconfig:preempt:smp2:net,i82558b:efi32:mem1G:sdhci:mmc:hd
x86_64:q35:Haswell-noTSX-IBRS:defconfig:nosmp:net,pcnet:efi32:mem2G:ata:hd

Guenter

2022-07-14 09:46:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Wed, Jul 13, 2022 at 11:40:03AM -0700, Linus Torvalds wrote:
> And I see that Thadeau already figured it out:
>
> https://lore.kernel.org/all/[email protected]/
>
> So presumably we need that patch everywhere.

Right, I've queued it along with other fallout fixes. Will do some
testing before I send them to you on Sunday.

I'm guessing you're thinking of cutting an -rc7 so that people can test
the whole retbleed mitigation disaster an additional week?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-14 09:53:46

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Wed, 2022-07-13 at 18:22 +0530, Naresh Kamboju wrote:
> On Wed, 13 Jul 2022 at 00:17, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > This is the start of the stable review cycle for the 5.15.55 release.
> > There are 78 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> >         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
> > or in the git tree and branch at:
> >         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
>
> Results from Linaro’s test farm.
> Regressions on x86_64.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> 1) Kernel panic noticed on device x86_6 while running kvm-unit-tests.
>    - APIC base relocation is unsupported by KVM

My 0.2 cent:

APIC base relocation warning is harmless, and I removed it 5.19 kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.19-rc6&id=3743c2f0251743b8ae968329708bbbeefff244cf

The 'emulating exchange as write' is also something that KVM unit tests trigger
normally although this warning recently did signal a real and very nasty bug, which I fixed in this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.19-rc6&id=33fbe6befa622c082f7d417896832856814bdde0


FYI, Another 'pr_info_ratelimited' message from KVM that you will see when running the unit tests is this
"kvm: vcpu 0: requested 2000 ns lapic timer period limited to 200000 ns"

It is also harmless, but I do wonder how much value it adds.

And the panic I guess was already figured out.

BTW, there is a script in the kernel source, called decode_stacktrace
(./scripts/decode_stacktrace.sh)

It is very useful to figure out, on which source line the panic/oops
was triggered, you might want to consider using it in the report.

(I do wish the kernel had a debug option, of 'I don't care about kernel addresses/info leaks',
please print all the info you can in the stacktrace).


Best regards,
Maxim Levitsky

>
> 2) while booting qemu-x86_64 the following warning noticed.
>   - WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:557
> apply_returns+0x19c/0x1d0
>
> kernel panic log:
>    -  https://lkft.validation.linaro.org/scheduler/job/5278112#L1703
> TESTNAME=emulator TIMEOUT=90s ACCEL= ./x86/run x86/emulator.flat -smp 1
> [   67.774572] APIC base relocation is unsupported by KVM
> [  105.643057] kvm: emulating exchange as write
> [  105.653717] int3: 0000 [#1] SMP PTI
> [  105.653720] CPU: 3 PID: 3747 Comm: qemu-system-x86 Not tainted 5.15.55-rc1 #1
> [  105.653721] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.0b 07/27/2017
> [  105.653722] RIP: 0010:xaddw_ax_dx+0x9/0x10
> [  105.653727] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc
> cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc
> cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3
> cc cc
> [  105.653728] RSP: 0018:ffffb98bc5157ce8 EFLAGS: 00000206
> [  105.653729] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
> [  105.653730] RDX: 0000000076543210 RSI: ffffffff8ea56000 RDI: 0000000000000204
> [  105.653731] RBP: ffffb98bc5157cf0 R08: ffffa306c6cf5df0 R09: 0000000000000002
> [  105.653732] R10: ffffa306c6cf5df0 R11: 0000000000000000 R12: ffffa306c6cf5df0
> [  105.653733] R13: ffffffff900090c0 R14: 0000000000000000 R15: 0000000000000000
> [  105.653734] FS:  00007f30ab0df700(0000) GS:ffffa30a1fd80000(0000)
> knlGS:0000000000000000
> [  105.653735] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.653736] CR2: 0000000000000000 CR3: 000000014e0d0003 CR4: 00000000003726e0
> [  105.653736] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  105.653737] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  105.653738] Call Trace:
> [  105.653738]  <TASK>
> [  105.653739]  ? fastop+0x5d/0xa0
> [  105.653741]  x86_emulate_insn+0x7c9/0xf20
> [  105.653743]  x86_emulate_instruction+0x2e3/0x790
> [  105.653746]  complete_emulated_mmio+0x238/0x310
> [  105.653748]  kvm_arch_vcpu_ioctl_run+0x11ba/0x1a70
> [  105.653750]  ? vfs_writev+0xcb/0x140
> [  105.653753]  kvm_vcpu_ioctl+0x281/0x6b0
> [  105.653755]  ? clockevents_program_event+0x98/0x100
> [  105.653757]  ? selinux_file_ioctl+0xae/0x140
> [  105.653760]  ? selinux_file_ioctl+0xae/0x140
> [  105.653762]  __x64_sys_ioctl+0x95/0xd0
> [  105.653764]  do_syscall_64+0x3b/0x90
> [  105.653767]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
> [  105.653769] RIP: 0033:0x7f30aca698f7
> [  105.653770] Code: b3 66 90 48 8b 05 a1 35 2c 00 64 c7 00 26 00 00
> 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 71 35 2c 00 f7 d8 64 89
> 01 48
> [  105.653771] RSP: 002b:00007f30ab0dea28 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  105.653772] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f30aca698f7
> [  105.653773] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000f
> [  105.653774] RBP: 000055fb7fb6faf0 R08: 000055fb7d67c450 R09: 00000000ffffffff
> [  105.653775] R10: 00000000000216a8 R11: 0000000000000246 R12: 0000000000000000
> [  105.653775] R13: 00007f30aed99000 R14: 0000000000000006 R15: 000055fb7fb6faf0
> [  105.653776]  </TASK>
> [  105.653777] Modules linked in: x86_pkg_temp_thermal
> [  105.902123] ---[ end trace cec99cae36bcbfd7 ]---
> [  105.902124] RIP: 0010:xaddw_ax_dx+0x9/0x10
> [  105.902126] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc
> cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc
> cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3
> cc cc
> [  105.902127] RSP: 0018:ffffb98bc5157ce8 EFLAGS: 00000206
> [  105.902127] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
> [  105.902128] RDX: 0000000076543210 RSI: ffffffff8ea56000 RDI: 0000000000000204
> [  105.902129] RBP: ffffb98bc5157cf0 R08: ffffa306c6cf5df0 R09: 0000000000000002
> [  105.902129] R10: ffffa306c6cf5df0 R11: 0000000000000000 R12: ffffa306c6cf5df0
> [  105.902130] R13: ffffffff900090c0 R14: 0000000000000000 R15: 0000000000000000
> [  105.902130] FS:  00007f30ab0df700(0000) GS:ffffa30a1fd80000(0000)
> knlGS:0000000000000000
> [  105.902131] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  105.902132] CR2: 0000000000000000 CR3: 000000014e0d0003 CR4: 00000000003726e0
> [  105.902133] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  105.902133] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  105.902134] Kernel panic - not syncing: Fatal exception in interrupt
> [  105.902170] Kernel Offset: 0xda00000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [  106.022663] ---[ end Kernel panic - not syncing: Fatal exception in
> interrupt ]---
> [  106.030224] ------------[ cut here ]------------
> [  106.030224] sched: Unexpected reschedule of offline CPU#0!
> [  106.030226] WARNING: CPU: 3 PID: 3747 at
> arch/x86/kernel/apic/ipi.c:68 native_smp_send_reschedule+0x3e/0x50
> [  106.030229] Modules linked in: x86_pkg_temp_thermal
> [  106.030230] CPU: 3 PID: 3747 Comm: qemu-system-x86 Tainted: G
> D           5.15.55-rc1 #1
> [  106.030231] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.0b 07/27/2017
> [  106.030232] RIP: 0010:native_smp_send_reschedule+0x3e/0x50
> [  106.030234] Code: 1b 48 8b 05 d4 70 a6 01 be fd 00 00 00 48 8b 40
> 30 e8 66 dc 31 01 5d c3 cc cc cc cc 89 fe 48 c7 c7 a0 e9 41 90 e8 1e
> c3 ea 00 <0f> 0b 5d c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00
> [  106.030234] RSP: 0018:ffffb98bc0160c60 EFLAGS: 00010086
> [  106.030235] RAX: 0000000000000000 RBX: ffffa30a1fc29b00 RCX: 0000000000000027
> [  106.030236] RDX: ffffa30a1fd9b4b8 RSI: 0000000000000001 RDI: ffffa30a1fd9b4b0
> [  106.030237] RBP: ffffb98bc0160c60 R08: ffffffff90b65665 R09: 0000000000000000
> [  106.030237] R10: 0000000000000030 R11: ffffffff90b65665 R12: ffffa306c097c100
> [  106.030238] R13: ffffb98bc0160d00 R14: ffffb98bc0160d00 R15: 0000000000000009
> [  106.030239] FS:  00007f30ab0df700(0000) GS:ffffa30a1fd80000(0000)
> knlGS:0000000000000000
> [  106.030239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  106.030240] CR2: 0000000000000000 CR3: 000000014e0d0003 CR4: 00000000003726e0
> [  106.030241] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  106.030241] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  106.030242] Call Trace:
> [  106.030242]  <IRQ>
> [  106.030243]  resched_curr+0x52/0xb0
> [  106.030245]  check_preempt_curr+0x3b/0x70
> [  106.030247]  ttwu_do_wakeup+0x1c/0x160
> [  106.030249]  ttwu_do_activate+0x94/0x190
> [  106.030251]  try_to_wake_up+0x1c4/0x480
> [  106.030253]  default_wake_function+0x1a/0x40
> [  106.030254]  autoremove_wake_function+0x12/0x40
> [  106.030256]  __wake_up_common+0x7d/0x140
> [  106.030258]  __wake_up_common_lock+0x7c/0xc0
> [  106.030261]  __wake_up+0x13/0x20
> [  106.030263]  wake_up_klogd_work_func+0x7b/0x90
> [  106.030265]  irq_work_single+0x46/0x80
> [  106.030267]  irq_work_run_list+0x2a/0x40
> [  106.030269]  irq_work_tick+0x3b/0x50
> [  106.030270]  update_process_times+0xba/0xd0
> [  106.030272]  tick_sched_handle+0x38/0x50
> [  106.030274]  tick_sched_timer+0x8c/0xc0
> [  106.030276]  ? can_stop_idle_tick+0xa0/0xa0
> [  106.030278]  __hrtimer_run_queues+0xa6/0x2b0
> [  106.030280]  hrtimer_interrupt+0x101/0x220
> [  106.030281]  __sysvec_apic_timer_interrupt+0x61/0xe0
> [  106.030283]  sysvec_apic_timer_interrupt+0x7b/0x90
> [  106.030285]  </IRQ>
> [  106.030285]  <TASK>
> [  106.030286]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  106.030288] RIP: 0010:panic+0x277/0x2b6
> [  106.030290] Code: e8 65 ac 25 ff 48 c7 c6 80 04 b5 90 48 c7 c7 80
> 3f 42 90 e8 6a 58 00 00 c7 05 cc 8e f5 00 01 00 00 00 e8 d3 80 33 ff
> fb 31 db <4c> 39 eb 7c 1d 41 83 f4 01 48 8b 05 a0 d2 1b 01 44 89 e7 e8
> 18 16
> [  106.030291] RSP: 0018:ffffb98bc5157b10 EFLAGS: 00000246
> [  106.030292] RAX: ffffb98bc5157b80 RBX: 0000000000000000 RCX: 0000000000000027
> [  106.030292] RDX: 0000000000000000 RSI: ffffffff8f98de8f RDI: ffffffff8f99314d
> [  106.030293] RBP: ffffb98bc5157b80 R08: ffffffff90b655fa R09: 0000000090b655d6
> [  106.030293] R10: ffffffffffffffff R11: ffffffffffffffff R12: 0000000000000000
> [  106.030294] R13: 0000000000000000 R14: ffffffff904131e0 R15: 0000000000000000
> [  106.030295]  ? oops_end.cold+0xc/0x18
> [  106.030297]  ? panic+0x274/0x2b6
> [  106.030299]  oops_end.cold+0xc/0x18
> [  106.030300]  die+0x43/0x60
> [  106.030302]  exc_int3+0x137/0x160
> [  106.030303]  asm_exc_int3+0x39/0x40
> [  106.030305] RIP: 0010:xaddw_ax_dx+0x9/0x10
> [  106.030307] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc
> cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc
> cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3
> cc cc
> [  106.030308] RSP: 0018:ffffb98bc5157ce8 EFLAGS: 00000206
> [  106.030308] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
> [  106.030309] RDX: 0000000076543210 RSI: ffffffff8ea56000 RDI: 0000000000000204
> [  106.030310] RBP: ffffb98bc5157cf0 R08: ffffa306c6cf5df0 R09: 0000000000000002
> [  106.030310] R10: ffffa306c6cf5df0 R11: 0000000000000000 R12: ffffa306c6cf5df0
> [  106.030311] R13: ffffffff900090c0 R14: 0000000000000000 R15: 0000000000000000
> [  106.030312]  ? xaddw_ax_dx+0x8/0x10
> [  106.030314]  ? xaddw_ax_dx+0x9/0x10
> [  106.030315]  ? xaddw_ax_dx+0x8/0x10
> [  106.030317]  ? fastop+0x5d/0xa0
> [  106.030319]  x86_emulate_insn+0x7c9/0xf20
> [  106.030321]  x86_emulate_instruction+0x2e3/0x790
> [  106.030323]  complete_emulated_mmio+0x238/0x310
> [  106.030325]  kvm_arch_vcpu_ioctl_run+0x11ba/0x1a70
> [  106.030327]  ? vfs_writev+0xcb/0x140
> [  106.030330]  kvm_vcpu_ioctl+0x281/0x6b0
> [  106.030331]  ? clockevents_program_event+0x98/0x100
> [  106.030333]  ? selinux_file_ioctl+0xae/0x140
> [  106.030335]  ? selinux_file_ioctl+0xae/0x140
> [  106.030337]  __x64_sys_ioctl+0x95/0xd0
> [  106.030339]  do_syscall_64+0x3b/0x90
> [  106.030340]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
> [  106.030342] RIP: 0033:0x7f30aca698f7
> [  106.030343] Code: b3 66 90 48 8b 05 a1 35 2c 00 64 c7 00 26 00 00
> 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 71 35 2c 00 f7 d8 64 89
> 01 48
> [  106.030343] RSP: 002b:00007f30ab0dea28 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  106.030344] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f30aca698f7
> [  106.030345] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000f
> [  106.030346] RBP: 000055fb7fb6faf0 R08: 000055fb7d67c450 R09: 00000000ffffffff
> [  106.030346] R10: 00000000000216a8 R11: 0000000000000246 R12: 0000000000000000
> [  106.030347] R13: 00007f30aed99000 R14: 0000000000000006 R15: 000055fb7fb6faf0
> [  106.030348]  </TASK>
> [  106.030348] ---[ end trace cec99cae36bcbfd8 ]---
> [  106.030350] ------------[ cut here ]------------
> [  106.030350] sched: Unexpected reschedule of offline CPU#2!
> [  106.030351] WARNING: CPU: 3 PID: 3747 at
> arch/x86/kernel/apic/ipi.c:68 native_smp_send_reschedule+0x3e/0x50
> [  106.030354] Modules linked in: x86_pkg_temp_thermal
> [  106.030354] CPU: 3 PID: 3747 Comm: qemu-system-x86 Tainted: G
> D W         5.15.55-rc1 #1
> [  106.030355] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.0b 07/27/2017
> [  106.030356] RIP: 0010:native_smp_send_reschedule+0x3e/0x50
> [  106.030357] Code: 1b 48 8b 05 d4 70 a6 01 be fd 00 00 00 48 8b 40
> 30 e8 66 dc 31 01 5d c3 cc cc cc cc 89 fe 48 c7 c7 a0 e9 41 90 e8 1e
> c3 ea 00 <0f> 0b 5d c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 44 00
> [  106.030358] RSP: 0018:ffffb98bc0160b40 EFLAGS: 00010086
> [  106.030359] RAX: 0000000000000000 RBX: ffffa30a1fd29b00 RCX: 0000000000000027
> [  106.030360] RDX: ffffa30a1fd9b4b8 RSI: 0000000000000001 RDI: ffffa30a1fd9b4b0
> [  106.030360] RBP: ffffb98bc0160b40 R08: ffffffff90b66d1d R09: 0000000000000000
> [  106.030361] R10: 0000000000000030 R11: ffffffff90b66d1d R12: ffffa306c2216180
> [  106.030361] R13: ffffb98bc0160be0 R14: ffffb98bc0160be0 R15: 0000000000000009
> [  106.030362] FS:  00007f30ab0df700(0000) GS:ffffa30a1fd80000(0000)
> knlGS:0000000000000000
> [  106.030363] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  106.030364] CR2: 0000000000000000 CR3: 000000014e0d0003 CR4: 00000000003726e0
> [  106.030364] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  106.030365] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  106.030365] Call Trace:
> [  106.030366]  <IRQ>
> [  106.030366]  resched_curr+0x52/0xb0
> [  106.030368]  check_preempt_curr+0x3b/0x70
> [  106.030369]  ttwu_do_wakeup+0x1c/0x160
> [  106.030371]  ttwu_do_activate+0x94/0x190
> [  106.030373]  try_to_wake_up+0x1c4/0x480
> [  106.030375]  default_wake_function+0x1a/0x40
> [  106.030377]  autoremove_wake_function+0x12/0x40
> [  106.030378]  __wake_up_common+0x7d/0x140
> [  106.030380]  __wake_up_common_lock+0x7c/0xc0
> [  106.030382]  __wake_up+0x13/0x20
> [  106.030384]  ep_poll_callback+0x10e/0x290
> [  106.030386]  __wake_up_common+0x7d/0x140
> [  106.030389]  __wake_up_common_lock+0x7c/0xc0
> [  106.030391]  __wake_up+0x13/0x20
> [  106.030393]  wake_up_klogd_work_func+0x7b/0x90
> [  106.030395]  irq_work_single+0x46/0x80
> [  106.030397]  irq_work_run_list+0x2a/0x40
> [  106.030398]  irq_work_tick+0x3b/0x50
> [  106.030400]  update_process_times+0xba/0xd0
> [  106.030401]  tick_sched_handle+0x38/0x50
> [  106.030403]  tick_sched_timer+0x8c/0xc0
> [  106.030405]  ? can_stop_idle_tick+0xa0/0xa0
> [  106.030407]  __hrtimer_run_queues+0xa6/0x2b0
> [  106.030408]  hrtimer_interrupt+0x101/0x220
> [  106.030410]  __sysvec_apic_timer_interrupt+0x61/0xe0
> [  106.030411]  sysvec_apic_timer_interrupt+0x7b/0x90
> [  106.030413]  </IRQ>
> [  106.030413]  <TASK>
> [  106.030414]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  106.030416] RIP: 0010:panic+0x277/0x2b6
> [  106.030417] Code: e8 65 ac 25 ff 48 c7 c6 80 04 b5 90 48 c7 c7 80
> 3f 42 90 e8 6a 58 00 00 c7 05 cc 8e f5 00 01 00 00 00 e8 d3 80 33 ff
> fb 31 db <4c> 39 eb 7c 1d 41 83 f4 01 48 8b 05 a0 d2 1b 01 44 89 e7 e8
> 18 16
> [  106.030418] RSP: 0018:ffffb98bc5157b10 EFLAGS: 00000246
> [  106.030419] RAX: ffffb98bc5157b80 RBX: 0000000000000000 RCX: 0000000000000027
> [  106.030419] RDX: 0000000000000000 RSI: ffffffff8f98de8f RDI: ffffffff8f99314d
> [  106.030420] RBP: ffffb98bc5157b80 R08: ffffffff90b655fa R09: 0000000090b655d6
> [  106.030420] R10: ffffffffffffffff R11: ffffffffffffffff R12: 0000000000000000
> [  106.030421] R13: 0000000000000000 R14: ffffffff904131e0 R15: 0000000000000000
> [  106.030422]  ? oops_end.cold+0xc/0x18
> [  106.030423]  ? panic+0x274/0x2b6
> [  106.030425]  oops_end.cold+0xc/0x18
> [  106.030427]  die+0x43/0x60
> [  106.030428]  exc_int3+0x137/0x160
> [  106.030430]  asm_exc_int3+0x39/0x40
> [  106.030431] RIP: 0010:xaddw_ax_dx+0x9/0x10
> [  106.030433] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc
> cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc
> cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3
> cc cc
> [  106.030434] RSP: 0018:ffffb98bc5157ce8 EFLAGS: 00000206
> [  106.030434] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000
> [  106.030435] RDX: 0000000076543210 RSI: ffffffff8ea56000 RDI: 0000000000000204
> [  106.030436] RBP: ffffb98bc5157cf0 R08: ffffa306c6cf5df0 R09: 0000000000000002
> [  106.030436] R10: ffffa306c6cf5df0 R11: 0000000000000000 R12: ffffa306c6cf5df0
> [  106.030437] R13: ffffffff900090c0 R14: 0000000000000000 R15: 0000000000000000
> [  106.030437]  ? xaddw_ax_dx+0x8/0x10
> [  106.030439]  ? xaddw_ax_dx+0x9/0x10
> [  106.030441]  ? xaddw_ax_dx+0x8/0x10
> [  106.030443]  ? fastop+0x5d/0xa0
> [  106.030445]  x86_emulate_insn+0x7c9/0xf20
> [  106.030446]  x86_emulate_instruction+0x2e3/0x790
> [  106.030449]  complete_emulated_mmio+0x238/0x310
> [  106.030450]  kvm_arch_vcpu_ioctl_run+0x11ba/0x1a70
> [  106.030453]  ? vfs_writev+0xcb/0x140
> [  106.030455]  kvm_vcpu_ioctl+0x281/0x6b0
> [  106.030456]  ? clockevents_program_event+0x98/0x100
> [  106.030458]  ? selinux_file_ioctl+0xae/0x140
> [  106.030460]  ? selinux_file_ioctl+0xae/0x140
> [  106.030462]  __x64_sys_ioctl+0x95/0xd0
> [  106.030464]  do_syscall_64+0x3b/0x90
> [  106.030465]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
> [  106.030467] RIP: 0033:0x7f30aca698f7
> [  106.030468] Code: b3 66 90 48 8b 05 a1 35 2c 00 64 c7 00 26 00 00
> 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 71 35 2c 00 f7 d8 64 89
> 01 48
> [  106.030468] RSP: 002b:00007f30ab0dea28 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  106.030469] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f30aca698f7
> [  106.030470] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000f
> [  106.030470] RBP: 000055fb7fb6faf0 R08: 000055fb7d67c450 R09: 00000000ffffffff
> [  106.030471] R10: 00000000000216a8 R11: 0000000000000246 R12: 0000000000000000
> [  106.030472] R13: 00007f30aed99000 R14: 0000000000000006 R15: 000055fb7fb6faf0
> [  106.030473]  </TASK>
> [  106.030473] ---[ end trace cec99cae36bcbfd9 ]---
>
> 2. Kernel warning
>     - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15.54-79-g0fe4fdf9b1da/testrun/10799646/suite/log-parser-test/tests/
> <6>[    0.571674] Speculative Store Bypass: Vulnerable
> <6>[    0.573329] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
> <4>[    0.942716] ------------[ cut here ]------------
> <4>[    0.944263] WARNING: CPU: 0 PID: 0 at
> arch/x86/kernel/alternative.c:557 apply_returns+0x19c/0x1d0
> <4>[    0.945106] Modules linked in:
> <4>[    0.946322] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.55-rc1 #1
> <4>[    0.947052] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.14.0-2 04/01/2014
> <4>[    0.947844] RIP: 0010:apply_returns+0x19c/0x1d0
> <4>[    0.948453] Code: 8d 7d c8 4c 89 75 c1 4c 89 74 08 f8 48 29 f8
> 8d 0c 02 4c 89 f0 c1 e9 03 f3 48 ab 8b 05 9d ba 52 02 85 c0 74 a0 e9
> 45 32 1a 01 <0f> 0b 48 83 c3 04 49 39 dd 0f 87 96 fe ff ff e9 0f ff ff
> ff c7 45
> <4>[    0.950059] RSP: 0000:ffffffff89803d78 EFLAGS: 00000206
> <4>[    0.950484] RAX: 0000000000000000 RBX: ffffffff89dbb174 RCX:
> 0000000000000000
> <4>[    0.951033] RDX: ffffffff891011b5 RSI: 00000000000000e9 RDI:
> ffffffff891011b0
> <4>[    0.951488] RBP: ffffffff89803e40 R08: 0000000000000000 R09:
> 0000000000000001
> <4>[    0.952032] R10: 0000000000000029 R11: 0000000000000000 R12:
> ffffffff891011b0
> <4>[    0.952488] R13: ffffffff89ddb184 R14: cccccccccccccccc R15:
> ffffffff891011b5
> <4>[    0.953125] FS:  0000000000000000(0000)
> GS:ffff90f63fc00000(0000) knlGS:0000000000000000
> <4>[    0.953634] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[    0.954031] CR2: ffff90f63fa01000 CR3: 000000003e826000 CR4:
> 00000000000006f0
> <4>[    0.954669] Call Trace:
> <4>[    0.955846]  <TASK>
> <4>[    0.957183]  alternative_instructions+0x7d/0x146
> <4>[    0.957673]  check_bugs+0xf16/0xf57
> <4>[    0.958114]  start_kernel+0x6b6/0x6ef
> <4>[    0.958446]  x86_64_start_reservations+0x24/0x2a
> <4>[    0.958764]  x86_64_start_kernel+0x8d/0x95
> <4>[    0.959035]  secondary_startup_64_no_verify+0xc2/0xcb
> <4>[    0.959591]  </TASK>
> <4>[    0.960146] irq event stamp: 121171
> <4>[    0.960419] hardirqs last  enabled at (121179):
> [<ffffffff87a7247c>] __up_console_sem+0x5c/0x70
> <4>[    0.961036] hardirqs last disabled at (121188):
> [<ffffffff87a72461>] __up_console_sem+0x41/0x70
> <4>[    0.961577] softirqs last  enabled at (1484):
> [<ffffffff87ad3f20>] cgroup_idr_alloc.constprop.0+0x60/0xe0
> <4>[    0.962031] softirqs last disabled at (1482):
> [<ffffffff87ad3ef3>] cgroup_idr_alloc.constprop.0+0x33/0xe0
> <4>[    0.963358] ---[ end trace 7af0f35d34a8be8b ]---
> <6>[    1.100332] Freeing SMP alternatives memory: 52K
> <6>[    1.233988] smpboot: CPU0: Intel Core i7 9xx (Nehalem Class Core
> i7) (family: 0x6, model: 0x1a, stepping: 0x3)
> <6>[    1.266761] Running RCU-tasks wait API self tests
> <6>[    1.271070] Performance Events: unsupported p6 CPU model 26 no
> PMU driver, software events only.
> <6>[    1.276776] rcu: Hierarchical SRCU implementation.
> <6>[    1.281153] Callback from call_rcu_ta#
> sks_trace() invoked.
> <6>[    1.308215] smp: Bringing up secondary CPUs ...
> <6>[    1.322732] x86: Booting SMP configuration:
> <6>[    1.323121] .... node  #0, CPUs:      #1
> <6>[    1.361239] smp: Brought up 1 node, 2 CPUs
>
>
> ## Build
> * kernel: 5.15.55-rc1
> * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
> * git branch: linux-5.15.y
> * git commit: 0fe4fdf9b1dac90e23465eefeff45e529adcf3c6
> * git describe: v5.15.54-79-g0fe4fdf9b1da
> * test details:
> https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15.54-79-g0fe4fdf9b1da
>
> ## Test Regressions (compared to v5.15.53-227-g71721f5974f2)
> kernel crash regression found.
>
> ## Metric Regressions (compared to v5.15.53-227-g71721f5974f2)
> No metric regressions found.
>
> ## Test Fixes (compared to v5.15.53-227-g71721f5974f2)
> No test fixes found.
>
> ## Metric Fixes (compared to v5.15.53-227-g71721f5974f2)
> No metric fixes found.
>
> ## Test result summary
> total: 137556, pass: 124337, fail: 404, skip: 12121, xfail: 694
>
> ## Build Summary
> * arc: 10 total, 10 passed, 0 failed
> * arm: 308 total, 308 passed, 0 failed
> * arm64: 62 total, 62 passed, 0 failed
> * i386: 52 total, 49 passed, 3 failed
> * mips: 48 total, 48 passed, 0 failed
> * parisc: 12 total, 12 passed, 0 failed
> * powerpc: 54 total, 52 passed, 2 failed
> * riscv: 22 total, 22 passed, 0 failed
> * s390: 21 total, 21 passed, 0 failed
> * sh: 24 total, 24 passed, 0 failed
> * sparc: 12 total, 12 passed, 0 failed
> * x86_64: 56 total, 55 passed, 1 failed
>
> ## Test suites summary
> * fwts
> * kunit
> * kvm-unit-tests
> * libgpiod
> * libhugetlbfs
> * log-parser-boot
> * log-parser-test
> * ltp-cap_bounds
> * ltp-commands
> * ltp-containers
> * ltp-controllers
> * ltp-cpuhotplug
> * ltp-crypto
> * ltp-cve
> * ltp-dio
> * ltp-fcntl-locktests
> * ltp-filecaps
> * ltp-fs
> * ltp-fs_bind
> * ltp-fs_perms_simple
> * ltp-fsx
> * ltp-hugetlb
> * ltp-io
> * ltp-ipc
> * ltp-math
> * ltp-mm
> * ltp-nptl
> * ltp-open-posix-tests
> * ltp-pty
> * ltp-sched
> * ltp-securebits
> * ltp-smoke
> * ltp-syscalls
> * ltp-tracing
> * network-basic-tests
> * packetdrill
> * perf
> * perf/Zstd-perf.data-compression
> * rcutorture
> * ssuite
> * v4l2-compliance
> * vdso
>
> --
> Linaro LKFT
> https://lkft.linaro.org
>


2022-07-14 10:16:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, Jul 14, 2022 at 12:50:10PM +0300, Maxim Levitsky wrote:
> On Wed, 2022-07-13 at 18:22 +0530, Naresh Kamboju wrote:
> > On Wed, 13 Jul 2022 at 00:17, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > This is the start of the stable review cycle for the 5.15.55 release.
> > > There are 78 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> > > Anything received after that time might be too late.
> > >
> > > The whole patch series can be found in one patch at:
> > >         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
> > > or in the git tree and branch at:
> > >         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > > and the diffstat can be found below.
> > >
> > > thanks,
> > >
> > > greg k-h
> >
> > Results from Linaro’s test farm.
> > Regressions on x86_64.
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> >
> > 1) Kernel panic noticed on device x86_6 while running kvm-unit-tests.
> >    - APIC base relocation is unsupported by KVM
>
> My 0.2 cent:
>
> APIC base relocation warning is harmless, and I removed it 5.19 kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.19-rc6&id=3743c2f0251743b8ae968329708bbbeefff244cf

Nice, but doesn't look relevant for stable trees.

> The 'emulating exchange as write' is also something that KVM unit tests trigger
> normally although this warning recently did signal a real and very nasty bug, which I fixed in this commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.19-rc6&id=33fbe6befa622c082f7d417896832856814bdde0

Already in the 5.18.2 release, doesn't look all that relevant for 5.15,
odd that it is showing up on 5.15.

thanks,

greg k-h

2022-07-14 11:10:43

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, 2022-07-14 at 12:04 +0200, Greg Kroah-Hartman wrote:
> On Thu, Jul 14, 2022 at 12:50:10PM +0300, Maxim Levitsky wrote:
> > On Wed, 2022-07-13 at 18:22 +0530, Naresh Kamboju wrote:
> > > On Wed, 13 Jul 2022 at 00:17, Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > > > This is the start of the stable review cycle for the 5.15.55 release.
> > > > There are 78 patches in this series, all will be posted as a response
> > > > to this one.  If anyone has any issues with these being applied, please
> > > > let me know.
> > > >
> > > > Responses should be made by Thu, 14 Jul 2022 18:32:19 +0000.
> > > > Anything received after that time might be too late.
> > > >
> > > > The whole patch series can be found in one patch at:
> > > >         https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.55-rc1.gz
> > > > or in the git tree and branch at:
> > > >         git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
> > > > and the diffstat can be found below.
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > > Results from Linaro’s test farm.
> > > Regressions on x86_64.
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > >
> > > 1) Kernel panic noticed on device x86_6 while running kvm-unit-tests.
> > >    - APIC base relocation is unsupported by KVM
> >
> > My 0.2 cent:
> >
> > APIC base relocation warning is harmless, and I removed it 5.19 kernel:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.19-rc6&id=3743c2f0251743b8ae968329708bbbeefff244cf
>
> Nice, but doesn't look relevant for stable trees.
>
> > The 'emulating exchange as write' is also something that KVM unit tests trigger
> > normally although this warning recently did signal a real and very nasty bug, which I fixed in this commit:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.19-rc6&id=33fbe6befa622c082f7d417896832856814bdde0
>
> Already in the 5.18.2 release, doesn't look all that relevant for 5.15,
> odd that it is showing up on 5.15.

Yep, I also think so - I just wanted to point out the source of these warnings.

Best regards,
Maxim Levitsky

>
> thanks,
>
> greg k-h
>


2022-07-14 14:13:25

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On 7/14/22 11:01, Borislav Petkov wrote:
> On Wed, Jul 13, 2022 at 11:40:03AM -0700, Linus Torvalds wrote:
>> And I see that Thadeau already figured it out:
>>
>> https://lore.kernel.org/all/[email protected]/
>>
>> So presumably we need that patch everywhere.
> Right, I've queued it along with other fallout fixes. Will do some
> testing before I send them to you on Sunday.
>
> I'm guessing you're thinking of cutting an -rc7 so that people can test
> the whole retbleed mitigation disaster an additional week?

Please leave that one out as Peter suggested a better fix and I have
that queued for Linus.

(If you don't no big deal, the conflict will be very clear, but it will
be a bit more work for everyone).

Paolo

2022-07-14 14:54:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On July 14, 2022 1:46:53 PM UTC, Paolo Bonzini <[email protected]> wrote:
>Please leave that one out as Peter suggested a better fix and I have that queued for Linus.

Already zapped.

Thx.

--
Sent from a small device: formatting sux and brevity is inevitable.

2022-07-14 17:06:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, Jul 14, 2022 at 09:51:40AM -0700, Linus Torvalds wrote:
> Oh, absolutely. Doing an -rc7 is normal.

Good. I'm gathering all the fallout fixes and will send them to you on
Sunday, if nothing unexpected happens.

> Right now the question isn't whether an rc7 happens, but whether we'll
> need an rc8. We'll see.

Right, we'll see what additional fallout happens next week. I'll try to
Cc you on such reports so that you're aware.

> Oh, I do hate the hw-embargoed stuff that doesn't get all the usual
> testing in all our automation.

Tell me about it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-07-14 17:07:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, Jul 14, 2022 at 7:46 AM Boris Petkov <[email protected]> wrote:
>
> On July 14, 2022 1:46:53 PM UTC, Paolo Bonzini <[email protected]> wrote:
> >Please leave that one out as Peter suggested a better fix and I have that queued for Linus.
>
> Already zapped.

I like Peter's more obvious use of FASTYOP_LENGTH, but this is just disgusting:

#define FASTOP_SIZE (8 << ((FASTOP_LENGTH > 8) & 1) <<
((FASTOP_LENGTH > 16) & 1))

I mean, I understand what it's doing, but just two lines above it the
code has a "ilog2()" use that already depends on the fact that you can
use ilog2() as a constant compile-time expression.

And guess what? The code could just use roundup_pow_of_two(), which is
designed exactly like ilog2() to be used for compile-time constant
values.

So the code should just use

#define FASTOP_SIZE roundup_pow_of_two(FASTOP_LENGTH)

and be a lot more legible, wouldn't it?

Because I don't think there is anything magical about the length
"8/16/32". It's purely "aligned and big enough to contain
FASTOP_LENGTH".

And then the point of that

static_assert(FASTOP_LENGTH <= FASTOP_SIZE);

just goes away, because there are no subtle math issues there any more.

In fact, the remaining question is just "where did the 7 come from" in

#define FASTOP_LENGTH (7 + ENDBR_INSN_SIZE + RET_LENGTH)

because other than that it all looks fairly straightforward.

Linus

2022-07-14 17:22:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On 7/14/22 19:02, Linus Torvalds wrote:
> And guess what? The code could just use roundup_pow_of_two(), which is
> designed exactly like ilog2() to be used for compile-time constant
> values.
>
> So the code should just use
>
> #define FASTOP_SIZE roundup_pow_of_two(FASTOP_LENGTH)
>
> and be a lot more legible, wouldn't it?
>
> Because I don't think there is anything magical about the length
> "8/16/32". It's purely "aligned and big enough to contain
> FASTOP_LENGTH".

roundup_pow_of_two unfortunately is not enough for stringizing
FASTOP_SIZE into an asm statement. :(

#define __FOP_FUNC(name) \
".align " __stringify(FASTOP_SIZE) " \n\t" \
".type " name ", @function \n\t" \
name ":\n\t" \
ASM_ENDBR

The shifts are what we came up with for the SETCC thunks when ENDBR and
SLS made them grew beyond 4 bytes; Peter's patch is reusing the trick
for the fastop thunks.

> Because I don't think there is anything magical about the length
> "8/16/32". It's purely "aligned and big enough to contain
> FASTOP_LENGTH".

I agree with that, it's only limited to 8/16/32 to keep the macro to a
decent size.

> And then the point of that
>
> static_assert(FASTOP_LENGTH <= FASTOP_SIZE);
>
> just goes away, because there are no subtle math issues there any more.
>
> In fact, the remaining question is just "where did the 7 come from" in
>
> #define FASTOP_LENGTH (7 + ENDBR_INSN_SIZE + RET_LENGTH)

The 7 is an upper limit to the length of the code between endbr and ret.
There's no particular reason to limit to 7, but it allows using an
alignment of 8 in the smallest case (no thunks, no SLS, no endbr) where
you just have ".align 8; ...; ret".

Paolo

2022-07-14 17:35:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, Jul 14, 2022 at 2:02 AM Borislav Petkov <[email protected]> wrote:
>
> I'm guessing you're thinking of cutting an -rc7 so that people can test
> the whole retbleed mitigation disaster an additional week?

Oh, absolutely. Doing an -rc7 is normal.

Right now the question isn't whether an rc7 happens, but whether we'll
need an rc8. We'll see.

Oh, I do hate the hw-embargoed stuff that doesn't get all the usual
testing in all our automation.

Linus

2022-07-14 18:05:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, Jul 14, 2022 at 10:02:57AM -0700, Linus Torvalds wrote:

> I like Peter's more obvious use of FASTYOP_LENGTH, but this is just disgusting:
>
> #define FASTOP_SIZE (8 << ((FASTOP_LENGTH > 8) & 1) <<
> ((FASTOP_LENGTH > 16) & 1))
>
> I mean, I understand what it's doing, but just two lines above it the
> code has a "ilog2()" use that already depends on the fact that you can
> use ilog2() as a constant compile-time expression.
>
> And guess what? The code could just use roundup_pow_of_two(), which is
> designed exactly like ilog2() to be used for compile-time constant
> values.

But NR_FASTOP isn't used in ASM.

> So the code should just use
>
> #define FASTOP_SIZE roundup_pow_of_two(FASTOP_LENGTH)
>
> and be a lot more legible, wouldn't it?

If only :/ FASTOP_SIZE is used in ASM, which means we've got to play by
GNU-as rules, and them are aweful.

2022-07-14 18:45:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, Jul 14, 2022 at 10:22 AM Peter Zijlstra <[email protected]> wrote:
>
> If only :/ FASTOP_SIZE is used in ASM, which means we've got to play by
> GNU-as rules, and they are awful.

Oh Gods. Yes they are.

Linus

2022-07-14 18:48:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

Oh, btw, how bad would it be to just do

#define FASTOP_SIZE 16
static_assert(FASTOP_SIZE >= FASTOP_LENGTH)

and leave it at that?

Afaik both gcc and clang default to -falign-functions=16 *anyway*, and
while on 32-bit x86 we have options to minimize alignment, we don't do
that on x86-64 afaik.

In fact, we have an option to force *bigger* alignment
(DEBUG_FORCE_FUNCTION_ALIGN_64B) but not any way to make it less.

And we use

.p2align 4

in most of our asm, aling with

#define __ALIGN .p2align 4, 0x90

So all the *normal* functions already get 16-byte alignment anyway.

So yeah, it would be less dense, but do we care? Wouldn't the "this is
really simple" be a nice thing? It's not like there are a ton of those
fastop functions anyway. 128 of them? Plus 16 of the "setCC" ones?

Linus

2022-07-14 19:38:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On July 14, 2022 6:16:40 PM UTC, Linus Torvalds <[email protected]> wrote:
>So yeah, it would be less dense, but do we care? Wouldn't the "this is
>really simple" be a nice thing? It's not like there are a ton of those
>fastop functions anyway. 128 of them? Plus 16 of the "setCC" ones?

I definitely like simple.

Along with a comment why we have this magic 16 there.

--
Sent from a small device: formatting sux and brevity is inevitable.

2022-07-14 20:58:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, Jul 14, 2022 at 10:02 AM Borislav Petkov <[email protected]> wrote:
>
> On Thu, Jul 14, 2022 at 09:51:40AM -0700, Linus Torvalds wrote:
> > Oh, absolutely. Doing an -rc7 is normal.
>
> Good. I'm gathering all the fallout fixes and will send them to you on
> Sunday, if nothing unexpected happens.

Btw, I assume that includes the clang fix for the
x86_spec_ctrl_current section attribute.

That's kind of personally embarrassing that it slipped through: I do
all my normal test builds that I actually *boot* with clang.

But since I kept all of the embargoed stuff outside my normal trees,
it also meant that the test builds I did didn't have my "this is my
clang tree" stuff in it.

And so I - like apparently everybody else - only did those builds with gcc.

And gcc for some reason doesn't care about this whole "you redeclared
that variable with a different attribute" thing.

And sadly, our percpu accessor functions don't verify these things
either, so you can write code like this:

unsigned long myvariable;

unsigned long test_fn(void)
{
return this_cpu_read(myvariable);
}

and the compiler will not complain about anything at all, and happily
generate completely nonsensical code like

movq %gs:myvariable(%rip), %rax

for it, which will do entirely the wrong thing because 'myvariable'
wasn't allocated in the percpu section.

In the 'x86_spec_ctrl_current' case, that nonsensical code _worked_
(with gcc), because despite the declaration being for a regular
variable, the actual definition was in the proper segment.

But that 'myvariable' thing above does end up being another example of
how we are clearly missing some type checkng in this area.

I'm not sure if there's any way to get that section mismatch at
compile-time at all. For the static declarations, we could just make
DECLARE_PER_CPU() add some prefix/postfix to the name (and obviously
then do it at use time too).

We have that '__pcpu_scope_##name' thing to make sure of globally
unique naming due to the whole weak type thing. I wonder if we could
do something similar to verify that "yes, this has been declared as a
percpu variable" at use time?

Linus

2022-07-15 11:45:54

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On 7/14/22 20:16, Linus Torvalds wrote:
> Oh, btw, how bad would it be to just do
>
> #define FASTOP_SIZE 16
> static_assert(FASTOP_SIZE >= FASTOP_LENGTH)

Yeah, for 32 I might have some (probably irrational) qualms, but 16 is
not worth the trouble.

Given 3 bytes for ENDBR, 5 for the return thunk and 1 for the straight
line speculation INT3, there are 7 bytes left and only 4 are currently
used (for instructions encoded as "66 0f xx xx"). So FASTOP_SIZE at
SETCC_ALIGN can indeed be 16 unconditionally.

Paolo

2022-07-15 14:21:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 5.15 00/78] 5.15.55-rc1 review

On Thu, Jul 14, 2022 at 01:39:25PM -0700, Linus Torvalds wrote:
> On Thu, Jul 14, 2022 at 10:02 AM Borislav Petkov <[email protected]> wrote:
> >
> > On Thu, Jul 14, 2022 at 09:51:40AM -0700, Linus Torvalds wrote:
> > > Oh, absolutely. Doing an -rc7 is normal.
> >
> > Good. I'm gathering all the fallout fixes and will send them to you on
> > Sunday, if nothing unexpected happens.
>
> Btw, I assume that includes the clang fix for the
> x86_spec_ctrl_current section attribute.

Yap. Here's the current lineup:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/urgent

> That's kind of personally embarrassing that it slipped through: I do
> all my normal test builds that I actually *boot* with clang.
>
> But since I kept all of the embargoed stuff outside my normal trees,
> it also meant that the test builds I did didn't have my "this is my
> clang tree" stuff in it.
>
> And so I - like apparently everybody else - only did those builds with gcc.
>
> And gcc for some reason doesn't care about this whole "you redeclared
> that variable with a different attribute" thing.

... so why does clang care? Or, why doesn't gcc care?

I guess I need to talk to gcc folks again.

> In the 'x86_spec_ctrl_current' case, that nonsensical code _worked_
> (with gcc), because despite the declaration being for a regular
> variable, the actual definition was in the proper segment.

I'm guessing this is the reason why gcc doesn't fail - it probably looks
at the declaration but doesn't care too much about it. And it is the
definition that matters.

While clang goes, uh, ah, declaration and definition mismatch, I better
warn.

> But that 'myvariable' thing above does end up being another example of
> how we are clearly missing some type checkng in this area.
>
> I'm not sure if there's any way to get that section mismatch at
> compile-time at all.

Well, apparently, clang can:

arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection]

so there's a -Wsection warning which gcc could implement too.

> For the static declarations, we could just make DECLARE_PER_CPU() add
> some prefix/postfix to the name (and obviously then do it at use time
> too).
>
> We have that '__pcpu_scope_##name' thing to make sure of globally
> unique naming due to the whole weak type thing. I wonder if we could
> do something similar to verify that "yes, this has been declared as a
> percpu variable" at use time?

But how?

We need to save the info how a var has been declared and then use that
info at access time.

Yeah, lemme bother compiler guys a bit...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette