2023-09-14 11:11:15

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 00/38] x86: enable FRED for x86-64

This patch set enables the Intel flexible return and event delivery
(FRED) architecture for x86-64.

The FRED architecture defines simple new transitions that change
privilege level (ring transitions). The FRED architecture was
designed with the following goals:

1) Improve overall performance and response time by replacing event
delivery through the interrupt descriptor table (IDT event
delivery) and event return by the IRET instruction with lower
latency transitions.

2) Improve software robustness by ensuring that event delivery
establishes the full supervisor context and that event return
establishes the full user context.

The new transitions defined by the FRED architecture are FRED event
delivery and, for returning from events, two FRED return instructions.
FRED event delivery can effect a transition from ring 3 to ring 0, but
it is used also to deliver events incident to ring 0. One FRED
instruction (ERETU) effects a return from ring 0 to ring 3, while the
other (ERETS) returns while remaining in ring 0. Collectively, FRED
event delivery and the FRED return instructions are FRED transitions.

Search for the latest FRED spec in most search engines with this search pattern:

site:intel.com FRED (flexible return and event delivery) specification

As of now there is no publicly avaiable CPU supporting FRED, thus the Intel
Simics® Simulator is used as software development and testing vehicles. And
it can be downloaded from:
https://www.intel.com/content/www/us/en/developer/articles/tool/simics-simulator.html

To enable FRED, the Simics package 8112 QSP-CPU needs to be installed with CPU
model configured as:
$cpu_comp_class = "x86-experimental-fred"


Changes since v9:
* Set unused sysvec table entries to fred_handle_spurious_interrupt()
in fred_complete_exception_setup() (Thomas Gleixner).
* Shove the whole thing into arch/x86/entry/entry_64_fred.S for invoking
external_interrupt() and fred_exc_nmi() (Sean Christopherson).
* Correct and improve a few comments (Sean Christopherson).
* Merge the two IRQ/NMI asm entries into one as it's fine to invoke
noinstr code from regular code (Thomas Gleixner).
* Setup the long mode and NMI flags in the augmented SS field of FRED
stack frame in C instead of asm (Thomas Gleixner).
* Don't use jump tables, indirect jumps are expensive (Thomas Gleixner).
* Except #NMI/#DB/#MCE, FRED really can share the exception handlers
with IDT (Thomas Gleixner).
* Avoid the sysvec_* idt_entry muck, do it at a central place, reuse code
instead of blindly copying it, which breaks the performance optimized
sysvec entries like reschedule_ipi (Thomas Gleixner).
* Add asm_ prefix to FRED asm entry points (Thomas Gleixner).
* Disable #DB to avoid endless recursion and stack overflow when a
watchpoint/breakpoint is set in the code path which is executed by
#DB handler (Thomas Gleixner).
* Introduce a new structure fred_ss to denote the FRED flags above SS
selector, which avoids FRED_SSX_ macros and makes the code simpler
and easier to read (Thomas Gleixner).
* Use type u64 to define FRED bit fields instead of type unsigned int
(Thomas Gleixner).
* Avoid a type cast by defining X86_CR4_FRED as 0 on 32-bit (Thomas
Gleixner).
* Add the WRMSRNS instruction support (Thomas Gleixner).

Changes since v8:
* Move the FRED initialization patch after all required changes are in
place (Thomas Gleixner).
* Don't do syscall early out in fred_entry_from_user() before there are
proper performance numbers and justifications (Thomas Gleixner).
* Add the control exception handler to the FRED exception handler table
(Thomas Gleixner).
* Introduce a macro sysvec_install() to derive the asm handler name from
a C handler, which simplifies the code and avoids an ugly typecast
(Thomas Gleixner).
* Remove junk code that assumes no local APIC on x86_64 (Thomas Gleixner).
* Put IDTENTRY changes in a separate patch (Thomas Gleixner).
* Use high-order 48 bits above the lowest 16 bit SS only when FRED is
enabled (Thomas Gleixner).
* Explain why writing directly to the IA32_KERNEL_GS_BASE MSR is
doing the right thing (Thomas Gleixner).
* Reword some patch descriptions (Thomas Gleixner).
* Add a new macro VMX_DO_FRED_EVENT_IRQOFF for FRED instead of
refactoring VMX_DO_EVENT_IRQOFF (Sean Christopherson).
* Do NOT use a trampoline, just LEA+PUSH the return RIP, PUSH the error
code, and jump to the FRED kernel entry point for NMI or call
external_interrupt() for IRQs (Sean Christopherson).
* Call external_interrupt() only when FRED is enabled, and convert the
non-FRED handling to external_interrupt() after FRED lands (Sean
Christopherson).
* Use __packed instead of __attribute__((__packed__)) (Borislav Petkov).
* Put all comments above the members, like the rest of the file does
(Borislav Petkov).
* Reflect the FRED spec 5.0 change that ERETS and ERETU add 8 to %rsp
before popping the return context from the stack.
* Reflect stack frame definition changes from FRED spec 3.0 to 5.0.
* Add ENDBR to the FRED_ENTER asm macro after kernel IBT is added to
FRED base line in FRED spec 5.0.
* Add a document which briefly introduces FRED features.
* Remove 2 patches, "allow FRED systems to use interrupt vectors
0x10-0x1f" and "allow dynamic stack frame size", from this patch set,
as they are "optimizations" only.
* Send 2 patches, "header file for event types" and "do not modify the
DPL bits for a null selector", as pre-FRED patches.

Changes since v7:
* Always call external_interrupt() for VMX IRQ handling on x86_64, thus avoid
re-entering the noinstr code.
* Create a FRED stack frame when FRED is compiled-in but not enabled, which
uses some extra stack space but simplifies the code.
* Add a log message when FRED is enabled.

Changes since v6:
* Add a comment to explain why it is safe to write to a previous FRED stack
frame. (Lai Jiangshan).
* Export fred_entrypoint_kernel(), required when kvm-intel built as a module.
* Reserve a REDZONE for CALL emulation and Align RSP to a 64-byte boundary
before pushing a new FRED stack frame.
* Replace pt_regs csx flags prefix FRED_CSL_ with FRED_CSX_.

Changes since v5:
* Initialize system_interrupt_handlers with dispatch_table_spurious_interrupt()
instead of NULL to get rid of a branch (Peter Zijlstra).
* Disallow #DB inside #MCE for robustness sake (Peter Zijlstra).
* Add a comment for FRED stack level settings (Lai Jiangshan).
* Move the NMI bit from an invalid stack frame, which caused ERETU to fault,
to the fault handler's stack frame, thus to unblock NMI ASAP if NMI is blocked
(Lai Jiangshan).
* Refactor VMX_DO_EVENT_IRQOFF to handle IRQ/NMI in IRQ/NMI induced VM exits
when FRED is enabled (Sean Christopherson).

Changes since v4:
* Do NOT use the term "injection", which in the KVM context means to
reinject an event into the guest (Sean Christopherson).
* Add the explanation of why to execute "int $2" to invoke the NMI handler
in NMI caused VM exits (Sean Christopherson).
* Use cs/ss instead of csx/ssx when initializing the pt_regs structure
for calling external_interrupt(), otherwise it breaks i386 build.

Changes since v3:
* Call external_interrupt() to handle IRQ in IRQ caused VM exits.
* Execute "int $2" to handle NMI in NMI caused VM exits.
* Rename csl/ssl of the pt_regs structure to csx/ssx (x for extended)
(Andrew Cooper).

Changes since v2:
* Improve comments for changes in arch/x86/include/asm/idtentry.h.

Changes since v1:
* call irqentry_nmi_{enter,exit}() in both IDT and FRED debug fault kernel
handler (Peter Zijlstra).
* Initialize a FRED exception handler to fred_bad_event() instead of NULL
if no FRED handler defined for an exception vector (Peter Zijlstra).
* Push calling irqentry_{enter,exit}() and instrumentation_{begin,end}()
down into individual FRED exception handlers, instead of in the dispatch
framework (Peter Zijlstra).


H. Peter Anvin (Intel) (22):
x86/fred: Add Kconfig option for FRED (CONFIG_X86_FRED)
x86/cpufeatures: Add the cpu feature bit for FRED
x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled
x86/opcode: Add ERET[US] instructions to the x86 opcode map
x86/objtool: Teach objtool about ERET[US]
x86/cpu: Add X86_CR4_FRED macro
x86/cpu: Add MSR numbers for FRED configuration
x86/fred: Add a new header file for FRED definitions
x86/fred: Reserve space for the FRED stack frame
x86/fred: Update MSR_IA32_FRED_RSP0 during task switch
x86/fred: Disallow the swapgs instruction when FRED is enabled
x86/fred: No ESPFIX needed when FRED is enabled
x86/fred: Allow single-step trap and NMI when starting a new task
x86/fred: Make exc_page_fault() work for FRED
x86/fred: Add a debug fault entry stub for FRED
x86/fred: Add a NMI entry stub for FRED
x86/fred: FRED entry/exit and dispatch code
x86/traps: Add sysvec_install() to install a system interrupt handler
x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED
is enabled
x86/fred: Add fred_syscall_init()
x86/fred: Add FRED initialization functions
x86/fred: Invoke FRED initialization code to enable FRED

Peter Zijlstra (Intel) (1):
x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual
entry code

Xin Li (15):
x86/cpufeatures: Add the cpu feature bit for WRMSRNS
x86/opcode: Add the WRMSRNS instruction to the x86 opcode map
x86/msr: Add the WRMSRNS instruction support
x86/entry: Remove idtentry_sysvec from entry_{32,64}.S
x86/trapnr: Add event type macros to <asm/trapnr.h>
Documentation/x86/64: Add a documentation for FRED
x86/fred: Disable FRED by default in its early stage
x86/ptrace: Cleanup the definition of the pt_regs structure
x86/ptrace: Add FRED additional information to the pt_regs structure
x86/idtentry: Incorporate definitions/declarations of the FRED entries
x86/fred: Add a machine check entry stub for FRED
x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI
KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling
x86/syscall: Split IDT syscall setup code into idt_syscall_init()

.../admin-guide/kernel-parameters.txt | 3 +
Documentation/arch/x86/x86_64/fred.rst | 98 ++++++
Documentation/arch/x86/x86_64/index.rst | 1 +
arch/x86/Kconfig | 9 +
arch/x86/entry/Makefile | 5 +-
arch/x86/entry/calling.h | 15 +-
arch/x86/entry/entry_32.S | 4 -
arch/x86/entry/entry_64.S | 14 +-
arch/x86/entry/entry_64_fred.S | 129 ++++++++
arch/x86/entry/entry_fred.c | 279 ++++++++++++++++++
arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
arch/x86/include/asm/asm-prototypes.h | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/desc.h | 2 -
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/extable_fixup_types.h | 4 +-
arch/x86/include/asm/fred.h | 97 ++++++
arch/x86/include/asm/idtentry.h | 88 +++++-
arch/x86/include/asm/msr-index.h | 13 +-
arch/x86/include/asm/msr.h | 18 ++
arch/x86/include/asm/ptrace.h | 85 +++++-
arch/x86/include/asm/switch_to.h | 8 +-
arch/x86/include/asm/thread_info.h | 12 +-
arch/x86/include/asm/trapnr.h | 12 +
arch/x86/include/asm/vmx.h | 17 +-
arch/x86/include/uapi/asm/processor-flags.h | 7 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/acrn.c | 4 +-
arch/x86/kernel/cpu/common.c | 53 +++-
arch/x86/kernel/cpu/cpuid-deps.c | 2 +
arch/x86/kernel/cpu/mce/core.c | 26 ++
arch/x86/kernel/cpu/mshyperv.c | 15 +-
arch/x86/kernel/espfix_64.c | 8 +
arch/x86/kernel/fred.c | 59 ++++
arch/x86/kernel/idt.c | 4 +-
arch/x86/kernel/irqinit.c | 7 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/nmi.c | 28 ++
arch/x86/kernel/process_64.c | 67 ++++-
arch/x86/kernel/traps.c | 48 ++-
arch/x86/kvm/vmx/vmx.c | 12 +-
arch/x86/lib/x86-opcode-map.txt | 4 +-
arch/x86/mm/extable.c | 79 +++++
arch/x86/mm/fault.c | 5 +-
drivers/xen/events/events_base.c | 2 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +
.../arch/x86/include/asm/disabled-features.h | 8 +-
tools/arch/x86/include/asm/msr-index.h | 13 +-
tools/arch/x86/lib/x86-opcode-map.txt | 4 +-
tools/objtool/arch/x86/decode.c | 19 +-
50 files changed, 1291 insertions(+), 114 deletions(-)
create mode 100644 Documentation/arch/x86/x86_64/fred.rst
create mode 100644 arch/x86/entry/entry_64_fred.S
create mode 100644 arch/x86/entry/entry_fred.c
create mode 100644 arch/x86/include/asm/fred.h
create mode 100644 arch/x86/kernel/fred.c

--
2.34.1


2023-09-14 11:27:00

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 37/38] x86/fred: Add FRED initialization functions

From: "H. Peter Anvin (Intel)" <[email protected]>

Add cpu_init_fred_exceptions() to:
- Set FRED entrypoints for events happening in ring 0 and 3.
- Specify the stack level for IRQs occurred ring 0.
- Specify dedicated event stacks for #DB/NMI/#MCE/#DF.
- Enable FRED and invalidtes IDT.
- Force 32-bit system calls to use "int $0x80" only.

Add fred_complete_exception_setup() to:
- Initialize system_vectors as done for IDT systems.
- Set unused sysvec_table entries to fred_handle_spurious_interrupt().

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Co-developed-by: Xin Li <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---

Changes since v9:
* Set unused sysvec table entries to fred_handle_spurious_interrupt()
in fred_complete_exception_setup() (Thomas Gleixner).

Changes since v5:
* Add a comment for FRED stack level settings (Lai Jiangshan).
* Define NMI/#DB/#MCE/#DF stack levels using macros.
---
arch/x86/entry/entry_fred.c | 21 +++++++++++++
arch/x86/include/asm/fred.h | 5 ++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/fred.c | 59 +++++++++++++++++++++++++++++++++++++
4 files changed, 86 insertions(+)
create mode 100644 arch/x86/kernel/fred.c

diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
index f8774611af80..7d507f59315d 100644
--- a/arch/x86/entry/entry_fred.c
+++ b/arch/x86/entry/entry_fred.c
@@ -140,6 +140,27 @@ void __init fred_install_sysvec(unsigned int sysvec, idtentry_t handler)
sysvec_table[sysvec - FIRST_SYSTEM_VECTOR] = handler;
}

+static noinstr void fred_handle_spurious_interrupt(struct pt_regs *regs)
+{
+ spurious_interrupt(regs, regs->fred_ss.vector);
+}
+
+void __init fred_complete_exception_setup(void)
+{
+ unsigned int vector;
+
+ for (vector = 0; vector < FIRST_EXTERNAL_VECTOR; vector++)
+ set_bit(vector, system_vectors);
+
+ for (vector = 0; vector < NR_SYSTEM_VECTORS; vector++) {
+ if (sysvec_table[vector])
+ set_bit(vector + FIRST_SYSTEM_VECTOR, system_vectors);
+ else
+ sysvec_table[vector] = fred_handle_spurious_interrupt;
+ }
+ fred_setup_done = true;
+}
+
static noinstr void fred_extint(struct pt_regs *regs)
{
unsigned int vector = regs->fred_ss.vector;
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 2fa9f34e5c95..e86c7ba32435 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -83,8 +83,13 @@ static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int
asm_fred_entry_from_kvm(ss);
}

+void cpu_init_fred_exceptions(void);
+void fred_complete_exception_setup(void);
+
#else /* CONFIG_X86_FRED */
static __always_inline unsigned long fred_event_data(struct pt_regs *regs) { return 0; }
+static inline void cpu_init_fred_exceptions(void) { }
+static inline void fred_complete_exception_setup(void) { }
static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector) { }
#endif /* CONFIG_X86_FRED */
#endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3269a0e23d3a..8dfdae4111bb 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -47,6 +47,7 @@ obj-y += platform-quirks.o
obj-y += process_$(BITS).o signal.o signal_$(BITS).o
obj-y += traps.o idt.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
obj-y += time.o ioport.o dumpstack.o nmi.o
+obj-$(CONFIG_X86_FRED) += fred.o
obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o
obj-$(CONFIG_X86_KERNEL_IBT) += ibt_selftest.o
obj-y += setup.o x86_init.o i8259.o irqinit.o
diff --git a/arch/x86/kernel/fred.c b/arch/x86/kernel/fred.c
new file mode 100644
index 000000000000..4bcd8791ad96
--- /dev/null
+++ b/arch/x86/kernel/fred.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/kernel.h>
+
+#include <asm/desc.h>
+#include <asm/fred.h>
+#include <asm/tlbflush.h>
+#include <asm/traps.h>
+
+/* #DB in the kernel would imply the use of a kernel debugger. */
+#define FRED_DB_STACK_LEVEL 1UL
+#define FRED_NMI_STACK_LEVEL 2UL
+#define FRED_MC_STACK_LEVEL 2UL
+/*
+ * #DF is the highest level because a #DF means "something went wrong
+ * *while delivering an exception*." The number of cases for which that
+ * can happen with FRED is drastically reduced and basically amounts to
+ * "the stack you pointed me to is broken." Thus, always change stacks
+ * on #DF, which means it should be at the highest level.
+ */
+#define FRED_DF_STACK_LEVEL 3UL
+
+#define FRED_STKLVL(vector, lvl) ((lvl) << (2 * (vector)))
+
+void cpu_init_fred_exceptions(void)
+{
+ /* When FRED is enabled by default, remove this log message */
+ pr_info("Initialize FRED on CPU%d\n", smp_processor_id());
+
+ wrmsrl(MSR_IA32_FRED_CONFIG,
+ /* Reserve for CALL emulation */
+ FRED_CONFIG_REDZONE |
+ FRED_CONFIG_INT_STKLVL(0) |
+ FRED_CONFIG_ENTRYPOINT(asm_fred_entrypoint_user));
+
+ /*
+ * The purpose of separate stacks for NMI, #DB and #MC *in the kernel*
+ * (remember that user space faults are always taken on stack level 0)
+ * is to avoid overflowing the kernel stack.
+ */
+ wrmsrl(MSR_IA32_FRED_STKLVLS,
+ FRED_STKLVL(X86_TRAP_DB, FRED_DB_STACK_LEVEL) |
+ FRED_STKLVL(X86_TRAP_NMI, FRED_NMI_STACK_LEVEL) |
+ FRED_STKLVL(X86_TRAP_MC, FRED_MC_STACK_LEVEL) |
+ FRED_STKLVL(X86_TRAP_DF, FRED_DF_STACK_LEVEL));
+
+ /* The FRED equivalents to IST stacks... */
+ wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB));
+ wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI));
+ wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF));
+
+ /* Enable FRED */
+ cr4_set_bits(X86_CR4_FRED);
+ /* Any further IDT use is a bug */
+ idt_invalidate();
+
+ /* Use int $0x80 for 32-bit system calls in FRED mode */
+ setup_clear_cpu_cap(X86_FEATURE_SYSENTER32);
+ setup_clear_cpu_cap(X86_FEATURE_SYSCALL32);
+}
--
2.34.1

2023-09-14 11:31:32

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 32/38] x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code

From: "Peter Zijlstra (Intel)" <[email protected]>

PUSH_AND_CLEAR_REGS could be used besides actual entry code; in that case
%rbp shouldn't be cleared (otherwise the frame pointer is destroyed) and
UNWIND_HINT shouldn't be added.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/entry/calling.h | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index f6907627172b..eb57c023d5df 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -65,7 +65,7 @@ For 32-bit we have the following conventions - kernel is built with
* for assembly code:
*/

-.macro PUSH_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0
+.macro PUSH_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 unwind_hint=1
.if \save_ret
pushq %rsi /* pt_regs->si */
movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */
@@ -87,14 +87,17 @@ For 32-bit we have the following conventions - kernel is built with
pushq %r13 /* pt_regs->r13 */
pushq %r14 /* pt_regs->r14 */
pushq %r15 /* pt_regs->r15 */
+
+ .if \unwind_hint
UNWIND_HINT_REGS
+ .endif

.if \save_ret
pushq %rsi /* return address on top of stack */
.endif
.endm

-.macro CLEAR_REGS
+.macro CLEAR_REGS clear_bp=1
/*
* Sanitize registers of values that a speculation attack might
* otherwise want to exploit. The lower registers are likely clobbered
@@ -109,7 +112,9 @@ For 32-bit we have the following conventions - kernel is built with
xorl %r10d, %r10d /* nospec r10 */
xorl %r11d, %r11d /* nospec r11 */
xorl %ebx, %ebx /* nospec rbx */
+ .if \clear_bp
xorl %ebp, %ebp /* nospec rbp */
+ .endif
xorl %r12d, %r12d /* nospec r12 */
xorl %r13d, %r13d /* nospec r13 */
xorl %r14d, %r14d /* nospec r14 */
@@ -117,9 +122,9 @@ For 32-bit we have the following conventions - kernel is built with

.endm

-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0
- PUSH_REGS rdx=\rdx, rcx=\rcx, rax=\rax, save_ret=\save_ret
- CLEAR_REGS
+.macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_bp=1 unwind_hint=1
+ PUSH_REGS rdx=\rdx, rcx=\rcx, rax=\rax, save_ret=\save_ret unwind_hint=\unwind_hint
+ CLEAR_REGS clear_bp=\clear_bp
.endm

.macro POP_REGS pop_rdi=1
--
2.34.1

2023-09-14 11:47:43

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 14/38] x86/cpu: Add MSR numbers for FRED configuration

From: "H. Peter Anvin (Intel)" <[email protected]>

Add MSR numbers for the FRED configuration registers per FRED spec 5.0.

Originally-by: Megha Dey <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/msr-index.h | 13 ++++++++++++-
tools/arch/x86/include/asm/msr-index.h | 13 ++++++++++++-
2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1d111350197f..972d15404420 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -36,8 +36,19 @@
#define EFER_FFXSR (1<<_EFER_FFXSR)
#define EFER_AUTOIBRS (1<<_EFER_AUTOIBRS)

-/* Intel MSRs. Some also available on other CPUs */
+/* FRED MSRs */
+#define MSR_IA32_FRED_RSP0 0x1cc /* Level 0 stack pointer */
+#define MSR_IA32_FRED_RSP1 0x1cd /* Level 1 stack pointer */
+#define MSR_IA32_FRED_RSP2 0x1ce /* Level 2 stack pointer */
+#define MSR_IA32_FRED_RSP3 0x1cf /* Level 3 stack pointer */
+#define MSR_IA32_FRED_STKLVLS 0x1d0 /* Exception stack levels */
+#define MSR_IA32_FRED_SSP0 MSR_IA32_PL0_SSP /* Level 0 shadow stack pointer */
+#define MSR_IA32_FRED_SSP1 0x1d1 /* Level 1 shadow stack pointer */
+#define MSR_IA32_FRED_SSP2 0x1d2 /* Level 2 shadow stack pointer */
+#define MSR_IA32_FRED_SSP3 0x1d3 /* Level 3 shadow stack pointer */
+#define MSR_IA32_FRED_CONFIG 0x1d4 /* Entrypoint and interrupt stack level */

+/* Intel MSRs. Some also available on other CPUs */
#define MSR_TEST_CTRL 0x00000033
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
index a00a53e15ab7..fc75e3ca47d9 100644
--- a/tools/arch/x86/include/asm/msr-index.h
+++ b/tools/arch/x86/include/asm/msr-index.h
@@ -36,8 +36,19 @@
#define EFER_FFXSR (1<<_EFER_FFXSR)
#define EFER_AUTOIBRS (1<<_EFER_AUTOIBRS)

-/* Intel MSRs. Some also available on other CPUs */
+/* FRED MSRs */
+#define MSR_IA32_FRED_RSP0 0x1cc /* Level 0 stack pointer */
+#define MSR_IA32_FRED_RSP1 0x1cd /* Level 1 stack pointer */
+#define MSR_IA32_FRED_RSP2 0x1ce /* Level 2 stack pointer */
+#define MSR_IA32_FRED_RSP3 0x1cf /* Level 3 stack pointer */
+#define MSR_IA32_FRED_STKLVLS 0x1d0 /* Exception stack levels */
+#define MSR_IA32_FRED_SSP0 MSR_IA32_PL0_SSP /* Level 0 shadow stack pointer */
+#define MSR_IA32_FRED_SSP1 0x1d1 /* Level 1 shadow stack pointer */
+#define MSR_IA32_FRED_SSP2 0x1d2 /* Level 2 shadow stack pointer */
+#define MSR_IA32_FRED_SSP3 0x1d3 /* Level 3 shadow stack pointer */
+#define MSR_IA32_FRED_CONFIG 0x1d4 /* Entrypoint and interrupt stack level */

+/* Intel MSRs. Some also available on other CPUs */
#define MSR_TEST_CTRL 0x00000033
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
--
2.34.1

2023-09-14 12:13:13

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 10/38] x86/fred: Disable FRED by default in its early stage

To enable FRED, a new kernel command line option "fred" needs to be added.

Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 3 +++
arch/x86/kernel/cpu/common.c | 3 +++
2 files changed, 6 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0a1731a0f0ef..42def5cc7552 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1525,6 +1525,9 @@
Warning: use of this parameter will taint the kernel
and may cause unknown problems.

+ fred [X86-64]
+ Enable flexible return and event delivery
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 382d4e6b848d..317b4877e9c7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1486,6 +1486,9 @@ static void __init cpu_parse_early_param(void)
char *argptr = arg, *opt;
int arglen, taint = 0;

+ if (!cmdline_find_option_bool(boot_command_line, "fred"))
+ setup_clear_cpu_cap(X86_FEATURE_FRED);
+
#ifdef CONFIG_X86_32
if (cmdline_find_option_bool(boot_command_line, "no387"))
#ifdef CONFIG_MATH_EMULATION
--
2.34.1

2023-09-14 13:24:35

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 35/38] x86/syscall: Split IDT syscall setup code into idt_syscall_init()

Split IDT syscall setup code into idt_syscall_init() to make it
cleaner to add FRED syscall setup code.

Suggested-by: Thomas Gleixner <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/kernel/cpu/common.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 42511209469b..d960b7276008 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2070,10 +2070,8 @@ static void wrmsrl_cstar(unsigned long val)
wrmsrl(MSR_CSTAR, val);
}

-/* May not be marked __init: used by software suspend */
-void syscall_init(void)
+static inline void idt_syscall_init(void)
{
- wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);

#ifdef CONFIG_IA32_EMULATION
@@ -2107,6 +2105,15 @@ void syscall_init(void)
X86_EFLAGS_AC|X86_EFLAGS_ID);
}

+/* May not be marked __init: used by software suspend */
+void syscall_init(void)
+{
+ /* The default user and kernel segments */
+ wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
+
+ idt_syscall_init();
+}
+
#else /* CONFIG_X86_64 */

#ifdef CONFIG_STACKPROTECTOR
--
2.34.1

2023-09-14 13:24:41

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 09/38] x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled

From: "H. Peter Anvin (Intel)" <[email protected]>

Add CONFIG_X86_FRED to <asm/disabled-features.h> to make
cpu_feature_enabled() work correctly with FRED.

Originally-by: Megha Dey <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/disabled-features.h | 8 +++++++-
tools/arch/x86/include/asm/disabled-features.h | 8 +++++++-
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 702d93fdd10e..3cde57cb5093 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -117,6 +117,12 @@
#define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31))
#endif

+#ifdef CONFIG_X86_FRED
+# define DISABLE_FRED 0
+#else
+# define DISABLE_FRED (1 << (X86_FEATURE_FRED & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -134,7 +140,7 @@
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
#define DISABLED_MASK12 (DISABLE_LAM)
-#define DISABLED_MASK13 0
+#define DISABLED_MASK13 (DISABLE_FRED)
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..d540ecdd8812 100644
--- a/tools/arch/x86/include/asm/disabled-features.h
+++ b/tools/arch/x86/include/asm/disabled-features.h
@@ -105,6 +105,12 @@
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
#endif

+#ifdef CONFIG_X86_FRED
+# define DISABLE_FRED 0
+#else
+# define DISABLE_FRED (1 << (X86_FEATURE_FRED & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -122,7 +128,7 @@
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
DISABLE_CALL_DEPTH_TRACKING)
#define DISABLED_MASK12 (DISABLE_LAM)
-#define DISABLED_MASK13 0
+#define DISABLED_MASK13 (DISABLE_FRED)
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
--
2.34.1

2023-09-14 13:59:18

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 22/38] x86/fred: Allow single-step trap and NMI when starting a new task

From: "H. Peter Anvin (Intel)" <[email protected]>

Entering a new task is logically speaking a return from a system call
(exec, fork, clone, etc.). As such, if ptrace enables single stepping
a single step exception should be allowed to trigger immediately upon
entering user space. This is not optional.

NMI should *never* be disabled in user space. As such, this is an
optional, opportunistic way to catch errors.

Allow single-step trap and NMI when starting a new task, thus once
the new task enters user space, single-step trap and NMI are both
enabled immediately.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---

Changes since v8:
* Use high-order 48 bits above the lowest 16 bit SS only when FRED
is enabled (Thomas Gleixner).
---
arch/x86/kernel/process_64.c | 38 ++++++++++++++++++++++++++++++------
1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4f87f5987ae8..c075591b7b46 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -56,6 +56,7 @@
#include <asm/resctrl.h>
#include <asm/unistd.h>
#include <asm/fsgsbase.h>
+#include <asm/fred.h>
#ifdef CONFIG_IA32_EMULATION
/* Not included via unistd.h */
#include <asm/unistd_32_ia32.h>
@@ -528,7 +529,7 @@ void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase)
static void
start_thread_common(struct pt_regs *regs, unsigned long new_ip,
unsigned long new_sp,
- unsigned int _cs, unsigned int _ss, unsigned int _ds)
+ u16 _cs, u16 _ss, u16 _ds)
{
WARN_ON_ONCE(regs != current_pt_regs());

@@ -545,11 +546,36 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
loadsegment(ds, _ds);
load_gs_index(0);

- regs->ip = new_ip;
- regs->sp = new_sp;
- regs->cs = _cs;
- regs->ss = _ss;
- regs->flags = X86_EFLAGS_IF;
+ regs->ip = new_ip;
+ regs->sp = new_sp;
+ regs->csx = _cs;
+ regs->ssx = _ss;
+ /*
+ * Allow single-step trap and NMI when starting a new task, thus
+ * once the new task enters user space, single-step trap and NMI
+ * are both enabled immediately.
+ *
+ * Entering a new task is logically speaking a return from a
+ * system call (exec, fork, clone, etc.). As such, if ptrace
+ * enables single stepping a single step exception should be
+ * allowed to trigger immediately upon entering user space.
+ * This is not optional.
+ *
+ * NMI should *never* be disabled in user space. As such, this
+ * is an optional, opportunistic way to catch errors.
+ *
+ * Paranoia: High-order 48 bits above the lowest 16 bit SS are
+ * discarded by the legacy IRET instruction on all Intel, AMD,
+ * and Cyrix/Centaur/VIA CPUs, thus can be set unconditionally,
+ * even when FRED is not enabled. But we choose the safer side
+ * to use these bits only when FRED is enabled.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+ regs->fred_ss.swevent = true;
+ regs->fred_ss.nmi = true;
+ }
+
+ regs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
}

void
--
2.34.1

2023-09-14 14:39:23

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 29/38] x86/traps: Add sysvec_install() to install a system interrupt handler

From: "H. Peter Anvin (Intel)" <[email protected]>

Add sysvec_install() to install a system interrupt handler into the IDT
or the FRED system interrupt handler table.

Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---

Changes since v8:
* Introduce a macro sysvec_install() to derive the asm handler name from
a C handler, which simplifies the code and avoids an ugly typecast
(Thomas Gleixner).
---
arch/x86/entry/entry_fred.c | 14 ++++++++++++++
arch/x86/include/asm/desc.h | 2 --
arch/x86/include/asm/idtentry.h | 15 +++++++++++++++
arch/x86/kernel/cpu/acrn.c | 4 ++--
arch/x86/kernel/cpu/mshyperv.c | 15 +++++++--------
arch/x86/kernel/idt.c | 4 ++--
arch/x86/kernel/kvm.c | 2 +-
drivers/xen/events/events_base.c | 2 +-
8 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
index dc645f5819a9..2fd3e421e066 100644
--- a/arch/x86/entry/entry_fred.c
+++ b/arch/x86/entry/entry_fred.c
@@ -126,6 +126,20 @@ static idtentry_t sysvec_table[NR_SYSTEM_VECTORS] __ro_after_init = {
SYSVEC(POSTED_INTR_NESTED_VECTOR, kvm_posted_intr_nested_ipi),
};

+static bool fred_setup_done __initdata;
+
+void __init fred_install_sysvec(unsigned int sysvec, idtentry_t handler)
+{
+ if (WARN_ON_ONCE(sysvec < FIRST_SYSTEM_VECTOR))
+ return;
+
+ if (WARN_ON_ONCE(fred_setup_done))
+ return;
+
+ if (!WARN_ON_ONCE(sysvec_table[sysvec - FIRST_SYSTEM_VECTOR]))
+ sysvec_table[sysvec - FIRST_SYSTEM_VECTOR] = handler;
+}
+
static noinstr void fred_extint(struct pt_regs *regs)
{
unsigned int vector = regs->fred_ss.vector;
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index ab97b22ac04a..ec95fe44fa3a 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -402,8 +402,6 @@ static inline void set_desc_limit(struct desc_struct *desc, unsigned long limit)
desc->limit1 = (limit >> 16) & 0xf;
}

-void alloc_intr_gate(unsigned int n, const void *addr);
-
static inline void init_idt_data(struct idt_data *data, unsigned int n,
const void *addr)
{
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 4f26ee9b8b74..650c98160152 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -459,6 +459,21 @@ __visible noinstr void func(struct pt_regs *regs, \
#define DEFINE_FREDENTRY_DEBUG DEFINE_FREDENTRY_RAW
#endif

+void idt_install_sysvec(unsigned int n, const void *function);
+
+#ifdef CONFIG_X86_FRED
+void fred_install_sysvec(unsigned int vector, const idtentry_t function);
+#else
+static inline void fred_install_sysvec(unsigned int vector, const idtentry_t function) { }
+#endif
+
+#define sysvec_install(vector, function) { \
+ if (cpu_feature_enabled(X86_FEATURE_FRED)) \
+ fred_install_sysvec(vector, function); \
+ else \
+ idt_install_sysvec(vector, asm_##function); \
+}
+
#else /* !__ASSEMBLY__ */

/*
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index bfeb18fad63f..2c5b51aad91a 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -26,8 +26,8 @@ static u32 __init acrn_detect(void)

static void __init acrn_init_platform(void)
{
- /* Setup the IDT for ACRN hypervisor callback */
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_acrn_hv_callback);
+ /* Install system interrupt handler for ACRN hypervisor callback */
+ sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_acrn_hv_callback);

x86_platform.calibrate_tsc = acrn_get_tsc_khz;
x86_platform.calibrate_cpu = acrn_get_tsc_khz;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e6bba12c759c..3403880c3e09 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -536,19 +536,18 @@ static void __init ms_hyperv_init_platform(void)
*/
x86_platform.apic_post_init = hyperv_init;
hyperv_setup_mmu_ops();
- /* Setup the IDT for hypervisor callback */
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_hyperv_callback);

- /* Setup the IDT for reenlightenment notifications */
+ /* Install system interrupt handler for hypervisor callback */
+ sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_hyperv_callback);
+
+ /* Install system interrupt handler for reenlightenment notifications */
if (ms_hyperv.features & HV_ACCESS_REENLIGHTENMENT) {
- alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
- asm_sysvec_hyperv_reenlightenment);
+ sysvec_install(HYPERV_REENLIGHTENMENT_VECTOR, sysvec_hyperv_reenlightenment);
}

- /* Setup the IDT for stimer0 */
+ /* Install system interrupt handler for stimer0 */
if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE) {
- alloc_intr_gate(HYPERV_STIMER0_VECTOR,
- asm_sysvec_hyperv_stimer0);
+ sysvec_install(HYPERV_STIMER0_VECTOR, sysvec_hyperv_stimer0);
}

# ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index b786d48f5a0f..0db7d92de48a 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -330,7 +330,7 @@ void idt_invalidate(void)
load_idt(&idt);
}

-void __init alloc_intr_gate(unsigned int n, const void *addr)
+void __init idt_install_sysvec(unsigned int n, const void *function)
{
if (WARN_ON(n < FIRST_SYSTEM_VECTOR))
return;
@@ -339,5 +339,5 @@ void __init alloc_intr_gate(unsigned int n, const void *addr)
return;

if (!WARN_ON(test_and_set_bit(n, system_vectors)))
- set_intr_gate(n, addr);
+ set_intr_gate(n, function);
}
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b8ab9ee5896c..eabf03813a5c 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -829,7 +829,7 @@ static void __init kvm_guest_init(void)

if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
static_branch_enable(&kvm_async_pf_enabled);
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_kvm_asyncpf_interrupt);
+ sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_kvm_asyncpf_interrupt);
}

#ifdef CONFIG_SMP
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 3bdd5b59661d..c54123ca7b1a 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -2243,7 +2243,7 @@ static __init void xen_alloc_callback_vector(void)
return;

pr_info("Xen HVM callback vector for event delivery is enabled\n");
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_xen_hvm_callback);
+ sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_xen_hvm_callback);
}
#else
void xen_setup_callback_vector(void) {}
--
2.34.1

2023-09-14 14:44:40

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 12/38] x86/objtool: Teach objtool about ERET[US]

From: "H. Peter Anvin (Intel)" <[email protected]>

Update the objtool decoder to know about the ERET[US] instructions
(type INSN_CONTEXT_SWITCH).

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
tools/objtool/arch/x86/decode.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index c0f25d00181e..6999f478c155 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -509,11 +509,20 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec

if (op2 == 0x01) {

- if (modrm == 0xca)
- insn->type = INSN_CLAC;
- else if (modrm == 0xcb)
- insn->type = INSN_STAC;
-
+ switch (insn_last_prefix_id(&ins)) {
+ case INAT_PFX_REPE:
+ case INAT_PFX_REPNE:
+ if (modrm == 0xca)
+ /* eretu/erets */
+ insn->type = INSN_CONTEXT_SWITCH;
+ break;
+ default:
+ if (modrm == 0xca)
+ insn->type = INSN_CLAC;
+ else if (modrm == 0xcb)
+ insn->type = INSN_STAC;
+ break;
+ }
} else if (op2 >= 0x80 && op2 <= 0x8f) {

insn->type = INSN_JUMP_CONDITIONAL;
--
2.34.1

2023-09-14 14:49:59

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 21/38] x86/fred: No ESPFIX needed when FRED is enabled

From: "H. Peter Anvin (Intel)" <[email protected]>

Because FRED always restores the full value of %rsp, ESPFIX is
no longer needed when it's enabled.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/kernel/espfix_64.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 16f9814c9be0..6726e0473d0b 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -106,6 +106,10 @@ void __init init_espfix_bsp(void)
pgd_t *pgd;
p4d_t *p4d;

+ /* FRED systems always restore the full value of %rsp */
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ return;
+
/* Install the espfix pud into the kernel page directory */
pgd = &init_top_pgt[pgd_index(ESPFIX_BASE_ADDR)];
p4d = p4d_alloc(&init_mm, pgd, ESPFIX_BASE_ADDR);
@@ -129,6 +133,10 @@ void init_espfix_ap(int cpu)
void *stack_page;
pteval_t ptemask;

+ /* FRED systems always restore the full value of %rsp */
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ return;
+
/* We only have to do this once... */
if (likely(per_cpu(espfix_stack, cpu)))
return; /* Already initialized */
--
2.34.1

2023-09-14 14:57:18

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 04/38] x86/entry: Remove idtentry_sysvec from entry_{32,64}.S

idtentry_sysvec is really just DECLARE_IDTENTRY defined in
<asm/idtentry.h>, no need to define it separately.

Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/entry/entry_32.S | 4 ----
arch/x86/entry/entry_64.S | 8 --------
arch/x86/include/asm/idtentry.h | 2 +-
3 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 6e6af42e044a..e0f22ad8ff7e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -649,10 +649,6 @@ SYM_CODE_START_LOCAL(asm_\cfunc)
SYM_CODE_END(asm_\cfunc)
.endm

-.macro idtentry_sysvec vector cfunc
- idtentry \vector asm_\cfunc \cfunc has_error_code=0
-.endm
-
/*
* Include the defines which emit the idt entries which are shared
* shared between 32 and 64 bit and emit the __irqentry_text_* markers
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 43606de22511..9cdb61ea91de 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -432,14 +432,6 @@ SYM_CODE_END(\asmsym)
idtentry \vector asm_\cfunc \cfunc has_error_code=1
.endm

-/*
- * System vectors which invoke their handlers directly and are not
- * going through the regular common device interrupt handling code.
- */
-.macro idtentry_sysvec vector cfunc
- idtentry \vector asm_\cfunc \cfunc has_error_code=0
-.endm
-
/**
* idtentry_mce_db - Macro to generate entry stubs for #MC and #DB
* @vector: Vector number
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 05fd175cec7d..cfca68f6cb84 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -447,7 +447,7 @@ __visible noinstr void func(struct pt_regs *regs, \

/* System vector entries */
#define DECLARE_IDTENTRY_SYSVEC(vector, func) \
- idtentry_sysvec vector func
+ DECLARE_IDTENTRY(vector, func)

#ifdef CONFIG_X86_64
# define DECLARE_IDTENTRY_MCE(vector, func) \
--
2.34.1

2023-09-14 14:59:52

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()

From: "H. Peter Anvin (Intel)" <[email protected]>

Add a syscall initialization function fred_syscall_init() for FRED,
and this is really just to skip setting up SYSCALL/SYSENTER related
MSRs, e.g., MSR_LSTAR and invalidate SYSENTER configurations, because
FRED uses the ring 3 FRED entrypoint for SYSCALL and SYSENTER, and
ERETU is the only legit instruction to return to ring 3 per FRED spec
5.0.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/kernel/cpu/common.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d960b7276008..4cb36e241c9a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2105,6 +2105,23 @@ static inline void idt_syscall_init(void)
X86_EFLAGS_AC|X86_EFLAGS_ID);
}

+static inline void fred_syscall_init(void)
+{
+ /*
+ * Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL
+ * and SYSENTER, and ERETU is the only legit instruction to return to
+ * ring 3, as a result there is _no_ need to setup the SYSCALL and
+ * SYSENTER MSRs.
+ *
+ * Note, both sysexit and sysret cause #UD when FRED is enabled.
+ */
+ wrmsrl(MSR_LSTAR, 0ULL);
+ wrmsrl_cstar(0ULL);
+ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
+ wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+ wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
+}
+
/* May not be marked __init: used by software suspend */
void syscall_init(void)
{
--
2.34.1

2023-09-14 15:00:08

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 25/38] x86/fred: Add a debug fault entry stub for FRED

From: "H. Peter Anvin (Intel)" <[email protected]>

When occurred on different ring level, i.e., from user or kernel context,
#DB needs to be handled on different stack: User #DB on current task
stack, while kernel #DB on a dedicated stack. This is exactly how FRED
event delivery invokes an exception handler: ring 3 event on level 0
stack, i.e., current task stack; ring 0 event on the #DB dedicated stack
specified in the IA32_FRED_STKLVLS MSR. So unlike IDT, the FRED debug
exception entry stub doesn't do stack switch.

On a FRED system, the debug trap status information (DR6) is passed on
the stack, to avoid the problem of transient state. Furthermore, FRED
transitions avoid a lot of ugly corner cases the handling of which can,
and should be, skipped.

The FRED debug trap status information saved on the stack differs from
DR6 in both stickiness and polarity; it is exactly in the format which
debug_read_clear_dr6() returns for the IDT entry points.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---

Changes since v9:
* Disable #DB to avoid endless recursion and stack overflow when a
watchpoint/breakpoint is set in the code path which is executed by
#DB handler (Thomas Gleixner).

Changes since v1:
* call irqentry_nmi_{enter,exit}() in both IDT and FRED debug fault kernel
handler (Peter Zijlstra).
---
arch/x86/kernel/traps.c | 43 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c876f1d36a81..848c85208a57 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -50,6 +50,7 @@
#include <asm/ftrace.h>
#include <asm/traps.h>
#include <asm/desc.h>
+#include <asm/fred.h>
#include <asm/fpu/api.h>
#include <asm/cpu.h>
#include <asm/cpu_entry_area.h>
@@ -934,8 +935,7 @@ static bool notify_debug(struct pt_regs *regs, unsigned long *dr6)
return false;
}

-static __always_inline void exc_debug_kernel(struct pt_regs *regs,
- unsigned long dr6)
+static noinstr void exc_debug_kernel(struct pt_regs *regs, unsigned long dr6)
{
/*
* Disable breakpoints during exception handling; recursive exceptions
@@ -947,6 +947,11 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
*
* Entry text is excluded for HW_BP_X and cpu_entry_area, which
* includes the entry stack is excluded for everything.
+ *
+ * For FRED, nested #DB should just work fine. But when a watchpoint or
+ * breakpoint is set in the code path which is executed by #DB handler,
+ * it results in an endless recursion and stack overflow. Thus we stay
+ * with the IDT approach, i.e., save DR7 and disable #DB.
*/
unsigned long dr7 = local_db_save();
irqentry_state_t irq_state = irqentry_nmi_enter(regs);
@@ -976,7 +981,8 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
* Catch SYSENTER with TF set and clear DR_STEP. If this hit a
* watchpoint at the same time then that will still be handled.
*/
- if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
+ if (!cpu_feature_enabled(X86_FEATURE_FRED) &&
+ (dr6 & DR_STEP) && is_sysenter_singlestep(regs))
dr6 &= ~DR_STEP;

/*
@@ -1008,8 +1014,7 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
local_db_restore(dr7);
}

-static __always_inline void exc_debug_user(struct pt_regs *regs,
- unsigned long dr6)
+static noinstr void exc_debug_user(struct pt_regs *regs, unsigned long dr6)
{
bool icebp;

@@ -1093,6 +1098,34 @@ DEFINE_IDTENTRY_DEBUG_USER(exc_debug)
{
exc_debug_user(regs, debug_read_clear_dr6());
}
+
+#ifdef CONFIG_X86_FRED
+/*
+ * When occurred on different ring level, i.e., from user or kernel
+ * context, #DB needs to be handled on different stack: User #DB on
+ * current task stack, while kernel #DB on a dedicated stack.
+ *
+ * This is exactly how FRED event delivery invokes an exception
+ * handler: ring 3 event on level 0 stack, i.e., current task stack;
+ * ring 0 event on the #DB dedicated stack specified in the
+ * IA32_FRED_STKLVLS MSR. So unlike IDT, the FRED debug exception
+ * entry stub doesn't do stack switch.
+ */
+DEFINE_FREDENTRY_DEBUG(exc_debug)
+{
+ /*
+ * FRED #DB stores DR6 on the stack in the format which
+ * debug_read_clear_dr6() returns for the IDT entry points.
+ */
+ unsigned long dr6 = fred_event_data(regs);
+
+ if (user_mode(regs))
+ exc_debug_user(regs, dr6);
+ else
+ exc_debug_kernel(regs, dr6);
+}
+#endif /* CONFIG_X86_FRED */
+
#else
/* 32 bit does not have separate entry points. */
DEFINE_IDTENTRY_RAW(exc_debug)
--
2.34.1

2023-09-14 15:02:54

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 20/38] x86/fred: Disallow the swapgs instruction when FRED is enabled

From: "H. Peter Anvin (Intel)" <[email protected]>

SWAPGS is no longer needed thus NOT allowed with FRED because FRED
transitions ensure that an operating system can _always_ operate
with its own GS base address:
- For events that occur in ring 3, FRED event delivery swaps the GS
base address with the IA32_KERNEL_GS_BASE MSR.
- ERETU (the FRED transition that returns to ring 3) also swaps the
GS base address with the IA32_KERNEL_GS_BASE MSR.

And the operating system can still setup the GS segment for a user
thread without the need of loading a user thread GS with:
- Using LKGS, available with FRED, to modify other attributes of the
GS segment without compromising its ability always to operate with
its own GS base address.
- Accessing the GS segment base address for a user thread as before
using RDMSR or WRMSR on the IA32_KERNEL_GS_BASE MSR.

Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE MSR
instead of the GS segment’s descriptor cache. As such, the operating
system never changes its runtime GS base address.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---

Changes since v8:
* Explain why writing directly to the IA32_KERNEL_GS_BASE MSR is
doing the right thing (Thomas Gleixner).
---
arch/x86/kernel/process_64.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 0f78b58021bb..4f87f5987ae8 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -166,7 +166,29 @@ static noinstr unsigned long __rdgsbase_inactive(void)

lockdep_assert_irqs_disabled();

- if (!cpu_feature_enabled(X86_FEATURE_XENPV)) {
+ /*
+ * SWAPGS is no longer needed thus NOT allowed with FRED because
+ * FRED transitions ensure that an operating system can _always_
+ * operate with its own GS base address:
+ * - For events that occur in ring 3, FRED event delivery swaps
+ * the GS base address with the IA32_KERNEL_GS_BASE MSR.
+ * - ERETU (the FRED transition that returns to ring 3) also swaps
+ * the GS base address with the IA32_KERNEL_GS_BASE MSR.
+ *
+ * And the operating system can still setup the GS segment for a
+ * user thread without the need of loading a user thread GS with:
+ * - Using LKGS, available with FRED, to modify other attributes
+ * of the GS segment without compromising its ability always to
+ * operate with its own GS base address.
+ * - Accessing the GS segment base address for a user thread as
+ * before using RDMSR or WRMSR on the IA32_KERNEL_GS_BASE MSR.
+ *
+ * Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE
+ * MSR instead of the GS segment’s descriptor cache. As such, the
+ * operating system never changes its runtime GS base address.
+ */
+ if (!cpu_feature_enabled(X86_FEATURE_FRED) &&
+ !cpu_feature_enabled(X86_FEATURE_XENPV)) {
native_swapgs();
gsbase = rdgsbase();
native_swapgs();
@@ -191,7 +213,8 @@ static noinstr void __wrgsbase_inactive(unsigned long gsbase)
{
lockdep_assert_irqs_disabled();

- if (!cpu_feature_enabled(X86_FEATURE_XENPV)) {
+ if (!cpu_feature_enabled(X86_FEATURE_FRED) &&
+ !cpu_feature_enabled(X86_FEATURE_XENPV)) {
native_swapgs();
wrgsbase(gsbase);
native_swapgs();
--
2.34.1

2023-09-14 15:11:18

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 38/38] x86/fred: Invoke FRED initialization code to enable FRED

From: "H. Peter Anvin (Intel)" <[email protected]>

Let cpu_init_exception_handling() call cpu_init_fred_exceptions() to
initialize FRED. However if FRED is unavailable or disabled, it falls
back to set up TSS IST and initialize IDT.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Co-developed-by: Xin Li <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---

Changes since v8:
* Move this patch after all required changes are in place (Thomas
Gleixner).
---
arch/x86/kernel/cpu/common.c | 17 ++++++++++++-----
arch/x86/kernel/irqinit.c | 7 ++++++-
arch/x86/kernel/traps.c | 5 ++++-
3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4cb36e241c9a..e230d3f4c556 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -61,6 +61,7 @@
#include <asm/microcode.h>
#include <asm/intel-family.h>
#include <asm/cpu_device_id.h>
+#include <asm/fred.h>
#include <asm/uv/uv.h>
#include <asm/set_memory.h>
#include <asm/traps.h>
@@ -2128,7 +2129,10 @@ void syscall_init(void)
/* The default user and kernel segments */
wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);

- idt_syscall_init();
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ fred_syscall_init();
+ else
+ idt_syscall_init();
}

#else /* CONFIG_X86_64 */
@@ -2244,8 +2248,9 @@ void cpu_init_exception_handling(void)
/* paranoid_entry() gets the CPU number from the GDT */
setup_getcpu(cpu);

- /* IST vectors need TSS to be set up. */
- tss_setup_ist(tss);
+ /* For IDT mode, IST vectors need to be set in TSS. */
+ if (!cpu_feature_enabled(X86_FEATURE_FRED))
+ tss_setup_ist(tss);
tss_setup_io_bitmap(tss);
set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);

@@ -2254,8 +2259,10 @@ void cpu_init_exception_handling(void)
/* GHCB needs to be setup to handle #VC. */
setup_ghcb();

- /* Finally load the IDT */
- load_current_idt();
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ cpu_init_fred_exceptions();
+ else
+ load_current_idt();
}

/*
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index c683666876f1..f79c5edc0b89 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -28,6 +28,7 @@
#include <asm/setup.h>
#include <asm/i8259.h>
#include <asm/traps.h>
+#include <asm/fred.h>
#include <asm/prom.h>

/*
@@ -96,7 +97,11 @@ void __init native_init_IRQ(void)
/* Execute any quirks before the call gates are initialised: */
x86_init.irqs.pre_vector_init();

- idt_setup_apic_and_irq_gates();
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ fred_complete_exception_setup();
+ else
+ idt_setup_apic_and_irq_gates();
+
lapic_assign_system_vectors();

if (!acpi_ioapic && !of_ioapic && nr_legacy_irqs()) {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 848c85208a57..0ee78a30e14a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1411,7 +1411,10 @@ void __init trap_init(void)

/* Initialize TSS before setting up traps so ISTs work */
cpu_init_exception_handling();
+
/* Setup traps as cpu_init() might #GP */
- idt_setup_traps();
+ if (!cpu_feature_enabled(X86_FEATURE_FRED))
+ idt_setup_traps();
+
cpu_init();
}
--
2.34.1

2023-09-14 15:32:08

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 15/38] x86/ptrace: Cleanup the definition of the pt_regs structure

struct pt_regs is hard to read because the member or section related
comments are not aligned with the members.

The 'cs' and 'ss' members of pt_regs are type of 'unsigned long' while
in reality they are only 16-bit wide. This works so far as the
remaining space is unused, but FRED will use the remaining bits for
other purposes.

To prepare for FRED:

- Cleanup the formatting
- Convert 'cs' and 'ss' to u16 and embed them into an union
with a u64
- Fixup the related printk() format strings

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
arch/x86/include/asm/ptrace.h | 44 +++++++++++++++++++--------
arch/x86/kernel/process_64.c | 2 +-
3 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index e0ca8120aea8..a3c0df11d0e6 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -76,7 +76,7 @@ static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
if (!show_unhandled_signals)
return;

- printk_ratelimited("%s%s[%d] %s ip:%lx cs:%lx sp:%lx ax:%lx si:%lx di:%lx\n",
+ printk_ratelimited("%s%s[%d] %s ip:%lx cs:%x sp:%lx ax:%lx si:%lx di:%lx\n",
level, current->comm, task_pid_nr(current),
message, regs->ip, regs->cs,
regs->sp, regs->ax, regs->si, regs->di);
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index f4db78b09c8f..f08ea073edd6 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -57,17 +57,19 @@ struct pt_regs {
#else /* __i386__ */

struct pt_regs {
-/*
- * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
- * unless syscall needs a complete, fully filled "struct pt_regs".
- */
+ /*
+ * C ABI says these regs are callee-preserved. They aren't saved on
+ * kernel entry unless syscall needs a complete, fully filled
+ * "struct pt_regs".
+ */
unsigned long r15;
unsigned long r14;
unsigned long r13;
unsigned long r12;
unsigned long bp;
unsigned long bx;
-/* These regs are callee-clobbered. Always saved on kernel entry. */
+
+ /* These regs are callee-clobbered. Always saved on kernel entry. */
unsigned long r11;
unsigned long r10;
unsigned long r9;
@@ -77,18 +79,34 @@ struct pt_regs {
unsigned long dx;
unsigned long si;
unsigned long di;
-/*
- * On syscall entry, this is syscall#. On CPU exception, this is error code.
- * On hw interrupt, it's IRQ number:
- */
+
+ /*
+ * orig_ax is used on entry for:
+ * - the syscall number (syscall, sysenter, int80)
+ * - error_code stored by the CPU on traps and exceptions
+ * - the interrupt number for device interrupts
+ */
unsigned long orig_ax;
-/* Return frame for iretq */
+
+ /* The IRETQ return frame starts here */
unsigned long ip;
- unsigned long cs;
+
+ union {
+ u64 csx; // The full 64-bit data slot containing CS
+ u16 cs; // CS selector
+ };
+
unsigned long flags;
unsigned long sp;
- unsigned long ss;
-/* top of stack page */
+
+ union {
+ u64 ssx; // The full 64-bit data slot containing SS
+ u16 ss; // SS selector
+ };
+
+ /*
+ * Top of stack on IDT systems.
+ */
};

#endif /* !__i386__ */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 33b268747bb7..0f78b58021bb 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -117,7 +117,7 @@ void __show_regs(struct pt_regs *regs, enum show_regs_mode mode,

printk("%sFS: %016lx(%04x) GS:%016lx(%04x) knlGS:%016lx\n",
log_lvl, fs, fsindex, gs, gsindex, shadowgs);
- printk("%sCS: %04lx DS: %04x ES: %04x CR0: %016lx\n",
+ printk("%sCS: %04x DS: %04x ES: %04x CR0: %016lx\n",
log_lvl, regs->cs, ds, es, cr0);
printk("%sCR2: %016lx CR3: %016lx CR4: %016lx\n",
log_lvl, cr2, cr3, cr4);
--
2.34.1

2023-09-14 16:05:22

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 07/38] x86/fred: Add Kconfig option for FRED (CONFIG_X86_FRED)

From: "H. Peter Anvin (Intel)" <[email protected]>

Add the configuration option CONFIG_X86_FRED to enable FRED.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/Kconfig | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3b3594f96330..cae126417427 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -496,6 +496,15 @@ config X86_CPU_RESCTRL

Say N if unsure.

+config X86_FRED
+ bool "Flexible Return and Event Delivery"
+ depends on X86_64
+ help
+ When enabled, try to use Flexible Return and Event Delivery
+ instead of the legacy SYSCALL/SYSENTER/IDT architecture for
+ ring transitions and exception/interrupt handling if the
+ system supports.
+
if X86_32
config X86_BIGSMP
bool "Support for big SMP systems with more than 8 CPUs"
--
2.34.1

2023-09-14 16:10:45

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 30/38] x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled

From: "H. Peter Anvin (Intel)" <[email protected]>

Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled,
otherwise the existing IDT code is chosen.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/entry/entry_64.S | 6 ++++++
arch/x86/entry/entry_64_fred.S | 1 +
2 files changed, 7 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9cdb61ea91de..c9e14617dd3f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -309,7 +309,13 @@ SYM_CODE_START(ret_from_fork_asm)
* and unwind should work normally.
*/
UNWIND_HINT_REGS
+
+#ifdef CONFIG_X86_FRED
+ ALTERNATIVE "jmp swapgs_restore_regs_and_return_to_usermode", \
+ "jmp asm_fred_exit_user", X86_FEATURE_FRED
+#else
jmp swapgs_restore_regs_and_return_to_usermode
+#endif
SYM_CODE_END(ret_from_fork_asm)
.popsection

diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 37a1dd5e8ace..5781c3411b44 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -32,6 +32,7 @@
SYM_CODE_START_NOALIGN(asm_fred_entrypoint_user)
FRED_ENTER
call fred_entry_from_user
+SYM_INNER_LABEL(asm_fred_exit_user, SYM_L_GLOBAL)
FRED_EXIT
ERETU
SYM_CODE_END(asm_fred_entrypoint_user)
--
2.34.1

2023-09-14 17:25:45

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 13/38] x86/cpu: Add X86_CR4_FRED macro

From: "H. Peter Anvin (Intel)" <[email protected]>

Add X86_CR4_FRED macro for the FRED bit in %cr4. This bit must not be
changed after initialization, so add it to the pinned CR4 bits.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---

Changes since v9:
* Avoid a type cast by defining X86_CR4_FRED as 0 on 32-bit (Thomas
Gleixner).
---
arch/x86/include/uapi/asm/processor-flags.h | 7 +++++++
arch/x86/kernel/cpu/common.c | 5 ++---
2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..f1a4adc78272 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -139,6 +139,13 @@
#define X86_CR4_LAM_SUP_BIT 28 /* LAM for supervisor pointers */
#define X86_CR4_LAM_SUP _BITUL(X86_CR4_LAM_SUP_BIT)

+#ifdef __x86_64__
+#define X86_CR4_FRED_BIT 32 /* enable FRED kernel entry */
+#define X86_CR4_FRED _BITUL(X86_CR4_FRED_BIT)
+#else
+#define X86_CR4_FRED (0)
+#endif
+
/*
* x86-64 Task Priority Register, CR8
*/
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 317b4877e9c7..42511209469b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -400,9 +400,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
}

/* These bits should not change their value after CPU init is finished. */
-static const unsigned long cr4_pinned_mask =
- X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
- X86_CR4_FSGSBASE | X86_CR4_CET;
+static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
+ X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED;
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;

--
2.34.1

2023-09-14 19:46:03

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 11/38] x86/opcode: Add ERET[US] instructions to the x86 opcode map

From: "H. Peter Anvin (Intel)" <[email protected]>

ERETU returns from an event handler while making a transition to ring 3,
and ERETS returns from an event handler while staying in ring 0.

Add instruction opcodes used by ERET[US] to the x86 opcode map; opcode
numbers are per FRED spec v5.0.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 1efe1d9bf5ce..12af572201a2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -1052,7 +1052,7 @@ EndTable

GrpTable: Grp7
0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B)
-1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B)
+1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
3: LIDT Ms
4: SMSW Mw/Rv
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 1efe1d9bf5ce..12af572201a2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -1052,7 +1052,7 @@ EndTable

GrpTable: Grp7
0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B)
-1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B)
+1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
3: LIDT Ms
4: SMSW Mw/Rv
--
2.34.1

2023-09-14 19:48:39

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 26/38] x86/fred: Add a NMI entry stub for FRED

From: "H. Peter Anvin (Intel)" <[email protected]>

On a FRED system, NMIs nest both with themselves and faults, transient
information is saved into the stack frame, and NMI unblocking only
happens when the stack frame indicates that so should happen.

Thus, the NMI entry stub for FRED is really quite small...

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/kernel/nmi.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index a0c551846b35..58843fdf5cd0 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -34,6 +34,7 @@
#include <asm/cache.h>
#include <asm/nospec-branch.h>
#include <asm/sev.h>
+#include <asm/fred.h>

#define CREATE_TRACE_POINTS
#include <trace/events/nmi.h>
@@ -643,6 +644,33 @@ void nmi_backtrace_stall_check(const struct cpumask *btp)

#endif

+#ifdef CONFIG_X86_FRED
+/*
+ * With FRED, CR2/DR6 is pushed to #PF/#DB stack frame during FRED
+ * event delivery, i.e., there is no problem of transient states.
+ * And NMI unblocking only happens when the stack frame indicates
+ * that so should happen.
+ *
+ * Thus, the NMI entry stub for FRED is really straightforward and
+ * as simple as most exception handlers. As such, #DB is allowed
+ * during NMI handling.
+ */
+DEFINE_FREDENTRY_NMI(exc_nmi)
+{
+ irqentry_state_t irq_state;
+
+ if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id()))
+ return;
+
+ irq_state = irqentry_nmi_enter(regs);
+
+ inc_irq_stat(__nmi_count);
+ default_do_nmi(regs);
+
+ irqentry_nmi_exit(regs, irq_state);
+}
+#endif
+
void stop_nmi(void)
{
ignore_nmis++;
--
2.34.1

2023-09-14 22:06:54

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 08/38] x86/cpufeatures: Add the cpu feature bit for FRED

From: "H. Peter Anvin (Intel)" <[email protected]>

Any FRED CPU will always have the following features as its baseline:
1) LKGS, load attributes of the GS segment but the base address into
the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor
cache.
2) WRMSRNS, non-serializing WRMSR for faster MSR writes.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 2 ++
tools/arch/x86/include/asm/cpufeatures.h | 1 +
3 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 330876d34b68..57ae93dc1e52 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -321,6 +321,7 @@
#define X86_FEATURE_FZRM (12*32+10) /* "" Fast zero-length REP MOVSB */
#define X86_FEATURE_FSRS (12*32+11) /* "" Fast short REP STOSB */
#define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
+#define X86_FEATURE_FRED (12*32+17) /* Flexible Return and Event Delivery */
#define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" (userspace) GS */
#define X86_FEATURE_WRMSRNS (12*32+19) /* "" Non-Serializing Write to Model Specific Register instruction */
#define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index e462c1d3800a..b7174209d855 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -82,6 +82,8 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_XFD, X86_FEATURE_XGETBV1 },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
+ { X86_FEATURE_FRED, X86_FEATURE_LKGS },
+ { X86_FEATURE_FRED, X86_FEATURE_WRMSRNS },
{}
};

diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index 1b9d86ba5bc2..18bab7987d7f 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -317,6 +317,7 @@
#define X86_FEATURE_FZRM (12*32+10) /* "" Fast zero-length REP MOVSB */
#define X86_FEATURE_FSRS (12*32+11) /* "" Fast short REP STOSB */
#define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP {CMPSB,SCASB} */
+#define X86_FEATURE_FRED (12*32+17) /* Flexible Return and Event Delivery */
#define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" (userspace) GS */
#define X86_FEATURE_WRMSRNS (12*32+19) /* "" Non-Serializing Write to Model Specific Register instruction */
#define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */
--
2.34.1

2023-09-14 22:25:48

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 23/38] x86/fred: Make exc_page_fault() work for FRED

From: "H. Peter Anvin (Intel)" <[email protected]>

On a FRED system, the faulting address (CR2) is passed on the stack,
to avoid the problem of transient state. Thus we get the page fault
address from the stack instead of CR2.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/mm/fault.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ab778eac1952..7675bc067153 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -34,6 +34,7 @@
#include <asm/kvm_para.h> /* kvm_handle_async_pf */
#include <asm/vdso.h> /* fixup_vdso_exception() */
#include <asm/irq_stack.h>
+#include <asm/fred.h>

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -1516,8 +1517,10 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,

DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
{
- unsigned long address = read_cr2();
irqentry_state_t state;
+ unsigned long address;
+
+ address = cpu_feature_enabled(X86_FEATURE_FRED) ? fred_event_data(regs) : read_cr2();

prefetchw(&current->mm->mmap_lock);

--
2.34.1

2023-09-15 04:44:12

by Li, Xin3

[permalink] [raw]
Subject: [PATCH v10 17/38] x86/fred: Add a new header file for FRED definitions

From: "H. Peter Anvin (Intel)" <[email protected]>

Add a header file for FRED prototypes and definitions.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---

Changes since v6:
* Replace pt_regs csx flags prefix FRED_CSL_ with FRED_CSX_.
---
arch/x86/include/asm/fred.h | 68 +++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
create mode 100644 arch/x86/include/asm/fred.h

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
new file mode 100644
index 000000000000..f514fdb5a39f
--- /dev/null
+++ b/arch/x86/include/asm/fred.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Macros for Flexible Return and Event Delivery (FRED)
+ */
+
+#ifndef ASM_X86_FRED_H
+#define ASM_X86_FRED_H
+
+#include <linux/const.h>
+
+#include <asm/asm.h>
+
+/*
+ * FRED event return instruction opcodes for ERET{S,U}; supported in
+ * binutils >= 2.41.
+ */
+#define ERETS _ASM_BYTES(0xf2,0x0f,0x01,0xca)
+#define ERETU _ASM_BYTES(0xf3,0x0f,0x01,0xca)
+
+/*
+ * RSP is aligned to a 64-byte boundary before used to push a new stack frame
+ */
+#define FRED_STACK_FRAME_RSP_MASK _AT(unsigned long, (~0x3f))
+
+/*
+ * Used for the return address for call emulation during code patching,
+ * and measured in 64-byte cache lines.
+ */
+#define FRED_CONFIG_REDZONE_AMOUNT 1
+#define FRED_CONFIG_REDZONE (_AT(unsigned long, FRED_CONFIG_REDZONE_AMOUNT) << 6)
+#define FRED_CONFIG_INT_STKLVL(l) (_AT(unsigned long, l) << 9)
+#define FRED_CONFIG_ENTRYPOINT(p) _AT(unsigned long, (p))
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_X86_FRED
+#include <linux/kernel.h>
+
+#include <asm/ptrace.h>
+
+struct fred_info {
+ /* Event data: CR2, DR6, ... */
+ unsigned long edata;
+ unsigned long resv;
+};
+
+/* Full format of the FRED stack frame */
+struct fred_frame {
+ struct pt_regs regs;
+ struct fred_info info;
+};
+
+static __always_inline struct fred_info *fred_info(struct pt_regs *regs)
+{
+ return &container_of(regs, struct fred_frame, regs)->info;
+}
+
+static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
+{
+ return fred_info(regs)->edata;
+}
+
+#else /* CONFIG_X86_FRED */
+static __always_inline unsigned long fred_event_data(struct pt_regs *regs) { return 0; }
+#endif /* CONFIG_X86_FRED */
+#endif /* !__ASSEMBLY__ */
+
+#endif /* ASM_X86_FRED_H */
--
2.34.1

2023-09-20 07:43:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()

On Wed, Sep 13 2023 at 21:48, Xin Li wrote:
> +static inline void fred_syscall_init(void)
> +{
> + /*
> + * Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL
> + * and SYSENTER, and ERETU is the only legit instruction to return to
> + * ring 3, as a result there is _no_ need to setup the SYSCALL and
> + * SYSENTER MSRs.
> + *
> + * Note, both sysexit and sysret cause #UD when FRED is enabled.
> + */
> + wrmsrl(MSR_LSTAR, 0ULL);
> + wrmsrl_cstar(0ULL);

That write is pointless. See the comment in wrmsrl_cstar().

Thanks,

tglx

2023-09-20 12:41:19

by Li, Xin3

[permalink] [raw]
Subject: RE: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()

> > +static inline void fred_syscall_init(void) {
> > + /*
> > + * Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL
> > + * and SYSENTER, and ERETU is the only legit instruction to return to
> > + * ring 3, as a result there is _no_ need to setup the SYSCALL and
> > + * SYSENTER MSRs.
> > + *
> > + * Note, both sysexit and sysret cause #UD when FRED is enabled.
> > + */
> > + wrmsrl(MSR_LSTAR, 0ULL);
> > + wrmsrl_cstar(0ULL);
>
> That write is pointless. See the comment in wrmsrl_cstar().

What I heard is that AMD is going to support FRED.

Both LSTAR and CSTAR have no function when FRED is enabled, so maybe
just do NOT write to them?

Thanks!
Xin

2023-09-20 18:07:13

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()

On Wed, Sep 20 2023 at 04:33, Li, Xin3 wrote:
>> > +static inline void fred_syscall_init(void) {
>> > + /*
>> > + * Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL
>> > + * and SYSENTER, and ERETU is the only legit instruction to return to
>> > + * ring 3, as a result there is _no_ need to setup the SYSCALL and
>> > + * SYSENTER MSRs.
>> > + *
>> > + * Note, both sysexit and sysret cause #UD when FRED is enabled.
>> > + */
>> > + wrmsrl(MSR_LSTAR, 0ULL);
>> > + wrmsrl_cstar(0ULL);
>>
>> That write is pointless. See the comment in wrmsrl_cstar().
>
> What I heard is that AMD is going to support FRED.
>
> Both LSTAR and CSTAR have no function when FRED is enabled, so maybe
> just do NOT write to them?

Right. If AMD needs to clear it then it's trivial enough to add a
wrmsrl_cstar(0) to it.

2023-09-20 20:04:27

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH v10 09/38] x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled



On 14.09.23 г. 7:47 ч., Xin Li wrote:
> From: "H. Peter Anvin (Intel)" <[email protected]>
>
> Add CONFIG_X86_FRED to <asm/disabled-features.h> to make
> cpu_feature_enabled() work correctly with FRED.
>
> Originally-by: Megha Dey <[email protected]>
> Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
> Tested-by: Shan Kang <[email protected]>
> Signed-off-by: Xin Li <[email protected]>
> ---
> arch/x86/include/asm/disabled-features.h | 8 +++++++-
> tools/arch/x86/include/asm/disabled-features.h | 8 +++++++-
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 702d93fdd10e..3cde57cb5093 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -117,6 +117,12 @@
> #define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31))
> #endif
>
> +#ifdef CONFIG_X86_FRED
> +# define DISABLE_FRED 0
> +#else
> +# define DISABLE_FRED (1 << (X86_FEATURE_FRED & 31))
> +#endif
> +
> /*
> * Make sure to add features to the correct mask
> */
> @@ -134,7 +140,7 @@
> #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
> DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
> #define DISABLED_MASK12 (DISABLE_LAM)
> -#define DISABLED_MASK13 0
> +#define DISABLED_MASK13 (DISABLE_FRED)
> #define DISABLED_MASK14 0
> #define DISABLED_MASK15 0
> #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
> diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h
> index fafe9be7a6f4..d540ecdd8812 100644
> --- a/tools/arch/x86/include/asm/disabled-features.h
> +++ b/tools/arch/x86/include/asm/disabled-features.h
> @@ -105,6 +105,12 @@
> # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
> #endif
>
> +#ifdef CONFIG_X86_FRED
> +# define DISABLE_FRED 0
> +#else
> +# define DISABLE_FRED (1 << (X86_FEATURE_FRED & 31))
> +#endif
> +
> /*
> * Make sure to add features to the correct mask
> */
> @@ -122,7 +128,7 @@
> #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
> DISABLE_CALL_DEPTH_TRACKING)
> #define DISABLED_MASK12 (DISABLE_LAM)
> -#define DISABLED_MASK13 0
> +#define DISABLED_MASK13 (DISABLE_FRED)

FRED feature is defined in cpuid word 12, not 13

> #define DISABLED_MASK14 0
> #define DISABLED_MASK15 0
> #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \

2023-09-20 20:20:53

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH v10 13/38] x86/cpu: Add X86_CR4_FRED macro



On 14.09.23 г. 7:47 ч., Xin Li wrote:
> From: "H. Peter Anvin (Intel)" <[email protected]>
>
> Add X86_CR4_FRED macro for the FRED bit in %cr4. This bit must not be
> changed after initialization, so add it to the pinned CR4 bits.
>
> Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
> Tested-by: Shan Kang <[email protected]>
> Signed-off-by: Xin Li <[email protected]>
> ---
>
> Changes since v9:
> * Avoid a type cast by defining X86_CR4_FRED as 0 on 32-bit (Thomas
> Gleixner).
> ---
> arch/x86/include/uapi/asm/processor-flags.h | 7 +++++++
> arch/x86/kernel/cpu/common.c | 5 ++---
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
> index d898432947ff..f1a4adc78272 100644
> --- a/arch/x86/include/uapi/asm/processor-flags.h
> +++ b/arch/x86/include/uapi/asm/processor-flags.h
> @@ -139,6 +139,13 @@
> #define X86_CR4_LAM_SUP_BIT 28 /* LAM for supervisor pointers */
> #define X86_CR4_LAM_SUP _BITUL(X86_CR4_LAM_SUP_BIT)
>
> +#ifdef __x86_64__
> +#define X86_CR4_FRED_BIT 32 /* enable FRED kernel entry */
> +#define X86_CR4_FRED _BITUL(X86_CR4_FRED_BIT)

nit: s/BITUL/BITULL I guess if __x86_64__ is defined then we are
guaranteed that unsigned long will be a 64 bit, but for the sake of
clarity I'd rather have this spelled out explicitly by using BITULL


> +#else
> +#define X86_CR4_FRED (0)
> +#endif
> +
> /*
> * x86-64 Task Priority Register, CR8
> */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 317b4877e9c7..42511209469b 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -400,9 +400,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
> }
>
> /* These bits should not change their value after CPU init is finished. */
> -static const unsigned long cr4_pinned_mask =
> - X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
> - X86_CR4_FSGSBASE | X86_CR4_CET;
> +static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
> + X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED;
> static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
> static unsigned long cr4_pinned_bits __ro_after_init;
>

2023-09-21 05:31:57

by H. Peter Anvin

[permalink] [raw]
Subject: RE: [PATCH v10 36/38] x86/fred: Add fred_syscall_init()

On September 20, 2023 1:18:14 AM PDT, Thomas Gleixner <[email protected]> wrote:
>On Wed, Sep 20 2023 at 04:33, Li, Xin3 wrote:
>>> > +static inline void fred_syscall_init(void) {
>>> > + /*
>>> > + * Per FRED spec 5.0, FRED uses the ring 3 FRED entrypoint for SYSCALL
>>> > + * and SYSENTER, and ERETU is the only legit instruction to return to
>>> > + * ring 3, as a result there is _no_ need to setup the SYSCALL and
>>> > + * SYSENTER MSRs.
>>> > + *
>>> > + * Note, both sysexit and sysret cause #UD when FRED is enabled.
>>> > + */
>>> > + wrmsrl(MSR_LSTAR, 0ULL);
>>> > + wrmsrl_cstar(0ULL);
>>>
>>> That write is pointless. See the comment in wrmsrl_cstar().
>>
>> What I heard is that AMD is going to support FRED.
>>
>> Both LSTAR and CSTAR have no function when FRED is enabled, so maybe
>> just do NOT write to them?
>
>Right. If AMD needs to clear it then it's trivial enough to add a
>wrmsrl_cstar(0) to it.

Just to clarify: the only reason I added the writes here was to possibly make bugs easier to track down. There is indeed no functional reason.

2023-09-21 06:41:48

by Li, Xin3

[permalink] [raw]
Subject: RE: [PATCH v10 13/38] x86/cpu: Add X86_CR4_FRED macro

> > +#ifdef __x86_64__
> > +#define X86_CR4_FRED_BIT 32 /* enable FRED kernel entry */
> > +#define X86_CR4_FRED _BITUL(X86_CR4_FRED_BIT)
>
> nit: s/BITUL/BITULL I guess if __x86_64__ is defined then we are
> guaranteed that unsigned long will be a 64 bit, but for the sake of
> clarity I'd rather have this spelled out explicitly by using BITULL
>

UL is better because CR4 is a machine word.

>
>
> > +#else
> > +#define X86_CR4_FRED (0)
> > +#endif