2022-12-20 07:23:16

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 00/32] x86: enable FRED for x86-64

This patch set enables FRED for x86-64, and it's based on the previous LKGS
patch set.

The Intel flexible return and event delivery (FRED) architecture defines simple
new transitions that change privilege level (ring transitions). The FRED
architecture was designed with the following goals:
1) Improve overall performance and response time by replacing event delivery
through the interrupt descriptor table (IDT event delivery) and event return by
the IRET instruction with lower latency transitions.
2) Improve software robustness by ensuring that event delivery establishes the
full supervisor context and that event return establishes the full user context.

The new transitions defined by the FRED architecture are FRED event delivery and,
for returning from events, two FRED return instructions. FRED event delivery can
effect a transition from ring 3 to ring 0, but it is used also to deliver events
incident to ring 0. One FRED instruction (ERETU) effects a return from ring 0 to
ring 3, while the other (ERETS) returns while remaining in ring 0.

In addition to these transitions, the FRED architecture defines a new instruction
(LKGS) for managing the state of the GS segment register. The LKGS instruction
can be used by 64-bit operating systems that do not use the new FRED transitions.

The Intel FRED architecture spec can be downloaded from:
https://cdrdv2.intel.com/v1/dl/getContent/678938

As of now there is no publicly avaiable CPU supporting FRED, thus the Intel
Simics® Simulator is used as software development and testing vehicle.
To enable FRED, Simics package 8112 QSP-CPU needs to be installed with CPU
model configured as:
$cpu_comp_class = "x86-experimental-fred"

Longer term, we should refactor common code shared by FRED and IDT into common
shared files, and contain IDT code using a new config CONFIG_X86_IDT.

TODO: call exc_raise_irq() to reinject IRQ in KVM VMX.

H. Peter Anvin (Intel) (24):
x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR
x86/traps: add a system interrupt table for system interrupt dispatch
x86/traps: add external_interrupt() to dispatch external interrupts
x86/cpufeature: add the cpu feature bit for FRED
x86/opcode: add ERETU, ERETS instructions to x86-opcode-map
x86/objtool: teach objtool about ERETU and ERETS
x86/cpu: add X86_CR4_FRED macro
x86/fred: add Kconfig option for FRED (CONFIG_X86_FRED)
x86/fred: if CONFIG_X86_FRED is disabled, disable FRED support
x86/cpu: add MSR numbers for FRED configuration
x86/fred: header file with FRED definitions
x86/fred: make unions for the cs and ss fields in struct pt_regs
x86/fred: reserve space for the FRED stack frame
x86/fred: add a page fault entry stub for FRED
x86/fred: add a debug fault entry stub for FRED
x86/fred: add a NMI entry stub for FRED
x86/fred: FRED entry/exit and dispatch code
x86/fred: FRED initialization code
x86/fred: update MSR_IA32_FRED_RSP0 during task switch
x86/fred: let ret_from_fork() jmp to fred_exit_user when FRED is
enabled
x86/fred: disallow the swapgs instruction when FRED is enabled
x86/fred: no ESPFIX needed when FRED is enabled
x86/fred: allow single-step trap and NMI when starting a new thread
x86/fred: allow FRED systems to use interrupt vectors 0x10-0x1f

Xin Li (8):
x86/traps: add install_system_interrupt_handler()
x86/traps: add exc_raise_irq() for VMX IRQ reinjection
x86/fred: header file for event types
x86/fred: add a machine check entry stub for FRED
x86/fred: fixup fault on ERETU by jumping to fred_entrypoint_user
x86/ia32: do not modify the DPL bits for a null selector
x86/fred: allow dynamic stack frame size
x86/fred: disable FRED by default in its early stage

.../admin-guide/kernel-parameters.txt | 4 +
arch/x86/Kconfig | 9 +
arch/x86/entry/Makefile | 5 +-
arch/x86/entry/entry_32.S | 2 +-
arch/x86/entry/entry_64.S | 5 +
arch/x86/entry/entry_64_fred.S | 59 ++++
arch/x86/entry/entry_fred.c | 272 ++++++++++++++++++
arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
arch/x86/ia32/ia32_signal.c | 21 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/entry-common.h | 3 +
arch/x86/include/asm/event-type.h | 17 ++
arch/x86/include/asm/extable_fixup_types.h | 4 +
arch/x86/include/asm/fred.h | 129 +++++++++
arch/x86/include/asm/idtentry.h | 62 +++-
arch/x86/include/asm/irq.h | 5 +
arch/x86/include/asm/irq_vectors.h | 15 +-
arch/x86/include/asm/msr-index.h | 12 +-
arch/x86/include/asm/processor.h | 12 +-
arch/x86/include/asm/ptrace.h | 36 ++-
arch/x86/include/asm/switch_to.h | 7 +-
arch/x86/include/asm/thread_info.h | 35 +--
arch/x86/include/asm/traps.h | 15 +
arch/x86/include/asm/vmx.h | 17 +-
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/apic/apic.c | 11 +-
arch/x86/kernel/apic/vector.c | 8 +-
arch/x86/kernel/cpu/acrn.c | 7 +-
arch/x86/kernel/cpu/common.c | 88 ++++--
arch/x86/kernel/cpu/mce/core.c | 11 +
arch/x86/kernel/cpu/mshyperv.c | 22 +-
arch/x86/kernel/espfix_64.c | 8 +
arch/x86/kernel/fred.c | 73 +++++
arch/x86/kernel/head_32.S | 3 +-
arch/x86/kernel/idt.c | 6 +-
arch/x86/kernel/irq.c | 6 +-
arch/x86/kernel/irqinit.c | 7 +-
arch/x86/kernel/kvm.c | 4 +-
arch/x86/kernel/nmi.c | 28 ++
arch/x86/kernel/process.c | 5 +
arch/x86/kernel/process_64.c | 21 +-
arch/x86/kernel/traps.c | 180 ++++++++++--
arch/x86/lib/x86-opcode-map.txt | 2 +-
arch/x86/mm/extable.c | 28 ++
arch/x86/mm/fault.c | 20 +-
drivers/xen/events/events_base.c | 5 +-
kernel/fork.c | 6 +
tools/arch/x86/include/asm/cpufeatures.h | 1 +
.../arch/x86/include/asm/disabled-features.h | 8 +-
tools/arch/x86/include/asm/msr-index.h | 12 +-
tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
tools/objtool/arch/x86/decode.c | 22 +-
54 files changed, 1180 insertions(+), 174 deletions(-)
create mode 100644 arch/x86/entry/entry_64_fred.S
create mode 100644 arch/x86/entry/entry_fred.c
create mode 100644 arch/x86/include/asm/event-type.h
create mode 100644 arch/x86/include/asm/fred.h
create mode 100644 arch/x86/kernel/fred.c

--
2.34.1


2022-12-20 07:23:27

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 07/32] x86/opcode: add ERETU, ERETS instructions to x86-opcode-map

From: "H. Peter Anvin (Intel)" <[email protected]>

Add the instruction opcodes used by FRED: ERETU, ERETS.
Opcode number is per public FRED draft spec v3.0
https://cdrdv2.intel.com/v1/dl/getContent/678938.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..7a269e269dc0 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -1052,7 +1052,7 @@ EndTable

GrpTable: Grp7
0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
-1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B)
+1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
3: LIDT Ms
4: SMSW Mw/Rv
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..7a269e269dc0 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -1052,7 +1052,7 @@ EndTable

GrpTable: Grp7
0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
-1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B)
+1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
3: LIDT Ms
4: SMSW Mw/Rv
--
2.34.1

2022-12-20 07:23:56

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 31/32] x86/fred: allow dynamic stack frame size

A FRED stack frame could contain different amount of information for
different event types, or perhaps even for different instances of the
same event type. Thus we need to eliminate the need of any advance
information of the stack frame size to allow dynamic stack frame size.

Implement it through:
1) add a new field user_pt_regs to thread_info, and initialize it
with a pointer to a virtual pt_regs structure at the top of a
thread stack.
2) save a pointer to the user-space pt_regs structure created by
fred_entrypoint_user() to user_pt_regs in fred_entry_from_user().
3) initialize the init_thread_info's user_pt_regs with a pointer to
a virtual pt_regs structure at the top of init stack.

This approach also works for IDT, thus we unify the code.

Suggested-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/entry/entry_32.S | 2 +-
arch/x86/entry/entry_fred.c | 2 ++
arch/x86/include/asm/entry-common.h | 3 +++
arch/x86/include/asm/processor.h | 12 +++------
arch/x86/include/asm/switch_to.h | 3 +--
arch/x86/include/asm/thread_info.h | 41 ++++-------------------------
arch/x86/kernel/head_32.S | 3 +--
arch/x86/kernel/process.c | 5 ++++
kernel/fork.c | 6 +++++
9 files changed, 27 insertions(+), 50 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e309e7156038..d98cc64ca82b 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1244,7 +1244,7 @@ SYM_CODE_START(rewind_stack_and_make_dead)
xorl %ebp, %ebp

movl PER_CPU_VAR(cpu_current_top_of_stack), %esi
- leal -TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%esi), %esp
+ leal -PTREGS_SIZE(%esi), %esp

call make_task_dead
1: jmp 1b
diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
index 56814ab0b825..140d9110bc39 100644
--- a/arch/x86/entry/entry_fred.c
+++ b/arch/x86/entry/entry_fred.c
@@ -216,6 +216,8 @@ __visible noinstr void fred_entry_from_user(struct pt_regs *regs)
[EVENT_TYPE_OTHER] = fred_syscall_slow
};

+ current->thread_info.user_pt_regs = regs;
+
/*
* FRED employs a two-level event dispatch mechanism, with
* the first-level on the type of an event and the second-level
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 674ed46d3ced..21e1e3ef9e33 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -12,6 +12,9 @@
/* Check that the stack and regs on entry from user mode are sane. */
static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs)
{
+ if (!cpu_feature_enabled(X86_FEATURE_FRED))
+ current->thread_info.user_pt_regs = regs;
+
if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) {
/*
* Make sure that the entry code gave us a sensible EFLAGS
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 67c9d73b31fa..6d573eeea074 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -747,17 +747,11 @@ static inline void spin_lock_prefetch(const void *x)
prefetchw(x);
}

-#define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \
- TOP_OF_KERNEL_STACK_PADDING)
+#define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack))

-#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
+#define task_top_of_stack(task) ((unsigned long)task_stack_page(task) + THREAD_SIZE)

-#define task_pt_regs(task) \
-({ \
- unsigned long __ptr = (unsigned long)task_stack_page(task); \
- __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
- ((struct pt_regs *)__ptr) - 1; \
-})
+#define task_pt_regs(task) ((task)->thread_info.user_pt_regs)

#ifdef CONFIG_X86_32
#define INIT_THREAD { \
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index c28170d4fbba..8ad5788da416 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -72,8 +72,7 @@ static inline void update_task_stack(struct task_struct *task)
this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
#else
if (cpu_feature_enabled(X86_FEATURE_FRED)) {
- wrmsrl(MSR_IA32_FRED_RSP0,
- task_top_of_stack(task) + TOP_OF_KERNEL_STACK_PADDING);
+ wrmsrl(MSR_IA32_FRED_RSP0, task_top_of_stack(task));
} else if (static_cpu_has(X86_FEATURE_XENPV)) {
/* Xen PV enters the kernel on the thread stack. */
load_sp0(task_top_of_stack(task));
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index fea0e69fc3d4..9b88b7a04fda 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -13,42 +13,6 @@
#include <asm/percpu.h>
#include <asm/types.h>

-/*
- * TOP_OF_KERNEL_STACK_PADDING is a number of unused bytes that we
- * reserve at the top of the kernel stack. We do it because of a nasty
- * 32-bit corner case. On x86_32, the hardware stack frame is
- * variable-length. Except for vm86 mode, struct pt_regs assumes a
- * maximum-length frame. If we enter from CPL 0, the top 8 bytes of
- * pt_regs don't actually exist. Ordinarily this doesn't matter, but it
- * does in at least one case:
- *
- * If we take an NMI early enough in SYSENTER, then we can end up with
- * pt_regs that extends above sp0. On the way out, in the espfix code,
- * we can read the saved SS value, but that value will be above sp0.
- * Without this offset, that can result in a page fault. (We are
- * careful that, in this case, the value we read doesn't matter.)
- *
- * In vm86 mode, the hardware frame is much longer still, so add 16
- * bytes to make room for the real-mode segments.
- *
- * x86-64 has a fixed-length stack frame, but it depends on whether
- * or not FRED is enabled. Future versions of FRED might make this
- * dynamic, but for now it is always 2 words longer.
- */
-#ifdef CONFIG_X86_32
-# ifdef CONFIG_VM86
-# define TOP_OF_KERNEL_STACK_PADDING 16
-# else
-# define TOP_OF_KERNEL_STACK_PADDING 8
-# endif
-#else /* x86-64 */
-# ifdef CONFIG_X86_FRED
-# define TOP_OF_KERNEL_STACK_PADDING (2*8)
-# else
-# define TOP_OF_KERNEL_STACK_PADDING 0
-# endif
-#endif
-
/*
* low level task data that entry.S needs immediate access to
* - this struct should fit entirely inside of one cache line
@@ -56,6 +20,7 @@
*/
#ifndef __ASSEMBLY__
struct task_struct;
+struct pt_regs;
#include <asm/cpufeature.h>
#include <linux/atomic.h>

@@ -66,11 +31,14 @@ struct thread_info {
#ifdef CONFIG_SMP
u32 cpu; /* current CPU */
#endif
+ struct pt_regs *user_pt_regs;
};

+#define INIT_TASK_PT_REGS ((struct pt_regs *)TOP_OF_INIT_STACK - 1)
#define INIT_THREAD_INFO(tsk) \
{ \
.flags = 0, \
+ .user_pt_regs = INIT_TASK_PT_REGS, \
}

#else /* !__ASSEMBLY__ */
@@ -235,6 +203,7 @@ static inline int arch_within_stack_frames(const void * const stack,

extern void arch_task_cache_init(void);
extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
+extern void arch_init_user_pt_regs(struct task_struct *tsk);
extern void arch_release_task_struct(struct task_struct *tsk);
extern void arch_setup_new_exec(void);
#define arch_setup_new_exec arch_setup_new_exec
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 9b7acc9c7874..8961946f1418 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -539,8 +539,7 @@ SYM_DATA_END(initial_page_table)
* reliably detect the end of the stack.
*/
SYM_DATA(initial_stack,
- .long init_thread_union + THREAD_SIZE -
- SIZEOF_PTREGS - TOP_OF_KERNEL_STACK_PADDING)
+ .long init_thread_union + THREAD_SIZE - SIZEOF_PTREGS)

__INITRODATA
int_msg:
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index e436c9c1ef3b..6294d41f7691 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -97,6 +97,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
return 0;
}

+void arch_init_user_pt_regs(struct task_struct *tsk)
+{
+ tsk->thread_info.user_pt_regs = (struct pt_regs *)task_top_of_stack(tsk)- 1;
+}
+
#ifdef CONFIG_X86_64
void arch_release_task_struct(struct task_struct *tsk)
{
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5aa38d..00bd585a4e07 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -948,6 +948,10 @@ int __weak arch_dup_task_struct(struct task_struct *dst,
return 0;
}

+void __weak arch_init_user_pt_regs(struct task_struct *tsk)
+{
+}
+
void set_task_stack_end_magic(struct task_struct *tsk)
{
unsigned long *stackend;
@@ -975,6 +979,8 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
if (err)
goto free_tsk;

+ arch_init_user_pt_regs(tsk);
+
#ifdef CONFIG_THREAD_INFO_IN_TASK
refcount_set(&tsk->stack_refcount, 1);
#endif
--
2.34.1

2022-12-20 07:24:04

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 28/32] x86/fred: fixup fault on ERETU by jumping to fred_entrypoint_user

If the stack frame contains an invalid user context (e.g. due to invalid SS,
a non-canonical RIP, etc.) the ERETU instruction will trap (#SS or #GP).

From a Linux point of view, this really should be considered a user space
failure, so use the standard fault fixup mechanism to intercept the fault,
fix up the exception frame, and redirect execution to fred_entrypoint_user.
The end result is that it appears just as if the hardware had taken the
exception immediately after completing the transition to user space.

Suggested-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/entry/entry_64_fred.S | 8 +++++--
arch/x86/include/asm/extable_fixup_types.h | 4 ++++
arch/x86/mm/extable.c | 28 ++++++++++++++++++++++
3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 1fb765fd3871..027ef8f1e600 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -5,8 +5,10 @@
* The actual FRED entry points.
*/
#include <linux/linkage.h>
-#include <asm/errno.h>
+#include <asm/asm.h>
#include <asm/asm-offsets.h>
+#include <asm/errno.h>
+#include <asm/export.h>
#include <asm/fred.h>

#include "calling.h"
@@ -38,7 +40,9 @@ SYM_CODE_START_NOALIGN(fred_entrypoint_user)
call fred_entry_from_user
SYM_INNER_LABEL(fred_exit_user, SYM_L_GLOBAL)
FRED_EXIT
- ERETU
+1: ERETU
+
+ _ASM_EXTABLE_TYPE(1b, fred_entrypoint_user, EX_TYPE_ERETU)
SYM_CODE_END(fred_entrypoint_user)

/*
diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index 991e31cfde94..ddebd5b8b340 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -66,4 +66,8 @@

#define EX_TYPE_ZEROPAD 20 /* longword load with zeropad on fault */

+#ifdef CONFIG_X86_FRED
+#define EX_TYPE_ERETU 21
+#endif
+
#endif
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 60814e110a54..be9d75358f50 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -6,6 +6,7 @@
#include <xen/xen.h>

#include <asm/fpu/api.h>
+#include <asm/fred.h>
#include <asm/sev.h>
#include <asm/traps.h>
#include <asm/kdebug.h>
@@ -195,6 +196,29 @@ static bool ex_handler_ucopy_len(const struct exception_table_entry *fixup,
return ex_handler_uaccess(fixup, regs, trapnr);
}

+#ifdef CONFIG_X86_FRED
+static bool ex_handler_eretu(const struct exception_table_entry *fixup,
+ struct pt_regs *regs, unsigned long error_code)
+{
+ struct pt_regs *uregs = (struct pt_regs *)(regs->sp - offsetof(struct pt_regs, ip));
+ unsigned short ss = uregs->ss;
+ unsigned short cs = uregs->cs;
+
+ fred_info(uregs)->edata = fred_event_data(regs);
+ uregs->ssl = regs->ssl;
+ uregs->ss = ss;
+ uregs->csl = regs->csl;
+ uregs->current_stack_level = 0;
+ uregs->cs = cs;
+ uregs->orig_ax = error_code;
+
+ /* drop error code */
+ regs->sp -= 8;
+
+ return ex_handler_default(fixup, regs);
+}
+#endif
+
int ex_get_fixup_type(unsigned long ip)
{
const struct exception_table_entry *e = search_exception_tables(ip);
@@ -272,6 +296,10 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
return ex_handler_ucopy_len(e, regs, trapnr, reg, imm);
case EX_TYPE_ZEROPAD:
return ex_handler_zeropad(e, regs, fault_addr);
+#ifdef CONFIG_X86_FRED
+ case EX_TYPE_ERETU:
+ return ex_handler_eretu(e, regs, error_code);
+#endif
}
BUG();
}
--
2.34.1

2022-12-20 07:24:27

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 27/32] x86/fred: allow single-step trap and NMI when starting a new thread

From: "H. Peter Anvin (Intel)" <[email protected]>

Allow single-step trap and NMI when starting a new thread, thus once
the new thread returns to ring3, single-step trap and NMI are both
enabled immediately.

High-order 48 bits above the lowest 16 bit CS are discarded by the
legacy IRET instruction, thus can be set unconditionally, even when
FRED is not enabled.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/fred.h | 11 +++++++++++
arch/x86/kernel/process_64.c | 13 +++++++------
2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index b6308e351e14..730e69d2bb87 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -50,6 +50,14 @@
#define FRED_CSL_ALLOW_SINGLE_STEP _BITUL(25)
#define FRED_CSL_INTERRUPT_SHADOW _BITUL(24)

+/*
+ * High-order 48 bits above the lowest 16 bit CS are discarded by the
+ * legacy IRET instruction, thus can be set unconditionally, even when
+ * FRED is not enabled.
+ */
+#define CSL_PROCESS_START \
+ (FRED_CSL_ENABLE_NMI | FRED_CSL_ALLOW_SINGLE_STEP)
+
#ifndef __ASSEMBLY__

#include <linux/kernel.h>
@@ -113,6 +121,9 @@ void fred_setup_apic(void);
#else
#define cpu_init_fred_exceptions() BUG()
#define fred_setup_apic() BUG()
+
+#define CSL_PROCESS_START 0
+
#endif /* CONFIG_X86_FRED */

#endif /* ASM_X86_FRED_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5b6cfd2ca630..128dafc04acf 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -55,6 +55,7 @@
#include <asm/resctrl.h>
#include <asm/unistd.h>
#include <asm/fsgsbase.h>
+#include <asm/fred.h>
#ifdef CONFIG_IA32_EMULATION
/* Not included via unistd.h */
#include <asm/unistd_32_ia32.h>
@@ -506,7 +507,7 @@ void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase)
static void
start_thread_common(struct pt_regs *regs, unsigned long new_ip,
unsigned long new_sp,
- unsigned int _cs, unsigned int _ss, unsigned int _ds)
+ u16 _cs, u16 _ss, u16 _ds)
{
WARN_ON_ONCE(regs != current_pt_regs());

@@ -521,11 +522,11 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
loadsegment(ds, _ds);
load_gs_index(0);

- regs->ip = new_ip;
- regs->sp = new_sp;
- regs->cs = _cs;
- regs->ss = _ss;
- regs->flags = X86_EFLAGS_IF;
+ regs->ip = new_ip;
+ regs->sp = new_sp;
+ regs->csl = _cs | CSL_PROCESS_START;
+ regs->ssl = _ss;
+ regs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
}

void
--
2.34.1

2022-12-20 07:24:45

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 08/32] x86/objtool: teach objtool about ERETU and ERETS

From: "H. Peter Anvin (Intel)" <[email protected]>

Update the objtool decoder to know about the ERETU and ERETS
instructions (type INSN_CONTEXT_SWITCH.)

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
tools/objtool/arch/x86/decode.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 1c253b4b7ce0..fbfe0a39599a 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -480,12 +480,22 @@ int arch_decode_instruction(struct objtool_file *file, const struct section *sec
case 0x0f:

if (op2 == 0x01) {
-
- if (modrm == 0xca)
- *type = INSN_CLAC;
- else if (modrm == 0xcb)
- *type = INSN_STAC;
-
+ switch (insn_last_prefix_id(&insn)) {
+ case INAT_PFX_REPE:
+ case INAT_PFX_REPNE:
+ if (modrm == 0xca) {
+ /* eretu/erets */
+ *type = INSN_CONTEXT_SWITCH;
+ }
+ break;
+ default:
+ if (modrm == 0xca) {
+ *type = INSN_CLAC;
+ } else if (modrm == 0xcb) {
+ *type = INSN_STAC;
+ }
+ break;
+ }
} else if (op2 >= 0x80 && op2 <= 0x8f) {

*type = INSN_JUMP_CONDITIONAL;
--
2.34.1

2022-12-20 07:25:17

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 20/32] x86/fred: add a machine check entry stub for FRED

Add a machine check entry stub for FRED.

Unlike IDT, no need to save/restore dr7 in FRED machine check handler.

Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/fred.h | 1 +
arch/x86/kernel/cpu/mce/core.c | 11 +++++++++++
2 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 66c274a12e26..01678ced5451 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -95,6 +95,7 @@ typedef DECLARE_FRED_HANDLER((*fred_handler));
DECLARE_FRED_HANDLER(fred_exc_nmi);
DECLARE_FRED_HANDLER(fred_exc_debug);
DECLARE_FRED_HANDLER(fred_exc_page_fault);
+DECLARE_FRED_HANDLER(fred_exc_machine_check);

#endif /* __ASSEMBLY__ */

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c8ec5c71712..0186c9b39f5f 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -52,6 +52,7 @@
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/reboot.h>
+#include <asm/fred.h>

#include "internal.h"

@@ -2121,6 +2122,16 @@ DEFINE_IDTENTRY_MCE_USER(exc_machine_check)
exc_machine_check_user(regs);
local_db_restore(dr7);
}
+
+#ifdef CONFIG_X86_FRED
+DEFINE_FRED_HANDLER(fred_exc_machine_check)
+{
+ if (user_mode(regs))
+ exc_machine_check_user(regs);
+ else
+ exc_machine_check_kernel(regs);
+}
+#endif
#else
/* 32bit unified entry point */
DEFINE_IDTENTRY_RAW(exc_machine_check)
--
2.34.1

2022-12-20 07:26:00

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 01/32] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR

From: "H. Peter Anvin (Intel)" <[email protected]>

IRQ_MOVE_CLEANUP_VECTOR is the only one of the system IRQ vectors that
is *below* FIRST_SYSTEM_VECTOR. It is a slow path, so just push it
into common_interrupt() just before the spurious interrupt handling.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/kernel/irq.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 766ffe3ba313..7e125fff45ab 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -248,6 +248,10 @@ DEFINE_IDTENTRY_IRQ(common_interrupt)
desc = __this_cpu_read(vector_irq[vector]);
if (likely(!IS_ERR_OR_NULL(desc))) {
handle_irq(desc, regs);
+#ifdef CONFIG_SMP
+ } else if (vector == IRQ_MOVE_CLEANUP_VECTOR) {
+ sysvec_irq_move_cleanup(regs);
+#endif
} else {
ack_APIC_irq();

--
2.34.1

2022-12-20 07:26:46

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 24/32] x86/fred: let ret_from_fork() jmp to fred_exit_user when FRED is enabled

From: "H. Peter Anvin (Intel)" <[email protected]>

Let ret_from_fork() jmp to fred_exit_user when FRED is enabled,
otherwise the existing IDT code is chosen.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/entry/entry_64.S | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e0c48998d2fb..cdb696cbb2a0 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -297,7 +297,12 @@ SYM_CODE_START(ret_from_fork)
UNWIND_HINT_REGS
movq %rsp, %rdi
call syscall_exit_to_user_mode /* returns with IRQs disabled */
+#ifdef CONFIG_X86_FRED
+ ALTERNATIVE "jmp swapgs_restore_regs_and_return_to_usermode", \
+ "jmp fred_exit_user", X86_FEATURE_FRED
+#else
jmp swapgs_restore_regs_and_return_to_usermode
+#endif

1:
/* kernel thread */
--
2.34.1

2022-12-20 07:45:13

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 18/32] x86/fred: add a debug fault entry stub for FRED

From: "H. Peter Anvin (Intel)" <[email protected]>

Add a debug fault entry stub for FRED.

On a FRED system, the debug trap status information (DR6) is passed
on the stack, to avoid the problem of transient state. Furthermore,
FRED transitions avoid a lot of ugly corner cases the handling of which
can, and should be, skipped.

The FRED debug trap status information saved on the stack differs from DR6
in both stickiness and polarity; it is exactly what debug_read_clear_dr6()
returns, and exc_debug_user()/exc_debug_kernel() expect.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/fred.h | 1 +
arch/x86/kernel/traps.c | 61 ++++++++++++++++++++++++++-----------
2 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 38a90eae7c0f..3089d1c70771 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -92,6 +92,7 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
#define DEFINE_FRED_HANDLER(f) noinstr DECLARE_FRED_HANDLER(f)
typedef DECLARE_FRED_HANDLER((*fred_handler));

+DECLARE_FRED_HANDLER(fred_exc_debug);
DECLARE_FRED_HANDLER(fred_exc_page_fault);

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 99386836b02e..b0ee83bab9e6 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -47,6 +47,7 @@
#include <asm/debugreg.h>
#include <asm/realmode.h>
#include <asm/text-patching.h>
+#include <asm/fred.h>
#include <asm/ftrace.h>
#include <asm/traps.h>
#include <asm/desc.h>
@@ -1020,22 +1021,9 @@ static bool notify_debug(struct pt_regs *regs, unsigned long *dr6)
return false;
}

-static __always_inline void exc_debug_kernel(struct pt_regs *regs,
- unsigned long dr6)
+static __always_inline void debug_kernel_common(struct pt_regs *regs,
+ unsigned long dr6)
{
- /*
- * Disable breakpoints during exception handling; recursive exceptions
- * are exceedingly 'fun'.
- *
- * Since this function is NOKPROBE, and that also applies to
- * HW_BREAKPOINT_X, we can't hit a breakpoint before this (XXX except a
- * HW_BREAKPOINT_W on our stack)
- *
- * Entry text is excluded for HW_BP_X and cpu_entry_area, which
- * includes the entry stack is excluded for everything.
- */
- unsigned long dr7 = local_db_save();
- irqentry_state_t irq_state = irqentry_nmi_enter(regs);
instrumentation_begin();

/*
@@ -1062,7 +1050,8 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
* Catch SYSENTER with TF set and clear DR_STEP. If this hit a
* watchpoint at the same time then that will still be handled.
*/
- if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
+ if (!cpu_feature_enabled(X86_FEATURE_FRED) &&
+ (dr6 & DR_STEP) && is_sysenter_singlestep(regs))
dr6 &= ~DR_STEP;

/*
@@ -1089,8 +1078,28 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
regs->flags &= ~X86_EFLAGS_TF;
out:
instrumentation_end();
- irqentry_nmi_exit(regs, irq_state);
+}
+
+static __always_inline void exc_debug_kernel(struct pt_regs *regs,
+ unsigned long dr6)
+{
+ /*
+ * Disable breakpoints during exception handling; recursive exceptions
+ * are exceedingly 'fun'.
+ *
+ * Since this function is NOKPROBE, and that also applies to
+ * HW_BREAKPOINT_X, we can't hit a breakpoint before this (XXX except a
+ * HW_BREAKPOINT_W on our stack)
+ *
+ * Entry text is excluded for HW_BP_X and cpu_entry_area, which
+ * includes the entry stack is excluded for everything.
+ */
+ unsigned long dr7 = local_db_save();
+ irqentry_state_t irq_state = irqentry_nmi_enter(regs);
+
+ debug_kernel_common(regs, dr6);

+ irqentry_nmi_exit(regs, irq_state);
local_db_restore(dr7);
}

@@ -1179,6 +1188,24 @@ DEFINE_IDTENTRY_DEBUG_USER(exc_debug)
{
exc_debug_user(regs, debug_read_clear_dr6());
}
+
+# ifdef CONFIG_X86_FRED
+DEFINE_FRED_HANDLER(fred_exc_debug)
+{
+ /*
+ * The FRED debug information saved onto stack differs from
+ * DR6 in both stickiness and polarity; it is exactly what
+ * debug_read_clear_dr6() returns.
+ */
+ unsigned long dr6 = fred_event_data(regs);
+
+ if (user_mode(regs))
+ exc_debug_user(regs, dr6);
+ else
+ debug_kernel_common(regs, dr6);
+}
+# endif /* CONFIG_X86_FRED */
+
#else
/* 32 bit does not have separate entry points. */
DEFINE_IDTENTRY_RAW(exc_debug)
--
2.34.1

2022-12-20 07:45:19

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 04/32] x86/traps: add external_interrupt() to dispatch external interrupts

From: "H. Peter Anvin (Intel)" <[email protected]>

Add external_interrupt() to dispatch external interrupts to their
handlers. If an external interrupt is a system interrupt, dipatch
it through system_interrupt_handler_table, otherwise call into
dispatch_common_interrupt().

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Co-developed-by: Xin Li <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/kernel/traps.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 2b8530235e47..c35dd2b4d146 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1499,6 +1499,43 @@ void __init install_system_interrupt_handler(unsigned int n, const void *asm_add
alloc_intr_gate(n, asm_addr);
}

+#ifndef CONFIG_X86_LOCAL_APIC
+DEFINE_IDTENTRY_IRQ(spurious_interrupt)
+{
+ pr_info("Spurious interrupt (vector 0x%x) on CPU#%d, should never happen.\n",
+ vector, smp_processor_id());
+}
+#endif
+
+/*
+ * External interrupt dispatch function.
+ *
+ * Until/unless dispatch_common_interrupt() can be taught to deal with the
+ * special system vectors, split the dispatch.
+ *
+ * Note: dispatch_common_interrupt() already deals with IRQ_MOVE_CLEANUP_VECTOR.
+ */
+int external_interrupt(struct pt_regs *regs, unsigned int vector)
+{
+ unsigned int sysvec = vector - FIRST_SYSTEM_VECTOR;
+
+ if (vector < FIRST_EXTERNAL_VECTOR) {
+ pr_err("invalid external interrupt vector %d\n", vector);
+ return -EINVAL;
+ }
+
+ if (sysvec < NR_SYSTEM_VECTORS) {
+ if (system_interrupt_handlers[sysvec])
+ system_interrupt_handlers[sysvec](regs);
+ else
+ dispatch_spurious_interrupt(regs, vector);
+ } else {
+ dispatch_common_interrupt(regs, vector);
+ }
+
+ return 0;
+}
+
void __init trap_init(void)
{
/* Init cpu_entry_area before IST entries are set up */
--
2.34.1

2022-12-20 07:45:39

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 13/32] x86/fred: header file for event types

FRED inherits the Intel VT-x enhancement of classified events with
a two-level event dispatch logic. The first-level dispatch is on
the event type, not the event vector as used in the IDT architecture.
This also means that vectors in different event types are orthogonal,
e.g., vectors 0x10-0x1f become available as hardware interrupts.

Add a header file for event types, and also use it in <asm/vmx.h>.

Suggested-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/event-type.h | 17 +++++++++++++++++
arch/x86/include/asm/vmx.h | 17 +++++++++--------
2 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 arch/x86/include/asm/event-type.h

diff --git a/arch/x86/include/asm/event-type.h b/arch/x86/include/asm/event-type.h
new file mode 100644
index 000000000000..fedaa0e492c5
--- /dev/null
+++ b/arch/x86/include/asm/event-type.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_EVENT_TYPE_H
+#define _ASM_X86_EVENT_TYPE_H
+
+/*
+ * Event type codes: these are the same that are used by VTx.
+ */
+#define EVENT_TYPE_HWINT 0 /* Maskable external interrupt */
+#define EVENT_TYPE_RESERVED 1
+#define EVENT_TYPE_NMI 2 /* Non-maskable interrupt */
+#define EVENT_TYPE_HWFAULT 3 /* Hardware exceptions (e.g., page fault) */
+#define EVENT_TYPE_SWINT 4 /* Software interrupt (INT n) */
+#define EVENT_TYPE_PRIVSW 5 /* INT1 (ICEBP) */
+#define EVENT_TYPE_SWFAULT 6 /* Software exception (INT3 or INTO) */
+#define EVENT_TYPE_OTHER 7 /* FRED: SYSCALL/SYSENTER */
+
+#endif
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 498dc600bd5c..8d9b8b0d8e56 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -15,6 +15,7 @@
#include <linux/bitops.h>
#include <linux/types.h>
#include <uapi/asm/vmx.h>
+#include <asm/event-type.h>
#include <asm/vmxfeatures.h>

#define VMCS_CONTROL_BIT(x) BIT(VMX_FEATURE_##x & 0x1f)
@@ -372,14 +373,14 @@ enum vmcs_field {
#define VECTORING_INFO_DELIVER_CODE_MASK INTR_INFO_DELIVER_CODE_MASK
#define VECTORING_INFO_VALID_MASK INTR_INFO_VALID_MASK

-#define INTR_TYPE_EXT_INTR (0 << 8) /* external interrupt */
-#define INTR_TYPE_RESERVED (1 << 8) /* reserved */
-#define INTR_TYPE_NMI_INTR (2 << 8) /* NMI */
-#define INTR_TYPE_HARD_EXCEPTION (3 << 8) /* processor exception */
-#define INTR_TYPE_SOFT_INTR (4 << 8) /* software interrupt */
-#define INTR_TYPE_PRIV_SW_EXCEPTION (5 << 8) /* ICE breakpoint - undocumented */
-#define INTR_TYPE_SOFT_EXCEPTION (6 << 8) /* software exception */
-#define INTR_TYPE_OTHER_EVENT (7 << 8) /* other event */
+#define INTR_TYPE_EXT_INTR (EVENT_TYPE_HWINT << 8) /* external interrupt */
+#define INTR_TYPE_RESERVED (EVENT_TYPE_RESERVED << 8) /* reserved */
+#define INTR_TYPE_NMI_INTR (EVENT_TYPE_NMI << 8) /* NMI */
+#define INTR_TYPE_HARD_EXCEPTION (EVENT_TYPE_HWFAULT << 8) /* processor exception */
+#define INTR_TYPE_SOFT_INTR (EVENT_TYPE_SWINT << 8) /* software interrupt */
+#define INTR_TYPE_PRIV_SW_EXCEPTION (EVENT_TYPE_PRIVSW << 8) /* ICE breakpoint - undocumented */
+#define INTR_TYPE_SOFT_EXCEPTION (EVENT_TYPE_SWFAULT << 8) /* software exception */
+#define INTR_TYPE_OTHER_EVENT (EVENT_TYPE_OTHER << 8) /* other event */

/* GUEST_INTERRUPTIBILITY_INFO flags. */
#define GUEST_INTR_STATE_STI 0x00000001
--
2.34.1

2022-12-20 07:45:57

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 11/32] x86/fred: if CONFIG_X86_FRED is disabled, disable FRED support

From: "H. Peter Anvin (Intel)" <[email protected]>

Add CONFIG_X86_FRED to <asm/disabled-features.h> to make
cpu_feature_enabled() work correctly with FRED.

Originally-by: Megha Dey <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/disabled-features.h | 8 +++++++-
tools/arch/x86/include/asm/disabled-features.h | 8 +++++++-
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 33d2cd04d254..3a2d0ad63332 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -87,6 +87,12 @@
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
#endif

+#ifdef CONFIG_X86_FRED
+# define DISABLE_FRED 0
+#else
+# define DISABLE_FRED (1 << (X86_FEATURE_FRED & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -102,7 +108,7 @@
#define DISABLED_MASK9 (DISABLE_SGX)
#define DISABLED_MASK10 0
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET)
-#define DISABLED_MASK12 0
+#define DISABLED_MASK12 (DISABLE_FRED)
#define DISABLED_MASK13 0
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
diff --git a/tools/arch/x86/include/asm/disabled-features.h b/tools/arch/x86/include/asm/disabled-features.h
index 33d2cd04d254..3a2d0ad63332 100644
--- a/tools/arch/x86/include/asm/disabled-features.h
+++ b/tools/arch/x86/include/asm/disabled-features.h
@@ -87,6 +87,12 @@
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
#endif

+#ifdef CONFIG_X86_FRED
+# define DISABLE_FRED 0
+#else
+# define DISABLE_FRED (1 << (X86_FEATURE_FRED & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -102,7 +108,7 @@
#define DISABLED_MASK9 (DISABLE_SGX)
#define DISABLED_MASK10 0
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET)
-#define DISABLED_MASK12 0
+#define DISABLED_MASK12 (DISABLE_FRED)
#define DISABLED_MASK13 0
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
--
2.34.1

2022-12-20 07:45:58

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 32/32] x86/fred: disable FRED by default in its early stage

Disable FRED by default in its early stage.

To enable FRED, a new kernel command line option "fred" needs to be added.

Signed-off-by: Xin Li <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ++++
arch/x86/kernel/cpu/common.c | 3 +++
2 files changed, 7 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 42af9ca0127e..0bc76d926dd4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1506,6 +1506,10 @@
Warning: use of this parameter will taint the kernel
and may cause unknown problems.

+ fred
+ Forcefully enable flexible return and event delivery,
+ which is otherwise disabled by default.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5de68356fe62..1a160337ad41 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1434,6 +1434,9 @@ static void __init cpu_parse_early_param(void)
char *argptr = arg, *opt;
int arglen, taint = 0;

+ if (!cmdline_find_option_bool(boot_command_line, "fred"))
+ setup_clear_cpu_cap(X86_FEATURE_FRED);
+
#ifdef CONFIG_X86_32
if (cmdline_find_option_bool(boot_command_line, "no387"))
#ifdef CONFIG_MATH_EMULATION
--
2.34.1

2022-12-20 07:46:13

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 22/32] x86/fred: FRED initialization code

From: "H. Peter Anvin (Intel)" <[email protected]>

The code to initialize FRED when it's available and _not_ disabled.

cpu_init_fred_exceptions() is the core function to initialize FRED,
which
1. Sets up FRED entrypoints for events happening in ring 0 and 3.
2. Sets up a default stack for event handling.
3. Sets up dedicated event stacks for DB/NMI/MC/DF, equivalent to
the IDT IST stacks.
4. Forces 32-bit system calls to use "int $0x80" only.
5. Enables FRED and invalidtes IDT.

When the FRED is used, cpu_init_exception_handling() initializes FRED
through calling cpu_init_fred_exceptions(), otherwise it sets up TSS
IST and loads IDT.

As FRED uses the ring 3 FRED entrypoint for SYSCALL and SYSENTER,
it skips setting up SYSCALL/SYSENTER related MSRs, e.g., MSR_LSTAR.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Co-developed-by: Xin Li <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/fred.h | 14 +++++++
arch/x86/include/asm/traps.h | 2 +
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/cpu/common.c | 74 +++++++++++++++++++++++-------------
arch/x86/kernel/fred.c | 73 +++++++++++++++++++++++++++++++++++
arch/x86/kernel/irqinit.c | 7 +++-
arch/x86/kernel/traps.c | 16 +++++++-
7 files changed, 157 insertions(+), 30 deletions(-)
create mode 100644 arch/x86/kernel/fred.c

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 01678ced5451..b6308e351e14 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -97,8 +97,22 @@ DECLARE_FRED_HANDLER(fred_exc_debug);
DECLARE_FRED_HANDLER(fred_exc_page_fault);
DECLARE_FRED_HANDLER(fred_exc_machine_check);

+/*
+ * The actual assembly entry and exit points
+ */
+extern __visible void fred_entrypoint_user(void);
+
+/*
+ * Initialization
+ */
+void cpu_init_fred_exceptions(void);
+void fred_setup_apic(void);
+
#endif /* __ASSEMBLY__ */

+#else
+#define cpu_init_fred_exceptions() BUG()
+#define fred_setup_apic() BUG()
#endif /* CONFIG_X86_FRED */

#endif /* ASM_X86_FRED_H */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 77ffc580e821..963c51e680bd 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -56,6 +56,8 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs,
void f (struct pt_regs *regs)
typedef DECLARE_SYSTEM_INTERRUPT_HANDLER((*system_interrupt_handler));

+system_interrupt_handler get_system_interrupt_handler(unsigned int i);
+
int exc_raise_irq(struct pt_regs *regs, u32 vector);

int external_interrupt(struct pt_regs *regs, unsigned int vector);
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index f901658d9f7c..1d9e669e288b 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -48,6 +48,7 @@ obj-y += process_$(BITS).o signal.o
obj-$(CONFIG_COMPAT) += signal_compat.o
obj-y += traps.o idt.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
obj-y += time.o ioport.o dumpstack.o nmi.o
+obj-$(CONFIG_X86_FRED) += fred.o
obj-$(CONFIG_MODIFY_LDT_SYSCALL) += ldt.o
obj-y += setup.o x86_init.o i8259.o irqinit.o
obj-$(CONFIG_JUMP_LABEL) += jump_label.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 05a5538052ad..5de68356fe62 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -57,6 +57,7 @@
#include <asm/microcode_intel.h>
#include <asm/intel-family.h>
#include <asm/cpu_device_id.h>
+#include <asm/fred.h>
#include <asm/uv/uv.h>
#include <asm/sigframe.h>
#include <asm/traps.h>
@@ -2034,28 +2035,6 @@ static void wrmsrl_cstar(unsigned long val)
/* May not be marked __init: used by software suspend */
void syscall_init(void)
{
- wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
- wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
-
-#ifdef CONFIG_IA32_EMULATION
- wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
- /*
- * This only works on Intel CPUs.
- * On AMD CPUs these MSRs are 32-bit, CPU truncates MSR_IA32_SYSENTER_EIP.
- * This does not cause SYSENTER to jump to the wrong location, because
- * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit).
- */
- wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
- wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
- (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
- wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
-#else
- wrmsrl_cstar((unsigned long)ignore_sysret);
- wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
- wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
- wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
-#endif
-
/*
* Flags to clear on syscall; clear as much as possible
* to minimize user space-kernel interference.
@@ -2066,6 +2045,41 @@ void syscall_init(void)
X86_EFLAGS_IF|X86_EFLAGS_DF|X86_EFLAGS_OF|
X86_EFLAGS_IOPL|X86_EFLAGS_NT|X86_EFLAGS_RF|
X86_EFLAGS_AC|X86_EFLAGS_ID);
+
+ /*
+ * The default user and kernel segments
+ */
+ wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
+
+ if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+ /* Both sysexit and sysret cause #UD when FRED is enabled */
+ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
+ wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+ wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
+ } else {
+ wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
+
+#ifdef CONFIG_IA32_EMULATION
+ wrmsrl_cstar((unsigned long)entry_SYSCALL_compat);
+ /*
+ * This only works on Intel CPUs.
+ * On AMD CPUs these MSRs are 32-bit, CPU truncates
+ * MSR_IA32_SYSENTER_EIP.
+ * This does not cause SYSENTER to jump to the wrong
+ * location, because AMD doesn't allow SYSENTER in
+ * long mode (either 32- or 64-bit).
+ */
+ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS);
+ wrmsrl_safe(MSR_IA32_SYSENTER_ESP,
+ (unsigned long)(cpu_entry_stack(smp_processor_id()) + 1));
+ wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat);
+#else
+ wrmsrl_cstar((unsigned long)ignore_sysret);
+ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
+ wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+ wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
+#endif
+ }
}

#else /* CONFIG_X86_64 */
@@ -2214,18 +2228,24 @@ void cpu_init_exception_handling(void)
/* paranoid_entry() gets the CPU number from the GDT */
setup_getcpu(cpu);

- /* IST vectors need TSS to be set up. */
- tss_setup_ist(tss);
+ /* Set up the TSS */
tss_setup_io_bitmap(tss);
set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
-
load_TR_desc();

/* GHCB needs to be setup to handle #VC. */
setup_ghcb();

- /* Finally load the IDT */
- load_current_idt();
+ if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+ /* Set up FRED exception handling */
+ cpu_init_fred_exceptions();
+ } else {
+ /* IST vectors need TSS to be set up. */
+ tss_setup_ist(tss);
+
+ /* Finally load the IDT */
+ load_current_idt();
+ }
}

/*
diff --git a/arch/x86/kernel/fred.c b/arch/x86/kernel/fred.c
new file mode 100644
index 000000000000..827b58fd98d4
--- /dev/null
+++ b/arch/x86/kernel/fred.c
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/kernel.h>
+#include <asm/desc.h>
+#include <asm/fred.h>
+#include <asm/tlbflush.h> /* For cr4_set_bits() */
+#include <asm/traps.h>
+
+/*
+ * Initialize FRED on this CPU. This cannot be __init as it is called
+ * during CPU hotplug.
+ */
+void cpu_init_fred_exceptions(void)
+{
+ wrmsrl(MSR_IA32_FRED_CONFIG,
+ FRED_CONFIG_ENTRYPOINT(fred_entrypoint_user) |
+ FRED_CONFIG_REDZONE(8) | /* Reserve for CALL emulation */
+ FRED_CONFIG_INT_STKLVL(0));
+
+ wrmsrl(MSR_IA32_FRED_STKLVLS,
+ FRED_STKLVL(X86_TRAP_DB, 1) |
+ FRED_STKLVL(X86_TRAP_NMI, 2) |
+ FRED_STKLVL(X86_TRAP_MC, 2) |
+ FRED_STKLVL(X86_TRAP_DF, 3));
+
+ /* The FRED equivalents to IST stacks... */
+ wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB));
+ wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI));
+ wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF));
+
+ /* Not used with FRED */
+ wrmsrl(MSR_LSTAR, 0ULL);
+ wrmsrl(MSR_CSTAR, 0ULL);
+ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
+ wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+ wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
+
+ /* Enable FRED */
+ cr4_set_bits(X86_CR4_FRED);
+ idt_invalidate(); /* Any further IDT use is a bug */
+
+ /* Use int $0x80 for 32-bit system calls in FRED mode */
+ setup_clear_cpu_cap(X86_FEATURE_SYSENTER32);
+ setup_clear_cpu_cap(X86_FEATURE_SYSCALL32);
+}
+
+/*
+ * Initialize system vectors from a FRED perspective, so
+ * lapic_assign_system_vectors() can do its job.
+ */
+void __init fred_setup_apic(void)
+{
+ int i;
+
+ for (i = 0; i < FIRST_EXTERNAL_VECTOR; i++)
+ set_bit(i, system_vectors);
+
+ /*
+ * Don't set the non assigned system vectors in the
+ * system_vectors bitmap. Otherwise they show up in
+ * /proc/interrupts.
+ */
+#ifdef CONFIG_SMP
+ set_bit(IRQ_MOVE_CLEANUP_VECTOR, system_vectors);
+#endif
+
+ for (i = 0; i < NR_SYSTEM_VECTORS; i++) {
+ if (get_system_interrupt_handler(i) != NULL) {
+ set_bit(i + FIRST_SYSTEM_VECTOR, system_vectors);
+ }
+ }
+
+ /* The rest are fair game... */
+}
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index beb1bada1b0a..bb59661f0278 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -28,6 +28,7 @@
#include <asm/setup.h>
#include <asm/i8259.h>
#include <asm/traps.h>
+#include <asm/fred.h>
#include <asm/prom.h>

/*
@@ -94,7 +95,11 @@ void __init native_init_IRQ(void)
/* Execute any quirks before the call gates are initialised: */
x86_init.irqs.pre_vector_init();

- idt_setup_apic_and_irq_gates();
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ fred_setup_apic();
+ else
+ idt_setup_apic_and_irq_gates();
+
lapic_assign_system_vectors();

if (!acpi_ioapic && !of_ioapic && nr_legacy_irqs()) {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b0ee83bab9e6..36a15df9b5e5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1518,12 +1518,21 @@ static system_interrupt_handler system_interrupt_handlers[NR_SYSTEM_VECTORS] = {

#undef SYSV

+system_interrupt_handler get_system_interrupt_handler(unsigned int i)
+{
+ if (i >= NR_SYSTEM_VECTORS)
+ return NULL;
+
+ return system_interrupt_handlers[i];
+}
+
void __init install_system_interrupt_handler(unsigned int n, const void *asm_addr, const void *addr)
{
BUG_ON(n < FIRST_SYSTEM_VECTOR);

system_interrupt_handlers[n - FIRST_SYSTEM_VECTOR] = (system_interrupt_handler)addr;
- alloc_intr_gate(n, asm_addr);
+ if (!cpu_feature_enabled(X86_FEATURE_FRED))
+ alloc_intr_gate(n, asm_addr);
}

#ifndef CONFIG_X86_LOCAL_APIC
@@ -1591,7 +1600,10 @@ void __init trap_init(void)

/* Initialize TSS before setting up traps so ISTs work */
cpu_init_exception_handling();
+
/* Setup traps as cpu_init() might #GP */
- idt_setup_traps();
+ if (!cpu_feature_enabled(X86_FEATURE_FRED))
+ idt_setup_traps();
+
cpu_init();
}
--
2.34.1

2022-12-20 07:47:32

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 10/32] x86/fred: add Kconfig option for FRED (CONFIG_X86_FRED)

From: "H. Peter Anvin (Intel)" <[email protected]>

Add the configuration option CONFIG_X86_FRED to enable FRED.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/Kconfig | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67745ceab0db..1155d2e06fd1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -500,6 +500,15 @@ config X86_CPU_RESCTRL

Say N if unsure.

+config X86_FRED
+ bool "Flexible Return and Event Delivery"
+ depends on X86_64
+ help
+ When enabled, try to use Flexible Return and Event Delivery
+ instead of the legacy SYSCALL/SYSENTER/IDT architecture for
+ ring transitions and exception/interrupt handling if the
+ system supports.
+
if X86_32
config X86_BIGSMP
bool "Support for big SMP systems with more than 8 CPUs"
--
2.34.1

2022-12-20 07:48:29

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 16/32] x86/fred: reserve space for the FRED stack frame

From: "H. Peter Anvin (Intel)" <[email protected]>

When using FRED, reserve space at the top of the stack frame, just
like i386 does. A future version of FRED might have dynamic frame
sizes, though, in which case it might be necessary to make
TOP_OF_KERNEL_STACK_PADDING a variable instead of a constant.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/thread_info.h | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f0cb881c1d69..fea0e69fc3d4 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -31,7 +31,9 @@
* In vm86 mode, the hardware frame is much longer still, so add 16
* bytes to make room for the real-mode segments.
*
- * x86_64 has a fixed-length stack frame.
+ * x86-64 has a fixed-length stack frame, but it depends on whether
+ * or not FRED is enabled. Future versions of FRED might make this
+ * dynamic, but for now it is always 2 words longer.
*/
#ifdef CONFIG_X86_32
# ifdef CONFIG_VM86
@@ -39,8 +41,12 @@
# else
# define TOP_OF_KERNEL_STACK_PADDING 8
# endif
-#else
-# define TOP_OF_KERNEL_STACK_PADDING 0
+#else /* x86-64 */
+# ifdef CONFIG_X86_FRED
+# define TOP_OF_KERNEL_STACK_PADDING (2*8)
+# else
+# define TOP_OF_KERNEL_STACK_PADDING 0
+# endif
#endif

/*
--
2.34.1

2022-12-20 07:49:36

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 17/32] x86/fred: add a page fault entry stub for FRED

From: "H. Peter Anvin (Intel)" <[email protected]>

Add a page fault entry stub for FRED.

On a FRED system, the faulting address (CR2) is passed on the stack,
to avoid the problem of transient state. Thus we get the page fault
address from the stack instead of CR2.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/fred.h | 2 ++
arch/x86/mm/fault.c | 20 ++++++++++++++++++--
2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 6292b28d461d..38a90eae7c0f 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -92,6 +92,8 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
#define DEFINE_FRED_HANDLER(f) noinstr DECLARE_FRED_HANDLER(f)
typedef DECLARE_FRED_HANDLER((*fred_handler));

+DECLARE_FRED_HANDLER(fred_exc_page_fault);
+
#endif /* __ASSEMBLY__ */

#endif /* CONFIG_X86_FRED */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7b0d4ab894c8..f31053f32048 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -33,6 +33,7 @@
#include <asm/kvm_para.h> /* kvm_handle_async_pf */
#include <asm/vdso.h> /* fixup_vdso_exception() */
#include <asm/irq_stack.h>
+#include <asm/fred.h> /* fred_event_data() */

#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -1528,9 +1529,10 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
}
}

-DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
+static __always_inline void page_fault_common(struct pt_regs *regs,
+ unsigned int error_code,
+ unsigned long address)
{
- unsigned long address = read_cr2();
irqentry_state_t state;

prefetchw(&current->mm->mmap_lock);
@@ -1577,3 +1579,17 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)

irqentry_exit(regs, state);
}
+
+DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
+{
+ page_fault_common(regs, error_code, read_cr2());
+}
+
+#ifdef CONFIG_X86_FRED
+
+DEFINE_FRED_HANDLER(fred_exc_page_fault)
+{
+ page_fault_common(regs, regs->orig_ax, fred_event_data(regs));
+}
+
+#endif /* CONFIG_X86_FRED */
--
2.34.1

2022-12-20 07:50:30

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 06/32] x86/cpufeature: add the cpu feature bit for FRED

From: "H. Peter Anvin (Intel)" <[email protected]>

Add the CPU feature bit for FRED (Flexible Return and Event Delivery).

The Intel flexible return and event delivery (FRED) architecture defines simple
new transitions that change privilege level (ring transitions). The FRED
architecture was designed with the following goals:
1) Improve overall performance and response time by replacing event delivery
through the interrupt descriptor table (IDT event delivery) and event return by
the IRET instruction with lower latency transitions.
2) Improve software robustness by ensuring that event delivery establishes the
full supervisor context and that event return establishes the full user context.

The new transitions defined by the FRED architecture are FRED event delivery and,
for returning from events, two FRED return instructions. FRED event delivery can
effect a transition from ring 3 to ring 0, but it is used also to deliver events
incident to ring 0. One FRED instruction (ERETU) effects a return from ring 0 to
ring 3, while the other (ERETS) returns while remaining in ring 0.

The Intel FRED architecture spec can be downloaded from:
https://cdrdv2.intel.com/v1/dl/getContent/678938

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
tools/arch/x86/include/asm/cpufeatures.h | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 29f53b31056e..6148e8a94d24 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -312,6 +312,7 @@
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
#define X86_FEATURE_LKGS (12*32+ 18) /* "" Load "kernel" (userspace) gs */
+#define X86_FEATURE_FRED (12*32+ 17) /* Flexible Return and Event Delivery */

/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index 3dc1a48c2796..41d1e1b4a6cb 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_FRED (12*32+ 17) /* Flexible Return and Event Delivery */
#define X86_FEATURE_LKGS (12*32+ 18) /* "" Load "kernel" (userspace) gs */

/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
--
2.34.1

2022-12-20 07:51:08

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 12/32] x86/cpu: add MSR numbers for FRED configuration

From: "H. Peter Anvin (Intel)" <[email protected]>

Add MSR numbers for the FRED configuration registers.

Originally-by: Megha Dey <[email protected]>
Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/msr-index.h | 12 +++++++++++-
tools/arch/x86/include/asm/msr-index.h | 12 +++++++++++-
2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4a2af82553e4..dea9223ec9ba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -39,8 +39,18 @@
#define EFER_LMSLE (1<<_EFER_LMSLE)
#define EFER_FFXSR (1<<_EFER_FFXSR)

-/* Intel MSRs. Some also available on other CPUs */
+/* FRED MSRs */
+#define MSR_IA32_FRED_RSP0 0x1cc /* Level 0 stack pointer */
+#define MSR_IA32_FRED_RSP1 0x1cd /* Level 1 stack pointer */
+#define MSR_IA32_FRED_RSP2 0x1ce /* Level 2 stack pointer */
+#define MSR_IA32_FRED_RSP3 0x1cf /* Level 3 stack pointer */
+#define MSR_IA32_FRED_STKLVLS 0x1d0 /* Exception stack levels */
+#define MSR_IA32_FRED_SSP1 0x1d1 /* Level 1 shadow stack pointer */
+#define MSR_IA32_FRED_SSP2 0x1d2 /* Level 2 shadow stack pointer */
+#define MSR_IA32_FRED_SSP3 0x1d3 /* Level 3 shadow stack pointer */
+#define MSR_IA32_FRED_CONFIG 0x1d4 /* Entrypoint and interrupt stack level */

+/* Intel MSRs. Some also available on other CPUs */
#define MSR_TEST_CTRL 0x00000033
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
diff --git a/tools/arch/x86/include/asm/msr-index.h b/tools/arch/x86/include/asm/msr-index.h
index f17ade084720..5c9d9040dd04 100644
--- a/tools/arch/x86/include/asm/msr-index.h
+++ b/tools/arch/x86/include/asm/msr-index.h
@@ -39,8 +39,18 @@
#define EFER_LMSLE (1<<_EFER_LMSLE)
#define EFER_FFXSR (1<<_EFER_FFXSR)

-/* Intel MSRs. Some also available on other CPUs */
+/* FRED MSRs */
+#define MSR_IA32_FRED_RSP0 0x1cc /* Level 0 stack pointer */
+#define MSR_IA32_FRED_RSP1 0x1cd /* Level 1 stack pointer */
+#define MSR_IA32_FRED_RSP2 0x1ce /* Level 2 stack pointer */
+#define MSR_IA32_FRED_RSP3 0x1cf /* Level 3 stack pointer */
+#define MSR_IA32_FRED_STKLVLS 0x1d0 /* Exception stack levels */
+#define MSR_IA32_FRED_SSP1 0x1d1 /* Level 1 shadow stack pointer */
+#define MSR_IA32_FRED_SSP2 0x1d2 /* Level 2 shadow stack pointer */
+#define MSR_IA32_FRED_SSP3 0x1d3 /* Level 3 shadow stack pointer */
+#define MSR_IA32_FRED_CONFIG 0x1d4 /* Entrypoint and interrupt stack level */

+/* Intel MSRs. Some also available on other CPUs */
#define MSR_TEST_CTRL 0x00000033
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT 29
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
--
2.34.1

2022-12-20 07:52:46

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 15/32] x86/fred: make unions for the cs and ss fields in struct pt_regs

From: "H. Peter Anvin (Intel)" <[email protected]>

Make the cs and ss fields in struct pt_regs unions between the actual
selector and the unsigned long stack slot. FRED uses this space to
store additional flags.

The printk changes are simply due to the cs and ss fields changed to
unsigned short from unsigned long.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
arch/x86/include/asm/ptrace.h | 36 ++++++++++++++++++++++++---
arch/x86/kernel/process_64.c | 2 +-
3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 4af81df133ee..6349c818d20a 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -76,7 +76,7 @@ static void warn_bad_vsyscall(const char *level, struct pt_regs *regs,
if (!show_unhandled_signals)
return;

- printk_ratelimited("%s%s[%d] %s ip:%lx cs:%lx sp:%lx ax:%lx si:%lx di:%lx\n",
+ printk_ratelimited("%s%s[%d] %s ip:%lx cs:%x sp:%lx ax:%lx si:%lx di:%lx\n",
level, current->comm, task_pid_nr(current),
message, regs->ip, regs->cs,
regs->sp, regs->ax, regs->si, regs->di);
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index f4db78b09c8f..341e44847cc1 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -82,13 +82,41 @@ struct pt_regs {
* On hw interrupt, it's IRQ number:
*/
unsigned long orig_ax;
-/* Return frame for iretq */
+
+ /* Return frame for iretq/eretu/erets */
unsigned long ip;
- unsigned long cs;
+ union {
+ unsigned long csl; /* CS + any fields above it */
+ struct __attribute__((__packed__)) {
+ unsigned short cs; /* CS selector proper */
+ unsigned int current_stack_level: 2;
+ unsigned int __csl_resv1 : 6;
+ unsigned int interrupt_shadowed : 1;
+ unsigned int software_initiated : 1;
+ unsigned int __csl_resv2 : 2;
+ unsigned int nmi : 1;
+ unsigned int __csl_resv3 : 3;
+ unsigned int __csl_resv4 : 32;
+ };
+ };
unsigned long flags;
unsigned long sp;
- unsigned long ss;
-/* top of stack page */
+ union {
+ unsigned long ssl; /* SS + any fields above it */
+ struct __attribute__((__packed__)) {
+ unsigned short ss; /* SS selector proper */
+ unsigned int __ssl_resv1: 16;
+ unsigned int vector : 8;
+ unsigned int __ssl_resv2: 8;
+ unsigned int type : 4;
+ unsigned int __ssl_resv3: 4;
+ unsigned int enclv : 1;
+ unsigned int long_mode : 1;
+ unsigned int nested : 1;
+ unsigned int __ssl_resv4: 1;
+ unsigned int instr_len : 4;
+ };
+ };
};

#endif /* !__i386__ */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6b3418bff326..bfe6179b7a17 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -116,7 +116,7 @@ void __show_regs(struct pt_regs *regs, enum show_regs_mode mode,

printk("%sFS: %016lx(%04x) GS:%016lx(%04x) knlGS:%016lx\n",
log_lvl, fs, fsindex, gs, gsindex, shadowgs);
- printk("%sCS: %04lx DS: %04x ES: %04x CR0: %016lx\n",
+ printk("%sCS: %04x DS: %04x ES: %04x CR0: %016lx\n",
log_lvl, regs->cs, ds, es, cr0);
printk("%sCR2: %016lx CR3: %016lx CR4: %016lx\n",
log_lvl, cr2, cr3, cr4);
--
2.34.1

2022-12-20 07:53:23

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 03/32] x86/traps: add install_system_interrupt_handler()

Some kernel components install system interrupt handlers into the IDT,
and we need to do the same for system_interrupt_handlers. A new function
install_system_interrupt_handler() is added to install a system interrupt
handler into both the IDT and system_interrupt_handlers.

Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/traps.h | 2 ++
arch/x86/kernel/cpu/acrn.c | 7 +++++--
arch/x86/kernel/cpu/mshyperv.c | 22 ++++++++++++++--------
arch/x86/kernel/kvm.c | 4 +++-
arch/x86/kernel/traps.c | 8 ++++++++
drivers/xen/events/events_base.c | 5 ++++-
6 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 28c8ba5fd81c..46f5e4e2a346 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -41,6 +41,8 @@ void math_emulate(struct math_emu_info *);

bool fault_in_kernel_space(unsigned long address);

+void install_system_interrupt_handler(unsigned int n, const void *asm_addr, const void *addr);
+
#ifdef CONFIG_VMAP_STACK
void __noreturn handle_stack_overflow(struct pt_regs *regs,
unsigned long fault_address,
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index 485441b7f030..9351bf183a9e 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -18,6 +18,7 @@
#include <asm/hypervisor.h>
#include <asm/idtentry.h>
#include <asm/irq_regs.h>
+#include <asm/traps.h>

static u32 __init acrn_detect(void)
{
@@ -26,8 +27,10 @@ static u32 __init acrn_detect(void)

static void __init acrn_init_platform(void)
{
- /* Setup the IDT for ACRN hypervisor callback */
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_acrn_hv_callback);
+ /* Install system interrupt handler for ACRN hypervisor callback */
+ install_system_interrupt_handler(HYPERVISOR_CALLBACK_VECTOR,
+ asm_sysvec_acrn_hv_callback,
+ sysvec_acrn_hv_callback);

x86_platform.calibrate_tsc = acrn_get_tsc_khz;
x86_platform.calibrate_cpu = acrn_get_tsc_khz;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 831613959a92..144b4a622188 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -29,6 +29,7 @@
#include <asm/i8259.h>
#include <asm/apic.h>
#include <asm/timer.h>
+#include <asm/traps.h>
#include <asm/reboot.h>
#include <asm/nmi.h>
#include <clocksource/hyperv_timer.h>
@@ -415,19 +416,24 @@ static void __init ms_hyperv_init_platform(void)
*/
x86_platform.apic_post_init = hyperv_init;
hyperv_setup_mmu_ops();
- /* Setup the IDT for hypervisor callback */
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_hyperv_callback);

- /* Setup the IDT for reenlightenment notifications */
+ /* Install system interrupt handler for hypervisor callback */
+ install_system_interrupt_handler(HYPERVISOR_CALLBACK_VECTOR,
+ asm_sysvec_hyperv_callback,
+ sysvec_hyperv_callback);
+
+ /* Install system interrupt handler for reenlightenment notifications */
if (ms_hyperv.features & HV_ACCESS_REENLIGHTENMENT) {
- alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
- asm_sysvec_hyperv_reenlightenment);
+ install_system_interrupt_handler(HYPERV_REENLIGHTENMENT_VECTOR,
+ asm_sysvec_hyperv_reenlightenment,
+ sysvec_hyperv_reenlightenment);
}

- /* Setup the IDT for stimer0 */
+ /* Install system interrupt handler for stimer0 */
if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE) {
- alloc_intr_gate(HYPERV_STIMER0_VECTOR,
- asm_sysvec_hyperv_stimer0);
+ install_system_interrupt_handler(HYPERV_STIMER0_VECTOR,
+ asm_sysvec_hyperv_stimer0,
+ sysvec_hyperv_stimer0);
}

# ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d4e48b4a438b..b7388ed2a980 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -835,7 +835,9 @@ static void __init kvm_guest_init(void)

if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
static_branch_enable(&kvm_async_pf_enabled);
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_kvm_asyncpf_interrupt);
+ install_system_interrupt_handler(HYPERVISOR_CALLBACK_VECTOR,
+ asm_sysvec_kvm_asyncpf_interrupt,
+ sysvec_kvm_asyncpf_interrupt);
}

#ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8f751c06c052..2b8530235e47 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1491,6 +1491,14 @@ static system_interrupt_handler system_interrupt_handlers[NR_SYSTEM_VECTORS] = {

#undef SYSV

+void __init install_system_interrupt_handler(unsigned int n, const void *asm_addr, const void *addr)
+{
+ BUG_ON(n < FIRST_SYSTEM_VECTOR);
+
+ system_interrupt_handlers[n - FIRST_SYSTEM_VECTOR] = (system_interrupt_handler)addr;
+ alloc_intr_gate(n, asm_addr);
+}
+
void __init trap_init(void)
{
/* Init cpu_entry_area before IST entries are set up */
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c443f04aaad7..1a9eaf417acc 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -45,6 +45,7 @@
#include <asm/irq.h>
#include <asm/io_apic.h>
#include <asm/i8259.h>
+#include <asm/traps.h>
#include <asm/xen/cpuid.h>
#include <asm/xen/pci.h>
#endif
@@ -2246,7 +2247,9 @@ static __init void xen_alloc_callback_vector(void)
return;

pr_info("Xen HVM callback vector for event delivery is enabled\n");
- alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_xen_hvm_callback);
+ install_system_interrupt_handler(HYPERVISOR_CALLBACK_VECTOR,
+ asm_sysvec_xen_hvm_callback,
+ sysvec_xen_hvm_callback);
}
#else
void xen_setup_callback_vector(void) {}
--
2.34.1

2022-12-20 07:58:30

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 09/32] x86/cpu: add X86_CR4_FRED macro

From: "H. Peter Anvin (Intel)" <[email protected]>

Add X86_CR4_FRED macro for the FRED bit in %cr4. This bit should be a
pinned bit, not to be changed after initialization.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/uapi/asm/processor-flags.h | 2 ++
arch/x86/kernel/cpu/common.c | 11 ++++++++---
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index c47cc7f2feeb..a90933f1ff41 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -132,6 +132,8 @@
#define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT)
#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement Technology */
#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_FRED_BIT 32 /* enable FRED kernel entry */
+#define X86_CR4_FRED _BITULL(X86_CR4_FRED_BIT)

/*
* x86-64 Task Priority Register, CR8
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d6eb4f60b47d..05a5538052ad 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -411,10 +411,15 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
cr4_clear_bits(X86_CR4_UMIP);
}

-/* These bits should not change their value after CPU init is finished. */
+/*
+ * These bits should not change their value after CPU init is
+ * finished. The explicit cast to unsigned long suppresses a warning
+ * on i386 for x86-64 only feature bits >= 32.
+ */
static const unsigned long cr4_pinned_mask =
- X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
- X86_CR4_FSGSBASE | X86_CR4_CET;
+ (unsigned long)
+ (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
+ X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED);
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;

--
2.34.1

2022-12-20 08:18:55

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 26/32] x86/fred: no ESPFIX needed when FRED is enabled

From: "H. Peter Anvin (Intel)" <[email protected]>

Because FRED always restores the full value of %rsp, ESPFIX is
no longer needed when it's enabled.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/kernel/espfix_64.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 9417d5aa7305..b594fcc0a4b7 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -116,6 +116,10 @@ void __init init_espfix_bsp(void)
pgd_t *pgd;
p4d_t *p4d;

+ /* FRED systems don't need ESPFIX */
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ return;
+
/* Install the espfix pud into the kernel page directory */
pgd = &init_top_pgt[pgd_index(ESPFIX_BASE_ADDR)];
p4d = p4d_alloc(&init_mm, pgd, ESPFIX_BASE_ADDR);
@@ -139,6 +143,10 @@ void init_espfix_ap(int cpu)
void *stack_page;
pteval_t ptemask;

+ /* FRED systems don't need ESPFIX */
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ return;
+
/* We only have to do this once... */
if (likely(per_cpu(espfix_stack, cpu)))
return; /* Already initialized */
--
2.34.1

2022-12-20 08:20:37

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 29/32] x86/ia32: do not modify the DPL bits for a null selector

When a null selector is to be loaded into a segment register,
reload_segments() sets its DPL bits to 3. Later when the IRET
instruction loads it, it zeros the segment register. The two
operations offset each other to actually effect a nop.

Unlike IRET, ERETU does not make any of DS, ES, FS, or GS null
if it is found to have DPL < 3. It is expected that a FRED-enabled
operating system will return to ring 3 (in compatibility mode)
only when those segments all have DPL = 3.

Thus when FRED is enabled, we end up with having 3 in a segment
register even when it is initially set to 0.

Fix it by not modifying the DPL bits for a null selector.

Signed-off-by: Xin Li <[email protected]>
---
arch/x86/ia32/ia32_signal.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 14c739303099..31f5bbb59441 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -36,22 +36,27 @@
#include <asm/smap.h>
#include <asm/gsseg.h>

+static inline u16 usrseg(u16 sel)
+{
+ return sel <= 3 ? sel : sel | 3;
+}
+
static inline void reload_segments(struct sigcontext_32 *sc)
{
unsigned int cur;

savesegment(gs, cur);
- if ((sc->gs | 0x03) != cur)
- load_gs_index(sc->gs | 0x03);
+ if (usrseg(sc->gs) != cur)
+ load_gs_index(usrseg(sc->gs));
savesegment(fs, cur);
- if ((sc->fs | 0x03) != cur)
- loadsegment(fs, sc->fs | 0x03);
+ if (usrseg(sc->fs) != cur)
+ loadsegment(fs, usrseg(sc->fs));
savesegment(ds, cur);
- if ((sc->ds | 0x03) != cur)
- loadsegment(ds, sc->ds | 0x03);
+ if (usrseg(sc->ds) != cur)
+ loadsegment(ds, usrseg(sc->ds));
savesegment(es, cur);
- if ((sc->es | 0x03) != cur)
- loadsegment(es, sc->es | 0x03);
+ if (usrseg(sc->es) != cur)
+ loadsegment(es, usrseg(sc->es));
}

/*
--
2.34.1

2022-12-20 08:26:14

by Li, Xin3

[permalink] [raw]
Subject: [RFC PATCH 14/32] x86/fred: header file with FRED definitions

From: "H. Peter Anvin (Intel)" <[email protected]>

Add a header file for FRED prototypes and definitions.

Signed-off-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Xin Li <[email protected]>
---
arch/x86/include/asm/fred.h | 99 +++++++++++++++++++++++++++++++++++++
1 file changed, 99 insertions(+)
create mode 100644 arch/x86/include/asm/fred.h

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
new file mode 100644
index 000000000000..6292b28d461d
--- /dev/null
+++ b/arch/x86/include/asm/fred.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * arch/x86/include/asm/fred.h
+ *
+ * Macros for Flexible Return and Event Delivery (FRED)
+ */
+
+#ifndef ASM_X86_FRED_H
+#define ASM_X86_FRED_H
+
+#ifdef CONFIG_X86_FRED
+
+#include <linux/const.h>
+#include <asm/asm.h>
+
+/*
+ * FRED return instructions
+ *
+ * Replace with "ERETS"/"ERETU" once binutils support FRED return instructions.
+ */
+#define ERETS _ASM_BYTES(0xf2,0x0f,0x01,0xca)
+#define ERETU _ASM_BYTES(0xf3,0x0f,0x01,0xca)
+
+/*
+ * Event stack level macro for the FRED_STKLVLS MSR.
+ * Usage example: FRED_STKLVL(X86_TRAP_DF, 3)
+ * Multiple values can be ORd together.
+ */
+#define FRED_STKLVL(v,l) (_AT(unsigned long, l) << (2*(v)))
+
+/* FRED_CONFIG MSR */
+#define FRED_CONFIG_CSL_MASK 0x3
+#define FRED_CONFIG_SHADOW_STACK_SPACE _BITUL(3)
+#define FRED_CONFIG_REDZONE(b) __ALIGN_KERNEL_MASK((b), _UL(0x3f))
+#define FRED_CONFIG_INT_STKLVL(l) (_AT(unsigned long, l) << 9)
+#define FRED_CONFIG_ENTRYPOINT(p) _AT(unsigned long, (p))
+
+/* FRED event type and vector bit width and counts */
+#define FRED_EVENT_TYPE_BITS 3 /* only 3 bits used in FRED 3.0 */
+#define FRED_EVENT_TYPE_COUNT _BITUL(FRED_EVENT_TYPE_BITS)
+#define FRED_EVENT_VECTOR_BITS 8
+#define FRED_EVENT_VECTOR_COUNT _BITUL(FRED_EVENT_VECTOR_BITS)
+
+/* FRED EVENT_TYPE_OTHER vector numbers */
+#define FRED_SYSCALL 1
+#define FRED_SYSENTER 2
+
+/* Flags above the CS selector (regs->csl) */
+#define FRED_CSL_ENABLE_NMI _BITUL(28)
+#define FRED_CSL_ALLOW_SINGLE_STEP _BITUL(25)
+#define FRED_CSL_INTERRUPT_SHADOW _BITUL(24)
+
+#ifndef __ASSEMBLY__
+
+#include <linux/kernel.h>
+#include <asm/ptrace.h>
+
+/* FRED stack frame information */
+struct fred_info {
+ unsigned long edata; /* Event data: CR2, DR6, ... */
+ unsigned long resv;
+};
+
+/* Full format of the FRED stack frame */
+struct fred_frame {
+ struct pt_regs regs;
+ struct fred_info info;
+};
+
+/* Getting the FRED frame information from a pt_regs pointer */
+static __always_inline struct fred_info *fred_info(struct pt_regs *regs)
+{
+ return &container_of(regs, struct fred_frame, regs)->info;
+}
+
+static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
+{
+ return fred_info(regs)->edata;
+}
+
+/*
+ * How FRED event handlers are called.
+ *
+ * FRED event delivery establishes the full supervisor context
+ * by pushing everything related to the event being delivered
+ * to the FRED stack frame, e.g., the faulting linear address
+ * of a #PF is pushed as event data of the FRED #PF stack frame.
+ * Thus a struct pt_regs has everything needed and it's the only
+ * input parameter required for a FRED event handler.
+ */
+#define DECLARE_FRED_HANDLER(f) void f (struct pt_regs *regs)
+#define DEFINE_FRED_HANDLER(f) noinstr DECLARE_FRED_HANDLER(f)
+typedef DECLARE_FRED_HANDLER((*fred_handler));
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* CONFIG_X86_FRED */
+
+#endif /* ASM_X86_FRED_H */
--
2.34.1

2022-12-20 09:33:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 14/32] x86/fred: header file with FRED definitions


> +/*
> + * FRED return instructions
> + *
> + * Replace with "ERETS"/"ERETU" once binutils support FRED return instructions.

binutils version that supports these instructions goes here...

> + */
> +#define ERETS _ASM_BYTES(0xf2,0x0f,0x01,0xca)
> +#define ERETU _ASM_BYTES(0xf3,0x0f,0x01,0xca)
> +
> +/*
> + * Event stack level macro for the FRED_STKLVLS MSR.
> + * Usage example: FRED_STKLVL(X86_TRAP_DF, 3)
> + * Multiple values can be ORd together.
> + */
> +#define FRED_STKLVL(v,l) (_AT(unsigned long, l) << (2*(v)))
> +
> +/* FRED_CONFIG MSR */
> +#define FRED_CONFIG_CSL_MASK 0x3
> +#define FRED_CONFIG_SHADOW_STACK_SPACE _BITUL(3)
> +#define FRED_CONFIG_REDZONE(b) __ALIGN_KERNEL_MASK((b), _UL(0x3f))
> +#define FRED_CONFIG_INT_STKLVL(l) (_AT(unsigned long, l) << 9)
> +#define FRED_CONFIG_ENTRYPOINT(p) _AT(unsigned long, (p))
> +
> +/* FRED event type and vector bit width and counts */
> +#define FRED_EVENT_TYPE_BITS 3 /* only 3 bits used in FRED 3.0 */
> +#define FRED_EVENT_TYPE_COUNT _BITUL(FRED_EVENT_TYPE_BITS)
> +#define FRED_EVENT_VECTOR_BITS 8
> +#define FRED_EVENT_VECTOR_COUNT _BITUL(FRED_EVENT_VECTOR_BITS)
> +
> +/* FRED EVENT_TYPE_OTHER vector numbers */
> +#define FRED_SYSCALL 1
> +#define FRED_SYSENTER 2
> +
> +/* Flags above the CS selector (regs->csl) */
> +#define FRED_CSL_ENABLE_NMI _BITUL(28)
> +#define FRED_CSL_ALLOW_SINGLE_STEP _BITUL(25)
> +#define FRED_CSL_INTERRUPT_SHADOW _BITUL(24)

What's the state of IBT WAIT-FOR-ENDBR vs this? That really should also
get a high CS bit.

2022-12-20 09:34:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 18/32] x86/fred: add a debug fault entry stub for FRED

On Mon, Dec 19, 2022 at 10:36:44PM -0800, Xin Li wrote:

> +static __always_inline void debug_kernel_common(struct pt_regs *regs,
> + unsigned long dr6)
> {
> - /*
> - * Disable breakpoints during exception handling; recursive exceptions
> - * are exceedingly 'fun'.
> - *
> - * Since this function is NOKPROBE, and that also applies to
> - * HW_BREAKPOINT_X, we can't hit a breakpoint before this (XXX except a
> - * HW_BREAKPOINT_W on our stack)
> - *
> - * Entry text is excluded for HW_BP_X and cpu_entry_area, which
> - * includes the entry stack is excluded for everything.
> - */
> - unsigned long dr7 = local_db_save();
> - irqentry_state_t irq_state = irqentry_nmi_enter(regs);
> instrumentation_begin();
>
> /*
> @@ -1062,7 +1050,8 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
> * Catch SYSENTER with TF set and clear DR_STEP. If this hit a
> * watchpoint at the same time then that will still be handled.
> */
> - if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
> + if (!cpu_feature_enabled(X86_FEATURE_FRED) &&
> + (dr6 & DR_STEP) && is_sysenter_singlestep(regs))
> dr6 &= ~DR_STEP;
>
> /*
> @@ -1089,8 +1078,28 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
> regs->flags &= ~X86_EFLAGS_TF;
> out:
> instrumentation_end();
> - irqentry_nmi_exit(regs, irq_state);
> +}

Why doesn't the FRED handler get to to use irqentry_nmi_{enter,exit}() ?

> @@ -1179,6 +1188,24 @@ DEFINE_IDTENTRY_DEBUG_USER(exc_debug)
> {
> exc_debug_user(regs, debug_read_clear_dr6());
> }
> +
> +# ifdef CONFIG_X86_FRED
> +DEFINE_FRED_HANDLER(fred_exc_debug)
> +{
> + /*
> + * The FRED debug information saved onto stack differs from
> + * DR6 in both stickiness and polarity; it is exactly what
> + * debug_read_clear_dr6() returns.
> + */
> + unsigned long dr6 = fred_event_data(regs);
> +
> + if (user_mode(regs))
> + exc_debug_user(regs, dr6);
> + else
> + debug_kernel_common(regs, dr6);
> +}
> +# endif /* CONFIG_X86_FRED */

2022-12-20 10:21:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 22/32] x86/fred: FRED initialization code

On Tue, Dec 20, 2022 at 09:55:31AM +0000, Andrew Cooper wrote:
> On 20/12/2022 9:45 am, Peter Zijlstra wrote:
> > On Mon, Dec 19, 2022 at 10:36:48PM -0800, Xin Li wrote:
> >
> >> + wrmsrl(MSR_IA32_FRED_STKLVLS,
> >> + FRED_STKLVL(X86_TRAP_DB, 1) |
> >> + FRED_STKLVL(X86_TRAP_NMI, 2) |
> >> + FRED_STKLVL(X86_TRAP_MC, 2) |
> >> + FRED_STKLVL(X86_TRAP_DF, 3));
> >> +
> >> + /* The FRED equivalents to IST stacks... */
> >> + wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB));
> >> + wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI));
> >> + wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF));
> > Not quite.. IIRC fred only switches to another stack when the level of
> > the exception is higher. Specifically, if we trigger #DB while inside
> > #NMI we will not switch to the #DB stack (since 1 < 2).
>
> There needs to be a new stack for #DF, and just possibly one for #MC.?
> NMI and #DB do not need separate stacks under FRED.

True, there is very little need to use additional stacks with FRED.

> > Now, as mentioned elsewhere, it all nests a lot saner, but stack
> > exhaustion is still a thing, given the above, what happens when a #DB
> > hits an #NMI which tickles a #VE or something?
> >
> > I don't think we've increased the exception stack size, but perhaps we
> > should for FRED?
>
> Not sure if it matters too much - it doesn't seem usefully different to
> IDT delivery.? #DB shouldn't get too deep, and NMI gets properly
> inhibited now.

Both #DB and #NMI can end up in perf, and all that goes quite deep :/

2022-12-20 10:26:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 22/32] x86/fred: FRED initialization code

On Mon, Dec 19, 2022 at 10:36:48PM -0800, Xin Li wrote:

> + wrmsrl(MSR_IA32_FRED_STKLVLS,
> + FRED_STKLVL(X86_TRAP_DB, 1) |
> + FRED_STKLVL(X86_TRAP_NMI, 2) |
> + FRED_STKLVL(X86_TRAP_MC, 2) |
> + FRED_STKLVL(X86_TRAP_DF, 3));
> +
> + /* The FRED equivalents to IST stacks... */
> + wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB));
> + wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI));
> + wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF));

Not quite.. IIRC fred only switches to another stack when the level of
the exception is higher. Specifically, if we trigger #DB while inside
#NMI we will not switch to the #DB stack (since 1 < 2).

Now, as mentioned elsewhere, it all nests a lot saner, but stack
exhaustion is still a thing, given the above, what happens when a #DB
hits an #NMI which tickles a #VE or something?

I don't think we've increased the exception stack size, but perhaps we
should for FRED?

2022-12-21 03:45:02

by Li, Xin3

[permalink] [raw]
Subject: RE: [RFC PATCH 14/32] x86/fred: header file with FRED definitions

> > +/*
> > + * FRED return instructions
> > + *
> > + * Replace with "ERETS"/"ERETU" once binutils support FRED return
> instructions.
>
> binutils version that supports these instructions goes here...
>
> > + */
> > +#define ERETS _ASM_BYTES(0xf2,0x0f,0x01,0xca)
> > +#define ERETU _ASM_BYTES(0xf3,0x0f,0x01,0xca)
> > +
> > +/*
> > + * Event stack level macro for the FRED_STKLVLS MSR.
> > + * Usage example: FRED_STKLVL(X86_TRAP_DF, 3)
> > + * Multiple values can be ORd together.
> > + */
> > +#define FRED_STKLVL(v,l) (_AT(unsigned long, l) << (2*(v)))
> > +
> > +/* FRED_CONFIG MSR */
> > +#define FRED_CONFIG_CSL_MASK 0x3
> > +#define FRED_CONFIG_SHADOW_STACK_SPACE _BITUL(3)
> > +#define FRED_CONFIG_REDZONE(b) __ALIGN_KERNEL_MASK((b),
> _UL(0x3f))
> > +#define FRED_CONFIG_INT_STKLVL(l) (_AT(unsigned long, l) << 9)
> > +#define FRED_CONFIG_ENTRYPOINT(p) _AT(unsigned long, (p))
> > +
> > +/* FRED event type and vector bit width and counts */
> > +#define FRED_EVENT_TYPE_BITS 3 /* only 3 bits used in FRED 3.0
> */
> > +#define FRED_EVENT_TYPE_COUNT _BITUL(FRED_EVENT_TYPE_BITS)
> > +#define FRED_EVENT_VECTOR_BITS 8
> > +#define FRED_EVENT_VECTOR_COUNT
> _BITUL(FRED_EVENT_VECTOR_BITS)
> > +
> > +/* FRED EVENT_TYPE_OTHER vector numbers */
> > +#define FRED_SYSCALL 1
> > +#define FRED_SYSENTER 2
> > +
> > +/* Flags above the CS selector (regs->csl) */
> > +#define FRED_CSL_ENABLE_NMI _BITUL(28)
> > +#define FRED_CSL_ALLOW_SINGLE_STEP _BITUL(25)
> > +#define FRED_CSL_INTERRUPT_SHADOW _BITUL(24)
>
> What's the state of IBT WAIT-FOR-ENDBR vs this? That really should also get a
> high CS bit.

FRED does provide more possibilities :)

2022-12-21 05:52:43

by Li, Xin3

[permalink] [raw]
Subject: RE: [RFC PATCH 22/32] x86/fred: FRED initialization code

> > >> + wrmsrl(MSR_IA32_FRED_STKLVLS,
> > >> + FRED_STKLVL(X86_TRAP_DB, 1) |
> > >> + FRED_STKLVL(X86_TRAP_NMI, 2) |
> > >> + FRED_STKLVL(X86_TRAP_MC, 2) |
> > >> + FRED_STKLVL(X86_TRAP_DF, 3));
> > >> +
> > >> + /* The FRED equivalents to IST stacks... */
> > >> + wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB));
> > >> + wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI));
> > >> + wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF));
> > > Not quite.. IIRC fred only switches to another stack when the level
> > > of the exception is higher. Specifically, if we trigger #DB while
> > > inside #NMI we will not switch to the #DB stack (since 1 < 2).

Yes, current stack level can only grow higher.

> >
> > There needs to be a new stack for #DF, and just possibly one for #MC.
> > NMI and #DB do not need separate stacks under FRED.
>
> True, there is very little need to use additional stacks with FRED.

Pretty much.

#DB/NMI from a ring 3 context uses CSL 0, and their CSLs increase only
when happening from a ring 0 context.

>
> > > Now, as mentioned elsewhere, it all nests a lot saner, but stack
> > > exhaustion is still a thing, given the above, what happens when a
> > > #DB hits an #NMI which tickles a #VE or something?
> > >
> > > I don't think we've increased the exception stack size, but perhaps
> > > we should for FRED?
> >
> > Not sure if it matters too much - it doesn't seem usefully different
> > to IDT delivery.? #DB shouldn't get too deep, and NMI gets properly
> > inhibited now.
>
> Both #DB and #NMI can end up in perf, and all that goes quite deep :/

Can you please elaborate it a bit?

2022-12-21 05:59:40

by H. Peter Anvin

[permalink] [raw]
Subject: RE: [RFC PATCH 22/32] x86/fred: FRED initialization code

On December 20, 2022 9:28:52 PM PST, "Li, Xin3" <[email protected]> wrote:
>> > >> + wrmsrl(MSR_IA32_FRED_STKLVLS,
>> > >> + FRED_STKLVL(X86_TRAP_DB, 1) |
>> > >> + FRED_STKLVL(X86_TRAP_NMI, 2) |
>> > >> + FRED_STKLVL(X86_TRAP_MC, 2) |
>> > >> + FRED_STKLVL(X86_TRAP_DF, 3));
>> > >> +
>> > >> + /* The FRED equivalents to IST stacks... */
>> > >> + wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB));
>> > >> + wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI));
>> > >> + wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF));
>> > > Not quite.. IIRC fred only switches to another stack when the level
>> > > of the exception is higher. Specifically, if we trigger #DB while
>> > > inside #NMI we will not switch to the #DB stack (since 1 < 2).
>
>Yes, current stack level can only grow higher.
>
>> >
>> > There needs to be a new stack for #DF, and just possibly one for #MC.
>> > NMI and #DB do not need separate stacks under FRED.
>>
>> True, there is very little need to use additional stacks with FRED.
>
>Pretty much.
>
>#DB/NMI from a ring 3 context uses CSL 0, and their CSLs increase only
>when happening from a ring 0 context.
>
>>
>> > > Now, as mentioned elsewhere, it all nests a lot saner, but stack
>> > > exhaustion is still a thing, given the above, what happens when a
>> > > #DB hits an #NMI which tickles a #VE or something?
>> > >
>> > > I don't think we've increased the exception stack size, but perhaps
>> > > we should for FRED?
>> >
>> > Not sure if it matters too much - it doesn't seem usefully different
>> > to IDT delivery.  #DB shouldn't get too deep, and NMI gets properly
>> > inhibited now.
>>
>> Both #DB and #NMI can end up in perf, and all that goes quite deep :/
>
>Can you please elaborate it a bit?
>

Right, this is one major reason for putting #DB/NMI in a separate stack level.

2022-12-22 13:21:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 22/32] x86/fred: FRED initialization code

On Tue, Dec 20, 2022 at 09:44:33PM -0800, H. Peter Anvin wrote:

> Right, this is one major reason for putting #DB/NMI in a separate stack level.

But with the IST they each have their own stack, with FRED they'll share
this small stack.

2022-12-22 13:23:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 14/32] x86/fred: header file with FRED definitions

On Wed, Dec 21, 2022 at 02:58:06AM +0000, Li, Xin3 wrote:

> > > +/* Flags above the CS selector (regs->csl) */
> > > +#define FRED_CSL_ENABLE_NMI _BITUL(28)
> > > +#define FRED_CSL_ALLOW_SINGLE_STEP _BITUL(25)
> > > +#define FRED_CSL_INTERRUPT_SHADOW _BITUL(24)
> >
> > What's the state of IBT WAIT-FOR-ENDBR vs this? That really should also get a
> > high CS bit.
>
> FRED does provide more possibilities :)

That's not an answer. IBT has a clear defect and FRED *should* fix it.

2022-12-23 19:45:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 22/32] x86/fred: FRED initialization code

On December 20, 2022 1:55:31 AM PST, Andrew Cooper <[email protected]> wrote:
>On 20/12/2022 9:45 am, Peter Zijlstra wrote:
>> On Mon, Dec 19, 2022 at 10:36:48PM -0800, Xin Li wrote:
>>
>>> + wrmsrl(MSR_IA32_FRED_STKLVLS,
>>> + FRED_STKLVL(X86_TRAP_DB, 1) |
>>> + FRED_STKLVL(X86_TRAP_NMI, 2) |
>>> + FRED_STKLVL(X86_TRAP_MC, 2) |
>>> + FRED_STKLVL(X86_TRAP_DF, 3));
>>> +
>>> + /* The FRED equivalents to IST stacks... */
>>> + wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB));
>>> + wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI));
>>> + wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF));
>> Not quite.. IIRC fred only switches to another stack when the level of
>> the exception is higher. Specifically, if we trigger #DB while inside
>> #NMI we will not switch to the #DB stack (since 1 < 2).
>
>There needs to be a new stack for #DF, and just possibly one for #MC. 
>NMI and #DB do not need separate stacks under FRED.
>
>> Now, as mentioned elsewhere, it all nests a lot saner, but stack
>> exhaustion is still a thing, given the above, what happens when a #DB
>> hits an #NMI which tickles a #VE or something?
>>
>> I don't think we've increased the exception stack size, but perhaps we
>> should for FRED?
>
>Not sure if it matters too much - it doesn't seem usefully different to
>IDT delivery.  #DB shouldn't get too deep, and NMI gets properly
>inhibited now.
>
>~Andrew
>

I still don't think you want to take #DB or – especially – NMI on the task stack while in the kernel. In fact, the plan is to get rid of the software irqstack handling, too, but at tglx's request that will be a later changeset (correctness first, then optimization.)

2022-12-23 19:55:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 22/32] x86/fred: FRED initialization code

On December 22, 2022 5:09:28 AM PST, Peter Zijlstra <[email protected]> wrote:
>On Tue, Dec 20, 2022 at 09:44:33PM -0800, H. Peter Anvin wrote:
>
>> Right, this is one major reason for putting #DB/NMI in a separate stack level.
>
>But with the IST they each have their own stack, with FRED they'll share
>this small stack.

So make the stack larger if you think we need it. You still end up using less total stack space with FRED.

2022-12-23 20:06:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 14/32] x86/fred: header file with FRED definitions

On December 22, 2022 5:03:57 AM PST, Peter Zijlstra <[email protected]> wrote:
>On Wed, Dec 21, 2022 at 02:58:06AM +0000, Li, Xin3 wrote:
>
>> > > +/* Flags above the CS selector (regs->csl) */
>> > > +#define FRED_CSL_ENABLE_NMI _BITUL(28)
>> > > +#define FRED_CSL_ALLOW_SINGLE_STEP _BITUL(25)
>> > > +#define FRED_CSL_INTERRUPT_SHADOW _BITUL(24)
>> >
>> > What's the state of IBT WAIT-FOR-ENDBR vs this? That really should also get a
>> > high CS bit.
>>
>> FRED does provide more possibilities :)
>
>That's not an answer. IBT has a clear defect and FRED *should* fix it.

You are not wrong, of course. That being said, we have not wanted to hitch too many things to the FRED baseline, lest it ends up delayed for implementation/validation reasons. The important thing is that FRED *does* provide the mechanism for addressing that even if it does not make the first implementation.