2021-09-26 15:10:28

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 00/41] x86/entry/64: Convert a bunch of ASM entry code into C code

From: Lai Jiangshan <[email protected]>

Many ASM code in entry_64.S can be rewritten in C if they can be written
to be non-instrumentable and are called in the right order regarding to
whether CR3/gsbase is changed to kernel CR3/gsbase.

The patchset covert some of them to C code.

The patch 16 converts the error_entry() to C code. And patch 1-15
are preparation for it.

The patches 17-37 convert the IST entry code to C code. Many of them
are preparation for the actual conversion.

The patch 41 converts a small part of ASM code of syscall to C code which
does the checking for whether it can use sysret to return to userspace.

Some other paths can be possible to be in C code, for example: the
error exit, the syscall entry/exit. The PTI handling for them can
be in C code. But it would required the pt_regs to be copied/pushed
to the entry stack which means the C code would not be efficient.

When converting ASM to C, the most effort is to make them the same.
Almost no creative was involved. The code are kept as the same as ASM
as possible and no functional change intended unless my misunderstanding
in the ASM code was involved. The functions called by the C entry code
are checked to be ensured noinstr or __always_inline. Some of them have
more than one definitions and require some more cares from reviewers.
The comments in the ASM are also copied in the right place in the C code.

Changed from V1:
Add a fix as the patch1. Found by trying to applied Peterz's
suggestion in patch11.
The whole entry_error() is converted to C instead of partial.
The whole parnoid_entry() is converted to C instead of partial.
The asm code of "parnoid_entry() cfunc() parnoid_exit()" are
converted to C as suggested by Peterz.
Add entry64.c rather than move traps.c to arch/x86/entry/
The order of some commits is changed.
Remove two cleanups

[V1]: https://lore.kernel.org/all/[email protected]/

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Joerg Roedel <[email protected]>

Lai Jiangshan (41):
x86/entry: Fix swapgs fence
x86/traps: Remove stack-protector from traps.c
compiler_types.h: Add __noinstr_section() for noinstr
x86/entry: Introduce __entry_text for entry code written in C
x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h
x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline
x86/traps: Move the declaration of native_irq_return_iret into proto.h
x86/entry: Add arch/x86/entry/entry64.c for C entry code
x86/entry: Expose the address of .Lgs_change to entry64.c
x86/entry: Add C verion of SWITCH_TO_KERNEL_CR3 as
switch_to_kernel_cr3()
x86/entry: Add C user_entry_swapgs_and_fence() and
kernel_entry_fence_no_swapgs()
x86/traps: Move pt_regs only in fixup_bad_iret()
x86/entry: Switch the stack after error_entry() returns
x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry
objtool: Allow .entry.text function using CLD instruction
x86/entry: Implement the whole error_entry() as C code
x86/entry: Make paranoid_exit() callable
x86/entry: Call paranoid_exit() in asm_exc_nmi()
x86/entry: move PUSH_AND_CLEAR_REGS out of paranoid_entry
x86/entry: Add the C version ist_switch_to_kernel_cr3()
x86/entry: Add the C version ist_restore_cr3()
x86/entry: Add the C version get_percpu_base()
x86/entry: Add the C version ist_switch_to_kernel_gsbase()
x86/entry: Implement the C version ist_paranoid_entry()
x86/entry: Implement the C version ist_paranoid_exit()
x86/entry: Add a C macro to define the function body for IST in
.entry.text
x86/mce: Remove stack protector from mce/core.c
x86/debug, mce: Use C entry code
x86/idtentry.h: Move the definitions *IDTENTRY_{MCE|DEBUG}* up
x86/nmi: Use DEFINE_IDTENTRY_NMI for nmi
x86/nmi: Remove stack protector from nmi.c
x86/nmi: Use C entry code
x86/entry: Add a C macro to define the function body for IST in
.entry.text with an error code
x86/doublefault: Use C entry code
x86/sev: Add and use ist_vc_switch_off_ist()
x86/sev: Remove stack protector from sev.c
x86/sev: Use C entry code
x86/entry: Remove ASM function paranoid_entry() and paranoid_exit()
x86/entry: Remove the unused ASM macros
x86/entry: Remove save_ret from PUSH_AND_CLEAR_REGS
x86/syscall/64: Move the checking for sysret to C code

arch/x86/entry/Makefile | 5 +-
arch/x86/entry/calling.h | 142 +--------
arch/x86/entry/common.c | 73 ++++-
arch/x86/entry/entry64.c | 354 ++++++++++++++++++++++
arch/x86/entry/entry_64.S | 403 +++----------------------
arch/x86/include/asm/idtentry.h | 64 +++-
arch/x86/include/asm/processor-flags.h | 15 +
arch/x86/include/asm/proto.h | 1 +
arch/x86/include/asm/special_insns.h | 4 +-
arch/x86/include/asm/syscall.h | 2 +-
arch/x86/include/asm/traps.h | 6 +-
arch/x86/kernel/Makefile | 7 +
arch/x86/kernel/cpu/mce/Makefile | 4 +
arch/x86/kernel/nmi.c | 2 +-
arch/x86/kernel/traps.c | 33 +-
include/linux/compiler_types.h | 6 +-
tools/objtool/check.c | 2 +-
17 files changed, 580 insertions(+), 543 deletions(-)
create mode 100644 arch/x86/entry/entry64.c

--
2.19.1.6.gb485710b


2021-09-26 15:10:53

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 02/41] x86/traps: Remove stack-protector from traps.c

From: Lai Jiangshan <[email protected]>

When stack-protector is enabled, the compiler adds some instrument code
at the beginning and the end of some functions. Many functions in traps.c
are non-instrumentable. Moreover, stack-protector code in the beginning
of the affected function accesses the canary that might be watched by
hardware breakpoints which also violate the non-instrumentable
nature of some functions and might cause infinite recursive #DB because
the canary is accessed before resetting the dr7.

So it is better to remove stack-protector from traps.c.

It is also prepared for later patches that move some entry code into
traps.c, some of which can NOT use percpu register until gsbase is
properly switched. And stack-protector depends on the percpu register
to work.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kernel/Makefile | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8f4e8fa6ed75..0e054e2304c6 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -48,6 +48,9 @@ KCOV_INSTRUMENT := n

CFLAGS_head$(BITS).o += -fno-stack-protector

+CFLAGS_REMOVE_traps.o = -fstack-protector -fstack-protector-strong
+CFLAGS_traps.o += -fno-stack-protector
+
CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace

obj-y := process_$(BITS).o signal.o
--
2.19.1.6.gb485710b

2021-09-26 15:11:14

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 03/41] compiler_types.h: Add __noinstr_section() for noinstr

From: Lai Jiangshan <[email protected]>

And it will be extended for C entry code.

Signed-off-by: Lai Jiangshan <[email protected]>
---
include/linux/compiler_types.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index b6ff83a714ca..3c77631c68bd 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -208,10 +208,12 @@ struct ftrace_likely_data {
#endif

/* Section for code which can't be instrumented at all */
-#define noinstr \
- noinline notrace __attribute((__section__(".noinstr.text"))) \
+#define __noinstr_section(section) \
+ noinline notrace __attribute((__section__(section))) \
__no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage

+#define noinstr __noinstr_section(".noinstr.text")
+
#endif /* __KERNEL__ */

#endif /* __ASSEMBLY__ */
--
2.19.1.6.gb485710b

2021-09-26 15:11:25

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 04/41] x86/entry: Introduce __entry_text for entry code written in C

From: Lai Jiangshan <[email protected]>

Some entry code will be implemented in C files. We need __entry_text
to set them in .entry.text section. __entry_text disables instruments
like noinstr, but it doesn't disable stack protector since not all
compiler supported by kernel supporting function level granular
attribute to disable stack protector. It will be disabled by C file
level.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/idtentry.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e9902..6779def97591 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -11,6 +11,9 @@

#include <asm/irq_stack.h>

+/* Entry code written in C. */
+#define __entry_text __noinstr_section(".entry.text")
+
/**
* DECLARE_IDTENTRY - Declare functions for simple IDT entry points
* No error code pushed by hardware
--
2.19.1.6.gb485710b

2021-09-26 15:11:31

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 05/41] x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h

From: Lai Jiangshan <[email protected]>

These constants will be also used in C file, so we move them to
arch/x86/include/asm/processor-flags.h which already has a kin
X86_CR3_PTI_PCID_USER_BIT defined in it.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/calling.h | 10 ----------
arch/x86/include/asm/processor-flags.h | 15 +++++++++++++++
2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a4c061fb7c6e..996b041e92d2 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -149,16 +149,6 @@ For 32-bit we have the following conventions - kernel is built with

#ifdef CONFIG_PAGE_TABLE_ISOLATION

-/*
- * PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two
- * halves:
- */
-#define PTI_USER_PGTABLE_BIT PAGE_SHIFT
-#define PTI_USER_PGTABLE_MASK (1 << PTI_USER_PGTABLE_BIT)
-#define PTI_USER_PCID_BIT X86_CR3_PTI_PCID_USER_BIT
-#define PTI_USER_PCID_MASK (1 << PTI_USER_PCID_BIT)
-#define PTI_USER_PGTABLE_AND_PCID_MASK (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
-
.macro SET_NOFLUSH_BIT reg:req
bts $X86_CR3_PCID_NOFLUSH_BIT, \reg
.endm
diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 02c2cbda4a74..4dd2fbbc861a 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -4,6 +4,7 @@

#include <uapi/asm/processor-flags.h>
#include <linux/mem_encrypt.h>
+#include <asm/page_types.h>

#ifdef CONFIG_VM86
#define X86_VM_MASK X86_EFLAGS_VM
@@ -50,7 +51,21 @@
#endif

#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
# define X86_CR3_PTI_PCID_USER_BIT 11
+
+#ifdef CONFIG_X86_64
+/*
+ * PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two
+ * halves:
+ */
+#define PTI_USER_PGTABLE_BIT PAGE_SHIFT
+#define PTI_USER_PGTABLE_MASK (1 << PTI_USER_PGTABLE_BIT)
+#define PTI_USER_PCID_BIT X86_CR3_PTI_PCID_USER_BIT
+#define PTI_USER_PCID_MASK (1 << PTI_USER_PCID_BIT)
+#define PTI_USER_PGTABLE_AND_PCID_MASK (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
+#endif
+
#endif

#endif /* _ASM_X86_PROCESSOR_FLAGS_H */
--
2.19.1.6.gb485710b

2021-09-26 15:11:44

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 06/41] x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline

From: Lai Jiangshan <[email protected]>

We need __native_read_cr3() & native_write_cr3() to be ensured noinstr.

It is prepared for later patches which implement entry code in C file.
Some of the code needs to handle KPTI and has to read/write CR3.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/special_insns.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index f3fbb84ff8a7..058995bb153c 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -42,14 +42,14 @@ static __always_inline void native_write_cr2(unsigned long val)
asm volatile("mov %0,%%cr2": : "r" (val) : "memory");
}

-static inline unsigned long __native_read_cr3(void)
+static __always_inline unsigned long __native_read_cr3(void)
{
unsigned long val;
asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : __FORCE_ORDER);
return val;
}

-static inline void native_write_cr3(unsigned long val)
+static __always_inline void native_write_cr3(unsigned long val)
{
asm volatile("mov %0,%%cr3": : "r" (val) : "memory");
}
--
2.19.1.6.gb485710b

2021-09-26 15:11:53

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 08/41] x86/entry: Add arch/x86/entry/entry64.c for C entry code

From: Lai Jiangshan <[email protected]>

Add a C file "entry64.c" to deposit C entry code for traps and faults
which will be as the same logic as the existing ASM code in entry_64.S.

The file is as low level as entry_64.S and its code can be running in
the environments that the GS base is user controlled value, or the CR3
is PTI user CR3 or both.

All the code in this file should not be instrumentable. Many instrument
facilities can be disabled by per-function attributes which are included
in __noinstr_section. But stack-protector can not be disabled function-
granularly by many versions of GCC that can be supported for compiling
the kernel. So stack-protector is disabled for the whole file in Makefile.

It is prepared for later patches that implement C version of the entry
code in entry64.c.

Suggested-by: Joerg Roedel <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/Makefile | 5 ++++-
arch/x86/entry/entry64.c | 11 +++++++++++
2 files changed, 15 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/entry/entry64.c

diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
index 7fec5dcf6438..492e0b113bd0 100644
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -11,12 +11,15 @@ CFLAGS_REMOVE_common.o = $(CC_FLAGS_FTRACE)

CFLAGS_common.o += -fno-stack-protector

+CFLAGS_REMOVE_entry64.o = -fstack-protector -fstack-protector-strong
+CFLAGS_entry64.o += -fno-stack-protector
+
obj-y := entry_$(BITS).o thunk_$(BITS).o syscall_$(BITS).o
obj-y += common.o
+obj-$(CONFIG_X86_64) += entry64.o

obj-y += vdso/
obj-y += vsyscall/

obj-$(CONFIG_IA32_EMULATION) += entry_64_compat.o syscall_32.o
obj-$(CONFIG_X86_X32_ABI) += syscall_x32.o
-
diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
new file mode 100644
index 000000000000..3a6d70367940
--- /dev/null
+++ b/arch/x86/entry/entry64.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 1991, 1992 Linus Torvalds
+ * Copyright (C) 2000, 2001, 2002 Andi Kleen SuSE Labs
+ * Copyright (C) 2000 Pavel Machek <[email protected]>
+ * Copyright (C) 2021 Lai Jiangshan, Alibaba
+ *
+ * Handle entries and exits for hardware traps and faults.
+ *
+ * It is as low level as entry_64.S and its code can be running in the
+ * environments that the GS base is user controlled value, or the CR3
+ * is PTI user CR3 or both.
+ */
+#include <asm/traps.h>
--
2.19.1.6.gb485710b

2021-09-26 15:12:08

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 09/41] x86/entry: Expose the address of .Lgs_change to entry64.c

From: Lai Jiangshan <[email protected]>

The address of .Lgs_change will be used in traps.c in later patch when
some entry code is implemented in entry64.c. So the address of .Lgs_change
is exposed to traps.c for preparation.

The label .Lgs_change is still needed in ASM code for extable due to it
can not use asm_load_gs_index_gs_change. Otherwise:

warning: objtool: __ex_table+0x0: don't know how to handle
non-section reloc symbol asm_load_gs_index_gs_change

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 2 ++
arch/x86/entry/entry_64.S | 3 ++-
2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 3a6d70367940..7272266a3726 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -9,3 +9,5 @@
* is PTI user CR3 or both.
*/
#include <asm/traps.h>
+
+extern unsigned char asm_load_gs_index_gs_change[];
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 95d85b16710b..291732f571a7 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -729,6 +729,7 @@ _ASM_NOKPROBE(common_interrupt_return)
SYM_FUNC_START(asm_load_gs_index)
FRAME_BEGIN
swapgs
+SYM_INNER_LABEL(asm_load_gs_index_gs_change, SYM_L_GLOBAL)
.Lgs_change:
movl %edi, %gs
2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
@@ -1006,7 +1007,7 @@ SYM_CODE_START_LOCAL(error_entry)
movl %ecx, %eax /* zero extend */
cmpq %rax, RIP+8(%rsp)
je .Lbstep_iret
- cmpq $.Lgs_change, RIP+8(%rsp)
+ cmpq $asm_load_gs_index_gs_change, RIP+8(%rsp)
jne .Lerror_entry_done_lfence

/*
--
2.19.1.6.gb485710b

2021-09-26 15:12:12

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 10/41] x86/entry: Add C verion of SWITCH_TO_KERNEL_CR3 as switch_to_kernel_cr3()

From: Lai Jiangshan <[email protected]>

The C version switch_to_kernel_cr3() implements SWITCH_TO_KERNEL_CR3().

No functional difference intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 7272266a3726..77838e19f1ac 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -11,3 +11,27 @@
#include <asm/traps.h>

extern unsigned char asm_load_gs_index_gs_change[];
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static __always_inline void pti_switch_to_kernel_cr3(unsigned long user_cr3)
+{
+ /*
+ * Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3
+ * at kernel pagetables:
+ */
+ unsigned long cr3 = user_cr3 & ~PTI_USER_PGTABLE_AND_PCID_MASK;
+
+ if (static_cpu_has(X86_FEATURE_PCID))
+ cr3 |= X86_CR3_PCID_NOFLUSH;
+
+ native_write_cr3(cr3);
+}
+
+static __always_inline void switch_to_kernel_cr3(void)
+{
+ if (static_cpu_has(X86_FEATURE_PTI))
+ pti_switch_to_kernel_cr3(__native_read_cr3());
+}
+#else
+static __always_inline void switch_to_kernel_cr3(void) {}
+#endif
--
2.19.1.6.gb485710b

2021-09-26 15:12:39

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 12/41] x86/traps: Move pt_regs only in fixup_bad_iret()

From: Lai Jiangshan <[email protected]>

Make fixup_bad_iret() works like sync_regs() which doesn't
move the return address of error_entry().

It is prepared later patch which implements the body of error_entry()
in C code. The fixup_bad_iret() can't handle return address when it
is called from C code.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 5 ++++-
arch/x86/include/asm/traps.h | 2 +-
arch/x86/kernel/traps.c | 17 ++++++-----------
3 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 291732f571a7..9921a823b2c6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1037,9 +1037,12 @@ SYM_CODE_START_LOCAL(error_entry)
* Pretend that the exception came from user mode: set up pt_regs
* as if we faulted immediately after IRET.
*/
- mov %rsp, %rdi
+ popq %r12 /* save return addr in %12 */
+ movq %rsp, %rdi /* arg0 = pt_regs pointer */
call fixup_bad_iret
mov %rax, %rsp
+ ENCODE_FRAME_POINTER
+ pushq %r12
jmp .Lerror_entry_from_usermode_after_swapgs
SYM_CODE_END(error_entry)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 6221be7cafc3..1cdd7e8bcba7 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -13,7 +13,7 @@
#ifdef CONFIG_X86_64
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
asmlinkage __visible notrace
-struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s);
+struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
void __init trap_init(void);
asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
#endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index cf852b5e347f..0afa16ea3702 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -759,13 +759,8 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r
}
#endif

-struct bad_iret_stack {
- void *error_entry_ret;
- struct pt_regs regs;
-};
-
asmlinkage __visible noinstr
-struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
+struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs)
{
/*
* This is called from entry_64.S early in handling a fault
@@ -775,19 +770,19 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s)
* just below the IRET frame) and we want to pretend that the
* exception came from the IRET target.
*/
- struct bad_iret_stack tmp, *new_stack =
- (struct bad_iret_stack *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+ struct pt_regs tmp, *new_stack =
+ (struct pt_regs *)__this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;

/* Copy the IRET target to the temporary storage. */
- __memcpy(&tmp.regs.ip, (void *)s->regs.sp, 5*8);
+ __memcpy(&tmp.ip, (void *)bad_regs->sp, 5*8);

/* Copy the remainder of the stack from the current stack. */
- __memcpy(&tmp, s, offsetof(struct bad_iret_stack, regs.ip));
+ __memcpy(&tmp, bad_regs, offsetof(struct pt_regs, ip));

/* Update the entry stack */
__memcpy(new_stack, &tmp, sizeof(tmp));

- BUG_ON(!user_mode(&new_stack->regs));
+ BUG_ON(!user_mode(new_stack));
return new_stack;
}
#endif
--
2.19.1.6.gb485710b

2021-09-26 15:12:52

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 13/41] x86/entry: Switch the stack after error_entry() returns

From: Lai Jiangshan <[email protected]>

error_entry() calls sync_regs() to settle/copy the pt_regs and switches
the stack directly after sync_regs(). But because error_entry() is also
called from entry, the switching has to handle the return address together,
which causes the behavior tangly.

Switching to the stack after error_entry() makes the code simpler and
intuitive.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9921a823b2c6..dd453a8e7317 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
.macro idtentry_body cfunc has_error_code:req

call error_entry
+ movq %rax, %rsp /* switch stack settled by sync_regs() */
+ ENCODE_FRAME_POINTER
UNWIND_HINT_REGS

movq %rsp, %rdi /* pt_regs pointer into 1st argument*/
@@ -979,19 +981,16 @@ SYM_CODE_START_LOCAL(error_entry)
/* We have user CR3. Change to kernel CR3. */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax

+ leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
.Lerror_entry_from_usermode_after_swapgs:
/* Put us onto the real thread stack. */
- popq %r12 /* save return addr in %12 */
- movq %rsp, %rdi /* arg0 = pt_regs pointer */
call sync_regs
- movq %rax, %rsp /* switch stack */
- ENCODE_FRAME_POINTER
- pushq %r12
ret

.Lerror_entry_done_lfence:
FENCE_SWAPGS_KERNEL_ENTRY
.Lerror_entry_done:
+ leaq 8(%rsp), %rax /* return pt_regs pointer */
ret

/*
@@ -1037,12 +1036,9 @@ SYM_CODE_START_LOCAL(error_entry)
* Pretend that the exception came from user mode: set up pt_regs
* as if we faulted immediately after IRET.
*/
- popq %r12 /* save return addr in %12 */
- movq %rsp, %rdi /* arg0 = pt_regs pointer */
+ leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
call fixup_bad_iret
- mov %rax, %rsp
- ENCODE_FRAME_POINTER
- pushq %r12
+ mov %rax, %rdi
jmp .Lerror_entry_from_usermode_after_swapgs
SYM_CODE_END(error_entry)

--
2.19.1.6.gb485710b

2021-09-26 15:13:11

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 15/41] objtool: Allow .entry.text function using CLD instruction

From: Lai Jiangshan <[email protected]>

The whole error_entry() will be implemented in C which has a CLD
instruction.

Signed-off-by: Lai Jiangshan <[email protected]>
---
tools/objtool/check.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 84e59a97bab6..2c775317b864 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3103,7 +3103,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
break;

case INSN_CLD:
- if (!state.df && func) {
+ if (!state.df && func && strcmp(sec->name, ".entry.text")) {
WARN_FUNC("redundant CLD", sec, insn->offset);
return 1;
}
--
2.19.1.6.gb485710b

2021-09-26 15:13:26

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 26/41] x86/entry: Add a C macro to define the function body for IST in .entry.text

From: Lai Jiangshan <[email protected]>

Add DEFINE_IDTENTRY_IST_ETNRY() macro to define C code to implement
the ASM code which calls paranoid_entry(), cfunc(), paranoid_exit()
in series for IST exceptions without error code.

Not functional difference intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/idtentry.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index b144ea05b859..b33e96e983c0 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -323,6 +323,20 @@ void ist_paranoid_exit(unsigned long cr3, unsigned long gsbase);
__visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code); \
__visible noinstr void user_##func(struct pt_regs *regs, unsigned long error_code)

+/**
+ * DEFINE_IDTENTRY_IST_ENTRY - Emit __entry_text code for IST entry points
+ * @func: Function name of the entry point
+ */
+#define DEFINE_IDTENTRY_IST_ETNRY(func) \
+__visible __entry_text void ist_##func(struct pt_regs *regs) \
+{ \
+ unsigned long cr3, gsbase; \
+ \
+ ist_paranoid_entry(&cr3, &gsbase); \
+ func(regs); \
+ ist_paranoid_exit(cr3, gsbase); \
+}
+
/**
* DEFINE_IDTENTRY_IST - Emit code for IST entry points
* @func: Function name of the entry point
--
2.19.1.6.gb485710b

2021-09-26 15:13:28

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 19/41] x86/entry: move PUSH_AND_CLEAR_REGS out of paranoid_entry

From: Lai Jiangshan <[email protected]>

It is prepared for converting the whole paranoid_entry() into C code.

No functional change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a0d73dc0d2f3..bd6bce341360 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -322,9 +322,6 @@ SYM_CODE_END(ret_from_fork)
*/
.macro idtentry_body cfunc has_error_code:req

- PUSH_AND_CLEAR_REGS
- ENCODE_FRAME_POINTER
-
movq %rsp, %rdi
call error_entry
movq %rax, %rsp /* switch stack settled by sync_regs() */
@@ -376,6 +373,9 @@ SYM_CODE_START(\asmsym)
.Lfrom_usermode_no_gap_\@:
.endif

+ PUSH_AND_CLEAR_REGS
+ ENCODE_FRAME_POINTER
+
idtentry_body \cfunc \has_error_code

_ASM_NOKPROBE(\asmsym)
@@ -427,11 +427,14 @@ SYM_CODE_START(\asmsym)

pushq $-1 /* ORIG_RAX: no syscall to restart */

+ PUSH_AND_CLEAR_REGS
+ ENCODE_FRAME_POINTER
+
/*
* If the entry is from userspace, switch stacks and treat it as
* a normal entry.
*/
- testb $3, CS-ORIG_RAX(%rsp)
+ testb $3, CS(%rsp)
jnz .Lfrom_usermode_switch_stack_\@

/* paranoid_entry returns GS information for paranoid_exit in EBX. */
@@ -481,11 +484,14 @@ SYM_CODE_START(\asmsym)
UNWIND_HINT_IRET_REGS
ASM_CLAC

+ PUSH_AND_CLEAR_REGS
+ ENCODE_FRAME_POINTER
+
/*
* If the entry is from userspace, switch stacks and treat it as
* a normal entry.
*/
- testb $3, CS-ORIG_RAX(%rsp)
+ testb $3, CS(%rsp)
jnz .Lfrom_usermode_switch_stack_\@

/*
@@ -543,6 +549,9 @@ SYM_CODE_START(\asmsym)
UNWIND_HINT_IRET_REGS offset=8
ASM_CLAC

+ PUSH_AND_CLEAR_REGS
+ ENCODE_FRAME_POINTER
+
/* paranoid_entry returns GS information for paranoid_exit in EBX. */
call paranoid_entry
UNWIND_HINT_REGS
@@ -855,8 +864,6 @@ SYM_CODE_END(xen_failsafe_callback)
SYM_CODE_START_LOCAL(paranoid_entry)
UNWIND_HINT_FUNC
cld
- PUSH_AND_CLEAR_REGS save_ret=1
- ENCODE_FRAME_POINTER 8

/*
* Always stash CR3 in %r14. This value will be restored,
@@ -1269,6 +1276,9 @@ end_repeat_nmi:
*/
pushq $-1 /* ORIG_RAX: no syscall to restart */

+ PUSH_AND_CLEAR_REGS
+ ENCODE_FRAME_POINTER
+
/*
* Use paranoid_entry to handle SWAPGS and CR3.
*/
--
2.19.1.6.gb485710b

2021-09-26 15:13:35

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 11/41] x86/entry: Add C user_entry_swapgs_and_fence() and kernel_entry_fence_no_swapgs()

From: Lai Jiangshan <[email protected]>

The C user_entry_swapgs_and_fence() implements the ASM code:
swapgs
FENCE_SWAPGS_USER_ENTRY

It will be used in the user entry swapgs code path, doing the swapgs and
lfence to prevent a speculative swapgs when coming from kernel space.

The C kernel_entry_fence_no_swapgs() implements the ASM code:
FENCE_SWAPGS_KERNEL_ENTRY

It will be used in the kernel entry non-swapgs code path to prevent the
swapgs from getting speculatively skipped when coming from user space.

Cc: Josh Poimboeuf <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 77838e19f1ac..dafae60d31f9 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -35,3 +35,24 @@ static __always_inline void switch_to_kernel_cr3(void)
#else
static __always_inline void switch_to_kernel_cr3(void) {}
#endif
+
+/*
+ * Mitigate Spectre v1 for conditional swapgs code paths.
+ *
+ * user_entry_swapgs_and_fence is used in the user entry swapgs code path,
+ * to prevent a speculative swapgs when coming from kernel space.
+ *
+ * kernel_entry_fence_no_swapgs is used in the kernel entry non-swapgs code
+ * path, to prevent the swapgs from getting speculatively skipped when coming
+ * from user space.
+ */
+static __always_inline void user_entry_swapgs_and_fence(void)
+{
+ native_swapgs();
+ alternative("", "lfence", X86_FEATURE_FENCE_SWAPGS_USER);
+}
+
+static __always_inline void kernel_entry_fence_no_swapgs(void)
+{
+ alternative("", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL);
+}
--
2.19.1.6.gb485710b

2021-09-26 15:13:43

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 16/41] x86/entry: Implement the whole error_entry() as C code

From: Lai Jiangshan <[email protected]>

All the needed facilities are set in entry64.c, the whole error_entry()
can be implemented in C in entry64.c. The C version generally has better
readability and easier to be updated/improved.

No function change intended. Only a check for X86_FEATURE_XENPV is added
because the new error_entry() does not use the pv SWAPGS, rather it uses
native_swapgs(). And for XENPV, error_entry() has nothing to do, so it
can return directly.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 76 ++++++++++++++++++++++++++++++++++
arch/x86/entry/entry_64.S | 80 +-----------------------------------
arch/x86/include/asm/traps.h | 1 +
3 files changed, 78 insertions(+), 79 deletions(-)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index dafae60d31f9..5f2be4c3f333 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -56,3 +56,78 @@ static __always_inline void kernel_entry_fence_no_swapgs(void)
{
alternative("", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL);
}
+
+/*
+ * Put pt_regs onto the task stack and switch GS and CR3 if needed.
+ * The actual stack switch is done in entry_64.S.
+ *
+ * Becareful, it might be in the user CR3 and user GS base at the start
+ * of the function.
+ */
+asmlinkage __visible __entry_text
+struct pt_regs *error_entry(struct pt_regs *eregs)
+{
+ unsigned long iret_ip = (unsigned long)native_irq_return_iret;
+
+ asm volatile ("cld");
+
+ /*
+ * When XENPV, it is already in the task stack, and it can't fault
+ * from native_irq_return_iret and asm_load_gs_index_gs_change()
+ * since XENPV uses its own pvops for iret and load_gs_index, and
+ * also it doesn't use PTI. So it can directly return and
+ * native_swapgs() can be used in the following code.
+ */
+ if (static_cpu_has(X86_FEATURE_XENPV))
+ return eregs;
+
+ if (user_mode(eregs)) {
+ /*
+ * We entered from user mode.
+ * Switch to kernel gsbase and CR3.
+ */
+ user_entry_swapgs_and_fence();
+ switch_to_kernel_cr3();
+
+ /* Put pt_regs onto the task stack. */
+ return sync_regs(eregs);
+ }
+
+ /*
+ * There are two places in the kernel that can potentially fault with
+ * usergs. Handle them here. B stepping K8s sometimes report a
+ * truncated RIP for IRET exceptions returning to compat mode. Check
+ * for these here too.
+ */
+ if ((eregs->ip == iret_ip) || (eregs->ip == (unsigned int)iret_ip)) {
+ eregs->ip = iret_ip; /* Fix truncated RIP */
+
+ /*
+ * We came from an IRET to user mode, so we have user
+ * gsbase and CR3. Switch to kernel gsbase and CR3:
+ */
+ user_entry_swapgs_and_fence();
+ switch_to_kernel_cr3();
+
+ /*
+ * Pretend that the exception came from user mode: set up
+ * pt_regs as if we faulted immediately after IRET and put
+ * pt_regs onto the real task stack.
+ */
+ return sync_regs(fixup_bad_iret(eregs));
+ }
+
+ /*
+ * Hack: asm_load_gs_index_gs_change can fail with user gsbase.
+ * If this happens, fix up gsbase and proceed. We'll fix up the
+ * exception and land in asm_load_gs_index_gs_change's error
+ * handler with kernel gsbase.
+ */
+ if (eregs->ip == (unsigned long)asm_load_gs_index_gs_change)
+ user_entry_swapgs_and_fence();
+ else
+ kernel_entry_fence_no_swapgs();
+
+ /* Enter from kernel, don't move pt_regs */
+ return eregs;
+}
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 757e7155670e..169ee14cc2d6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -325,6 +325,7 @@ SYM_CODE_END(ret_from_fork)
PUSH_AND_CLEAR_REGS
ENCODE_FRAME_POINTER

+ movq %rsp, %rdi
call error_entry
movq %rax, %rsp /* switch stack settled by sync_regs() */
ENCODE_FRAME_POINTER
@@ -964,85 +965,6 @@ SYM_CODE_START_LOCAL(paranoid_exit)
jmp restore_regs_and_return_to_kernel
SYM_CODE_END(paranoid_exit)

-/*
- * Save all registers in pt_regs, and switch GS if needed.
- */
-SYM_CODE_START_LOCAL(error_entry)
- UNWIND_HINT_FUNC
- cld
- testb $3, CS+8(%rsp)
- jz .Lerror_kernelspace
-
- /*
- * We entered from user mode or we're pretending to have entered
- * from user mode due to an IRET fault.
- */
- SWAPGS
- FENCE_SWAPGS_USER_ENTRY
- /* We have user CR3. Change to kernel CR3. */
- SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
-
- leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
-.Lerror_entry_from_usermode_after_swapgs:
- /* Put us onto the real thread stack. */
- call sync_regs
- ret
-
-.Lerror_entry_done_lfence:
- FENCE_SWAPGS_KERNEL_ENTRY
-.Lerror_entry_done:
- leaq 8(%rsp), %rax /* return pt_regs pointer */
- ret
-
- /*
- * There are two places in the kernel that can potentially fault with
- * usergs. Handle them here. B stepping K8s sometimes report a
- * truncated RIP for IRET exceptions returning to compat mode. Check
- * for these here too.
- */
-.Lerror_kernelspace:
- leaq native_irq_return_iret(%rip), %rcx
- cmpq %rcx, RIP+8(%rsp)
- je .Lerror_bad_iret
- movl %ecx, %eax /* zero extend */
- cmpq %rax, RIP+8(%rsp)
- je .Lbstep_iret
- cmpq $asm_load_gs_index_gs_change, RIP+8(%rsp)
- jne .Lerror_entry_done_lfence
-
- /*
- * hack: .Lgs_change can fail with user gsbase. If this happens, fix up
- * gsbase and proceed. We'll fix up the exception and land in
- * .Lgs_change's error handler with kernel gsbase.
- */
- SWAPGS
- FENCE_SWAPGS_USER_ENTRY
- jmp .Lerror_entry_done
-
-.Lbstep_iret:
- /* Fix truncated RIP */
- movq %rcx, RIP+8(%rsp)
- /* fall through */
-
-.Lerror_bad_iret:
- /*
- * We came from an IRET to user mode, so we have user
- * gsbase and CR3. Switch to kernel gsbase and CR3:
- */
- SWAPGS
- FENCE_SWAPGS_USER_ENTRY
- SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
-
- /*
- * Pretend that the exception came from user mode: set up pt_regs
- * as if we faulted immediately after IRET.
- */
- leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
- call fixup_bad_iret
- mov %rax, %rdi
- jmp .Lerror_entry_from_usermode_after_swapgs
-SYM_CODE_END(error_entry)
-
SYM_CODE_START_LOCAL(error_return)
UNWIND_HINT_REGS
DEBUG_ENTRY_ASSERT_IRQS_OFF
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 1cdd7e8bcba7..686461ac9803 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -14,6 +14,7 @@
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs);
asmlinkage __visible notrace
struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
+asmlinkage __visible notrace struct pt_regs *error_entry(struct pt_regs *eregs);
void __init trap_init(void);
asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
#endif
--
2.19.1.6.gb485710b

2021-09-26 15:13:47

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 21/41] x86/entry: Add the C version ist_restore_cr3()

From: Lai Jiangshan <[email protected]>

It implements the C version of RESTORE_CR3().

Not functional difference intended except the ASM code uses bit test
and clear operations while the C version uses mask check and 'AND'
operations. The resulted asm code of both versions are very similar.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 46 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index faee44a3d1d8..2db9ae3508f1 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -8,6 +8,7 @@
* environments that the GS base is user controlled value, or the CR3
* is PTI user CR3 or both.
*/
+#include <asm/tlbflush.h>
#include <asm/traps.h>

extern unsigned char asm_load_gs_index_gs_change[];
@@ -27,6 +28,26 @@ static __always_inline void pti_switch_to_kernel_cr3(unsigned long user_cr3)
native_write_cr3(cr3);
}

+static __always_inline void pti_switch_to_user_cr3(unsigned long user_cr3)
+{
+#define KERN_PCID_MASK (CR3_PCID_MASK & ~PTI_USER_PCID_MASK)
+
+ if (static_cpu_has(X86_FEATURE_PCID)) {
+ int pcid = user_cr3 & KERN_PCID_MASK;
+ unsigned short pcid_mask = 1ull << pcid;
+
+ /*
+ * Check if there's a pending flush for the user ASID we're
+ * about to set.
+ */
+ if (!(this_cpu_read(cpu_tlbstate.user_pcid_flush_mask) & pcid_mask))
+ user_cr3 |= X86_CR3_PCID_NOFLUSH;
+ else
+ this_cpu_and(cpu_tlbstate.user_pcid_flush_mask, ~pcid_mask);
+ }
+ native_write_cr3(user_cr3);
+}
+
static __always_inline void switch_to_kernel_cr3(void)
{
if (static_cpu_has(X86_FEATURE_PTI))
@@ -46,9 +67,34 @@ static __always_inline unsigned long ist_switch_to_kernel_cr3(void)

return cr3;
}
+
+static __always_inline void ist_restore_cr3(unsigned long cr3)
+{
+ if (!static_cpu_has(X86_FEATURE_PTI))
+ return;
+
+ if (unlikely(cr3 & PTI_USER_PGTABLE_MASK)) {
+ pti_switch_to_user_cr3(cr3);
+ return;
+ }
+
+ /*
+ * KERNEL pages can always resume with NOFLUSH as we do
+ * explicit flushes.
+ */
+ if (static_cpu_has(X86_FEATURE_PCID))
+ cr3 |= X86_CR3_PCID_NOFLUSH;
+
+ /*
+ * The CR3 write could be avoided when not changing its value,
+ * but would require a CR3 read.
+ */
+ native_write_cr3(cr3);
+}
#else
static __always_inline void switch_to_kernel_cr3(void) {}
static __always_inline unsigned long ist_switch_to_kernel_cr3(void) { return 0; }
+static __always_inline void ist_restore_cr3(unsigned long cr3) {}
#endif

/*
--
2.19.1.6.gb485710b

2021-09-26 15:14:04

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 14/41] x86/entry: move PUSH_AND_CLEAR_REGS out of error_entry

From: Lai Jiangshan <[email protected]>

Moving PUSH_AND_CLEAR_REGS out of error_entry doesn't change any
functionality. It will enlarge the size:

size arch/x86/entry/entry_64.o.before:
text data bss dec hex filename
17916 384 0 18300 477c arch/x86/entry/entry_64.o

size --format=SysV arch/x86/entry/entry_64.o.before:
.entry.text 5528 0
.orc_unwind 6456 0
.orc_unwind_ip 4304 0

size arch/x86/entry/entry_64.o.after:
text data bss dec hex filename
26868 384 0 27252 6a74 arch/x86/entry/entry_64.o

size --format=SysV arch/x86/entry/entry_64.o.after:
.entry.text 8200 0
.orc_unwind 10224 0
.orc_unwind_ip 6816 0

But .entry.text in x86_64 is 2M aligned, enlarging it to 8.2k doesn't
enlarge the final text size.

The tables .orc_unwind[_ip] are enlarged due to it adds many pushes.

It is prepared for converting the whole error_entry into C code.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index dd453a8e7317..757e7155670e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -322,6 +322,9 @@ SYM_CODE_END(ret_from_fork)
*/
.macro idtentry_body cfunc has_error_code:req

+ PUSH_AND_CLEAR_REGS
+ ENCODE_FRAME_POINTER
+
call error_entry
movq %rax, %rsp /* switch stack settled by sync_regs() */
ENCODE_FRAME_POINTER
@@ -967,8 +970,6 @@ SYM_CODE_END(paranoid_exit)
SYM_CODE_START_LOCAL(error_entry)
UNWIND_HINT_FUNC
cld
- PUSH_AND_CLEAR_REGS save_ret=1
- ENCODE_FRAME_POINTER 8
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace

--
2.19.1.6.gb485710b

2021-09-26 15:14:07

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 07/41] x86/traps: Move the declaration of native_irq_return_iret into proto.h

From: Lai Jiangshan <[email protected]>

The declaration of native_irq_return_iret is used in exc_double_fault()
only by now. But it will be used in other place later, so the declaration
is moved to a header file for preparation.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/proto.h | 1 +
arch/x86/kernel/traps.c | 2 --
2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 8c5d1910a848..ee07b3cae213 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -13,6 +13,7 @@ void syscall_init(void);
#ifdef CONFIG_X86_64
void entry_SYSCALL_64(void);
void entry_SYSCALL_64_safe_stack(void);
+extern unsigned char native_irq_return_iret[];
long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2);
#endif

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index cc6de3a01293..cf852b5e347f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -359,8 +359,6 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
#endif

#ifdef CONFIG_X86_ESPFIX64
- extern unsigned char native_irq_return_iret[];
-
/*
* If IRET takes a non-IST fault on the espfix64 stack, then we
* end up promoting it to a doublefault. In that case, take
--
2.19.1.6.gb485710b

2021-09-26 15:14:26

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 27/41] x86/mce: Remove stack protector from mce/core.c

From: Lai Jiangshan <[email protected]>

mce/core.c is going to contain __entry_code which can not be instrumented
by stack protector.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kernel/cpu/mce/Makefile | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/Makefile b/arch/x86/kernel/cpu/mce/Makefile
index 015856abdbb1..ce192c5344fc 100644
--- a/arch/x86/kernel/cpu/mce/Makefile
+++ b/arch/x86/kernel/cpu/mce/Makefile
@@ -1,4 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
+
+CFLAGS_REMOVE_core.o = -fstack-protector -fstack-protector-strong
+CFLAGS_core.o += -fno-stack-protector
+
obj-y = core.o severity.o genpool.o

obj-$(CONFIG_X86_ANCIENT_MCE) += winchip.o p5.o
--
2.19.1.6.gb485710b

2021-09-26 15:14:29

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 24/41] x86/entry: Implement the C version ist_paranoid_entry()

From: Lai Jiangshan <[email protected]>

It implements the whole ASM version paranoid_entry().

No functional difference intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 39 +++++++++++++++++++++++++++++++++
arch/x86/include/asm/idtentry.h | 3 +++
2 files changed, 42 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 1a0d5d703ad6..67f13aebd948 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -273,3 +273,42 @@ static __always_inline unsigned long ist_switch_to_kernel_gsbase(void)
/* SWAPGS required on exit */
return 0;
}
+
+/*
+ * Switch and save CR3 in *@cr3 if PTI enabled. Return GSBASE related
+ * information in *@gsbase depending on the availability of the FSGSBASE
+ * instructions:
+ *
+ * FSGSBASE *@gsbase
+ * N 0 -> SWAPGS on exit
+ * 1 -> no SWAPGS on exit
+ *
+ * Y GSBASE value at entry, must be restored in ist_paranoid_exit
+ */
+__visible __entry_text
+void ist_paranoid_entry(unsigned long *cr3, unsigned long *gsbase)
+{
+ asm volatile ("cld");
+
+ /*
+ * Always stash CR3 in *@cr3. This value will be restored,
+ * verbatim, at exit. Needed if ist_paranoid_entry interrupted
+ * another entry that already switched to the user CR3 value
+ * but has not yet returned to userspace.
+ *
+ * This is also why CS (stashed in the "iret frame" by the
+ * hardware at entry) can not be used: this may be a return
+ * to kernel code, but with a user CR3 value.
+ *
+ * Switching CR3 does not depend on kernel GSBASE so it can
+ * be done before switching to the kernel GSBASE. This is
+ * required for FSGSBASE because the kernel GSBASE has to
+ * be retrieved from a kernel internal table.
+ */
+ *cr3 = ist_switch_to_kernel_cr3();
+
+ barrier();
+
+ /* Handle GSBASE, store the return value in *@gsbase for exit. */
+ *gsbase = ist_switch_to_kernel_gsbase();
+}
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 6779def97591..fa8d73cfd8d6 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -293,6 +293,9 @@ static __always_inline void __##func(struct pt_regs *regs)
DECLARE_IDTENTRY(vector, func)

#ifdef CONFIG_X86_64
+__visible __entry_text
+void ist_paranoid_entry(unsigned long *cr3, unsigned long *gsbase);
+
/**
* DECLARE_IDTENTRY_IST - Declare functions for IST handling IDT entry points
* @vector: Vector number (ignored for C)
--
2.19.1.6.gb485710b

2021-09-26 15:14:38

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 29/41] x86/idtentry.h: Move the definitions *IDTENTRY_{MCE|DEBUG}* up

From: Lai Jiangshan <[email protected]>

Move them closer to the related definitions and reduce a #ifdef entry.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/idtentry.h | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index babe530cfa77..49c0ebe374ae 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -358,6 +358,14 @@ __visible __entry_text void ist_##func(struct pt_regs *regs) \
#define DEFINE_IDTENTRY_NOIST(func) \
DEFINE_IDTENTRY_RAW(noist_##func)

+#define DECLARE_IDTENTRY_MCE DECLARE_IDTENTRY_IST
+#define DEFINE_IDTENTRY_MCE DEFINE_IDTENTRY_IST
+#define DEFINE_IDTENTRY_MCE_USER DEFINE_IDTENTRY_NOIST
+
+#define DECLARE_IDTENTRY_DEBUG DECLARE_IDTENTRY_IST
+#define DEFINE_IDTENTRY_DEBUG DEFINE_IDTENTRY_IST
+#define DEFINE_IDTENTRY_DEBUG_USER DEFINE_IDTENTRY_NOIST
+
/**
* DECLARE_IDTENTRY_DF - Declare functions for double fault
* @vector: Vector number (ignored for C)
@@ -432,16 +440,6 @@ __visible noinstr void func(struct pt_regs *regs, \
#define DECLARE_IDTENTRY_NMI DECLARE_IDTENTRY_RAW
#define DEFINE_IDTENTRY_NMI DEFINE_IDTENTRY_RAW

-#ifdef CONFIG_X86_64
-#define DECLARE_IDTENTRY_MCE DECLARE_IDTENTRY_IST
-#define DEFINE_IDTENTRY_MCE DEFINE_IDTENTRY_IST
-#define DEFINE_IDTENTRY_MCE_USER DEFINE_IDTENTRY_NOIST
-
-#define DECLARE_IDTENTRY_DEBUG DECLARE_IDTENTRY_IST
-#define DEFINE_IDTENTRY_DEBUG DEFINE_IDTENTRY_IST
-#define DEFINE_IDTENTRY_DEBUG_USER DEFINE_IDTENTRY_NOIST
-#endif
-
#else /* !__ASSEMBLY__ */

/*
--
2.19.1.6.gb485710b

2021-09-26 15:14:49

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 30/41] x86/nmi: Use DEFINE_IDTENTRY_NMI for nmi

From: Lai Jiangshan <[email protected]>

DEFINE_IDTENTRY_NMI is defined, but not used. It is better to use it.

It is also prepared for later patch to define DEFINE_IDTENTRY_NMI
differently in 32bit and 64bit.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kernel/nmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 4bce802d25fb..44c3adb68282 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -473,7 +473,7 @@ static DEFINE_PER_CPU(enum nmi_states, nmi_state);
static DEFINE_PER_CPU(unsigned long, nmi_cr2);
static DEFINE_PER_CPU(unsigned long, nmi_dr7);

-DEFINE_IDTENTRY_RAW(exc_nmi)
+DEFINE_IDTENTRY_NMI(exc_nmi)
{
irqentry_state_t irq_state;

--
2.19.1.6.gb485710b

2021-09-26 15:14:50

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 33/41] x86/entry: Add a C macro to define the function body for IST in .entry.text with an error code

From: Lai Jiangshan <[email protected]>

Add DEFINE_IDTENTRY_IST_ETNRY_ERRORCODE() macro to define C code to
implement the ASM code which calls paranoid_entry(), modify orig_ax,
cfunc(), paranoid_exit() in series for IST exceptions with an error code.

Not functional difference intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/idtentry.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index c99c58bc179a..7935b0abc65d 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -337,6 +337,22 @@ __visible __entry_text void ist_##func(struct pt_regs *regs) \
ist_paranoid_exit(cr3, gsbase); \
}

+/**
+ * DEFINE_IDTENTRY_IST_ENTRY_ERRORCODE - Emit __entry_text code for IST
+ * entry points with an error code
+ * @func: Function name of the entry point
+ */
+#define DEFINE_IDTENTRY_IST_ETNRY_ERRORCODE(func) \
+__visible __entry_text void ist_##func(struct pt_regs *regs) \
+{ \
+ unsigned long cr3, gsbase, error_code = regs->orig_ax; \
+ \
+ ist_paranoid_entry(&cr3, &gsbase); \
+ regs->orig_ax = -1; /* no syscall to restart */ \
+ func(regs, error_code); \
+ ist_paranoid_exit(cr3, gsbase); \
+}
+
/**
* DEFINE_IDTENTRY_IST - Emit code for IST entry points
* @func: Function name of the entry point
--
2.19.1.6.gb485710b

2021-09-26 15:14:57

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 31/41] x86/nmi: Remove stack protector from nmi.c

From: Lai Jiangshan <[email protected]>

nmi.c is going to contain __entry_code which can not be instrumented
by stack protector.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kernel/Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0e054e2304c6..f56e8088c85d 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -50,6 +50,8 @@ CFLAGS_head$(BITS).o += -fno-stack-protector

CFLAGS_REMOVE_traps.o = -fstack-protector -fstack-protector-strong
CFLAGS_traps.o += -fno-stack-protector
+CFLAGS_REMOVE_nmi.o = -fstack-protector -fstack-protector-strong
+CFLAGS_nmi.o += -fno-stack-protector

CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace

--
2.19.1.6.gb485710b

2021-09-26 15:15:25

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 18/41] x86/entry: Call paranoid_exit() in asm_exc_nmi()

From: Lai Jiangshan <[email protected]>

The code between "call exc_nmi" and nmi_restore is as the same as
paranoid_exit(), so we can just use paranoid_exit() instead of the open
duplicated code.

No functional change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 34 +++++-----------------------------
1 file changed, 5 insertions(+), 29 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 202253c9a4f2..a0d73dc0d2f3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -922,8 +922,7 @@ SYM_CODE_END(paranoid_entry)

/*
* "Paranoid" exit path from exception stack. This is invoked
- * only on return from non-NMI IST interrupts that came
- * from kernel space.
+ * only on return from IST interrupts that came from kernel space.
*
* We may be returning to very strange contexts (e.g. very early
* in syscall entry), so checking for preemption here would
@@ -1271,11 +1270,7 @@ end_repeat_nmi:
pushq $-1 /* ORIG_RAX: no syscall to restart */

/*
- * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
- * as we should not be calling schedule in NMI context.
- * Even with normal interrupts enabled. An NMI should not be
- * setting NEED_RESCHED or anything that normal interrupts and
- * exceptions might do.
+ * Use paranoid_entry to handle SWAPGS and CR3.
*/
call paranoid_entry
UNWIND_HINT_REGS
@@ -1284,31 +1279,12 @@ end_repeat_nmi:
movq $-1, %rsi
call exc_nmi

- /* Always restore stashed CR3 value (see paranoid_entry) */
- RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
-
/*
- * The above invocation of paranoid_entry stored the GSBASE
- * related information in R/EBX depending on the availability
- * of FSGSBASE.
- *
- * If FSGSBASE is enabled, restore the saved GSBASE value
- * unconditionally, otherwise take the conditional SWAPGS path.
+ * Use paranoid_exit to handle SWAPGS and CR3, but no need to use
+ * restore_regs_and_return_to_kernel as we must handle nested NMI.
*/
- ALTERNATIVE "jmp nmi_no_fsgsbase", "", X86_FEATURE_FSGSBASE
-
- wrgsbase %rbx
- jmp nmi_restore
-
-nmi_no_fsgsbase:
- /* EBX == 0 -> invoke SWAPGS */
- testl %ebx, %ebx
- jnz nmi_restore
-
-nmi_swapgs:
- swapgs
+ call paranoid_exit

-nmi_restore:
POP_REGS

/*
--
2.19.1.6.gb485710b

2021-09-26 15:15:25

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 32/41] x86/nmi: Use C entry code

From: Lai Jiangshan <[email protected]>

Use DEFINE_IDTENTRY_IST_ETNRY to emit C entry function and use the function
directly in entry_64.S.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 17 ++---------------
arch/x86/include/asm/idtentry.h | 5 ++++-
2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0ba788bb9857..72a1610bb540 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1271,21 +1271,8 @@ end_repeat_nmi:
PUSH_AND_CLEAR_REGS
ENCODE_FRAME_POINTER

- /*
- * Use paranoid_entry to handle SWAPGS and CR3.
- */
- call paranoid_entry
- UNWIND_HINT_REGS
-
- movq %rsp, %rdi
- movq $-1, %rsi
- call exc_nmi
-
- /*
- * Use paranoid_exit to handle SWAPGS and CR3, but no need to use
- * restore_regs_and_return_to_kernel as we must handle nested NMI.
- */
- call paranoid_exit
+ movq %rsp, %rdi /* pt_regs pointer */
+ call ist_exc_nmi

POP_REGS

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 49c0ebe374ae..c99c58bc179a 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -358,6 +358,8 @@ __visible __entry_text void ist_##func(struct pt_regs *regs) \
#define DEFINE_IDTENTRY_NOIST(func) \
DEFINE_IDTENTRY_RAW(noist_##func)

+#define DEFINE_IDTENTRY_NMI DEFINE_IDTENTRY_IST
+
#define DECLARE_IDTENTRY_MCE DECLARE_IDTENTRY_IST
#define DEFINE_IDTENTRY_MCE DEFINE_IDTENTRY_IST
#define DEFINE_IDTENTRY_MCE_USER DEFINE_IDTENTRY_NOIST
@@ -407,6 +409,8 @@ __visible __entry_text void ist_##func(struct pt_regs *regs) \

#else /* CONFIG_X86_64 */

+#define DEFINE_IDTENTRY_NMI DEFINE_IDTENTRY_RAW
+
/**
* DECLARE_IDTENTRY_DF - Declare functions for double fault 32bit variant
* @vector: Vector number (ignored for C)
@@ -438,7 +442,6 @@ __visible noinstr void func(struct pt_regs *regs, \

/* C-Code mapping */
#define DECLARE_IDTENTRY_NMI DECLARE_IDTENTRY_RAW
-#define DEFINE_IDTENTRY_NMI DEFINE_IDTENTRY_RAW

#else /* !__ASSEMBLY__ */

--
2.19.1.6.gb485710b

2021-09-26 15:15:27

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 17/41] x86/entry: Make paranoid_exit() callable

From: Lai Jiangshan <[email protected]>

Move the last JMP out of paranoid_exit() and make it callable.

Allow paranoid_exit() to be re-written in C later and also allow
asm_exc_nmi() to call it to avoid duplicated code.

No functional change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 169ee14cc2d6..202253c9a4f2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -443,7 +443,8 @@ SYM_CODE_START(\asmsym)

call \cfunc

- jmp paranoid_exit
+ call paranoid_exit
+ jmp restore_regs_and_return_to_kernel

/* Switch to the regular task stack and use the noist entry point */
.Lfrom_usermode_switch_stack_\@:
@@ -520,7 +521,8 @@ SYM_CODE_START(\asmsym)
* identical to the stack in the IRET frame or the VC fall-back stack,
* so it is definitely mapped even with PTI enabled.
*/
- jmp paranoid_exit
+ call paranoid_exit
+ jmp restore_regs_and_return_to_kernel

/* Switch to the regular task stack */
.Lfrom_usermode_switch_stack_\@:
@@ -550,7 +552,8 @@ SYM_CODE_START(\asmsym)
movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
call \cfunc

- jmp paranoid_exit
+ call paranoid_exit
+ jmp restore_regs_and_return_to_kernel

_ASM_NOKPROBE(\asmsym)
SYM_CODE_END(\asmsym)
@@ -937,7 +940,7 @@ SYM_CODE_END(paranoid_entry)
* Y User space GSBASE, must be restored unconditionally
*/
SYM_CODE_START_LOCAL(paranoid_exit)
- UNWIND_HINT_REGS
+ UNWIND_HINT_REGS offset=8
/*
* The order of operations is important. RESTORE_CR3 requires
* kernel GSBASE.
@@ -953,16 +956,17 @@ SYM_CODE_START_LOCAL(paranoid_exit)

/* With FSGSBASE enabled, unconditionally restore GSBASE */
wrgsbase %rbx
- jmp restore_regs_and_return_to_kernel
+ ret

.Lparanoid_exit_checkgs:
/* On non-FSGSBASE systems, conditionally do SWAPGS */
testl %ebx, %ebx
- jnz restore_regs_and_return_to_kernel
+ jnz .Lparanoid_exit_done

/* We are returning to a context with user GSBASE */
swapgs
- jmp restore_regs_and_return_to_kernel
+.Lparanoid_exit_done:
+ ret
SYM_CODE_END(paranoid_exit)

SYM_CODE_START_LOCAL(error_return)
--
2.19.1.6.gb485710b

2021-09-26 15:15:38

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 34/41] x86/doublefault: Use C entry code

From: Lai Jiangshan <[email protected]>

Use DEFINE_IDTENTRY_IST_ETNRY_ERRORCODE to emit C entry function and
use the function directly in entry_64.S.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 12 ++----------
arch/x86/include/asm/idtentry.h | 1 +
2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 72a1610bb540..db108f8cd554 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -544,16 +544,8 @@ SYM_CODE_START(\asmsym)
PUSH_AND_CLEAR_REGS
ENCODE_FRAME_POINTER

- /* paranoid_entry returns GS information for paranoid_exit in EBX. */
- call paranoid_entry
- UNWIND_HINT_REGS
-
- movq %rsp, %rdi /* pt_regs pointer into first argument */
- movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/
- movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
- call \cfunc
-
- call paranoid_exit
+ movq %rsp, %rdi /* pt_regs pointer */
+ call ist_\cfunc
jmp restore_regs_and_return_to_kernel

_ASM_NOKPROBE(\asmsym)
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 7935b0abc65d..99e1ae3f5c7d 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -401,6 +401,7 @@ __visible __entry_text void ist_##func(struct pt_regs *regs) \
* Maps to DEFINE_IDTENTRY_RAW_ERRORCODE
*/
#define DEFINE_IDTENTRY_DF(func) \
+ DEFINE_IDTENTRY_IST_ETNRY_ERRORCODE(func) \
DEFINE_IDTENTRY_RAW_ERRORCODE(func)

/**
--
2.19.1.6.gb485710b

2021-09-26 15:15:46

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 20/41] x86/entry: Add the C version ist_switch_to_kernel_cr3()

From: Lai Jiangshan <[email protected]>

It switches the CR3 to kernel CR3 and returns the original CR3, and
the caller should save the return value.

It is the C version of SAVE_AND_SWITCH_TO_KERNEL_CR3.

Not functional difference intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 5f2be4c3f333..faee44a3d1d8 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -32,8 +32,23 @@ static __always_inline void switch_to_kernel_cr3(void)
if (static_cpu_has(X86_FEATURE_PTI))
pti_switch_to_kernel_cr3(__native_read_cr3());
}
+
+static __always_inline unsigned long ist_switch_to_kernel_cr3(void)
+{
+ unsigned long cr3 = 0;
+
+ if (static_cpu_has(X86_FEATURE_PTI)) {
+ cr3 = __native_read_cr3();
+
+ if (cr3 & PTI_USER_PGTABLE_MASK)
+ pti_switch_to_kernel_cr3(cr3);
+ }
+
+ return cr3;
+}
#else
static __always_inline void switch_to_kernel_cr3(void) {}
+static __always_inline unsigned long ist_switch_to_kernel_cr3(void) { return 0; }
#endif

/*
--
2.19.1.6.gb485710b

2021-09-26 15:15:48

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 22/41] x86/entry: Add the C version get_percpu_base()

From: Lai Jiangshan <[email protected]>

It implements the C version of asm macro GET_PERCPU_BASE().

Not functional difference intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 2db9ae3508f1..b939b56d985d 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -193,3 +193,39 @@ struct pt_regs *error_entry(struct pt_regs *eregs)
/* Enter from kernel, don't move pt_regs */
return eregs;
}
+
+#ifdef CONFIG_SMP
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ *
+ * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
+ * may not restore the host's value until the CPU returns to userspace.
+ * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
+ * while running KVM's run loop.
+ */
+static __always_inline unsigned int gdt_get_cpu(void)
+{
+ unsigned int p;
+
+ asm ("lsl %[seg],%[p]" : [p] "=a" (p) : [seg] "r" (__CPUNODE_SEG));
+
+ return p & VDSO_CPUNODE_MASK;
+}
+
+/*
+ * Fetch the per-CPU GSBASE value for this processor.
+ *
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+static __always_inline unsigned long get_percpu_base(void)
+{
+ return __per_cpu_offset[gdt_get_cpu()];
+}
+#else
+static __always_inline unsigned long get_percpu_base(void)
+{
+ return pcpu_unit_offsets;
+}
+#endif
--
2.19.1.6.gb485710b

2021-09-26 15:15:59

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 25/41] x86/entry: Implement the C version ist_paranoid_exit()

From: Lai Jiangshan <[email protected]>

It implements the whole ASM version paranoid_exit().

No functional difference intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 40 +++++++++++++++++++++++++++++++++
arch/x86/include/asm/idtentry.h | 2 ++
2 files changed, 42 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index 67f13aebd948..017a7f94e3a4 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -312,3 +312,43 @@ void ist_paranoid_entry(unsigned long *cr3, unsigned long *gsbase)
/* Handle GSBASE, store the return value in *@gsbase for exit. */
*gsbase = ist_switch_to_kernel_gsbase();
}
+
+/*
+ * "Paranoid" exit path from exception stack. This is invoked
+ * only on return from IST interrupts that came from kernel space.
+ *
+ * We may be returning to very strange contexts (e.g. very early
+ * in syscall entry), so checking for preemption here would
+ * be complicated. Fortunately, there's no good reason to try
+ * to handle preemption here.
+ */
+__visible __entry_text
+void ist_paranoid_exit(unsigned long cr3, unsigned long gsbase)
+{
+ /*
+ * Restore CR3 at first, it can use kernel GSBASE.
+ */
+ ist_restore_cr3(cr3);
+
+ barrier();
+
+ /*
+ * Handle the three GSBASE cases.
+ *
+ * @gsbase contains the GSBASE related information depending
+ * on the availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE @gsbase
+ * N 0 -> SWAPGS on exit
+ * 1 -> no SWAPGS on exit
+ *
+ * Y User space GSBASE, must be restored unconditionally
+ */
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ wrgsbase(gsbase);
+ return;
+ }
+
+ if (gsbase)
+ native_swapgs();
+}
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index fa8d73cfd8d6..b144ea05b859 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -295,6 +295,8 @@ static __always_inline void __##func(struct pt_regs *regs)
#ifdef CONFIG_X86_64
__visible __entry_text
void ist_paranoid_entry(unsigned long *cr3, unsigned long *gsbase);
+__visible __entry_text
+void ist_paranoid_exit(unsigned long cr3, unsigned long gsbase);

/**
* DECLARE_IDTENTRY_IST - Declare functions for IST handling IDT entry points
--
2.19.1.6.gb485710b

2021-09-26 15:15:59

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 35/41] x86/sev: Add and use ist_vc_switch_off_ist()

From: Lai Jiangshan <[email protected]>

ist_vc_switch_off_ist() is the same as vc_switch_off_ist(), but it is
called without CR3 or gsbase fixed. It has to call ist_paranoid_entry()
by its own.

It is prepared for using C code for the other part of identry_vc and
remove ASM paranoid_entry() and paranoid_exit().

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 20 ++++++++++----------
arch/x86/include/asm/traps.h | 3 ++-
arch/x86/kernel/traps.c | 14 +++++++++++++-
3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index db108f8cd554..8871f8ccf117 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -486,26 +486,26 @@ SYM_CODE_START(\asmsym)
testb $3, CS(%rsp)
jnz .Lfrom_usermode_switch_stack_\@

- /*
- * paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
- * EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
- */
- call paranoid_entry
-
- UNWIND_HINT_REGS
-
/*
* Switch off the IST stack to make it free for nested exceptions. The
- * vc_switch_off_ist() function will switch back to the interrupted
+ * ist_vc_switch_off_ist() function will switch back to the interrupted
* stack if it is safe to do so. If not it switches to the VC fall-back
* stack.
*/
movq %rsp, %rdi /* pt_regs pointer */
- call vc_switch_off_ist
+ call ist_vc_switch_off_ist
movq %rax, %rsp /* Switch to new stack */

UNWIND_HINT_REGS

+ /*
+ * paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
+ * EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
+ */
+ call paranoid_entry
+
+ UNWIND_HINT_REGS
+
/* Update pt_regs */
movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/
movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 686461ac9803..1aefc081d763 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -16,7 +16,8 @@ asmlinkage __visible notrace
struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
asmlinkage __visible notrace struct pt_regs *error_entry(struct pt_regs *eregs);
void __init trap_init(void);
-asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
+asmlinkage __visible __entry_text
+struct pt_regs *ist_vc_switch_off_ist(struct pt_regs *eregs);
#endif

#ifdef CONFIG_X86_F00F_BUG
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 0afa16ea3702..03347db4c2c4 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -717,7 +717,7 @@ asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
}

#ifdef CONFIG_AMD_MEM_ENCRYPT
-asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *regs)
+static noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *regs)
{
unsigned long sp, *stack;
struct stack_info info;
@@ -757,6 +757,18 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r

return regs_ret;
}
+
+asmlinkage __visible __entry_text
+struct pt_regs *ist_vc_switch_off_ist(struct pt_regs *regs)
+{
+ unsigned long cr3, gsbase;
+
+ ist_paranoid_entry(&cr3, &gsbase);
+ regs = vc_switch_off_ist(regs);
+ ist_paranoid_exit(cr3, gsbase);
+
+ return regs;
+}
#endif

asmlinkage __visible noinstr
--
2.19.1.6.gb485710b

2021-09-26 15:16:09

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 36/41] x86/sev: Remove stack protector from sev.c

From: Lai Jiangshan <[email protected]>

sev.c is going to contain __entry_code which can not be instrumented
by stack protector.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kernel/Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index f56e8088c85d..88bbfeeab929 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -52,6 +52,8 @@ CFLAGS_REMOVE_traps.o = -fstack-protector -fstack-protector-strong
CFLAGS_traps.o += -fno-stack-protector
CFLAGS_REMOVE_nmi.o = -fstack-protector -fstack-protector-strong
CFLAGS_nmi.o += -fno-stack-protector
+CFLAGS_REMOVE_sev.o = -fstack-protector -fstack-protector-strong
+CFLAGS_sev.o += -fno-stack-protector

CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace

--
2.19.1.6.gb485710b

2021-09-26 15:16:09

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 37/41] x86/sev: Use C entry code

From: Lai Jiangshan <[email protected]>

Use DEFINE_IDTENTRY_IST_ETNRY_ERRORCODE to emit C entry function and
use the function directly in entry_64.S.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 22 +---------------------
arch/x86/include/asm/idtentry.h | 1 +
2 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 8871f8ccf117..63cafeeaf27d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -498,28 +498,8 @@ SYM_CODE_START(\asmsym)

UNWIND_HINT_REGS

- /*
- * paranoid_entry returns SWAPGS flag for paranoid_exit in EBX.
- * EBX == 0 -> SWAPGS, EBX == 1 -> no SWAPGS
- */
- call paranoid_entry
-
- UNWIND_HINT_REGS
-
- /* Update pt_regs */
- movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/
- movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */
-
movq %rsp, %rdi /* pt_regs pointer */
-
- call kernel_\cfunc
-
- /*
- * No need to switch back to the IST stack. The current stack is either
- * identical to the stack in the IRET frame or the VC fall-back stack,
- * so it is definitely mapped even with PTI enabled.
- */
- call paranoid_exit
+ call ist_kernel_\cfunc
jmp restore_regs_and_return_to_kernel

/* Switch to the regular task stack */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 99e1ae3f5c7d..c8837bb3991f 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -412,6 +412,7 @@ __visible __entry_text void ist_##func(struct pt_regs *regs) \
* Maps to DEFINE_IDTENTRY_RAW_ERRORCODE
*/
#define DEFINE_IDTENTRY_VC_KERNEL(func) \
+ DEFINE_IDTENTRY_IST_ETNRY_ERRORCODE(kernel_##func) \
DEFINE_IDTENTRY_RAW_ERRORCODE(kernel_##func)

/**
--
2.19.1.6.gb485710b

2021-09-26 15:16:10

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 38/41] x86/entry: Remove ASM function paranoid_entry() and paranoid_exit()

From: Lai Jiangshan <[email protected]>

IST exceptions are changed to use C entry code which uses the C function
ist_paranoid_entry() and ist_paranoid_exit(). The ASM function
paranoid_entry() and paranoid_exit() are useless.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 124 --------------------------------------
1 file changed, 124 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 63cafeeaf27d..260be3c9da7d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -815,130 +815,6 @@ SYM_CODE_START(xen_failsafe_callback)
SYM_CODE_END(xen_failsafe_callback)
#endif /* CONFIG_XEN_PV */

-/*
- * Save all registers in pt_regs. Return GSBASE related information
- * in EBX depending on the availability of the FSGSBASE instructions:
- *
- * FSGSBASE R/EBX
- * N 0 -> SWAPGS on exit
- * 1 -> no SWAPGS on exit
- *
- * Y GSBASE value at entry, must be restored in paranoid_exit
- */
-SYM_CODE_START_LOCAL(paranoid_entry)
- UNWIND_HINT_FUNC
- cld
-
- /*
- * Always stash CR3 in %r14. This value will be restored,
- * verbatim, at exit. Needed if paranoid_entry interrupted
- * another entry that already switched to the user CR3 value
- * but has not yet returned to userspace.
- *
- * This is also why CS (stashed in the "iret frame" by the
- * hardware at entry) can not be used: this may be a return
- * to kernel code, but with a user CR3 value.
- *
- * Switching CR3 does not depend on kernel GSBASE so it can
- * be done before switching to the kernel GSBASE. This is
- * required for FSGSBASE because the kernel GSBASE has to
- * be retrieved from a kernel internal table.
- */
- SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
-
- /*
- * Handling GSBASE depends on the availability of FSGSBASE.
- *
- * Without FSGSBASE the kernel enforces that negative GSBASE
- * values indicate kernel GSBASE. With FSGSBASE no assumptions
- * can be made about the GSBASE value when entering from user
- * space.
- */
- ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
-
- /*
- * Read the current GSBASE and store it in %rbx unconditionally,
- * retrieve and set the current CPUs kernel GSBASE. The stored value
- * has to be restored in paranoid_exit unconditionally.
- *
- * The unconditional write to GS base below ensures that no subsequent
- * loads based on a mispredicted GS base can happen, therefore no LFENCE
- * is needed here.
- */
- SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
- ret
-
-.Lparanoid_entry_checkgs:
- /* EBX = 1 -> kernel GSBASE active, no restore required */
- movl $1, %ebx
- /*
- * The kernel-enforced convention is a negative GSBASE indicates
- * a kernel value. No SWAPGS needed on entry and exit.
- */
- movl $MSR_GS_BASE, %ecx
- rdmsr
- testl %edx, %edx
- jns .Lparanoid_entry_swapgs
- FENCE_SWAPGS_KERNEL_ENTRY
- ret
-
-.Lparanoid_entry_swapgs:
- swapgs
- FENCE_SWAPGS_USER_ENTRY
-
- /* EBX = 0 -> SWAPGS required on exit */
- xorl %ebx, %ebx
- ret
-SYM_CODE_END(paranoid_entry)
-
-/*
- * "Paranoid" exit path from exception stack. This is invoked
- * only on return from IST interrupts that came from kernel space.
- *
- * We may be returning to very strange contexts (e.g. very early
- * in syscall entry), so checking for preemption here would
- * be complicated. Fortunately, there's no good reason to try
- * to handle preemption here.
- *
- * R/EBX contains the GSBASE related information depending on the
- * availability of the FSGSBASE instructions:
- *
- * FSGSBASE R/EBX
- * N 0 -> SWAPGS on exit
- * 1 -> no SWAPGS on exit
- *
- * Y User space GSBASE, must be restored unconditionally
- */
-SYM_CODE_START_LOCAL(paranoid_exit)
- UNWIND_HINT_REGS offset=8
- /*
- * The order of operations is important. RESTORE_CR3 requires
- * kernel GSBASE.
- *
- * NB to anyone to try to optimize this code: this code does
- * not execute at all for exceptions from user mode. Those
- * exceptions go through error_exit instead.
- */
- RESTORE_CR3 scratch_reg=%rax save_reg=%r14
-
- /* Handle the three GSBASE cases */
- ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE
-
- /* With FSGSBASE enabled, unconditionally restore GSBASE */
- wrgsbase %rbx
- ret
-
-.Lparanoid_exit_checkgs:
- /* On non-FSGSBASE systems, conditionally do SWAPGS */
- testl %ebx, %ebx
- jnz .Lparanoid_exit_done
-
- /* We are returning to a context with user GSBASE */
- swapgs
-.Lparanoid_exit_done:
- ret
-SYM_CODE_END(paranoid_exit)
-
SYM_CODE_START_LOCAL(error_return)
UNWIND_HINT_REGS
DEBUG_ENTRY_ASSERT_IRQS_OFF
--
2.19.1.6.gb485710b

2021-09-26 15:16:21

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 23/41] x86/entry: Add the C version ist_switch_to_kernel_gsbase()

From: Lai Jiangshan <[email protected]>

It implements the second half of paranoid_entry() whose functionality
is to switch to kernel gsbase.

Not functional difference intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry64.c | 44 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)

diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
index b939b56d985d..1a0d5d703ad6 100644
--- a/arch/x86/entry/entry64.c
+++ b/arch/x86/entry/entry64.c
@@ -229,3 +229,47 @@ static __always_inline unsigned long get_percpu_base(void)
return pcpu_unit_offsets;
}
#endif
+
+/*
+ * Handle GSBASE depends on the availability of FSGSBASE.
+ *
+ * Without FSGSBASE the kernel enforces that negative GSBASE
+ * values indicate kernel GSBASE. With FSGSBASE no assumptions
+ * can be made about the GSBASE value when entering from user
+ * space.
+ */
+static __always_inline unsigned long ist_switch_to_kernel_gsbase(void)
+{
+ unsigned long gsbase;
+
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ /*
+ * Read the current GSBASE for return.
+ * Retrieve and set the current CPUs kernel GSBASE.
+ *
+ * The unconditional write to GS base below ensures that
+ * no subsequent loads based on a mispredicted GS base can
+ * happen, therefore no LFENCE is needed here.
+ */
+ gsbase = rdgsbase();
+ wrgsbase(get_percpu_base());
+ return gsbase;
+ }
+
+ gsbase = __rdmsr(MSR_GS_BASE);
+
+ /*
+ * The kernel-enforced convention is a negative GSBASE indicates
+ * a kernel value. No SWAPGS needed on entry and exit.
+ */
+ if ((long)gsbase < 0) {
+ kernel_entry_fence_no_swapgs();
+ /* no SWAPGS required on exit */
+ return 1;
+ }
+
+ user_entry_swapgs_and_fence();
+
+ /* SWAPGS required on exit */
+ return 0;
+}
--
2.19.1.6.gb485710b

2021-09-26 15:16:32

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 28/41] x86/debug, mce: Use C entry code

From: Lai Jiangshan <[email protected]>

Use DEFINE_IDTENTRY_IST_ETNRY to emit C entry function and use the function
directly in entry_64.S.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 10 +---------
arch/x86/include/asm/idtentry.h | 1 +
2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index bd6bce341360..0ba788bb9857 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -437,16 +437,8 @@ SYM_CODE_START(\asmsym)
testb $3, CS(%rsp)
jnz .Lfrom_usermode_switch_stack_\@

- /* paranoid_entry returns GS information for paranoid_exit in EBX. */
- call paranoid_entry
-
- UNWIND_HINT_REGS
-
movq %rsp, %rdi /* pt_regs pointer */
-
- call \cfunc
-
- call paranoid_exit
+ call ist_\cfunc
jmp restore_regs_and_return_to_kernel

/* Switch to the regular task stack and use the noist entry point */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index b33e96e983c0..babe530cfa77 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -344,6 +344,7 @@ __visible __entry_text void ist_##func(struct pt_regs *regs) \
* Maps to DEFINE_IDTENTRY_RAW
*/
#define DEFINE_IDTENTRY_IST(func) \
+ DEFINE_IDTENTRY_IST_ETNRY(func) \
DEFINE_IDTENTRY_RAW(func)

/**
--
2.19.1.6.gb485710b

2021-09-26 15:17:35

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 40/41] x86/entry: Remove save_ret from PUSH_AND_CLEAR_REGS

From: Lai Jiangshan <[email protected]>

PUSH_AND_CLEAR_REGS is never used with save_ret anymore.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/calling.h | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index d42012fc694d..6f9de1c6da73 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -63,15 +63,9 @@ For 32-bit we have the following conventions - kernel is built with
* for assembly code:
*/

-.macro PUSH_REGS rdx=%rdx rax=%rax save_ret=0
- .if \save_ret
- pushq %rsi /* pt_regs->si */
- movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */
- movq %rdi, 8(%rsp) /* pt_regs->di (overwriting original return address) */
- .else
+.macro PUSH_REGS rdx=%rdx rax=%rax
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
- .endif
pushq \rdx /* pt_regs->dx */
pushq %rcx /* pt_regs->cx */
pushq \rax /* pt_regs->ax */
@@ -86,10 +80,6 @@ For 32-bit we have the following conventions - kernel is built with
pushq %r14 /* pt_regs->r14 */
pushq %r15 /* pt_regs->r15 */
UNWIND_HINT_REGS
-
- .if \save_ret
- pushq %rsi /* return address on top of stack */
- .endif
.endm

.macro CLEAR_REGS
@@ -114,8 +104,8 @@ For 32-bit we have the following conventions - kernel is built with

.endm

-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0
- PUSH_REGS rdx=\rdx, rax=\rax, save_ret=\save_ret
+.macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax
+ PUSH_REGS rdx=\rdx, rax=\rax
CLEAR_REGS
.endm

--
2.19.1.6.gb485710b

2021-09-26 15:18:23

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 39/41] x86/entry: Remove the unused ASM macros

From: Lai Jiangshan <[email protected]>

They are implemented and used in C code. The ASM version is not needed
any more.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/calling.h | 106 ---------------------------------------
1 file changed, 106 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 996b041e92d2..d42012fc694d 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -210,60 +210,6 @@ For 32-bit we have the following conventions - kernel is built with
popq %rax
.endm

-.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
- ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI
- movq %cr3, \scratch_reg
- movq \scratch_reg, \save_reg
- /*
- * Test the user pagetable bit. If set, then the user page tables
- * are active. If clear CR3 already has the kernel page table
- * active.
- */
- bt $PTI_USER_PGTABLE_BIT, \scratch_reg
- jnc .Ldone_\@
-
- ADJUST_KERNEL_CR3 \scratch_reg
- movq \scratch_reg, %cr3
-
-.Ldone_\@:
-.endm
-
-.macro RESTORE_CR3 scratch_reg:req save_reg:req
- ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
-
- ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
-
- /*
- * KERNEL pages can always resume with NOFLUSH as we do
- * explicit flushes.
- */
- bt $PTI_USER_PGTABLE_BIT, \save_reg
- jnc .Lnoflush_\@
-
- /*
- * Check if there's a pending flush for the user ASID we're
- * about to set.
- */
- movq \save_reg, \scratch_reg
- andq $(0x7FF), \scratch_reg
- bt \scratch_reg, THIS_CPU_user_pcid_flush_mask
- jnc .Lnoflush_\@
-
- btr \scratch_reg, THIS_CPU_user_pcid_flush_mask
- jmp .Lwrcr3_\@
-
-.Lnoflush_\@:
- SET_NOFLUSH_BIT \save_reg
-
-.Lwrcr3_\@:
- /*
- * The CR3 write could be avoided when not changing its value,
- * but would require a CR3 read *and* a scratch register.
- */
- movq \save_reg, %cr3
-.Lend_\@:
-.endm
-
#else /* CONFIG_PAGE_TABLE_ISOLATION=n: */

.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
@@ -272,10 +218,6 @@ For 32-bit we have the following conventions - kernel is built with
.endm
.macro SWITCH_TO_USER_CR3_STACK scratch_reg:req
.endm
-.macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
-.endm
-.macro RESTORE_CR3 scratch_reg:req save_reg:req
-.endm

#endif

@@ -284,17 +226,10 @@ For 32-bit we have the following conventions - kernel is built with
*
* FENCE_SWAPGS_USER_ENTRY is used in the user entry swapgs code path, to
* prevent a speculative swapgs when coming from kernel space.
- *
- * FENCE_SWAPGS_KERNEL_ENTRY is used in the kernel entry non-swapgs code path,
- * to prevent the swapgs from getting speculatively skipped when coming from
- * user space.
*/
.macro FENCE_SWAPGS_USER_ENTRY
ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_USER
.endm
-.macro FENCE_SWAPGS_KERNEL_ENTRY
- ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL
-.endm

.macro STACKLEAK_ERASE_NOCLOBBER
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
@@ -304,12 +239,6 @@ For 32-bit we have the following conventions - kernel is built with
#endif
.endm

-.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
- rdgsbase \save_reg
- GET_PERCPU_BASE \scratch_reg
- wrgsbase \scratch_reg
-.endm
-
#else /* CONFIG_X86_64 */
# undef UNWIND_HINT_IRET_REGS
# define UNWIND_HINT_IRET_REGS
@@ -320,38 +249,3 @@ For 32-bit we have the following conventions - kernel is built with
call stackleak_erase
#endif
.endm
-
-#ifdef CONFIG_SMP
-
-/*
- * CPU/node NR is loaded from the limit (size) field of a special segment
- * descriptor entry in GDT.
- */
-.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
- movq $__CPUNODE_SEG, \reg
- lsl \reg, \reg
-.endm
-
-/*
- * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
- * We normally use %gs for accessing per-CPU data, but we are setting up
- * %gs here and obviously can not use %gs itself to access per-CPU data.
- *
- * Do not use RDPID, because KVM loads guest's TSC_AUX on vm-entry and
- * may not restore the host's value until the CPU returns to userspace.
- * Thus the kernel would consume a guest's TSC_AUX if an NMI arrives
- * while running KVM's run loop.
- */
-.macro GET_PERCPU_BASE reg:req
- LOAD_CPU_AND_NODE_SEG_LIMIT \reg
- andq $VDSO_CPUNODE_MASK, \reg
- movq __per_cpu_offset(, \reg, 8), \reg
-.endm
-
-#else
-
-.macro GET_PERCPU_BASE reg:req
- movq pcpu_unit_offsets(%rip), \reg
-.endm
-
-#endif /* CONFIG_SMP */
--
2.19.1.6.gb485710b

2021-09-26 15:18:25

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 41/41] x86/syscall/64: Move the checking for sysret to C code

From: Lai Jiangshan <[email protected]>

Like do_fast_syscall_32() which checks whether it can return to userspace
via fast instructions before the function returns, do_syscall_64()
also checks whether it can use sysret to return to userspace before
do_syscall_64() returns via C code. And a bunch of ASM code can be
removed.

No functional change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/calling.h | 10 +----
arch/x86/entry/common.c | 73 ++++++++++++++++++++++++++++++-
arch/x86/entry/entry_64.S | 78 ++--------------------------------
arch/x86/include/asm/syscall.h | 2 +-
4 files changed, 78 insertions(+), 85 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 6f9de1c6da73..05da3ef48ee4 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -109,27 +109,19 @@ For 32-bit we have the following conventions - kernel is built with
CLEAR_REGS
.endm

-.macro POP_REGS pop_rdi=1 skip_r11rcx=0
+.macro POP_REGS pop_rdi=1
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbp
popq %rbx
- .if \skip_r11rcx
- popq %rsi
- .else
popq %r11
- .endif
popq %r10
popq %r9
popq %r8
popq %rax
- .if \skip_r11rcx
- popq %rsi
- .else
popq %rcx
- .endif
popq %rdx
popq %rsi
.if \pop_rdi
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c2826417b33..718045b7a53c 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -70,7 +70,77 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
return false;
}

-__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
+/*
+ * Change top bits to match the most significant bit (47th or 56th bit
+ * depending on paging mode) in the address to get canonical address.
+ *
+ * If width of "canonical tail" ever becomes variable, this will need
+ * to be updated to remain correct on both old and new CPUs.
+ */
+static __always_inline u64 canonical_address(u64 vaddr)
+{
+ if (IS_ENABLED(CONFIG_X86_5LEVEL) && static_cpu_has(X86_FEATURE_LA57))
+ return ((s64)vaddr << (64 - 57)) >> (64 - 57);
+ else
+ return ((s64)vaddr << (64 - 48)) >> (64 - 48);
+}
+
+/*
+ * Check if it can use SYSRET.
+ *
+ * Try to use SYSRET instead of IRET if we're returning to
+ * a completely clean 64-bit userspace context.
+ *
+ * Returns 0 to return using IRET or 1 to return using SYSRET.
+ */
+static __always_inline int can_sysret(struct pt_regs *regs)
+{
+ /* In the Xen PV case we must use iret anyway. */
+ if (static_cpu_has(X86_FEATURE_XENPV))
+ return 0;
+
+ /* SYSRET requires RCX == RIP && R11 == RFLAGS */
+ if (regs->ip != regs->cx || regs->flags != regs->r11)
+ return 0;
+
+ /* CS and SS must match SYSRET */
+ if (regs->cs != __USER_CS || regs->ss != __USER_DS)
+ return 0;
+
+ /*
+ * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
+ * in kernel space. This essentially lets the user take over
+ * the kernel, since userspace controls RSP.
+ */
+ if (regs->cx != canonical_address(regs->cx))
+ return 0;
+
+ /*
+ * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
+ * restore RF properly. If the slowpath sets it for whatever reason, we
+ * need to restore it correctly.
+ *
+ * SYSRET can restore TF, but unlike IRET, restoring TF results in a
+ * trap from userspace immediately after SYSRET. This would cause an
+ * infinite loop whenever #DB happens with register state that satisfies
+ * the opportunistic SYSRET conditions. For example, single-stepping
+ * this user code:
+ *
+ * movq $stuck_here, %rcx
+ * pushfq
+ * popq %r11
+ * stuck_here:
+ *
+ * would never get past 'stuck_here'.
+ */
+ if (regs->r11 & (X86_EFLAGS_RF | X86_EFLAGS_TF))
+ return 0;
+
+ return 1;
+}
+
+/* Returns 0 to return using IRET or 1 to return using SYSRET. */
+__visible noinstr int do_syscall_64(struct pt_regs *regs, int nr)
{
add_random_kstack_offset();
nr = syscall_enter_from_user_mode(regs, nr);
@@ -84,6 +154,7 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)

instrumentation_end();
syscall_exit_to_user_mode(regs);
+ return can_sysret(regs);
}
#endif

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 260be3c9da7d..777fbf7c3939 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -112,85 +112,15 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
movslq %eax, %rsi
call do_syscall_64 /* returns with IRQs disabled */

- /*
- * Try to use SYSRET instead of IRET if we're returning to
- * a completely clean 64-bit userspace context. If we're not,
- * go to the slow exit path.
- * In the Xen PV case we must use iret anyway.
- */
-
- ALTERNATIVE "", "jmp swapgs_restore_regs_and_return_to_usermode", \
- X86_FEATURE_XENPV
-
- movq RCX(%rsp), %rcx
- movq RIP(%rsp), %r11
-
- cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */
- jne swapgs_restore_regs_and_return_to_usermode
+ testl %eax, %eax
+ jz swapgs_restore_regs_and_return_to_usermode

/*
- * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
- * in kernel space. This essentially lets the user take over
- * the kernel, since userspace controls RSP.
- *
- * If width of "canonical tail" ever becomes variable, this will need
- * to be updated to remain correct on both old and new CPUs.
- *
- * Change top bits to match most significant bit (47th or 56th bit
- * depending on paging mode) in the address.
- */
-#ifdef CONFIG_X86_5LEVEL
- ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \
- "shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57
-#else
- shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
- sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
-#endif
-
- /* If this changed %rcx, it was not canonical */
- cmpq %rcx, %r11
- jne swapgs_restore_regs_and_return_to_usermode
-
- cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */
- jne swapgs_restore_regs_and_return_to_usermode
-
- movq R11(%rsp), %r11
- cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */
- jne swapgs_restore_regs_and_return_to_usermode
-
- /*
- * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
- * restore RF properly. If the slowpath sets it for whatever reason, we
- * need to restore it correctly.
- *
- * SYSRET can restore TF, but unlike IRET, restoring TF results in a
- * trap from userspace immediately after SYSRET. This would cause an
- * infinite loop whenever #DB happens with register state that satisfies
- * the opportunistic SYSRET conditions. For example, single-stepping
- * this user code:
- *
- * movq $stuck_here, %rcx
- * pushfq
- * popq %r11
- * stuck_here:
- *
- * would never get past 'stuck_here'.
- */
- testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
- jnz swapgs_restore_regs_and_return_to_usermode
-
- /* nothing to check for RSP */
-
- cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */
- jne swapgs_restore_regs_and_return_to_usermode
-
- /*
- * We win! This label is here just for ease of understanding
+ * This label is here just for ease of understanding
* perf profiles. Nothing jumps here.
*/
syscall_return_via_sysret:
- /* rcx and r11 are already restored (see code above) */
- POP_REGS pop_rdi=0 skip_r11rcx=1
+ POP_REGS pop_rdi=0

/*
* Now all regs are restored except RSP and RDI.
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index f7e2d82d24fb..477adea7bac0 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -159,7 +159,7 @@ static inline int syscall_get_arch(struct task_struct *task)
? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
}

-void do_syscall_64(struct pt_regs *regs, int nr);
+int do_syscall_64(struct pt_regs *regs, int nr);
void do_int80_syscall_32(struct pt_regs *regs);
long do_fast_syscall_32(struct pt_regs *regs);

--
2.19.1.6.gb485710b

2021-09-27 10:20:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 02/41] x86/traps: Remove stack-protector from traps.c

On Sun, Sep 26, 2021 at 11:07:59PM +0800, Lai Jiangshan wrote:
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 8f4e8fa6ed75..0e054e2304c6 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -48,6 +48,9 @@ KCOV_INSTRUMENT := n
>
> CFLAGS_head$(BITS).o += -fno-stack-protector
>
> +CFLAGS_REMOVE_traps.o = -fstack-protector -fstack-protector-strong

Why this too?

> +CFLAGS_traps.o += -fno-stack-protector

Isn't this enough to disable stack protector for this file?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-27 10:51:17

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 02/41] x86/traps: Remove stack-protector from traps.c



On 2021/9/27 18:19, Borislav Petkov wrote:
> On Sun, Sep 26, 2021 at 11:07:59PM +0800, Lai Jiangshan wrote:
>> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> index 8f4e8fa6ed75..0e054e2304c6 100644
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -48,6 +48,9 @@ KCOV_INSTRUMENT := n
>>
>> CFLAGS_head$(BITS).o += -fno-stack-protector
>>
>> +CFLAGS_REMOVE_traps.o = -fstack-protector -fstack-protector-strong
>
> Why this too?
>
>> +CFLAGS_traps.o += -fno-stack-protector
>
> Isn't this enough to disable stack protector for this file?
>

I did not investigate deep enough. I reviewed the generated code and
found %gs is accessed early for the C entry function and searched for
solution and I chose to copy the code that I thought is the most complete:
kernel/entry/Makefile

Using only "-fno-stack-protector" is enough to disable stack protector with
my .config, I'm not so sure about other configuration.

Thanks
Lai

2021-09-27 11:06:37

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 02/41] x86/traps: Remove stack-protector from traps.c

On Mon, Sep 27, 2021 at 06:49:16PM +0800, Lai Jiangshan wrote:
> Using only "-fno-stack-protector" is enough to disable stack protector with
> my .config, I'm not so sure about other configuration.

What does the gcc manpage say about it?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-27 14:39:39

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 02/41] x86/traps: Remove stack-protector from traps.c



On 2021/9/27 19:01, Borislav Petkov wrote:
> On Mon, Sep 27, 2021 at 06:49:16PM +0800, Lai Jiangshan wrote:
>> Using only "-fno-stack-protector" is enough to disable stack protector with
>> my .config, I'm not so sure about other configuration.
>
> What does the gcc manpage say about it?
>

In gcc's code, all the -f[no-]stack-protector* argument overwrites the
same flag_stack_protect variable, so the last one takes effect.

> fstack-protector
> Common Var(flag_stack_protect, 1) Init(-1) Optimization
> Use propolice as a stack protection method.
>
> fstack-protector-all
> Common RejectNegative Var(flag_stack_protect, 2) Init(-1) Optimization
> Use a stack protection method for every function.
>
> fstack-protector-strong
> Common RejectNegative Var(flag_stack_protect, 3) Init(-1) Optimization
> Use a smart stack protection method for certain functions.
>
> fstack-protector-explicit
> Common RejectNegative Var(flag_stack_protect, 4) Optimization
> Use stack protection method only for functions with the stack_protect attribute.

In linux kernel's scripts/Makefile.lib, CFLAGS_traps.o is the last flags for
gcc invocation, so only "CFLAGS_traps.o += -fno-stack-protector" must be enough.

> _c_flags = $(filter-out $(CFLAGS_REMOVE_$(target-stem).o), \
> $(filter-out $(ccflags-remove-y), \
> $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) $(ccflags-y)) \
> $(CFLAGS_$(target-stem).o))

2021-09-27 18:13:56

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH V2 03/41] compiler_types.h: Add __noinstr_section() for noinstr

On Sun, Sep 26, 2021 at 11:08:00PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> And it will be extended for C entry code.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> include/linux/compiler_types.h | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index b6ff83a714ca..3c77631c68bd 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -208,10 +208,12 @@ struct ftrace_likely_data {
> #endif
>
> /* Section for code which can't be instrumented at all */
> -#define noinstr \
> - noinline notrace __attribute((__section__(".noinstr.text"))) \
> +#define __noinstr_section(section) \

bikeshed: this could be just __noinstr(section) instead
of __noinstr_section(section) just to avoid semi-redundant
information. *shrug*

Reviewed-by: Kees Cook <[email protected]>

> + noinline notrace __attribute((__section__(section))) \
> __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
>
> +#define noinstr __noinstr_section(".noinstr.text")
> +
> #endif /* __KERNEL__ */
>
> #endif /* __ASSEMBLY__ */
> --
> 2.19.1.6.gb485710b
>

--
Kees Cook

2021-09-28 21:35:55

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH V2 16/41] x86/entry: Implement the whole error_entry() as C code

On Sun, Sep 26, 2021 at 11:13 AM Lai Jiangshan <[email protected]> wrote:
>
> From: Lai Jiangshan <[email protected]>
>
> All the needed facilities are set in entry64.c, the whole error_entry()
> can be implemented in C in entry64.c. The C version generally has better
> readability and easier to be updated/improved.
>
> No function change intended. Only a check for X86_FEATURE_XENPV is added
> because the new error_entry() does not use the pv SWAPGS, rather it uses
> native_swapgs(). And for XENPV, error_entry() has nothing to do, so it
> can return directly.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/entry/entry64.c | 76 ++++++++++++++++++++++++++++++++++
> arch/x86/entry/entry_64.S | 80 +-----------------------------------
> arch/x86/include/asm/traps.h | 1 +
> 3 files changed, 78 insertions(+), 79 deletions(-)
>
> diff --git a/arch/x86/entry/entry64.c b/arch/x86/entry/entry64.c
> index dafae60d31f9..5f2be4c3f333 100644
> --- a/arch/x86/entry/entry64.c
> +++ b/arch/x86/entry/entry64.c
> @@ -56,3 +56,78 @@ static __always_inline void kernel_entry_fence_no_swapgs(void)
> {
> alternative("", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL);
> }
> +
> +/*
> + * Put pt_regs onto the task stack and switch GS and CR3 if needed.
> + * The actual stack switch is done in entry_64.S.
> + *
> + * Becareful, it might be in the user CR3 and user GS base at the start
> + * of the function.
> + */
> +asmlinkage __visible __entry_text
> +struct pt_regs *error_entry(struct pt_regs *eregs)
> +{
> + unsigned long iret_ip = (unsigned long)native_irq_return_iret;
> +
> + asm volatile ("cld");

The C ABI states that the direction flag must be clear on function
entry and exit, so the CLD instruction needs to remain in the asm
code.

https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf#subsection.3.2.1

--
Brian Gerst

2021-09-29 10:42:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 16/41] x86/entry: Implement the whole error_entry() as C code

On Tue, Sep 28, 2021 at 05:34:02PM -0400, Brian Gerst wrote:
> On Sun, Sep 26, 2021 at 11:13 AM Lai Jiangshan <[email protected]> wrote:
> > +asmlinkage __visible __entry_text
> > +struct pt_regs *error_entry(struct pt_regs *eregs)
> > +{
> > + unsigned long iret_ip = (unsigned long)native_irq_return_iret;
> > +
> > + asm volatile ("cld");
>
> The C ABI states that the direction flag must be clear on function
> entry and exit, so the CLD instruction needs to remain in the asm
> code.

Right, also, one of my pet peeves with out entry code is that CLD and
CLAC are not next to one another.

2021-09-30 11:52:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V2 04/41] x86/entry: Introduce __entry_text for entry code written in C

On Sun, Sep 26, 2021 at 11:08:01PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> Some entry code will be implemented in C files. We need __entry_text

Who's "we"?

> to set them in .entry.text section. __entry_text disables instruments

s/instruments/instrumentation/

> like noinstr, but it doesn't disable stack protector since not all
> compiler supported by kernel supporting function level granular
> attribute to disable stack protector. It will be disabled by C file
> level.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/include/asm/idtentry.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index 1345088e9902..6779def97591 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -11,6 +11,9 @@
>
> #include <asm/irq_stack.h>
>
> +/* Entry code written in C. */
> +#define __entry_text __noinstr_section(".entry.text")

I'm assuming that __noinstr_section() is defined somewhere, maybe in
patch 3, which I don't have in my mbox.

Yah, the 0th message says:

" compiler_types.h: Add __noinstr_section() for noinstr"

Aha, I see why: you haven't CCed me on that one so I don't have it:

https://lkml.kernel.org/r/[email protected]

I have all the remaining 40 but not that one.

On your next submission, please make sure you CC [email protected] so that
all x86 people get the whole patchset.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette