2024-01-25 11:32:58

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 00/17] x86: Confine early 1:1 mapped startup code

From: Ard Biesheuvel <[email protected]>

This is a follow-up to my RFC [0] that proposed to build the entire core
kernel with -fPIC, to reduce the likelihood that code that runs
extremely early from the 1:1 mapping of memory will misbehave.

This is needed to address reports that SEV boot on Clang built kernels
is broken, due to the fact that this early code attempts to access
virtual kernel address that are not mapped yet. Kevin has suggested some
workarounds to this [1] but this is really something that requires a
more rigorous approach, rather than addressing a couple of symptoms of
the underlying defect.

As it turns out, the use of fPIE for the entire kernel is neither
necessary nor sufficient, and has its own set of problems, including the
fact that the PIE small C code model uses FS rather than GS for the
per-CPU register, and only recent GCC and Clang versions permit this to
be overridden on the command line.

But the real problem is that even position independent code is not
guaranteed to execute correctly at any offset unless all statically
initialized pointer variables use the same translation as the code.

So instead, this v2 proposes another solution, taking the following
approach:
- clean up and refactor the startup code so that the primary startup
code executes from the 1:1 mapping but nothing else;
- define a new text section type .pi.text and enforce that it can only
call into other .pi.text sections;
- (tbd) require that objects containing .pi.text sections are built with
-fPIC, and disallow any absolute references from such objects.

The latter point is not implemented yet in this v2, but this could be
done rather straight-forwardly. (The EFI stub already does something
similar across all architectures)

Patch #13 in particular gives an overview of all the code that gets
pulled into the early 1:1 startup code path due to the fact that memory
encryption needs to be configured before we can even map the kernel.


[0] https://lkml.kernel.org/r/20240122090851.851120-7-ardb%2Bgit%40google.com
[1] https://lore.kernel.org/all/[email protected]/T/#u

Cc: Kevin Loughlin <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Dionna Glaze <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Ard Biesheuvel (17):
x86/startup_64: Drop long return to initial_code pointer
x86/startup_64: Simplify calculation of initial page table address
x86/startup_64: Simplify CR4 handling in startup code
x86/startup_64: Drop global variables to keep track of LA57 state
x86/startup_64: Simplify virtual switch on primary boot
x86/head64: Replace pointer fixups with PIE codegen
x86/head64: Simplify GDT/IDT initialization code
asm-generic: Add special .pi.text section for position independent
code
x86: Move return_thunk to __pitext section
x86/head64: Move early startup code into __pitext
modpost: Warn about calls from __pitext into other text sections
x86/coco: Make cc_set_mask() static inline
x86/sev: Make all code reachable from 1:1 mapping __pitext
x86/sev: Avoid WARN() in early code
x86/sev: Use PIC codegen for early SEV startup code
x86/sev: Drop inline asm LEA instructions for RIP-relative references
x86/startup_64: Don't bother setting up GS before the kernel is mapped

arch/x86/Makefile | 5 +
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/pgtable_64.c | 2 -
arch/x86/boot/compressed/sev.c | 3 +
arch/x86/coco/core.c | 7 +-
arch/x86/include/asm/coco.h | 8 +-
arch/x86/include/asm/mem_encrypt.h | 8 +-
arch/x86/include/asm/pgtable.h | 6 +-
arch/x86/include/asm/pgtable_64_types.h | 15 +-
arch/x86/include/asm/setup.h | 4 +-
arch/x86/include/asm/sev.h | 6 +-
arch/x86/kernel/Makefile | 5 +
arch/x86/kernel/cpu/common.c | 2 -
arch/x86/kernel/head64.c | 188 ++++++--------------
arch/x86/kernel/head_64.S | 156 ++++++----------
arch/x86/kernel/sev-shared.c | 26 +--
arch/x86/kernel/sev.c | 27 ++-
arch/x86/kernel/vmlinux.lds.S | 3 +-
arch/x86/lib/Makefile | 2 +-
arch/x86/lib/cmdline.c | 6 +-
arch/x86/lib/memcpy_64.S | 3 +-
arch/x86/lib/memset_64.S | 3 +-
arch/x86/lib/retpoline.S | 2 +-
arch/x86/mm/Makefile | 3 +-
arch/x86/mm/kasan_init_64.c | 3 -
arch/x86/mm/mem_encrypt_boot.S | 3 +-
arch/x86/mm/mem_encrypt_identity.c | 94 +++++-----
arch/x86/mm/pti.c | 2 +-
include/asm-generic/vmlinux.lds.h | 3 +
include/linux/init.h | 12 ++
scripts/mod/modpost.c | 11 +-
tools/objtool/check.c | 26 ++-
32 files changed, 262 insertions(+), 384 deletions(-)

--
2.43.0.429.g432eaa2c6b-goog



2024-01-25 11:33:09

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 01/17] x86/startup_64: Drop long return to initial_code pointer

From: Ard Biesheuvel <[email protected]>

Since commit 866b556efa12 ("x86/head/64: Install startup GDT"), the
primary startup sequence sets the code segment register (CS) to __KERNEL_CS
before calling into the startup code shared between primary and
secondary boot.

This means a simple indirect call is sufficient here.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 35 ++------------------
1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d4918d03efb4..4017a49d7b76 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -428,39 +428,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
movq %r15, %rdi

.Ljump_to_C_code:
- /*
- * Jump to run C code and to be on a real kernel address.
- * Since we are running on identity-mapped space we have to jump
- * to the full 64bit address, this is only possible as indirect
- * jump. In addition we need to ensure %cs is set so we make this
- * a far return.
- *
- * Note: do not change to far jump indirect with 64bit offset.
- *
- * AMD does not support far jump indirect with 64bit offset.
- * AMD64 Architecture Programmer's Manual, Volume 3: states only
- * JMP FAR mem16:16 FF /5 Far jump indirect,
- * with the target specified by a far pointer in memory.
- * JMP FAR mem16:32 FF /5 Far jump indirect,
- * with the target specified by a far pointer in memory.
- *
- * Intel64 does support 64bit offset.
- * Software Developer Manual Vol 2: states:
- * FF /5 JMP m16:16 Jump far, absolute indirect,
- * address given in m16:16
- * FF /5 JMP m16:32 Jump far, absolute indirect,
- * address given in m16:32.
- * REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
- * address given in m16:64.
- */
- pushq $.Lafter_lret # put return address on stack for unwinder
xorl %ebp, %ebp # clear frame pointer
- movq initial_code(%rip), %rax
- pushq $__KERNEL_CS # set correct cs
- pushq %rax # target address in negative space
- lretq
-.Lafter_lret:
- ANNOTATE_NOENDBR
+ ANNOTATE_RETPOLINE_SAFE
+ callq *initial_code(%rip)
+ int3
SYM_CODE_END(secondary_startup_64)

#include "verify_cpu.S"
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:33:38

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 03/17] x86/startup_64: Simplify CR4 handling in startup code

From: Ard Biesheuvel <[email protected]>

When executing in long mode, the CR4.PAE and CR4.LA57 control bits
cannot be updated, and so they can simply be preserved rather than
reason about whether or not they need to be set. CR4.PSE has no effect
in long mode so it can be omitted.

CR4.PGE is used to flush the TLBs, by clearing it if it was set, and
subsequently re-enabling it. So there is no need to set it just to
disable and re-enable it later.

CR4.MCE must be preserved unless the kernel was built without
CONFIG_X86_MCE, in which case it must be cleared.

Reimplement the above logic in a more straight-forward way, by defining
a mask of CR4 bits to preserve, and applying that to CR4 at the point
where it needs to be updated anyway.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 27 ++++++++------------
1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 6d24c2014759..2d361e0ac74e 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -179,6 +179,12 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)

1:

+ /*
+ * Define a mask of CR4 bits to preserve. PAE and LA57 cannot be
+ * modified while paging remains enabled. PGE will be toggled below if
+ * it is already set.
+ */
+ orl $(X86_CR4_PAE | X86_CR4_PGE | X86_CR4_LA57), %edx
#ifdef CONFIG_X86_MCE
/*
* Preserve CR4.MCE if the kernel will enable #MC support.
@@ -187,22 +193,9 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
* configured will crash the system regardless of the CR4.MCE value set
* here.
*/
- movq %cr4, %rcx
- andl $X86_CR4_MCE, %ecx
-#else
- movl $0, %ecx
+ orl $X86_CR4_MCE, %edx
#endif

- /* Enable PAE mode, PSE, PGE and LA57 */
- orl $(X86_CR4_PAE | X86_CR4_PSE | X86_CR4_PGE), %ecx
-#ifdef CONFIG_X86_5LEVEL
- testb $1, __pgtable_l5_enabled(%rip)
- jz 1f
- orl $X86_CR4_LA57, %ecx
-1:
-#endif
- movq %rcx, %cr4
-
/*
* Switch to new page-table
*
@@ -218,10 +211,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
* entries from the identity mapping are flushed.
*/
movq %cr4, %rcx
- movq %rcx, %rax
- xorq $X86_CR4_PGE, %rcx
+ andl %edx, %ecx
+0: btcl $X86_CR4_PGE_BIT, %ecx
movq %rcx, %cr4
- movq %rax, %cr4
+ jc 0b

/* Ensure I am executing from virtual addresses */
movq $1f, %rax
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:34:05

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 05/17] x86/startup_64: Simplify virtual switch on primary boot

From: Ard Biesheuvel <[email protected]>

The secondary startup code is used on the primary boot path as well, but
in this case, the initial part runs from a 1:1 mapping, until an
explicit cross-jump is made to the kernel virtual mapping of the same
code.

On the secondary boot path, this jump is pointless as the code already
executes from the mapping targeted by the jump. So combine this
cross-jump with the jump from startup_64() into the common boot path.
This simplifies the execution flow, and clearly separates code that runs
from a 1:1 mapping from code that runs from the kernel virtual mapping.

Note that this requires a page table switch, so hoist the CR3 assignment
into startup_64() as well.

Given that the secondary startup code does not require a special
placement inside the executable, move it to the .text section.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 41 +++++++++-----------
1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 2d361e0ac74e..399241dcdbb5 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -39,7 +39,6 @@ L4_START_KERNEL = l4_index(__START_KERNEL_map)

L3_START_KERNEL = pud_index(__START_KERNEL_map)

- .text
__HEAD
.code64
SYM_CODE_START_NOALIGN(startup_64)
@@ -128,9 +127,19 @@ SYM_CODE_START_NOALIGN(startup_64)
call sev_verify_cbit
#endif

- jmp 1f
+ /*
+ * Switch to early_top_pgt which still has the identity mappings
+ * present.
+ */
+ movq %rax, %cr3
+
+ /* Branch to the common startup code at its kernel virtual address */
+ movq $common_startup_64, %rax
+ ANNOTATE_RETPOLINE_SAFE
+ jmp *%rax
SYM_CODE_END(startup_64)

+ .text
SYM_CODE_START(secondary_startup_64)
UNWIND_HINT_END_OF_STACK
ANNOTATE_NOENDBR
@@ -176,8 +185,15 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
#ifdef CONFIG_AMD_MEM_ENCRYPT
addq sme_me_mask(%rip), %rax
#endif
+ /*
+ * Switch to the init_top_pgt here, away from the trampoline_pgd and
+ * unmap the identity mapped ranges.
+ */
+ movq %rax, %cr3

-1:
+SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
+ UNWIND_HINT_END_OF_STACK
+ ANNOTATE_NOENDBR // above

/*
* Define a mask of CR4 bits to preserve. PAE and LA57 cannot be
@@ -195,17 +211,6 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
*/
orl $X86_CR4_MCE, %edx
#endif
-
- /*
- * Switch to new page-table
- *
- * For the boot CPU this switches to early_top_pgt which still has the
- * identity mappings present. The secondary CPUs will switch to the
- * init_top_pgt here, away from the trampoline_pgd and unmap the
- * identity mapped ranges.
- */
- movq %rax, %cr3
-
/*
* Do a global TLB flush after the CR3 switch to make sure the TLB
* entries from the identity mapping are flushed.
@@ -216,14 +221,6 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
movq %rcx, %cr4
jc 0b

- /* Ensure I am executing from virtual addresses */
- movq $1f, %rax
- ANNOTATE_RETPOLINE_SAFE
- jmp *%rax
-1:
- UNWIND_HINT_END_OF_STACK
- ANNOTATE_NOENDBR // above
-
#ifdef CONFIG_SMP
/*
* For parallel boot, the APIC ID is read from the APIC, and then
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:34:36

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 02/17] x86/startup_64: Simplify calculation of initial page table address

From: Ard Biesheuvel <[email protected]>

Determining the address of the initial page table to program into CR3
involves:
- taking the physical address
- adding the SME encryption mask

On the primary entry path, the code is mapped using a 1:1 virtual to
physical translation, so the physical address can be taken directly
using a RIP-relative LEA instruction.

On the secondary entry path, the address can be obtained by taking the
offset from the virtual kernel base (__START_kernel_map) and adding the
physical kernel base.

This is all very straight-forward, but the current code makes a mess of
this. Clean this up.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 25 ++++++--------------
1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 4017a49d7b76..6d24c2014759 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -113,13 +113,11 @@ SYM_CODE_START_NOALIGN(startup_64)
call __startup_64

/* Form the CR3 value being sure to include the CR3 modifier */
- addq $(early_top_pgt - __START_KERNEL_map), %rax
+ leaq early_top_pgt(%rip), %rcx
+ addq %rcx, %rax

#ifdef CONFIG_AMD_MEM_ENCRYPT
mov %rax, %rdi
- mov %rax, %r14
-
- addq phys_base(%rip), %rdi

/*
* For SEV guests: Verify that the C-bit is correct. A malicious
@@ -128,12 +126,6 @@ SYM_CODE_START_NOALIGN(startup_64)
* the next RET instruction.
*/
call sev_verify_cbit
-
- /*
- * Restore CR3 value without the phys_base which will be added
- * below, before writing %cr3.
- */
- mov %r14, %rax
#endif

jmp 1f
@@ -173,18 +165,18 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
/* Clear %R15 which holds the boot_params pointer on the boot CPU */
xorq %r15, %r15

+ /* Derive the runtime physical address of init_top_pgt[] */
+ movq phys_base(%rip), %rax
+ addq $(init_top_pgt - __START_KERNEL_map), %rax
+
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
* added to the initial pgdir entry that will be programmed into CR3.
*/
#ifdef CONFIG_AMD_MEM_ENCRYPT
- movq sme_me_mask, %rax
-#else
- xorq %rax, %rax
+ addq sme_me_mask(%rip), %rax
#endif

- /* Form the CR3 value being sure to include the CR3 modifier */
- addq $(init_top_pgt - __START_KERNEL_map), %rax
1:

#ifdef CONFIG_X86_MCE
@@ -211,9 +203,6 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
#endif
movq %rcx, %cr4

- /* Setup early boot stage 4-/5-level pagetables. */
- addq phys_base(%rip), %rax
-
/*
* Switch to new page-table
*
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:35:01

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 04/17] x86/startup_64: Drop global variables to keep track of LA57 state

From: Ard Biesheuvel <[email protected]>

On x86_64, the core kernel is entered in long mode, which implies that
paging is enabled. This means that the CR4.LA57 control bit is
guaranteed to be in sync with the number of paging levels used by the
kernel, and there is no need to store this in a variable.

There is also no need to use variables for storing the calculations of
pgdir_shift and ptrs_per_p4d, as they are easily determined on the fly.
Other assignments of global variables related to the number of paging
levels can be deferred to the primary C entrypoint that actually runs
from the kernel virtual mapping.

This removes the need for writing to __ro_after_init from the code that
executes extremely early via the 1:1 mapping.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/pgtable_64.c | 2 -
arch/x86/include/asm/pgtable_64_types.h | 15 +++---
arch/x86/kernel/cpu/common.c | 2 -
arch/x86/kernel/head64.c | 52 ++++----------------
arch/x86/mm/kasan_init_64.c | 3 --
arch/x86/mm/mem_encrypt_identity.c | 9 ----
6 files changed, 15 insertions(+), 68 deletions(-)

diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 51f957b24ba7..0586cc216aa6 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -128,8 +128,6 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)

/* Initialize variables for 5-level paging */
__pgtable_l5_enabled = 1;
- pgdir_shift = 48;
- ptrs_per_p4d = 512;
}

/*
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 38b54b992f32..ecc010fbb377 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -22,28 +22,25 @@ typedef struct { pteval_t pte; } pte_t;
typedef struct { pmdval_t pmd; } pmd_t;

#ifdef CONFIG_X86_5LEVEL
+#ifdef USE_EARLY_PGTABLE_L5
extern unsigned int __pgtable_l5_enabled;

-#ifdef USE_EARLY_PGTABLE_L5
/*
- * cpu_feature_enabled() is not available in early boot code.
- * Use variable instead.
+ * CR4.LA57 may not be set to its final value yet in the early boot code.
+ * Use a variable instead.
*/
static inline bool pgtable_l5_enabled(void)
{
return __pgtable_l5_enabled;
}
#else
-#define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57)
+#define pgtable_l5_enabled() !!(native_read_cr4() & X86_CR4_LA57)
#endif /* USE_EARLY_PGTABLE_L5 */

#else
#define pgtable_l5_enabled() 0
#endif /* CONFIG_X86_5LEVEL */

-extern unsigned int pgdir_shift;
-extern unsigned int ptrs_per_p4d;
-
#endif /* !__ASSEMBLY__ */

#define SHARED_KERNEL_PMD 0
@@ -53,7 +50,7 @@ extern unsigned int ptrs_per_p4d;
/*
* PGDIR_SHIFT determines what a top-level page table entry can map
*/
-#define PGDIR_SHIFT pgdir_shift
+#define PGDIR_SHIFT (pgtable_l5_enabled() ? 48 : 39)
#define PTRS_PER_PGD 512

/*
@@ -61,7 +58,7 @@ extern unsigned int ptrs_per_p4d;
*/
#define P4D_SHIFT 39
#define MAX_PTRS_PER_P4D 512
-#define PTRS_PER_P4D ptrs_per_p4d
+#define PTRS_PER_P4D (pgtable_l5_enabled() ? 512 : 1)
#define P4D_SIZE (_AC(1, UL) << P4D_SHIFT)
#define P4D_MASK (~(P4D_SIZE - 1))

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0b97bcde70c6..20ac11a2c06b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1,6 +1,4 @@
// SPDX-License-Identifier: GPL-2.0-only
-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5

#include <linux/memblock.h>
#include <linux/linkage.h>
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index dc0956067944..d636bb02213f 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -7,9 +7,6 @@

#define DISABLE_BRANCH_PROFILING

-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
-
#include <linux/init.h>
#include <linux/linkage.h>
#include <linux/types.h>
@@ -50,14 +47,6 @@ extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
static unsigned int __initdata next_early_pgt;
pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);

-#ifdef CONFIG_X86_5LEVEL
-unsigned int __pgtable_l5_enabled __ro_after_init;
-unsigned int pgdir_shift __ro_after_init = 39;
-EXPORT_SYMBOL(pgdir_shift);
-unsigned int ptrs_per_p4d __ro_after_init = 1;
-EXPORT_SYMBOL(ptrs_per_p4d);
-#endif
-
#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4;
EXPORT_SYMBOL(page_offset_base);
@@ -95,37 +84,6 @@ static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
return fixup_pointer(ptr, physaddr);
}

-#ifdef CONFIG_X86_5LEVEL
-static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
-{
- return fixup_pointer(ptr, physaddr);
-}
-
-static bool __head check_la57_support(unsigned long physaddr)
-{
- /*
- * 5-level paging is detected and enabled at kernel decompression
- * stage. Only check if it has been enabled there.
- */
- if (!(native_read_cr4() & X86_CR4_LA57))
- return false;
-
- *fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
- *fixup_int(&pgdir_shift, physaddr) = 48;
- *fixup_int(&ptrs_per_p4d, physaddr) = 512;
- *fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
- *fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
- *fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
-
- return true;
-}
-#else
-static bool __head check_la57_support(unsigned long physaddr)
-{
- return false;
-}
-#endif
-
static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
{
unsigned long vaddr, vaddr_end;
@@ -189,7 +147,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
int i;
unsigned int *next_pgt_ptr;

- la57 = check_la57_support(physaddr);
+ la57 = pgtable_l5_enabled();

/* Is the address too large? */
if (physaddr >> MAX_PHYSMEM_BITS)
@@ -486,6 +444,14 @@ asmlinkage __visible void __init __noreturn x86_64_start_kernel(char * real_mode
(__START_KERNEL & PGDIR_MASK)));
BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);

+#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
+ if (pgtable_l5_enabled()) {
+ page_offset_base = __PAGE_OFFSET_BASE_L5;
+ vmalloc_base = __VMALLOC_BASE_L5;
+ vmemmap_base = __VMEMMAP_BASE_L5;
+ }
+#endif
+
cr4_init_shadow();

/* Kill off the identity-map trampoline */
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 0302491d799d..85ae1ef840cc 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -2,9 +2,6 @@
#define DISABLE_BRANCH_PROFILING
#define pr_fmt(fmt) "kasan: " fmt

-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
-
#include <linux/memblock.h>
#include <linux/kasan.h>
#include <linux/kdebug.h>
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index d73aeb16417f..67d4530548ce 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -27,15 +27,6 @@
#undef CONFIG_PARAVIRT_XXL
#undef CONFIG_PARAVIRT_SPINLOCKS

-/*
- * This code runs before CPU feature bits are set. By default, the
- * pgtable_l5_enabled() function uses bit X86_FEATURE_LA57 to determine if
- * 5-level paging is active, so that won't work here. USE_EARLY_PGTABLE_L5
- * is provided to handle this situation and, instead, use a variable that
- * has been set by the early boot code.
- */
-#define USE_EARLY_PGTABLE_L5
-
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/mem_encrypt.h>
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:35:25

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 06/17] x86/head64: Replace pointer fixups with PIE codegen

From: Ard Biesheuvel <[email protected]>

Some of the C code in head64.c may be called from a different virtual
address than it was linked at. Currently, we deal with this by using
ordinary, position dependent codegen, and fixing up all symbol
references on the fly. This is fragile and tricky to maintain. It is
also unnecessary: we can use position independent codegen (with hidden
visibility) to ensure that all compiler generated symbol references are
RIP-relative, removing the need for fixups entirely.

It does mean we need explicit references to kernel virtual addresses to
be generated by hand, so generate those using a movabs instruction in
inline asm in the handful places where we actually need this.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/Makefile | 5 ++
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/include/asm/setup.h | 4 +-
arch/x86/kernel/Makefile | 4 +
arch/x86/kernel/head64.c | 88 +++++++-------------
arch/x86/kernel/head_64.S | 5 +-
6 files changed, 45 insertions(+), 63 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1a068de12a56..3c3c07cccd47 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -168,6 +168,11 @@ else
KBUILD_CFLAGS += -mcmodel=kernel
KBUILD_RUSTFLAGS += -Cno-redzone=y
KBUILD_RUSTFLAGS += -Ccode-model=kernel
+
+ PIE_CFLAGS := -fpie -mcmodel=small -fno-stack-protector \
+ -include $(srctree)/include/linux/hidden.h
+
+ export PIE_CFLAGS
endif

#
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f19c038409aa..bccee07eae60 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -84,7 +84,7 @@ LDFLAGS_vmlinux += -T
hostprogs := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include

-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'

quiet_cmd_voffset = VOFFSET $@
cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 5c83729c8e71..b004f1b9a052 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -47,8 +47,8 @@ extern unsigned long saved_video_mode;

extern void reserve_standard_io_resources(void);
extern void i386_reserve_resources(void);
-extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
-extern void startup_64_setup_env(unsigned long physbase);
+extern unsigned long __startup_64(struct boot_params *bp);
+extern void startup_64_setup_env(void);
extern void early_setup_idt(void);
extern void __init do_early_exception(struct pt_regs *regs, int trapnr);

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0000325ab98f..65194ca79b5c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,10 @@ CFLAGS_REMOVE_sev.o = -pg
CFLAGS_REMOVE_rethook.o = -pg
endif

+# head64.c contains C code that may execute from a different virtual address
+# than it was linked at, so we always build it using PIE codegen
+CFLAGS_head64.o += $(PIE_CFLAGS)
+
KASAN_SANITIZE_head$(BITS).o := n
KASAN_SANITIZE_dumpstack.o := n
KASAN_SANITIZE_dumpstack_$(BITS).o := n
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index d636bb02213f..a4a380494703 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -74,15 +74,10 @@ static struct desc_ptr startup_gdt_descr __initdata = {
.address = 0,
};

-static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
-{
- return ptr - (void *)_text + (void *)physaddr;
-}
-
-static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
-{
- return fixup_pointer(ptr, physaddr);
-}
+#define __va_symbol(sym) ({ \
+ unsigned long __v; \
+ asm("movq $" __stringify(sym) ", %0":"=r"(__v)); \
+ __v; })

static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
{
@@ -99,8 +94,8 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* attribute.
*/
if (sme_get_me_mask()) {
- vaddr = (unsigned long)__start_bss_decrypted;
- vaddr_end = (unsigned long)__end_bss_decrypted;
+ vaddr = __va_symbol(__start_bss_decrypted);
+ vaddr_end = __va_symbol(__end_bss_decrypted);

for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
/*
@@ -127,25 +122,17 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
return sme_get_me_mask();
}

-/* Code in __startup_64() can be relocated during execution, but the compiler
- * doesn't have to generate PC-relative relocations when accessing globals from
- * that function. Clang actually does not generate them, which leads to
- * boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer().
- */
-unsigned long __head __startup_64(unsigned long physaddr,
- struct boot_params *bp)
+unsigned long __head __startup_64(struct boot_params *bp)
{
+ unsigned long physaddr = (unsigned long)_text;
unsigned long load_delta, *p;
unsigned long pgtable_flags;
pgdval_t *pgd;
p4dval_t *p4d;
pudval_t *pud;
pmdval_t *pmd, pmd_entry;
- pteval_t *mask_ptr;
bool la57;
int i;
- unsigned int *next_pgt_ptr;

la57 = pgtable_l5_enabled();

@@ -157,7 +144,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Compute the delta between the address I am compiled to run at
* and the address I am actually running at.
*/
- load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+ load_delta = physaddr - (__va_symbol(_text) - __START_KERNEL_map);

/* Is the address not 2M aligned? */
if (load_delta & ~PMD_MASK)
@@ -168,26 +155,24 @@ unsigned long __head __startup_64(unsigned long physaddr,

/* Fixup the physical addresses in the page table */

- pgd = fixup_pointer(early_top_pgt, physaddr);
+ pgd = (pgdval_t *)early_top_pgt;
p = pgd + pgd_index(__START_KERNEL_map);
if (la57)
*p = (unsigned long)level4_kernel_pgt;
else
*p = (unsigned long)level3_kernel_pgt;
- *p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;
+ *p += _PAGE_TABLE_NOENC + sme_get_me_mask();

if (la57) {
- p4d = fixup_pointer(level4_kernel_pgt, physaddr);
+ p4d = (p4dval_t *)level4_kernel_pgt;
p4d[511] += load_delta;
}

- pud = fixup_pointer(level3_kernel_pgt, physaddr);
- pud[510] += load_delta;
- pud[511] += load_delta;
+ level3_kernel_pgt[510].pud += load_delta;
+ level3_kernel_pgt[511].pud += load_delta;

- pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--)
- pmd[i] += load_delta;
+ level2_fixmap_pgt[i].pmd += load_delta;

/*
* Set up the identity mapping for the switchover. These
@@ -196,15 +181,13 @@ unsigned long __head __startup_64(unsigned long physaddr,
* it avoids problems around wraparound.
*/

- next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
- pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
- pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
+ pud = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+ pmd = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];

pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();

if (la57) {
- p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
- physaddr);
+ p4d = (p4dval_t *)early_dynamic_pgts[next_early_pgt++];

i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
@@ -225,8 +208,7 @@ unsigned long __head __startup_64(unsigned long physaddr,

pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
/* Filter out unsupported __PAGE_KERNEL_* bits: */
- mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
- pmd_entry &= *mask_ptr;
+ pmd_entry &= __supported_pte_mask;
pmd_entry += sme_get_me_mask();
pmd_entry += physaddr;

@@ -252,14 +234,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
* error, causing the BIOS to halt the system.
*/

- pmd = fixup_pointer(level2_kernel_pgt, physaddr);
+ pmd = (pmdval_t *)level2_kernel_pgt;

/* invalidate pages before the kernel image */
- for (i = 0; i < pmd_index((unsigned long)_text); i++)
+ for (i = 0; i < pmd_index(__va_symbol(_text)); i++)
pmd[i] &= ~_PAGE_PRESENT;

/* fixup pages that are part of the kernel image */
- for (; i <= pmd_index((unsigned long)_end); i++)
+ for (; i <= pmd_index(__va_symbol(_end)); i++)
if (pmd[i] & _PAGE_PRESENT)
pmd[i] += load_delta;

@@ -271,7 +253,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Fixup phys_base - remove the memory encryption mask to obtain
* the true physical address.
*/
- *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
+ phys_base += load_delta - sme_get_me_mask();

return sme_postprocess_startup(bp, pmd);
}
@@ -553,22 +535,16 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
}

/* This runs while still in the direct mapping */
-static void __head startup_64_load_idt(unsigned long physbase)
+static void __head startup_64_load_idt(void)
{
- struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
- gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
-
-
- if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
- void *handler;
+ gate_desc *idt = bringup_idt_table;

+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
/* VMM Communication Exception */
- handler = fixup_pointer(vc_no_ghcb, physbase);
- set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
- }
+ set_bringup_idt_handler(idt, X86_TRAP_VC, vc_no_ghcb);

- desc->address = (unsigned long)idt;
- native_load_idt(desc);
+ bringup_idt_descr.address = (unsigned long)idt;
+ native_load_idt(&bringup_idt_descr);
}

/* This is used when running on kernel addresses */
@@ -587,10 +563,10 @@ void early_setup_idt(void)
/*
* Setup boot CPU state needed before kernel switches to virtual addresses.
*/
-void __head startup_64_setup_env(unsigned long physbase)
+void __head startup_64_setup_env(void)
{
/* Load GDT */
- startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
+ startup_gdt_descr.address = (unsigned long)startup_gdt;
native_load_gdt(&startup_gdt_descr);

/* New GDT is live - reload data segment registers */
@@ -598,5 +574,5 @@ void __head startup_64_setup_env(unsigned long physbase)
"movl %%eax, %%ss\n"
"movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");

- startup_64_load_idt(physbase);
+ startup_64_load_idt();
}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 399241dcdbb5..b8704ac1a4da 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -67,8 +67,6 @@ SYM_CODE_START_NOALIGN(startup_64)
/* Set up the stack for verify_cpu() */
leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp

- leaq _text(%rip), %rdi
-
/* Setup GSBASE to allow stack canary access for C code */
movl $MSR_GS_BASE, %ecx
leaq INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx
@@ -107,8 +105,7 @@ SYM_CODE_START_NOALIGN(startup_64)
* is active) to be added to the initial pgdir entry that will be
* programmed into CR3.
*/
- leaq _text(%rip), %rdi
- movq %r15, %rsi
+ movq %r15, %rdi
call __startup_64

/* Form the CR3 value being sure to include the CR3 modifier */
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:35:29

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 07/17] x86/head64: Simplify GDT/IDT initialization code

From: Ard Biesheuvel <[email protected]>

There used to be two separate code paths for programming the IDT early:
one that was called via the 1:1 mapping, and one via the kernel virtual
mapping, where the former used explicit pointer fixups to obtain 1:1
mapped addresses.

That distinction is now gone so the GDT/IDT init code can be unified and
simplified accordingly.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head64.c | 59 +++++++-------------
1 file changed, 19 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index a4a380494703..993d888a3172 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -59,21 +59,12 @@ EXPORT_SYMBOL(vmemmap_base);
/*
* GDT used on the boot CPU before switching to virtual addresses.
*/
-static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
+static struct desc_struct startup_gdt[GDT_ENTRIES] __initconst = {
[GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(DESC_CODE32, 0, 0xfffff),
[GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(DESC_CODE64, 0, 0xfffff),
[GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(DESC_DATA64, 0, 0xfffff),
};

-/*
- * Address needs to be set at runtime because it references the startup_gdt
- * while the kernel still uses a direct mapping.
- */
-static struct desc_ptr startup_gdt_descr __initdata = {
- .size = sizeof(startup_gdt)-1,
- .address = 0,
-};
-
#define __va_symbol(sym) ({ \
unsigned long __v; \
asm("movq $" __stringify(sym) ", %0":"=r"(__v)); \
@@ -363,7 +354,7 @@ void __init do_early_exception(struct pt_regs *regs, int trapnr)
early_fixup_exception(regs, trapnr);
}

-/* Don't add a printk in there. printk relies on the PDA which is not initialized
+/* Don't add a printk in there. printk relies on the PDA which is not initialized
yet. */
void __init clear_bss(void)
{
@@ -517,47 +508,32 @@ void __init __noreturn x86_64_start_reservations(char *real_mode_data)
*/
static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_data;

-static struct desc_ptr bringup_idt_descr = {
- .size = (NUM_EXCEPTION_VECTORS * sizeof(gate_desc)) - 1,
- .address = 0, /* Set at runtime */
-};
-
-static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
-{
-#ifdef CONFIG_AMD_MEM_ENCRYPT
- struct idt_data data;
- gate_desc desc;
-
- init_idt_data(&data, n, handler);
- idt_init_desc(&desc, &data);
- native_write_idt_entry(idt, n, &desc);
-#endif
-}
-
-/* This runs while still in the direct mapping */
-static void __head startup_64_load_idt(void)
+static void early_load_idt(void (*handler)(void))
{
gate_desc *idt = bringup_idt_table;
+ struct desc_ptr bringup_idt_descr;
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ struct idt_data data;
+ gate_desc desc;

- if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
/* VMM Communication Exception */
- set_bringup_idt_handler(idt, X86_TRAP_VC, vc_no_ghcb);
+ init_idt_data(&data, X86_TRAP_VC, handler);
+ idt_init_desc(&desc, &data);
+ native_write_idt_entry(idt, X86_TRAP_VC, &desc);
+ }

bringup_idt_descr.address = (unsigned long)idt;
+ bringup_idt_descr.size = sizeof(bringup_idt_table);
native_load_idt(&bringup_idt_descr);
}

-/* This is used when running on kernel addresses */
void early_setup_idt(void)
{
- /* VMM Communication Exception */
- if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
setup_ghcb();
- set_bringup_idt_handler(bringup_idt_table, X86_TRAP_VC, vc_boot_ghcb);
- }

- bringup_idt_descr.address = (unsigned long)bringup_idt_table;
- native_load_idt(&bringup_idt_descr);
+ early_load_idt(vc_boot_ghcb);
}

/*
@@ -565,8 +541,11 @@ void early_setup_idt(void)
*/
void __head startup_64_setup_env(void)
{
+ struct desc_ptr startup_gdt_descr;
+
/* Load GDT */
startup_gdt_descr.address = (unsigned long)startup_gdt;
+ startup_gdt_descr.size = sizeof(startup_gdt) - 1;
native_load_gdt(&startup_gdt_descr);

/* New GDT is live - reload data segment registers */
@@ -574,5 +553,5 @@ void __head startup_64_setup_env(void)
"movl %%eax, %%ss\n"
"movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");

- startup_64_load_idt();
+ early_load_idt(vc_no_ghcb);
}
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:35:39

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 10/17] x86/head64: Move early startup code into __pitext

From: Ard Biesheuvel <[email protected]>

The boot CPU runs some early startup C code using a 1:1 mapping of
memory, which deviates from the normal kernel virtual mapping that is
used for calculating statically initialized pointer variables.

This makes it necessary to strictly limit which C code will actually be
called from that early boot path. Implement this by moving the early
startup code into __pitext.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head64.c | 9 ++++----
arch/x86/kernel/head_64.S | 24 ++++++++++++--------
2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 993d888a3172..079e1adc6121 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -70,7 +70,8 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] __initconst = {
asm("movq $" __stringify(sym) ", %0":"=r"(__v)); \
__v; })

-static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
+static unsigned long __pitext sme_postprocess_startup(struct boot_params *bp,
+ pmdval_t *pmd)
{
unsigned long vaddr, vaddr_end;
int i;
@@ -113,7 +114,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
return sme_get_me_mask();
}

-unsigned long __head __startup_64(struct boot_params *bp)
+unsigned long __pitext __startup_64(struct boot_params *bp)
{
unsigned long physaddr = (unsigned long)_text;
unsigned long load_delta, *p;
@@ -508,7 +509,7 @@ void __init __noreturn x86_64_start_reservations(char *real_mode_data)
*/
static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_data;

-static void early_load_idt(void (*handler)(void))
+static void __pitext early_load_idt(void (*handler)(void))
{
gate_desc *idt = bringup_idt_table;
struct desc_ptr bringup_idt_descr;
@@ -539,7 +540,7 @@ void early_setup_idt(void)
/*
* Setup boot CPU state needed before kernel switches to virtual addresses.
*/
-void __head startup_64_setup_env(void)
+void __pitext startup_64_setup_env(void)
{
struct desc_ptr startup_gdt_descr;

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index b8704ac1a4da..5defefcc7f50 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -42,6 +42,15 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
__HEAD
.code64
SYM_CODE_START_NOALIGN(startup_64)
+ UNWIND_HINT_END_OF_STACK
+ jmp primary_startup_64
+SYM_CODE_END(startup_64)
+
+ __PITEXT
+#include "verify_cpu.S"
+#include "sev_verify_cbit.S"
+
+SYM_CODE_START_LOCAL(primary_startup_64)
UNWIND_HINT_END_OF_STACK
/*
* At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
@@ -131,10 +140,12 @@ SYM_CODE_START_NOALIGN(startup_64)
movq %rax, %cr3

/* Branch to the common startup code at its kernel virtual address */
- movq $common_startup_64, %rax
ANNOTATE_RETPOLINE_SAFE
- jmp *%rax
-SYM_CODE_END(startup_64)
+ jmp *.Lcommon_startup_64(%rip)
+SYM_CODE_END(primary_startup_64)
+
+ __INITRODATA
+SYM_DATA_LOCAL(.Lcommon_startup_64, .quad common_startup_64)

.text
SYM_CODE_START(secondary_startup_64)
@@ -410,9 +421,6 @@ SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
int3
SYM_CODE_END(secondary_startup_64)

-#include "verify_cpu.S"
-#include "sev_verify_cbit.S"
-
#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_AMD_MEM_ENCRYPT)
/*
* Entry point for soft restart of a CPU. Invoked from xxx_play_dead() for
@@ -539,10 +547,8 @@ SYM_CODE_END(early_idt_handler_common)
* paravirtualized INTERRUPT_RETURN and pv-ops don't work that early.
*
* XXX it does, fix this.
- *
- * This handler will end up in the .init.text section and not be
- * available to boot secondary CPUs.
*/
+ __PITEXT
SYM_CODE_START_NOALIGN(vc_no_ghcb)
UNWIND_HINT_IRET_REGS offset=8
ENDBR
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:35:58

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 09/17] x86: Move return_thunk to __pitext section

From: Ard Biesheuvel <[email protected]>

The x86 return thunk will function correctly even when it is called via
a different virtual mapping than the one it was linked at, so it can
safely be moved to .pi.text. This allows other code in that section to
call it.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/vmlinux.lds.S | 2 +-
arch/x86/lib/retpoline.S | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index a349dbfc6d5a..77262e804250 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -134,7 +134,7 @@ SECTIONS
SOFTIRQENTRY_TEXT
#ifdef CONFIG_RETPOLINE
*(.text..__x86.indirect_thunk)
- *(.text..__x86.return_thunk)
+ *(.pi.text..__x86.return_thunk)
#endif
STATIC_CALL_TEXT

diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 7b2589877d06..003b35445bbb 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -136,7 +136,7 @@ SYM_CODE_END(__x86_indirect_jump_thunk_array)
* relocations for same-section JMPs and that breaks the returns
* detection logic in apply_returns() and in objtool.
*/
- .section .text..__x86.return_thunk
+ .section .pi.text..__x86.return_thunk, "ax"

#ifdef CONFIG_CPU_SRSO

--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:36:15

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 13/17] x86/sev: Make all code reachable from 1:1 mapping __pitext

From: Ard Biesheuvel <[email protected]>

We cannot safely call any code when still executing from the 1:1 mapping
at early boot. The SEV init code in particular does a fair amount of
work this early, and calls into ordinary APIs, which is not safe.

So annotate all SEV code used early as __pitext and along with it, some
of the shared code that it relies on.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/sev.c | 3 ++
arch/x86/include/asm/mem_encrypt.h | 8 +--
arch/x86/include/asm/pgtable.h | 6 +--
arch/x86/include/asm/sev.h | 6 +--
arch/x86/kernel/sev-shared.c | 26 +++++-----
arch/x86/kernel/sev.c | 14 +++---
arch/x86/lib/cmdline.c | 6 +--
arch/x86/lib/memcpy_64.S | 3 +-
arch/x86/lib/memset_64.S | 3 +-
arch/x86/mm/mem_encrypt_boot.S | 3 +-
arch/x86/mm/mem_encrypt_identity.c | 52 ++++++++++++++------
arch/x86/mm/pti.c | 2 +-
12 files changed, 81 insertions(+), 51 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 454acd7a2daf..22b9de2724f7 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -116,6 +116,9 @@ static bool fault_in_kernel_space(unsigned long address)
#undef __init
#define __init

+#undef __pitext
+#define __pitext
+
#define __BOOT_COMPRESSED

/* Basic instruction decoding support needed */
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 359ada486fa9..48469e22a75e 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -46,8 +46,8 @@ void __init sme_unmap_bootdata(char *real_mode_data);

void __init sme_early_init(void);

-void __init sme_encrypt_kernel(struct boot_params *bp);
-void __init sme_enable(struct boot_params *bp);
+void sme_encrypt_kernel(struct boot_params *bp);
+void sme_enable(struct boot_params *bp);

int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
@@ -75,8 +75,8 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }

static inline void __init sme_early_init(void) { }

-static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
-static inline void __init sme_enable(struct boot_params *bp) { }
+static inline void sme_encrypt_kernel(struct boot_params *bp) { }
+static inline void sme_enable(struct boot_params *bp) { }

static inline void sev_es_init_vc_handling(void) { }

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9d077bca6a10..8f45255a8e32 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1412,7 +1412,7 @@ extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma,
* Returns true for parts of the PGD that map userspace and
* false for the parts that map the kernel.
*/
-static inline bool pgdp_maps_userspace(void *__ptr)
+static __always_inline bool pgdp_maps_userspace(void *__ptr)
{
unsigned long ptr = (unsigned long)__ptr;

@@ -1435,7 +1435,7 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
* This generates better code than the inline assembly in
* __set_bit().
*/
-static inline void *ptr_set_bit(void *ptr, int bit)
+static __always_inline void *ptr_set_bit(void *ptr, int bit)
{
unsigned long __ptr = (unsigned long)ptr;

@@ -1450,7 +1450,7 @@ static inline void *ptr_clear_bit(void *ptr, int bit)
return (void *)__ptr;
}

-static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+static __always_inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
{
return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
}
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 5b4a1ce3d368..e3b55bd15ce1 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -201,14 +201,14 @@ struct snp_guest_request_ioctl;
void setup_ghcb(void);
void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
unsigned long npages);
-void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
- unsigned long npages);
+void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned long npages);
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
void snp_set_memory_shared(unsigned long vaddr, unsigned long npages);
void snp_set_memory_private(unsigned long vaddr, unsigned long npages);
void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
-void __init __noreturn snp_abort(void);
+void __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
void snp_accept_memory(phys_addr_t start, phys_addr_t end);
u64 snp_get_unsupported_features(u64 status);
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 1d24ec679915..b432cac19d13 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -89,7 +89,8 @@ static bool __init sev_es_check_cpu_features(void)
return true;
}

-static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
+static void __always_inline __noreturn sev_es_terminate(unsigned int set,
+ unsigned int reason)
{
u64 val = GHCB_MSR_TERM_REQ;

@@ -222,10 +223,9 @@ static enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt
return ES_VMM_ERROR;
}

-static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
- struct es_em_ctxt *ctxt,
- u64 exit_code, u64 exit_info_1,
- u64 exit_info_2)
+static enum es_result __pitext
+sev_es_ghcb_hv_call(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
+ u64 exit_code, u64 exit_info_1, u64 exit_info_2)
{
/* Fill in protocol and format specifiers */
ghcb->protocol_version = ghcb_version;
@@ -241,7 +241,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
return verify_exception_info(ghcb, ctxt);
}

-static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
+static int __pitext __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
{
u64 val;

@@ -256,7 +256,7 @@ static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
return 0;
}

-static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
+static int __pitext __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
{
int ret;

@@ -391,7 +391,7 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
return xsave_size;
}

-static bool
+static bool __pitext
snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -427,7 +427,8 @@ snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
return false;
}

-static void snp_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+static void __pitext snp_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
+ struct cpuid_leaf *leaf)
{
if (sev_cpuid_hv(ghcb, ctxt, leaf))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
@@ -528,7 +529,8 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
* Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
* should be treated as fatal by caller.
*/
-static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+static int __pitext snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
+ struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();

@@ -570,7 +572,7 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
* page yet, so it only supports the MSR based communication with the
* hypervisor and only the CPUID exit-code.
*/
-void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
+void __pitext do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
{
unsigned int subfn = lower_bits(regs->cx, 32);
unsigned int fn = lower_bits(regs->ax, 32);
@@ -1043,7 +1045,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
* mapping needs to be updated in sync with all the changes to virtual memory
* layout and related mapping facilities throughout the boot process.
*/
-static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
+static void __pitext setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
{
const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
int i;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index c67285824e82..e5793505b307 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -682,8 +682,8 @@ static u64 __init get_jump_table_addr(void)
return ret;
}

-static void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
- unsigned long npages, enum psc_op op)
+static void __pitext early_set_pages_state(unsigned long vaddr, unsigned long paddr,
+ unsigned long npages, enum psc_op op)
{
unsigned long paddr_end;
u64 val;
@@ -758,8 +758,8 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE);
}

-void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
- unsigned long npages)
+void __pitext early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned long npages)
{
/*
* This can be invoked in early boot while running identity mapped, so
@@ -2059,7 +2059,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
*
* Scan for the blob in that order.
*/
-static __init struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
+static __pitext struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;

@@ -2085,7 +2085,7 @@ static __init struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
return cc_info;
}

-bool __init snp_init(struct boot_params *bp)
+bool __pitext snp_init(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;

@@ -2107,7 +2107,7 @@ bool __init snp_init(struct boot_params *bp)
return true;
}

-void __init __noreturn snp_abort(void)
+void __pitext __noreturn snp_abort(void)
{
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
}
diff --git a/arch/x86/lib/cmdline.c b/arch/x86/lib/cmdline.c
index 80570eb3c89b..9f040b2882ae 100644
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -119,7 +119,7 @@ __cmdline_find_option_bool(const char *cmdline, int max_cmdline_size,
* Returns the length of the argument (regardless of if it was
* truncated to fit in the buffer), or -1 on not found.
*/
-static int
+static int __pitext
__cmdline_find_option(const char *cmdline, int max_cmdline_size,
const char *option, char *buffer, int bufsize)
{
@@ -203,12 +203,12 @@ __cmdline_find_option(const char *cmdline, int max_cmdline_size,
return len;
}

-int cmdline_find_option_bool(const char *cmdline, const char *option)
+int __pitext cmdline_find_option_bool(const char *cmdline, const char *option)
{
return __cmdline_find_option_bool(cmdline, COMMAND_LINE_SIZE, option);
}

-int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+int __pitext cmdline_find_option(const char *cmdline, const char *option, char *buffer,
int bufsize)
{
return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 0ae2e1712e2e..48b0908d2c3e 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -4,11 +4,12 @@
#include <linux/export.h>
#include <linux/linkage.h>
#include <linux/cfi_types.h>
+#include <linux/init.h>
#include <asm/errno.h>
#include <asm/cpufeatures.h>
#include <asm/alternative.h>

-.section .noinstr.text, "ax"
+ __PITEXT

/*
* memcpy - Copy a memory block.
diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 0199d56cb479..455424dcadc0 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -2,11 +2,12 @@
/* Copyright 2002 Andi Kleen, SuSE Labs */

#include <linux/export.h>
+#include <linux/init.h>
#include <linux/linkage.h>
#include <asm/cpufeatures.h>
#include <asm/alternative.h>

-.section .noinstr.text, "ax"
+ __PITEXT

/*
* ISO C memset - set a memory block to a byte value. This function uses fast
diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
index e25288ee33c2..f951f4f86e5c 100644
--- a/arch/x86/mm/mem_encrypt_boot.S
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -7,6 +7,7 @@
* Author: Tom Lendacky <[email protected]>
*/

+#include <linux/init.h>
#include <linux/linkage.h>
#include <linux/pgtable.h>
#include <asm/page.h>
@@ -14,7 +15,7 @@
#include <asm/msr-index.h>
#include <asm/nospec-branch.h>

- .text
+ __PITEXT
.code64
SYM_FUNC_START(sme_encrypt_execute)

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 67d4530548ce..20b23da4a26d 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -90,7 +90,7 @@ static char sme_cmdline_arg[] __initdata = "mem_encrypt";
static char sme_cmdline_on[] __initdata = "on";
static char sme_cmdline_off[] __initdata = "off";

-static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
+static void __pitext sme_clear_pgd(struct sme_populate_pgd_data *ppd)
{
unsigned long pgd_start, pgd_end, pgd_size;
pgd_t *pgd_p;
@@ -105,7 +105,7 @@ static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
memset(pgd_p, 0, pgd_size);
}

-static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
+static pud_t __pitext *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
{
pgd_t *pgd;
p4d_t *p4d;
@@ -142,7 +142,7 @@ static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
return pud;
}

-static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
+static void __pitext sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
{
pud_t *pud;
pmd_t *pmd;
@@ -158,7 +158,7 @@ static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags));
}

-static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
+static void __pitext sme_populate_pgd(struct sme_populate_pgd_data *ppd)
{
pud_t *pud;
pmd_t *pmd;
@@ -184,7 +184,7 @@ static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
}

-static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
+static void __pitext __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd_large(ppd);
@@ -194,7 +194,7 @@ static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
}
}

-static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
+static void __pitext __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd(ppd);
@@ -204,7 +204,7 @@ static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
}
}

-static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
+static void __pitext __sme_map_range(struct sme_populate_pgd_data *ppd,
pmdval_t pmd_flags, pteval_t pte_flags)
{
unsigned long vaddr_end;
@@ -228,22 +228,22 @@ static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
__sme_map_range_pte(ppd);
}

-static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
+static void __pitext sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC);
}

-static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
+static void __pitext sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC);
}

-static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
+static void __pitext sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP);
}

-static unsigned long __init sme_pgtable_calc(unsigned long len)
+static unsigned long __pitext sme_pgtable_calc(unsigned long len)
{
unsigned long entries = 0, tables = 0;

@@ -280,7 +280,7 @@ static unsigned long __init sme_pgtable_calc(unsigned long len)
return entries + tables;
}

-void __init sme_encrypt_kernel(struct boot_params *bp)
+void __pitext sme_encrypt_kernel(struct boot_params *bp)
{
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
@@ -493,7 +493,29 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
native_write_cr3(__native_read_cr3());
}

-void __init sme_enable(struct boot_params *bp)
+/**
+ * strncmp - Compare two length-limited strings
+ * @cs: One string
+ * @ct: Another string
+ * @count: The maximum number of bytes to compare
+ */
+static int __pitext __strncmp(const char *cs, const char *ct, size_t count)
+{
+ unsigned char c1, c2;
+
+ while (count) {
+ c1 = *cs++;
+ c2 = *ct++;
+ if (c1 != c2)
+ return c1 < c2 ? -1 : 1;
+ if (!c1)
+ break;
+ count--;
+ }
+ return 0;
+}
+
+void __pitext sme_enable(struct boot_params *bp)
{
const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
unsigned int eax, ebx, ecx, edx;
@@ -594,9 +616,9 @@ void __init sme_enable(struct boot_params *bp)
if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
return;

- if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
+ if (!__strncmp(buffer, cmdline_on, sizeof(buffer)))
sme_me_mask = me_mask;
- else if (!strncmp(buffer, cmdline_off, sizeof(buffer)))
+ else if (!__strncmp(buffer, cmdline_off, sizeof(buffer)))
sme_me_mask = 0;
else
sme_me_mask = active_by_default ? me_mask : 0;
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 669ba1c345b3..8fd1b84ab40c 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -121,7 +121,7 @@ static int __init pti_parse_cmdline_nopti(char *arg)
}
early_param("nopti", pti_parse_cmdline_nopti);

-pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
+pgd_t __pitext __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
{
/*
* Changes to the high (kernel) portion of the kernelmode page
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:36:22

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 11/17] modpost: Warn about calls from __pitext into other text sections

From: Ard Biesheuvel <[email protected]>

Ensure that code that is marked as being able to safely run from a 1:1
mapping does not call into other code which might lack that property.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
scripts/mod/modpost.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 962d00df47ab..33b56d6b4e7b 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -825,6 +825,7 @@ enum mismatch {
ANY_INIT_TO_ANY_EXIT,
ANY_EXIT_TO_ANY_INIT,
EXTABLE_TO_NON_TEXT,
+ PI_TEXT_TO_NON_PI_TEXT,
};

/**
@@ -887,6 +888,11 @@ static const struct sectioncheck sectioncheck[] = {
.bad_tosec = { ".altinstr_replacement", NULL },
.good_tosec = {ALL_TEXT_SECTIONS , NULL},
.mismatch = EXTABLE_TO_NON_TEXT,
+},
+{
+ .fromsec = { ALL_PI_TEXT_SECTIONS, NULL },
+ .bad_tosec = { ALL_NON_PI_TEXT_SECTIONS, NULL },
+ .mismatch = PI_TEXT_TO_NON_PI_TEXT,
}
};

--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:36:27

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 15/17] x86/sev: Use PIC codegen for early SEV startup code

From: Ard Biesheuvel <[email protected]>

Use PIC codegen for the compilation units containing code that may be
called very early during the boot, at which point the CPU still runs
from the 1:1 mapping of memory. This is necessary to prevent the
compiler from emitting absolute symbol references to addresses that are
not mapped yet.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/vmlinux.lds.S | 1 +
arch/x86/lib/Makefile | 2 +-
arch/x86/mm/Makefile | 3 ++-
4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 65194ca79b5c..65677b25d803 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -24,6 +24,7 @@ endif
# head64.c contains C code that may execute from a different virtual address
# than it was linked at, so we always build it using PIE codegen
CFLAGS_head64.o += $(PIE_CFLAGS)
+CFLAGS_sev.o += $(PIE_CFLAGS)

KASAN_SANITIZE_head$(BITS).o := n
KASAN_SANITIZE_dumpstack.o := n
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 77262e804250..bbdccb6362a9 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -182,6 +182,7 @@ SECTIONS

DATA_DATA
CONSTRUCTORS
+ *(.data.rel .data.rel.*)

/* rarely changed data like cpu maps */
READ_MOSTLY_DATA(INTERNODE_CACHE_BYTES)
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index ea3a28e7b613..87c79bb8d386 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -24,7 +24,7 @@ ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_cmdline.o = -pg
endif

-CFLAGS_cmdline.o := -fno-stack-protector -fno-jump-tables
+CFLAGS_cmdline.o := $(PIE_CFLAGS)
endif

inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index c80febc44cd2..b412009ae588 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -31,7 +31,8 @@ obj-y += pat/

# Make sure __phys_addr has no stackprotector
CFLAGS_physaddr.o := -fno-stack-protector
-CFLAGS_mem_encrypt_identity.o := -fno-stack-protector
+CFLAGS_mem_encrypt_identity.o := $(PIE_CFLAGS)
+CFLAGS_pti.o := $(PIE_CFLAGS)

CFLAGS_fault.o := -I $(srctree)/$(src)/../include/asm/trace

--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:37:07

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 16/17] x86/sev: Drop inline asm LEA instructions for RIP-relative references

From: Ard Biesheuvel <[email protected]>

The SEV code that may run early is now built with -fPIC and so there is
no longer a need for explicit RIP-relative references in inline asm,
given that is what the compiler will emit as well.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/mm/mem_encrypt_identity.c | 37 +++-----------------
1 file changed, 5 insertions(+), 32 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 20b23da4a26d..2d857e3a560a 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -86,10 +86,6 @@ struct sme_populate_pgd_data {
*/
static char sme_workarea[2 * PMD_SIZE] __section(".init.scratch");

-static char sme_cmdline_arg[] __initdata = "mem_encrypt";
-static char sme_cmdline_on[] __initdata = "on";
-static char sme_cmdline_off[] __initdata = "off";
-
static void __pitext sme_clear_pgd(struct sme_populate_pgd_data *ppd)
{
unsigned long pgd_start, pgd_end, pgd_size;
@@ -333,14 +329,6 @@ void __pitext sme_encrypt_kernel(struct boot_params *bp)
}
#endif

- /*
- * We're running identity mapped, so we must obtain the address to the
- * SME encryption workarea using rip-relative addressing.
- */
- asm ("lea sme_workarea(%%rip), %0"
- : "=r" (workarea_start)
- : "p" (sme_workarea));
-
/*
* Calculate required number of workarea bytes needed:
* executable encryption area size:
@@ -350,7 +338,7 @@ void __pitext sme_encrypt_kernel(struct boot_params *bp)
* pagetable structures for the encryption of the kernel
* pagetable structures for workarea (in case not currently mapped)
*/
- execute_start = workarea_start;
+ execute_start = workarea_start = (unsigned long)sme_workarea;
execute_end = execute_start + (PAGE_SIZE * 2) + PMD_SIZE;
execute_len = execute_end - execute_start;

@@ -517,9 +505,9 @@ static int __pitext __strncmp(const char *cs, const char *ct, size_t count)

void __pitext sme_enable(struct boot_params *bp)
{
- const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
unsigned int eax, ebx, ecx, edx;
unsigned long feature_mask;
+ const char *cmdline_ptr;
bool active_by_default;
unsigned long me_mask;
char buffer[16];
@@ -590,21 +578,6 @@ void __pitext sme_enable(struct boot_params *bp)
goto out;
}

- /*
- * Fixups have not been applied to phys_base yet and we're running
- * identity mapped, so we must obtain the address to the SME command
- * line argument data using rip-relative addressing.
- */
- asm ("lea sme_cmdline_arg(%%rip), %0"
- : "=r" (cmdline_arg)
- : "p" (sme_cmdline_arg));
- asm ("lea sme_cmdline_on(%%rip), %0"
- : "=r" (cmdline_on)
- : "p" (sme_cmdline_on));
- asm ("lea sme_cmdline_off(%%rip), %0"
- : "=r" (cmdline_off)
- : "p" (sme_cmdline_off));
-
if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
active_by_default = true;
else
@@ -613,12 +586,12 @@ void __pitext sme_enable(struct boot_params *bp)
cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
((u64)bp->ext_cmd_line_ptr << 32));

- if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
+ if (cmdline_find_option(cmdline_ptr, "mem_encrypt", buffer, sizeof(buffer)) < 0)
return;

- if (!__strncmp(buffer, cmdline_on, sizeof(buffer)))
+ if (!__strncmp(buffer, "on", sizeof(buffer)))
sme_me_mask = me_mask;
- else if (!__strncmp(buffer, cmdline_off, sizeof(buffer)))
+ else if (!__strncmp(buffer, "off", sizeof(buffer)))
sme_me_mask = 0;
else
sme_me_mask = active_by_default ? me_mask : 0;
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:37:12

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 17/17] x86/startup_64: Don't bother setting up GS before the kernel is mapped

From: Ard Biesheuvel <[email protected]>

The code that executes from the early 1:1 mapping of the kernel should
set up the kernel page tables and nothing else. C code that is linked
into this code path is severely restricted in what it can do, and is
therefore required to remain uninstrumented. It also built with -fPIC
and without stack protector support.

This makes it unnecessary to enable per-CPU variable access this early,
and for the boot CPU, the initialization that occurs in the common CPU
startup path is sufficient.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 7 -------
1 file changed, 7 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 5defefcc7f50..2cce53b2cd70 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -76,13 +76,6 @@ SYM_CODE_START_LOCAL(primary_startup_64)
/* Set up the stack for verify_cpu() */
leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp

- /* Setup GSBASE to allow stack canary access for C code */
- movl $MSR_GS_BASE, %ecx
- leaq INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx
- movl %edx, %eax
- shrq $32, %rdx
- wrmsr
-
call startup_64_setup_env

/* Now switch to __KERNEL_CS so IRET works reliably */
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:57:34

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 12/17] x86/coco: Make cc_set_mask() static inline

From: Ard Biesheuvel <[email protected]>

Setting the cc_mask global variable may be done early in the boot while
running fromm a 1:1 translation. This code is built with -fPIC in order
to support this.

Make cc_set_mask() static inline so it can execute safely in this
context as well.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/coco/core.c | 7 +------
arch/x86/include/asm/coco.h | 8 +++++++-
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index eeec9986570e..d07be9d05cd0 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -14,7 +14,7 @@
#include <asm/processor.h>

enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
-static u64 cc_mask __ro_after_init;
+u64 cc_mask __ro_after_init;

static bool noinstr intel_cc_platform_has(enum cc_attr attr)
{
@@ -148,8 +148,3 @@ u64 cc_mkdec(u64 val)
}
}
EXPORT_SYMBOL_GPL(cc_mkdec);
-
-__init void cc_set_mask(u64 mask)
-{
- cc_mask = mask;
-}
diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h
index 6ae2d16a7613..ecc29d6136ad 100644
--- a/arch/x86/include/asm/coco.h
+++ b/arch/x86/include/asm/coco.h
@@ -13,7 +13,13 @@ enum cc_vendor {
extern enum cc_vendor cc_vendor;

#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
-void cc_set_mask(u64 mask);
+static inline void cc_set_mask(u64 mask)
+{
+ extern u64 cc_mask;
+
+ cc_mask = mask;
+}
+
u64 cc_mkenc(u64 val);
u64 cc_mkdec(u64 val);
#else
--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 11:57:58

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 14/17] x86/sev: Avoid WARN() in early code

From: Ard Biesheuvel <[email protected]>

Drop uses of WARN() from code that is reachable from the early primary
boot path which executes via the initial 1:1 mapping before the kernel
page tables are populated. This is unsafe and mostly pointless, given
that printk() does not actually work yet at this point.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/sev.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e5793505b307..8eb6454eadd6 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -698,7 +698,7 @@ static void __pitext early_set_pages_state(unsigned long vaddr, unsigned long pa
if (op == SNP_PAGE_STATE_SHARED) {
/* Page validation must be rescinded before changing to shared */
ret = pvalidate(vaddr, RMP_PG_SIZE_4K, false);
- if (WARN(ret, "Failed to validate address 0x%lx ret %d", paddr, ret))
+ if (ret)
goto e_term;
}

@@ -711,21 +711,16 @@ static void __pitext early_set_pages_state(unsigned long vaddr, unsigned long pa

val = sev_es_rd_ghcb_msr();

- if (WARN(GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP,
- "Wrong PSC response code: 0x%x\n",
- (unsigned int)GHCB_RESP_CODE(val)))
+ if (GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP)
goto e_term;

- if (WARN(GHCB_MSR_PSC_RESP_VAL(val),
- "Failed to change page state to '%s' paddr 0x%lx error 0x%llx\n",
- op == SNP_PAGE_STATE_PRIVATE ? "private" : "shared",
- paddr, GHCB_MSR_PSC_RESP_VAL(val)))
+ if (GHCB_MSR_PSC_RESP_VAL(val))
goto e_term;

if (op == SNP_PAGE_STATE_PRIVATE) {
/* Page validation must be performed after changing to private */
ret = pvalidate(vaddr, RMP_PG_SIZE_4K, true);
- if (WARN(ret, "Failed to validate address 0x%lx ret %d", paddr, ret))
+ if (ret)
goto e_term;
}

--
2.43.0.429.g432eaa2c6b-goog


2024-01-25 21:07:14

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] x86/sev: Drop inline asm LEA instructions for RIP-relative references

On Thu, Jan 25, 2024 at 3:33 AM Ard Biesheuvel <[email protected]> wrote:
>
> The SEV code that may run early is now built with -fPIC and so there is
> no longer a need for explicit RIP-relative references in inline asm,
> given that is what the compiler will emit as well.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/mm/mem_encrypt_identity.c | 37 +++-----------------
> 1 file changed, 5 insertions(+), 32 deletions(-)

snp_cpuid_get_table() in arch/x86/kernel/sev-shared.c (a helper
function to provide the same inline assembly pattern for RIP-relative
references) would also no longer be needed, as all calls to it would
now be made in position-independent code. We can therefore eliminate
the function as part of this commit.

2024-01-25 22:23:53

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] x86: Confine early 1:1 mapped startup code

Hi Ard,

On Thu, Jan 25, 2024 at 12:28:19PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <[email protected]>
>
> This is a follow-up to my RFC [0] that proposed to build the entire core
> kernel with -fPIC, to reduce the likelihood that code that runs
> extremely early from the 1:1 mapping of memory will misbehave.
>
> This is needed to address reports that SEV boot on Clang built kernels
> is broken, due to the fact that this early code attempts to access
> virtual kernel address that are not mapped yet. Kevin has suggested some
> workarounds to this [1] but this is really something that requires a
> more rigorous approach, rather than addressing a couple of symptoms of
> the underlying defect.
>
> As it turns out, the use of fPIE for the entire kernel is neither
> necessary nor sufficient, and has its own set of problems, including the
> fact that the PIE small C code model uses FS rather than GS for the
> per-CPU register, and only recent GCC and Clang versions permit this to
> be overridden on the command line.
>
> But the real problem is that even position independent code is not
> guaranteed to execute correctly at any offset unless all statically
> initialized pointer variables use the same translation as the code.
>
> So instead, this v2 proposes another solution, taking the following
> approach:
> - clean up and refactor the startup code so that the primary startup
> code executes from the 1:1 mapping but nothing else;
> - define a new text section type .pi.text and enforce that it can only
> call into other .pi.text sections;
> - (tbd) require that objects containing .pi.text sections are built with
> -fPIC, and disallow any absolute references from such objects.
>
> The latter point is not implemented yet in this v2, but this could be
> done rather straight-forwardly. (The EFI stub already does something
> similar across all architectures)
>
> Patch #13 in particular gives an overview of all the code that gets
> pulled into the early 1:1 startup code path due to the fact that memory
> encryption needs to be configured before we can even map the kernel.
>
>
> [0] https://lkml.kernel.org/r/20240122090851.851120-7-ardb%2Bgit%40google.com
> [1] https://lore.kernel.org/all/[email protected]/T/#u

I tested both this series as well as the pending updates on
x86-pie-for-sev-v3 at commit 0574677aacf7 ("x86/efi: Remap kernel code
read-only before dropping NX attribute") with my LLVM build matrix and I
noticed two problems.

The first issue is a series of errors when building with LTO around
mismatched code model attributes between functions. Unhelpfully, the
error message does not actually say what function is conflicting...

ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(numa.o at 1191378)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(buffer.o at 1211538)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(nfs4xdr.o at 1222698)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(namei.o at 1209498)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(vmalloc.o at 1207458)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(iommu.o at 1267098)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(ring_buffer.o at 1202478)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(sky2.o at 1299798)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(page_alloc.o at 1207578)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(percpu.o at 1206018)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(slub.o at 1207758)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(xhci.o at 1302858)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(blk-mq.o at 1233858)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(nfs4proc.o at 1222638)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(inode.o at 1218078)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(journal.o at 1219638)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(memory.o at 1206738)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(mballoc.o at 1218198)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(iov_iter.o at 1238058)'
ld.lld: error: Function Import: link error: linking module flags 'Code Model': IDs have conflicting values in 'vmlinux.a(head64.o at 1181178)' and 'vmlinux.a(filemap.o at 1204938)'

Turning off LTO for the translation units that use PIE_CFLAGS avoids
that, not sure if that is reasonable or not.

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 65677b25d803..e4fa57ae3d09 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -24,7 +24,9 @@ endif
# head64.c contains C code that may execute from a different virtual address
# than it was linked at, so we always build it using PIE codegen
CFLAGS_head64.o += $(PIE_CFLAGS)
+CFLAGS_REMOVE_head64.o += $(CC_FLAGS_LTO)
CFLAGS_sev.o += $(PIE_CFLAGS)
+CFLAGS_REMOVE_sev.o += $(CC_FLAGS_LTO)

KASAN_SANITIZE_head$(BITS).o := n
KASAN_SANITIZE_dumpstack.o := n
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 87c79bb8d386..6bb4bc271441 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -25,6 +25,7 @@ CFLAGS_REMOVE_cmdline.o = -pg
endif

CFLAGS_cmdline.o := $(PIE_CFLAGS)
+CFLAGS_REMOVE_cmdline.o := $(CC_FLAGS_LTO)
endif

inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index b412009ae588..bf8d9d4bc97f 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -31,8 +31,10 @@ obj-y += pat/

# Make sure __phys_addr has no stackprotector
CFLAGS_physaddr.o := -fno-stack-protector
-CFLAGS_mem_encrypt_identity.o := $(PIE_CFLAGS)
+CFLAGS_mem_encrypt_identity.o := $(PIE_CFLAGS)
+CFLAGS_REMOVE_mem_encrypt_identity.o := $(CC_FLAGS_LTO)
CFLAGS_pti.o := $(PIE_CFLAGS)
+CFLAGS_REMOVE_pti.o := $(CC_FLAGS_LTO)

CFLAGS_fault.o := -I $(srctree)/$(src)/../include/asm/trace


The second issue is a bunch of modpost warnings I see with various
configurations around some UBSAN and tracing functions.

Clang allmodconfig:

WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x353 (section: .pi.text) -> __phys_addr (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x35e (section: .pi.text) -> __phys_addr (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x3f8 (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x41f (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x44a (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: snp_cpuid+0xa6 (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: snp_cpuid+0x316 (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: snp_init+0x12d (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: snp_cpuid_hv+0x10d (section: .pi.text) -> __phys_addr (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x5 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x15 (section: .pi.text) -> __sanitizer_cov_trace_pc (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x26 (section: .pi.text) -> __sanitizer_cov_trace_const_cmp8 (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x36 (section: .pi.text) -> __sanitizer_cov_trace_pc (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x54 (section: .pi.text) -> __sanitizer_cov_trace_const_cmp8 (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x6a (section: .pi.text) -> __sanitizer_cov_trace_const_cmp8 (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x8d (section: .pi.text) -> __sanitizer_cov_trace_pc (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_encrypt_kernel+0x5 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_encrypt_kernel+0x42 (section: .pi.text) -> __phys_addr_symbol (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_encrypt_kernel+0x51 (section: .pi.text) -> __phys_addr_symbol (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_pgtable_calc+0x1 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_enable+0x5 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __strncmp+0x1 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __sme_map_range+0x1 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __sme_map_range_pte+0x1 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_prepare_pgd+0x1 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: cmdline_find_option_bool+0x5 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: cmdline_find_option+0x5 (section: .pi.text) -> __fentry__ (section: .text)

GCC allmodconfig

WARNING: modpost: vmlinux: section mismatch in reference: early_load_idt+0x53 (section: .pi.text) -> native_write_idt_entry.constprop.0 (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x348 (section: .pi.text) -> __phys_addr (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x353 (section: .pi.text) -> __phys_addr (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x436 (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text.unlikely)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x47b (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text.unlikely)
WARNING: modpost: vmlinux: section mismatch in reference: __startup_64+0x4c0 (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text.unlikely)
WARNING: modpost: vmlinux: section mismatch in reference: sev_es_ghcb_hv_call+0x58 (section: .pi.text) -> __phys_addr (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: setup_cpuid_table+0xf6 (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text.unlikely)
WARNING: modpost: vmlinux: section mismatch in reference: snp_cpuid_hv+0x6 (section: .pi.text) -> __sev_cpuid_hv_ghcb (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: snp_cpuid+0xde (section: .pi.text) -> snp_cpuid_postprocess (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: snp_cpuid+0x16c (section: .pi.text) -> __ubsan_handle_out_of_bounds (section: .text.unlikely)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x6 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x15 (section: .pi.text) -> __sanitizer_cov_trace_pc (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x29 (section: .pi.text) -> __sanitizer_cov_trace_const_cmp8 (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x33 (section: .pi.text) -> __sanitizer_cov_trace_pc (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x51 (section: .pi.text) -> __sanitizer_cov_trace_const_cmp8 (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x5c (section: .pi.text) -> __sanitizer_cov_trace_pc (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x71 (section: .pi.text) -> __sanitizer_cov_trace_pc (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x82 (section: .pi.text) -> __sanitizer_cov_trace_const_cmp8 (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __pti_set_user_pgtbl+0x8c (section: .pi.text) -> __sanitizer_cov_trace_pc (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_pgtable_calc+0x2 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_clear_pgd+0x2 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_prepare_pgd+0x2 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_populate_pgd+0x2 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: __sme_map_range+0x2 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_encrypt_kernel+0x6 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_encrypt_kernel+0x19 (section: .pi.text) -> stackleak_track_stack (section: .noinstr.text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_encrypt_kernel+0x8f (section: .pi.text) -> __phys_addr_symbol (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_encrypt_kernel+0xa5 (section: .pi.text) -> __phys_addr_symbol (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: sme_enable+0x6 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: cmdline_find_option_bool+0x6 (section: .pi.text) -> __fentry__ (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference: cmdline_find_option+0x6 (section: .pi.text) -> __fentry__ (section: .text)

Should functions marked with __pitext have these sanitizers disabled?

Cheers,
Nathan

2024-01-25 23:24:58

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v2 16/17] x86/sev: Drop inline asm LEA instructions for RIP-relative references

On Thu, 25 Jan 2024 at 21:46, Kevin Loughlin <[email protected]> wrote:
>
> On Thu, Jan 25, 2024 at 3:33 AM Ard Biesheuvel <[email protected]> wrote:
> >
> > The SEV code that may run early is now built with -fPIC and so there is
> > no longer a need for explicit RIP-relative references in inline asm,
> > given that is what the compiler will emit as well.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/x86/mm/mem_encrypt_identity.c | 37 +++-----------------
> > 1 file changed, 5 insertions(+), 32 deletions(-)
>
> snp_cpuid_get_table() in arch/x86/kernel/sev-shared.c (a helper
> function to provide the same inline assembly pattern for RIP-relative
> references) would also no longer be needed, as all calls to it would
> now be made in position-independent code. We can therefore eliminate
> the function as part of this commit.

Yes that would be another nice cleanup.