2024-02-26 14:30:12

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 00/10] x86: Confine early 1:1 mapped startup code

From: Ard Biesheuvel <[email protected]>

This is a resend of v5 [0] with some touchups applied.

Changes since v5:
- drop patches that have been merged
- rebase onto latest tip/x86/boot
- fix comment regarding CR4.PGE wrt flushing of global TLB entries
- avoid adding startup code to .noinstr.text as it triggers objtool
warnings

[0] https://lore.kernel.org/all/[email protected]/

Cc: Kevin Loughlin <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Dionna Glaze <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>

Ard Biesheuvel (10):
x86/startup_64: Simplify CR4 handling in startup code
x86/startup_64: Defer assignment of 5-level paging global variables
x86/startup_64: Simplify calculation of initial page table address
x86/startup_64: Simplify virtual switch on primary boot
x86/sme: Avoid SME/SVE related checks on non-SME/SVE platforms
efi/libstub: Add generic support for parsing mem_encrypt=
x86/boot: Move mem_encrypt= parsing to the decompressor
x86/sme: Move early SME kernel encryption handling into .head.text
x86/sev: Move early startup code into .head.text section
x86/startup_64: Drop global variables keeping track of LA57 state

arch/x86/boot/compressed/misc.c | 15 ++++
arch/x86/boot/compressed/misc.h | 4 -
arch/x86/boot/compressed/pgtable_64.c | 12 ---
arch/x86/boot/compressed/sev.c | 3 +
arch/x86/boot/compressed/vmlinux.lds.S | 1 +
arch/x86/include/asm/mem_encrypt.h | 8 +-
arch/x86/include/asm/pgtable_64_types.h | 43 ++++-----
arch/x86/include/asm/sev.h | 10 +--
arch/x86/include/uapi/asm/bootparam.h | 1 +
arch/x86/kernel/cpu/common.c | 2 -
arch/x86/kernel/head64.c | 67 +++-----------
arch/x86/kernel/head_64.S | 93 ++++++++------------
arch/x86/kernel/sev-shared.c | 23 +++--
arch/x86/kernel/sev.c | 14 +--
arch/x86/lib/Makefile | 13 ---
arch/x86/mm/kasan_init_64.c | 3 -
arch/x86/mm/mem_encrypt_identity.c | 89 +++++--------------
drivers/firmware/efi/libstub/efi-stub-helper.c | 8 ++
drivers/firmware/efi/libstub/efistub.h | 2 +-
drivers/firmware/efi/libstub/x86-stub.c | 3 +
20 files changed, 152 insertions(+), 262 deletions(-)

--
2.44.0.rc0.258.g7320e95886-goog



2024-02-26 14:30:51

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 01/10] x86/startup_64: Simplify CR4 handling in startup code

From: Ard Biesheuvel <[email protected]>

When paging is enabled, the CR4.PAE and CR4.LA57 control bits cannot be
changed, and so they can simply be preserved rather than reason about
whether or not they need to be set. CR4.MCE should be preserved unless
the kernel was built without CONFIG_X86_MCE, in which case it must be
cleared.

CR4.PSE should be set explicitly, regardless of whether or not it was
set before.

CR4.PGE is set explicitly, and then cleared and set again after
programming CR3 in order to flush TLB entries based on global
translations. This makes the first assignment redundant, and can
therefore be omitted. So clear PGE by omitting it from the preserve
mask, and set it again explicitly after switching to the new page
tables.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 30 ++++++++------------
1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d295bf68bf94..1b054585bfd1 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -185,6 +185,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
addq $(init_top_pgt - __START_KERNEL_map), %rax
1:

+ /*
+ * Create a mask of CR4 bits to preserve. Omit PGE in order to clean
+ * global 1:1 translations from the TLBs.
+ */
+ movl $(X86_CR4_PAE | X86_CR4_LA57), %edx
#ifdef CONFIG_X86_MCE
/*
* Preserve CR4.MCE if the kernel will enable #MC support.
@@ -193,20 +198,13 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
* configured will crash the system regardless of the CR4.MCE value set
* here.
*/
- movq %cr4, %rcx
- andl $X86_CR4_MCE, %ecx
-#else
- movl $0, %ecx
+ orl $X86_CR4_MCE, %edx
#endif
+ movq %cr4, %rcx
+ andl %edx, %ecx

- /* Enable PAE mode, PSE, PGE and LA57 */
- orl $(X86_CR4_PAE | X86_CR4_PSE | X86_CR4_PGE), %ecx
-#ifdef CONFIG_X86_5LEVEL
- testb $1, __pgtable_l5_enabled(%rip)
- jz 1f
- orl $X86_CR4_LA57, %ecx
-1:
-#endif
+ /* Even if ignored in long mode, set PSE uniformly on all logical CPUs. */
+ btsl $X86_CR4_PSE_BIT, %ecx
movq %rcx, %cr4

/* Setup early boot stage 4-/5-level pagetables. */
@@ -223,14 +221,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
movq %rax, %cr3

/*
- * Do a global TLB flush after the CR3 switch to make sure the TLB
- * entries from the identity mapping are flushed.
+ * Set CR4.PGE to re-enable global translations.
*/
- movq %cr4, %rcx
- movq %rcx, %rax
- xorq $X86_CR4_PGE, %rcx
+ btsl $X86_CR4_PGE_BIT, %ecx
movq %rcx, %cr4
- movq %rax, %cr4

/* Ensure I am executing from virtual addresses */
movq $1f, %rax
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 14:31:00

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 02/10] x86/startup_64: Defer assignment of 5-level paging global variables

From: Ard Biesheuvel <[email protected]>

Assigning the 5-level paging related global variables from the earliest
C code using explicit references that use the 1:1 translation of memory
is unnecessary, as the startup code itself does not rely on them to
create the initial page tables, and this is all it should be doing. So
defer these assignments to the primary C entry code that executes via
the ordinary kernel virtual mapping.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/pgtable_64_types.h | 2 +-
arch/x86/kernel/head64.c | 44 +++++++-------------
2 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 38b54b992f32..9053dfe9fa03 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -21,9 +21,9 @@ typedef unsigned long pgprotval_t;
typedef struct { pteval_t pte; } pte_t;
typedef struct { pmdval_t pmd; } pmd_t;

-#ifdef CONFIG_X86_5LEVEL
extern unsigned int __pgtable_l5_enabled;

+#ifdef CONFIG_X86_5LEVEL
#ifdef USE_EARLY_PGTABLE_L5
/*
* cpu_feature_enabled() is not available in early boot code.
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 72351c3121a6..deaaea3280d9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -23,6 +23,7 @@
#include <linux/pgtable.h>

#include <asm/asm.h>
+#include <asm/page_64.h>
#include <asm/processor.h>
#include <asm/proto.h>
#include <asm/smp.h>
@@ -77,24 +78,11 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
[GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(DESC_DATA64, 0, 0xfffff),
};

-#ifdef CONFIG_X86_5LEVEL
-static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
-{
- return ptr - (void *)_text + (void *)physaddr;
-}
-
-static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
+static inline bool check_la57_support(void)
{
- return fixup_pointer(ptr, physaddr);
-}
-
-static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
-{
- return fixup_pointer(ptr, physaddr);
-}
+ if (!IS_ENABLED(CONFIG_X86_5LEVEL))
+ return false;

-static bool __head check_la57_support(unsigned long physaddr)
-{
/*
* 5-level paging is detected and enabled at kernel decompression
* stage. Only check if it has been enabled there.
@@ -102,21 +90,8 @@ static bool __head check_la57_support(unsigned long physaddr)
if (!(native_read_cr4() & X86_CR4_LA57))
return false;

- *fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
- *fixup_int(&pgdir_shift, physaddr) = 48;
- *fixup_int(&ptrs_per_p4d, physaddr) = 512;
- *fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
- *fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
- *fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
-
return true;
}
-#else
-static bool __head check_la57_support(unsigned long physaddr)
-{
- return false;
-}
-#endif

static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
{
@@ -180,7 +155,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
bool la57;
int i;

- la57 = check_la57_support(physaddr);
+ la57 = check_la57_support();

/* Is the address too large? */
if (physaddr >> MAX_PHYSMEM_BITS)
@@ -465,6 +440,15 @@ asmlinkage __visible void __init __noreturn x86_64_start_kernel(char * real_mode
(__START_KERNEL & PGDIR_MASK)));
BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);

+ if (check_la57_support()) {
+ __pgtable_l5_enabled = 1;
+ pgdir_shift = 48;
+ ptrs_per_p4d = 512;
+ page_offset_base = __PAGE_OFFSET_BASE_L5;
+ vmalloc_base = __VMALLOC_BASE_L5;
+ vmemmap_base = __VMEMMAP_BASE_L5;
+ }
+
cr4_init_shadow();

/* Kill off the identity-map trampoline */
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 14:31:12

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 03/10] x86/startup_64: Simplify calculation of initial page table address

From: Ard Biesheuvel <[email protected]>

Determining the address of the initial page table to program into CR3
involves:
- taking the physical address
- adding the SME encryption mask

On the primary entry path, the code is mapped using a 1:1 virtual to
physical translation, so the physical address can be taken directly
using a RIP-relative LEA instruction.

On the secondary entry path, the address can be obtained by taking the
offset from the virtual kernel base (__START_kernel_map) and adding the
physical kernel base.

This is implemented in a slightly confusing way, so clean this up.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 25 ++++++--------------
1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 1b054585bfd1..c451a72bc92b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -111,13 +111,11 @@ SYM_CODE_START_NOALIGN(startup_64)
call __startup_64

/* Form the CR3 value being sure to include the CR3 modifier */
- addq $(early_top_pgt - __START_KERNEL_map), %rax
+ leaq early_top_pgt(%rip), %rcx
+ addq %rcx, %rax

#ifdef CONFIG_AMD_MEM_ENCRYPT
mov %rax, %rdi
- mov %rax, %r14
-
- addq phys_base(%rip), %rdi

/*
* For SEV guests: Verify that the C-bit is correct. A malicious
@@ -126,12 +124,6 @@ SYM_CODE_START_NOALIGN(startup_64)
* the next RET instruction.
*/
call sev_verify_cbit
-
- /*
- * Restore CR3 value without the phys_base which will be added
- * below, before writing %cr3.
- */
- mov %r14, %rax
#endif

jmp 1f
@@ -171,18 +163,18 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
/* Clear %R15 which holds the boot_params pointer on the boot CPU */
xorq %r15, %r15

+ /* Derive the runtime physical address of init_top_pgt[] */
+ movq phys_base(%rip), %rax
+ addq $(init_top_pgt - __START_KERNEL_map), %rax
+
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
* added to the initial pgdir entry that will be programmed into CR3.
*/
#ifdef CONFIG_AMD_MEM_ENCRYPT
- movq sme_me_mask, %rax
-#else
- xorq %rax, %rax
+ addq sme_me_mask(%rip), %rax
#endif

- /* Form the CR3 value being sure to include the CR3 modifier */
- addq $(init_top_pgt - __START_KERNEL_map), %rax
1:

/*
@@ -207,9 +199,6 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
btsl $X86_CR4_PSE_BIT, %ecx
movq %rcx, %cr4

- /* Setup early boot stage 4-/5-level pagetables. */
- addq phys_base(%rip), %rax
-
/*
* Switch to new page-table
*
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 14:32:22

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 04/10] x86/startup_64: Simplify virtual switch on primary boot

From: Ard Biesheuvel <[email protected]>

The secondary startup code is used on the primary boot path as well, but
in this case, the initial part runs from a 1:1 mapping, until an
explicit cross-jump is made to the kernel virtual mapping of the same
code.

On the secondary boot path, this jump is pointless as the code already
executes from the mapping targeted by the jump. So combine this
cross-jump with the jump from startup_64() into the common boot path.
This simplifies the execution flow, and clearly separates code that runs
from a 1:1 mapping from code that runs from the kernel virtual mapping.

Note that this requires a page table switch, so hoist the CR3 assignment
into startup_64() as well. And since absolute symbol references will no
longer be permitted in .head.text once we enable the associated build
time checks, a RIP-relative memory operand is used in the JMP
instruction, referring to an absolute constant in the .init.rodata
section.

Given that the secondary startup code does not require a special
placement inside the executable, move it to the .text section.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 42 ++++++++++----------
1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c451a72bc92b..87929f615048 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -39,7 +39,6 @@ L4_START_KERNEL = l4_index(__START_KERNEL_map)

L3_START_KERNEL = pud_index(__START_KERNEL_map)

- .text
__HEAD
.code64
SYM_CODE_START_NOALIGN(startup_64)
@@ -126,9 +125,21 @@ SYM_CODE_START_NOALIGN(startup_64)
call sev_verify_cbit
#endif

- jmp 1f
+ /*
+ * Switch to early_top_pgt which still has the identity mappings
+ * present.
+ */
+ movq %rax, %cr3
+
+ /* Branch to the common startup code at its kernel virtual address */
+ ANNOTATE_RETPOLINE_SAFE
+ jmp *0f(%rip)
SYM_CODE_END(startup_64)

+ __INITRODATA
+0: .quad common_startup_64
+
+ .text
SYM_CODE_START(secondary_startup_64)
UNWIND_HINT_END_OF_STACK
ANNOTATE_NOENDBR
@@ -174,8 +185,15 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
#ifdef CONFIG_AMD_MEM_ENCRYPT
addq sme_me_mask(%rip), %rax
#endif
+ /*
+ * Switch to the init_top_pgt here, away from the trampoline_pgd and
+ * unmap the identity mapped ranges.
+ */
+ movq %rax, %cr3

-1:
+SYM_INNER_LABEL(common_startup_64, SYM_L_LOCAL)
+ UNWIND_HINT_END_OF_STACK
+ ANNOTATE_NOENDBR

/*
* Create a mask of CR4 bits to preserve. Omit PGE in order to clean
@@ -199,30 +217,12 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
btsl $X86_CR4_PSE_BIT, %ecx
movq %rcx, %cr4

- /*
- * Switch to new page-table
- *
- * For the boot CPU this switches to early_top_pgt which still has the
- * identity mappings present. The secondary CPUs will switch to the
- * init_top_pgt here, away from the trampoline_pgd and unmap the
- * identity mapped ranges.
- */
- movq %rax, %cr3
-
/*
* Set CR4.PGE to re-enable global translations.
*/
btsl $X86_CR4_PGE_BIT, %ecx
movq %rcx, %cr4

- /* Ensure I am executing from virtual addresses */
- movq $1f, %rax
- ANNOTATE_RETPOLINE_SAFE
- jmp *%rax
-1:
- UNWIND_HINT_END_OF_STACK
- ANNOTATE_NOENDBR // above
-
#ifdef CONFIG_SMP
/*
* For parallel boot, the APIC ID is read from the APIC, and then
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 14:32:27

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 06/10] efi/libstub: Add generic support for parsing mem_encrypt=

From: Ard Biesheuvel <[email protected]>

Parse the mem_encrypt= command line parameter from the EFI stub if
CONFIG_ARCH_HAS_MEM_ENCRYPT=y, so that it can be passed to the early
boot code by the arch code in the stub.

This avoids the need for the core kernel to do any string parsing very
early in the boot.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/efi-stub-helper.c | 8 ++++++++
drivers/firmware/efi/libstub/efistub.h | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index bfa30625f5d0..3dc2f9aaf08d 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -24,6 +24,8 @@ static bool efi_noinitrd;
static bool efi_nosoftreserve;
static bool efi_disable_pci_dma = IS_ENABLED(CONFIG_EFI_DISABLE_PCI_DMA);

+int efi_mem_encrypt;
+
bool __pure __efi_soft_reserve_enabled(void)
{
return !efi_nosoftreserve;
@@ -75,6 +77,12 @@ efi_status_t efi_parse_options(char const *cmdline)
efi_noinitrd = true;
} else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
efi_no5lvl = true;
+ } else if (IS_ENABLED(CONFIG_ARCH_HAS_MEM_ENCRYPT) &&
+ !strcmp(param, "mem_encrypt") && val) {
+ if (parse_option_str(val, "on"))
+ efi_mem_encrypt = 1;
+ else if (parse_option_str(val, "off"))
+ efi_mem_encrypt = -1;
} else if (!strcmp(param, "efi") && val) {
efi_nochunk = parse_option_str(val, "nochunk");
efi_novamap |= parse_option_str(val, "novamap");
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index c04b82ea40f2..fc18fd649ed7 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -37,8 +37,8 @@ extern bool efi_no5lvl;
extern bool efi_nochunk;
extern bool efi_nokaslr;
extern int efi_loglevel;
+extern int efi_mem_encrypt;
extern bool efi_novamap;
-
extern const efi_system_table_t *efi_system_table;

typedef union efi_dxe_services_table efi_dxe_services_table_t;
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 14:32:30

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 05/10] x86/sme: Avoid SME/SVE related checks on non-SME/SVE platforms

From: Ard Biesheuvel <[email protected]>

Reorganize the early SME/SVE init code so that SME/SVE related calls are
deferred until it has been determined that the platform actually
supports this, and so those calls could actually make sense.

This removes logic from the early boot path that executes from the 1:1
mapping when booting a CONFIG_AMD_MEM_ENCRYPT=y kernel on a system that
does not implement that (i.e., 99% of distro kernels)

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 4 ++--
arch/x86/kernel/head64.c | 6 +++---
arch/x86/mm/mem_encrypt_identity.c | 8 +++-----
3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index b31eb9fd5954..b1437ba0b3b8 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -48,7 +48,7 @@ void __init sme_unmap_bootdata(char *real_mode_data);
void __init sme_early_init(void);

void __init sme_encrypt_kernel(struct boot_params *bp);
-void __init sme_enable(struct boot_params *bp);
+void sme_enable(struct boot_params *bp);

int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
@@ -82,7 +82,7 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
static inline void __init sme_early_init(void) { }

static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
-static inline void __init sme_enable(struct boot_params *bp) { }
+static inline void sme_enable(struct boot_params *bp) { }

static inline void sev_es_init_vc_handling(void) { }

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index deaaea3280d9..f37278d1cf85 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -98,9 +98,6 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
unsigned long vaddr, vaddr_end;
int i;

- /* Encrypt the kernel and related (if SME is active) */
- sme_encrypt_kernel(bp);
-
/*
* Clear the memory encryption mask from the .bss..decrypted section.
* The bss section will be memset to zero later in the initialization so
@@ -108,6 +105,9 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* attribute.
*/
if (sme_get_me_mask()) {
+ /* Encrypt the kernel and related */
+ sme_encrypt_kernel(bp);
+
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;

diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 0166ab1780cc..7ddcf960e92a 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -45,6 +45,7 @@
#include <asm/sections.h>
#include <asm/cmdline.h>
#include <asm/coco.h>
+#include <asm/init.h>
#include <asm/sev.h>

#include "mm_internal.h"
@@ -502,18 +503,15 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
native_write_cr3(__native_read_cr3());
}

-void __init sme_enable(struct boot_params *bp)
+void __head sme_enable(struct boot_params *bp)
{
const char *cmdline_ptr, *cmdline_arg, *cmdline_on;
unsigned int eax, ebx, ecx, edx;
unsigned long feature_mask;
unsigned long me_mask;
char buffer[16];
- bool snp;
u64 msr;

- snp = snp_init(bp);
-
/* Check for the SME/SEV support leaf */
eax = 0x80000000;
ecx = 0;
@@ -546,7 +544,7 @@ void __init sme_enable(struct boot_params *bp)
feature_mask = (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;

/* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
- if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
+ if (snp_init(bp) && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();

/* Check if memory encryption is enabled */
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 14:32:37

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 07/10] x86/boot: Move mem_encrypt= parsing to the decompressor

From: Ard Biesheuvel <[email protected]>

The early SME/SEV code parses the command line very early, in order to
decide whether or not memory encryption should be enabled, which needs
to occur even before the initial page tables are created.

This is problematic for a number of reasons:
- this early code runs from the 1:1 mapping provided by the decompressor
or firmware, which uses a different translation than the one assumed by
the linker, and so the code needs to be built in a special way;
- parsing external input while the entire kernel image is still mapped
writable is a bad idea in general, and really does not belong in
security minded code;
- the current code ignores the built-in command line entirely (although
this appears to be the case for the entire decompressor)

Given that the decompressor/EFI stub is an intrinsic part of the x86
bootable kernel image, move the command line parsing there and out of
the core kernel. This removes the need to build lib/cmdline.o in a
special way, or to use RIP-relative LEA instructions in inline asm
blocks.

This involves a new xloadflag in the setup header to indicate
that mem_encrypt=on appeared on the kernel command line.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/misc.c | 15 +++++++++
arch/x86/include/uapi/asm/bootparam.h | 1 +
arch/x86/lib/Makefile | 13 --------
arch/x86/mm/mem_encrypt_identity.c | 32 ++------------------
drivers/firmware/efi/libstub/x86-stub.c | 3 ++
5 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index bd6857a9f15a..408507e305be 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -371,6 +371,19 @@ unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
return entry;
}

+/*
+ * Set the memory encryption xloadflag based on the mem_encrypt= command line
+ * parameter, if provided.
+ */
+static void parse_mem_encrypt(struct setup_header *hdr)
+{
+ int on = cmdline_find_option_bool("mem_encrypt=on");
+ int off = cmdline_find_option_bool("mem_encrypt=off");
+
+ if (on > off)
+ hdr->xloadflags |= XLF_MEM_ENCRYPTION;
+}
+
/*
* The compressed kernel image (ZO), has been moved so that its position
* is against the end of the buffer used to hold the uncompressed kernel
@@ -401,6 +414,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
/* Clear flags intended for solely in-kernel use. */
boot_params_ptr->hdr.loadflags &= ~KASLR_FLAG;

+ parse_mem_encrypt(&boot_params_ptr->hdr);
+
sanitize_boot_params(boot_params_ptr);

if (boot_params_ptr->screen_info.orig_video_mode == 7) {
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 01d19fc22346..eeea058cf602 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -38,6 +38,7 @@
#define XLF_EFI_KEXEC (1<<4)
#define XLF_5LEVEL (1<<5)
#define XLF_5LEVEL_ENABLED (1<<6)
+#define XLF_MEM_ENCRYPTION (1<<7)

#ifndef __ASSEMBLY__

diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index ea3a28e7b613..f0dae4fb6d07 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -14,19 +14,6 @@ ifdef CONFIG_KCSAN
CFLAGS_REMOVE_delay.o = $(CC_FLAGS_FTRACE)
endif

-# Early boot use of cmdline; don't instrument it
-ifdef CONFIG_AMD_MEM_ENCRYPT
-KCOV_INSTRUMENT_cmdline.o := n
-KASAN_SANITIZE_cmdline.o := n
-KCSAN_SANITIZE_cmdline.o := n
-
-ifdef CONFIG_FUNCTION_TRACER
-CFLAGS_REMOVE_cmdline.o = -pg
-endif
-
-CFLAGS_cmdline.o := -fno-stack-protector -fno-jump-tables
-endif
-
inat_tables_script = $(srctree)/arch/x86/tools/gen-insn-attr-x86.awk
inat_tables_maps = $(srctree)/arch/x86/lib/x86-opcode-map.txt
quiet_cmd_inat_tables = GEN $@
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 7ddcf960e92a..0180fbbcc940 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -43,7 +43,6 @@

#include <asm/setup.h>
#include <asm/sections.h>
-#include <asm/cmdline.h>
#include <asm/coco.h>
#include <asm/init.h>
#include <asm/sev.h>
@@ -96,9 +95,6 @@ struct sme_populate_pgd_data {
*/
static char sme_workarea[2 * PMD_SIZE] __section(".init.scratch");

-static char sme_cmdline_arg[] __initdata = "mem_encrypt";
-static char sme_cmdline_on[] __initdata = "on";
-
static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
{
unsigned long pgd_start, pgd_end, pgd_size;
@@ -505,11 +501,9 @@ void __init sme_encrypt_kernel(struct boot_params *bp)

void __head sme_enable(struct boot_params *bp)
{
- const char *cmdline_ptr, *cmdline_arg, *cmdline_on;
unsigned int eax, ebx, ecx, edx;
unsigned long feature_mask;
unsigned long me_mask;
- char buffer[16];
u64 msr;

/* Check for the SME/SEV support leaf */
@@ -549,6 +543,9 @@ void __head sme_enable(struct boot_params *bp)

/* Check if memory encryption is enabled */
if (feature_mask == AMD_SME_BIT) {
+ if (!(bp->hdr.xloadflags & XLF_MEM_ENCRYPTION))
+ return;
+
/*
* No SME if Hypervisor bit is set. This check is here to
* prevent a guest from trying to enable SME. For running as a
@@ -568,31 +565,8 @@ void __head sme_enable(struct boot_params *bp)
msr = __rdmsr(MSR_AMD64_SYSCFG);
if (!(msr & MSR_AMD64_SYSCFG_MEM_ENCRYPT))
return;
- } else {
- /* SEV state cannot be controlled by a command line option */
- goto out;
}

- /*
- * Fixups have not been applied to phys_base yet and we're running
- * identity mapped, so we must obtain the address to the SME command
- * line argument data using rip-relative addressing.
- */
- asm ("lea sme_cmdline_arg(%%rip), %0"
- : "=r" (cmdline_arg)
- : "p" (sme_cmdline_arg));
- asm ("lea sme_cmdline_on(%%rip), %0"
- : "=r" (cmdline_on)
- : "p" (sme_cmdline_on));
-
- cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
- ((u64)bp->ext_cmd_line_ptr << 32));
-
- if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0 ||
- strncmp(buffer, cmdline_on, sizeof(buffer)))
- return;
-
-out:
RIP_REL_REF(sme_me_mask) = me_mask;
physical_mask &= ~me_mask;
cc_vendor = CC_VENDOR_AMD;
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 99429bc4b0c7..0336ed175e67 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -884,6 +884,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
}

+ if (efi_mem_encrypt > 0)
+ hdr->xloadflags |= XLF_MEM_ENCRYPTION;
+
status = efi_decompress_kernel(&kernel_entry);
if (status != EFI_SUCCESS) {
efi_err("Failed to decompress kernel\n");
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 14:32:48

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 09/10] x86/sev: Move early startup code into .head.text section

From: Ard Biesheuvel <[email protected]>

In preparation for implementing rigorous build time checks to enforce
that only code that can support it will be called from the early 1:1
mapping of memory, move SEV init code that is called in this manner to
the .head.text section.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/sev.c | 3 +++
arch/x86/include/asm/sev.h | 10 ++++-----
arch/x86/kernel/sev-shared.c | 23 +++++++++-----------
arch/x86/kernel/sev.c | 14 +++++++-----
4 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 073291832f44..bea0719d70f2 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -116,6 +116,9 @@ static bool fault_in_kernel_space(unsigned long address)
#undef __init
#define __init

+#undef __head
+#define __head
+
#define __BOOT_COMPRESSED

/* Basic instruction decoding support needed */
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index bed95e1f4d52..cf671138feef 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -213,16 +213,16 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
struct snp_guest_request_ioctl;

void setup_ghcb(void);
-void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
- unsigned long npages);
-void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
- unsigned long npages);
+void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+ unsigned long npages);
+void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+ unsigned long npages);
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
void snp_set_memory_shared(unsigned long vaddr, unsigned long npages);
void snp_set_memory_private(unsigned long vaddr, unsigned long npages);
void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
-void __init __noreturn snp_abort(void);
+void __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
void snp_accept_memory(phys_addr_t start, phys_addr_t end);
u64 snp_get_unsupported_features(u64 status);
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index ae79f9505298..0bd7ccbe8732 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -93,7 +93,8 @@ static bool __init sev_es_check_cpu_features(void)
return true;
}

-static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
+static void __head __noreturn
+sev_es_terminate(unsigned int set, unsigned int reason)
{
u64 val = GHCB_MSR_TERM_REQ;

@@ -330,13 +331,7 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
*/
static const struct snp_cpuid_table *snp_cpuid_get_table(void)
{
- void *ptr;
-
- asm ("lea cpuid_table_copy(%%rip), %0"
- : "=r" (ptr)
- : "p" (&cpuid_table_copy));
-
- return ptr;
+ return &RIP_REL_REF(cpuid_table_copy);
}

/*
@@ -395,7 +390,7 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
return xsave_size;
}

-static bool
+static bool __head
snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -532,7 +527,8 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
* Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
* should be treated as fatal by caller.
*/
-static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+static int __head
+snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();

@@ -574,7 +570,7 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
* page yet, so it only supports the MSR based communication with the
* hypervisor and only the CPUID exit-code.
*/
-void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
+void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
{
unsigned int subfn = lower_bits(regs->cx, 32);
unsigned int fn = lower_bits(regs->ax, 32);
@@ -1025,7 +1021,8 @@ struct cc_setup_data {
* Search for a Confidential Computing blob passed in as a setup_data entry
* via the Linux Boot Protocol.
*/
-static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
+static __head
+struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
{
struct cc_setup_data *sd = NULL;
struct setup_data *hdr;
@@ -1052,7 +1049,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
* mapping needs to be updated in sync with all the changes to virtual memory
* layout and related mapping facilities throughout the boot process.
*/
-static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
+static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
{
const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
int i;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1ef7ae806a01..33c14aa1f06c 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -25,6 +25,7 @@
#include <linux/psp-sev.h>
#include <uapi/linux/sev-guest.h>

+#include <asm/init.h>
#include <asm/cpu_entry_area.h>
#include <asm/stacktrace.h>
#include <asm/sev.h>
@@ -682,8 +683,9 @@ static u64 __init get_jump_table_addr(void)
return ret;
}

-static void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
- unsigned long npages, enum psc_op op)
+static void __head
+early_set_pages_state(unsigned long vaddr, unsigned long paddr,
+ unsigned long npages, enum psc_op op)
{
unsigned long paddr_end;
u64 val;
@@ -739,7 +741,7 @@ static void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
}

-void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
unsigned long npages)
{
/*
@@ -2062,7 +2064,7 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
*
* Scan for the blob in that order.
*/
-static __init struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
+static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;

@@ -2088,7 +2090,7 @@ static __init struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
return cc_info;
}

-bool __init snp_init(struct boot_params *bp)
+bool __head snp_init(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;

@@ -2110,7 +2112,7 @@ bool __init snp_init(struct boot_params *bp)
return true;
}

-void __init __noreturn snp_abort(void)
+void __head __noreturn snp_abort(void)
{
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
}
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 14:32:53

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 10/10] x86/startup_64: Drop global variables keeping track of LA57 state

From: Ard Biesheuvel <[email protected]>

On x86_64, the core kernel is entered in long mode, which implies that
paging is enabled. This means that the CR4.LA57 control bit is
guaranteed to be in sync with the number of paging levels used by the
kernel, and there is no need to store this in a variable.

There is also no need to use variables for storing the calculations of
pgdir_shift and ptrs_per_p4d, as they are easily determined on the fly.

This removes the need for two different sources of truth for determining
whether 5-level paging is in use: CR4.LA57 always reflects the actual
state, and never changes from the point of view of the 64-bit core
kernel. The only potential concern is the cost of CR4 accesses, which
can be mitigated using alternatives patching based on feature detection.

Note that even the decompressor does not manipulate any page tables
before updating CR4.LA57, so it can also avoid the associated global
variables entirely. However, as it does not implement alternatives
patching, the associated ELF sections need to be discarded.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/misc.h | 4 --
arch/x86/boot/compressed/pgtable_64.c | 12 ------
arch/x86/boot/compressed/vmlinux.lds.S | 1 +
arch/x86/include/asm/pgtable_64_types.h | 43 ++++++++++----------
arch/x86/kernel/cpu/common.c | 2 -
arch/x86/kernel/head64.c | 33 +--------------
arch/x86/mm/kasan_init_64.c | 3 --
arch/x86/mm/mem_encrypt_identity.c | 9 ----
8 files changed, 25 insertions(+), 82 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index b353a7be380c..e4ab7b4d8698 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -16,9 +16,6 @@

#define __NO_FORTIFY

-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
-
/*
* Boot stub deals with identity mappings, physical and virtual addresses are
* the same, so override these defines.
@@ -181,7 +178,6 @@ static inline int count_immovable_mem_regions(void) { return 0; }
#endif

/* ident_map_64.c */
-extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
extern void kernel_add_identity_map(unsigned long start, unsigned long end);

/* Used by PAGE_KERN* macros: */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 51f957b24ba7..ae72f53f5e77 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -9,13 +9,6 @@
#define BIOS_START_MIN 0x20000U /* 128K, less than this is insane */
#define BIOS_START_MAX 0x9f000U /* 640K, absolute maximum */

-#ifdef CONFIG_X86_5LEVEL
-/* __pgtable_l5_enabled needs to be in .data to avoid being cleared along with .bss */
-unsigned int __section(".data") __pgtable_l5_enabled;
-unsigned int __section(".data") pgdir_shift = 39;
-unsigned int __section(".data") ptrs_per_p4d = 1;
-#endif
-
/* Buffer to preserve trampoline memory */
static char trampoline_save[TRAMPOLINE_32BIT_SIZE];

@@ -125,11 +118,6 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) {
l5_required = true;
-
- /* Initialize variables for 5-level paging */
- __pgtable_l5_enabled = 1;
- pgdir_shift = 48;
- ptrs_per_p4d = 512;
}

/*
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 083ec6d7722a..06358bb067fe 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -81,6 +81,7 @@ SECTIONS
*(.dynamic) *(.dynsym) *(.dynstr) *(.dynbss)
*(.hash) *(.gnu.hash)
*(.note.*)
+ *(.altinstructions .altinstr_replacement)
}

.got.plt (INFO) : {
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 9053dfe9fa03..2fac8ba9564a 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -6,7 +6,10 @@

#ifndef __ASSEMBLY__
#include <linux/types.h>
+#include <asm/alternative.h>
+#include <asm/cpufeatures.h>
#include <asm/kaslr.h>
+#include <asm/processor-flags.h>

/*
* These are used to make use of C type-checking..
@@ -21,28 +24,24 @@ typedef unsigned long pgprotval_t;
typedef struct { pteval_t pte; } pte_t;
typedef struct { pmdval_t pmd; } pmd_t;

-extern unsigned int __pgtable_l5_enabled;
-
-#ifdef CONFIG_X86_5LEVEL
-#ifdef USE_EARLY_PGTABLE_L5
-/*
- * cpu_feature_enabled() is not available in early boot code.
- * Use variable instead.
- */
-static inline bool pgtable_l5_enabled(void)
+static __always_inline __pure bool pgtable_l5_enabled(void)
{
- return __pgtable_l5_enabled;
-}
-#else
-#define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57)
-#endif /* USE_EARLY_PGTABLE_L5 */
+ unsigned long r;
+ bool ret;

-#else
-#define pgtable_l5_enabled() 0
-#endif /* CONFIG_X86_5LEVEL */
+ if (!IS_ENABLED(CONFIG_X86_5LEVEL))
+ return false;

-extern unsigned int pgdir_shift;
-extern unsigned int ptrs_per_p4d;
+ asm(ALTERNATIVE_TERNARY(
+ "movq %%cr4, %[reg] \n\t btl %[la57], %k[reg]" CC_SET(c),
+ %P[feat], "stc", "clc")
+ : [reg] "=&r" (r), CC_OUT(c) (ret)
+ : [feat] "i" (X86_FEATURE_LA57),
+ [la57] "i" (X86_CR4_LA57_BIT)
+ : "cc");
+
+ return ret;
+}

#endif /* !__ASSEMBLY__ */

@@ -53,7 +52,7 @@ extern unsigned int ptrs_per_p4d;
/*
* PGDIR_SHIFT determines what a top-level page table entry can map
*/
-#define PGDIR_SHIFT pgdir_shift
+#define PGDIR_SHIFT (pgtable_l5_enabled() ? 48 : 39)
#define PTRS_PER_PGD 512

/*
@@ -61,7 +60,7 @@ extern unsigned int ptrs_per_p4d;
*/
#define P4D_SHIFT 39
#define MAX_PTRS_PER_P4D 512
-#define PTRS_PER_P4D ptrs_per_p4d
+#define PTRS_PER_P4D (pgtable_l5_enabled() ? 512 : 1)
#define P4D_SIZE (_AC(1, UL) << P4D_SHIFT)
#define P4D_MASK (~(P4D_SIZE - 1))

@@ -76,6 +75,8 @@ extern unsigned int ptrs_per_p4d;
#define PTRS_PER_PGD 512
#define MAX_PTRS_PER_P4D 1

+#define MAX_POSSIBLE_PHYSMEM_BITS 46
+
#endif /* CONFIG_X86_5LEVEL */

/*
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9e35e276c55a..d88e4be88868 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1,6 +1,4 @@
// SPDX-License-Identifier: GPL-2.0-only
-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5

#include <linux/memblock.h>
#include <linux/linkage.h>
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index f37278d1cf85..a65babef6148 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -7,9 +7,6 @@

#define DISABLE_BRANCH_PROFILING

-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
-
#include <linux/init.h>
#include <linux/linkage.h>
#include <linux/types.h>
@@ -52,14 +49,6 @@ extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
static unsigned int __initdata next_early_pgt;
pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);

-#ifdef CONFIG_X86_5LEVEL
-unsigned int __pgtable_l5_enabled __ro_after_init;
-unsigned int pgdir_shift __ro_after_init = 39;
-EXPORT_SYMBOL(pgdir_shift);
-unsigned int ptrs_per_p4d __ro_after_init = 1;
-EXPORT_SYMBOL(ptrs_per_p4d);
-#endif
-
#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4;
EXPORT_SYMBOL(page_offset_base);
@@ -78,21 +67,6 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = {
[GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(DESC_DATA64, 0, 0xfffff),
};

-static inline bool check_la57_support(void)
-{
- if (!IS_ENABLED(CONFIG_X86_5LEVEL))
- return false;
-
- /*
- * 5-level paging is detected and enabled at kernel decompression
- * stage. Only check if it has been enabled there.
- */
- if (!(native_read_cr4() & X86_CR4_LA57))
- return false;
-
- return true;
-}
-
static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
{
unsigned long vaddr, vaddr_end;
@@ -155,7 +129,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
bool la57;
int i;

- la57 = check_la57_support();
+ la57 = pgtable_l5_enabled();

/* Is the address too large? */
if (physaddr >> MAX_PHYSMEM_BITS)
@@ -440,10 +414,7 @@ asmlinkage __visible void __init __noreturn x86_64_start_kernel(char * real_mode
(__START_KERNEL & PGDIR_MASK)));
BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);

- if (check_la57_support()) {
- __pgtable_l5_enabled = 1;
- pgdir_shift = 48;
- ptrs_per_p4d = 512;
+ if (pgtable_l5_enabled()) {
page_offset_base = __PAGE_OFFSET_BASE_L5;
vmalloc_base = __VMALLOC_BASE_L5;
vmemmap_base = __VMEMMAP_BASE_L5;
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 0302491d799d..85ae1ef840cc 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -2,9 +2,6 @@
#define DISABLE_BRANCH_PROFILING
#define pr_fmt(fmt) "kasan: " fmt

-/* cpu_feature_enabled() cannot be used this early */
-#define USE_EARLY_PGTABLE_L5
-
#include <linux/memblock.h>
#include <linux/kasan.h>
#include <linux/kdebug.h>
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 174a7192c9cb..3da15e9a8c7d 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -27,15 +27,6 @@
#undef CONFIG_PARAVIRT_XXL
#undef CONFIG_PARAVIRT_SPINLOCKS

-/*
- * This code runs before CPU feature bits are set. By default, the
- * pgtable_l5_enabled() function uses bit X86_FEATURE_LA57 to determine if
- * 5-level paging is active, so that won't work here. USE_EARLY_PGTABLE_L5
- * is provided to handle this situation and, instead, use a variable that
- * has been set by the early boot code.
- */
-#define USE_EARLY_PGTABLE_L5
-
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/mem_encrypt.h>
--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 15:00:27

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v6 08/10] x86/sme: Move early SME kernel encryption handling into .head.text

From: Ard Biesheuvel <[email protected]>

The .head.text section is the initial primary entrypoint of the core
kernel, and is entered with the CPU executing from a 1:1 mapping of
memory. Such code must never access global variables using absolute
references, as these are based on the kernel virtual mapping which is
not active yet at this point.

Given that the SME startup code is also called from this early execution
context, move it into .head.text as well. This will allow more thorough
build time checks in the future to ensure that early startup code only
uses RIP-relative references to global variables.

Also replace some occurrences of __pa_symbol() [which relies on the
compiler generating an absolute reference, which is not guaranteed] and
an open coded RIP-relative access with RIP_REL_REF().

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/mem_encrypt.h | 4 +-
arch/x86/mm/mem_encrypt_identity.c | 40 ++++++++------------
2 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index b1437ba0b3b8..f922b682b9b4 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -47,7 +47,7 @@ void __init sme_unmap_bootdata(char *real_mode_data);

void __init sme_early_init(void);

-void __init sme_encrypt_kernel(struct boot_params *bp);
+void sme_encrypt_kernel(struct boot_params *bp);
void sme_enable(struct boot_params *bp);

int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
@@ -81,7 +81,7 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }

static inline void __init sme_early_init(void) { }

-static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
+static inline void sme_encrypt_kernel(struct boot_params *bp) { }
static inline void sme_enable(struct boot_params *bp) { }

static inline void sev_es_init_vc_handling(void) { }
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 0180fbbcc940..174a7192c9cb 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -41,6 +41,7 @@
#include <linux/mem_encrypt.h>
#include <linux/cc_platform.h>

+#include <asm/init.h>
#include <asm/setup.h>
#include <asm/sections.h>
#include <asm/coco.h>
@@ -95,7 +96,7 @@ struct sme_populate_pgd_data {
*/
static char sme_workarea[2 * PMD_SIZE] __section(".init.scratch");

-static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
+static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd)
{
unsigned long pgd_start, pgd_end, pgd_size;
pgd_t *pgd_p;
@@ -110,7 +111,7 @@ static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
memset(pgd_p, 0, pgd_size);
}

-static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
+static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
{
pgd_t *pgd;
p4d_t *p4d;
@@ -147,7 +148,7 @@ static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
return pud;
}

-static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
+static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
{
pud_t *pud;
pmd_t *pmd;
@@ -163,7 +164,7 @@ static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags));
}

-static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
+static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd)
{
pud_t *pud;
pmd_t *pmd;
@@ -189,7 +190,7 @@ static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
}

-static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
+static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd_large(ppd);
@@ -199,7 +200,7 @@ static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
}
}

-static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
+static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd(ppd);
@@ -209,7 +210,7 @@ static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
}
}

-static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
+static void __head __sme_map_range(struct sme_populate_pgd_data *ppd,
pmdval_t pmd_flags, pteval_t pte_flags)
{
unsigned long vaddr_end;
@@ -233,22 +234,22 @@ static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
__sme_map_range_pte(ppd);
}

-static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
+static void __head sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC);
}

-static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
+static void __head sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC);
}

-static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
+static void __head sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP);
}

-static unsigned long __init sme_pgtable_calc(unsigned long len)
+static unsigned long __head sme_pgtable_calc(unsigned long len)
{
unsigned long entries = 0, tables = 0;

@@ -285,7 +286,7 @@ static unsigned long __init sme_pgtable_calc(unsigned long len)
return entries + tables;
}

-void __init sme_encrypt_kernel(struct boot_params *bp)
+void __head sme_encrypt_kernel(struct boot_params *bp)
{
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
@@ -320,9 +321,8 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* memory from being cached.
*/

- /* Physical addresses gives us the identity mapped virtual addresses */
- kernel_start = __pa_symbol(_text);
- kernel_end = ALIGN(__pa_symbol(_end), PMD_SIZE);
+ kernel_start = (unsigned long)RIP_REL_REF(_text);
+ kernel_end = ALIGN((unsigned long)RIP_REL_REF(_end), PMD_SIZE);
kernel_len = kernel_end - kernel_start;

initrd_start = 0;
@@ -339,14 +339,6 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
}
#endif

- /*
- * We're running identity mapped, so we must obtain the address to the
- * SME encryption workarea using rip-relative addressing.
- */
- asm ("lea sme_workarea(%%rip), %0"
- : "=r" (workarea_start)
- : "p" (sme_workarea));
-
/*
* Calculate required number of workarea bytes needed:
* executable encryption area size:
@@ -356,7 +348,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* pagetable structures for the encryption of the kernel
* pagetable structures for workarea (in case not currently mapped)
*/
- execute_start = workarea_start;
+ execute_start = workarea_start = (unsigned long)RIP_REL_REF(sme_workarea);
execute_end = execute_start + (PAGE_SIZE * 2) + PMD_SIZE;
execute_len = execute_end - execute_start;

--
2.44.0.rc0.258.g7320e95886-goog


2024-02-26 21:38:19

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] x86/sme: Avoid SME/SVE related checks on non-SME/SVE platforms

On 2/26/24 08:29, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <[email protected]>
>
> Reorganize the early SME/SVE init code so that SME/SVE related calls are

I'm assuming you mean SEV here and in the subject line.

> deferred until it has been determined that the platform actually
> supports this, and so those calls could actually make sense.
>
> This removes logic from the early boot path that executes from the 1:1
> mapping when booting a CONFIG_AMD_MEM_ENCRYPT=y kernel on a system that
> does not implement that (i.e., 99% of distro kernels)

Maybe I'm missing something but I don't see how this has changed anything
other than moving the call to sme_encrypt_kernel() within the if statement
(which means the check at the beginning of sme_encrypt_kernel() could be
changed do just check for SEV now).

>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/include/asm/mem_encrypt.h | 4 ++--
> arch/x86/kernel/head64.c | 6 +++---
> arch/x86/mm/mem_encrypt_identity.c | 8 +++-----
> 3 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index b31eb9fd5954..b1437ba0b3b8 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -48,7 +48,7 @@ void __init sme_unmap_bootdata(char *real_mode_data);
> void __init sme_early_init(void);
>
> void __init sme_encrypt_kernel(struct boot_params *bp);
> -void __init sme_enable(struct boot_params *bp);
> +void sme_enable(struct boot_params *bp);
>
> int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
> int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
> @@ -82,7 +82,7 @@ static inline void __init sme_unmap_bootdata(char *real_mode_data) { }
> static inline void __init sme_early_init(void) { }
>
> static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
> -static inline void __init sme_enable(struct boot_params *bp) { }
> +static inline void sme_enable(struct boot_params *bp) { }
>
> static inline void sev_es_init_vc_handling(void) { }
>
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index deaaea3280d9..f37278d1cf85 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -98,9 +98,6 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> unsigned long vaddr, vaddr_end;
> int i;
>
> - /* Encrypt the kernel and related (if SME is active) */
> - sme_encrypt_kernel(bp);
> -
> /*
> * Clear the memory encryption mask from the .bss..decrypted section.
> * The bss section will be memset to zero later in the initialization so
> @@ -108,6 +105,9 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> * attribute.
> */

The comment above this if statement should now probably be moved into the
if portion of the statement after the sme_encrypt_kernel() call, since now
more than just bss_decrypted work is being done here.

> if (sme_get_me_mask()) {
> + /* Encrypt the kernel and related */
> + sme_encrypt_kernel(bp);
> +
> vaddr = (unsigned long)__start_bss_decrypted;
> vaddr_end = (unsigned long)__end_bss_decrypted;
>
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 0166ab1780cc..7ddcf960e92a 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -45,6 +45,7 @@
> #include <asm/sections.h>
> #include <asm/cmdline.h>
> #include <asm/coco.h>
> +#include <asm/init.h>
> #include <asm/sev.h>
>
> #include "mm_internal.h"
> @@ -502,18 +503,15 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> native_write_cr3(__native_read_cr3());
> }
>
> -void __init sme_enable(struct boot_params *bp)
> +void __head sme_enable(struct boot_params *bp)
> {
> const char *cmdline_ptr, *cmdline_arg, *cmdline_on;
> unsigned int eax, ebx, ecx, edx;
> unsigned long feature_mask;
> unsigned long me_mask;
> char buffer[16];
> - bool snp;
> u64 msr;
>
> - snp = snp_init(bp);

The snp_init() call is here because the SNP CPUID table needs to be
established before the below CPUID instruction is executed. This can't be
moved.

Thanks,
Tom

> -
> /* Check for the SME/SEV support leaf */
> eax = 0x80000000;
> ecx = 0;
> @@ -546,7 +544,7 @@ void __init sme_enable(struct boot_params *bp)
> feature_mask = (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
>
> /* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
> - if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
> + if (snp_init(bp) && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
> snp_abort();
>
> /* Check if memory encryption is enabled */

2024-02-27 15:26:27

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] x86/sme: Avoid SME/SVE related checks on non-SME/SVE platforms

On Mon, 26 Feb 2024 at 22:38, Tom Lendacky <[email protected]> wrote:
>
> On 2/26/24 08:29, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <[email protected]>
> >
> > Reorganize the early SME/SVE init code so that SME/SVE related calls are
>
> I'm assuming you mean SEV here and in the subject line.
>

Yes.

> > deferred until it has been determined that the platform actually
> > supports this, and so those calls could actually make sense.
> >
> > This removes logic from the early boot path that executes from the 1:1
> > mapping when booting a CONFIG_AMD_MEM_ENCRYPT=y kernel on a system that
> > does not implement that (i.e., 99% of distro kernels)
>
> Maybe I'm missing something but I don't see how this has changed anything
> other than moving the call to sme_encrypt_kernel() within the if statement
> (which means the check at the beginning of sme_encrypt_kernel() could be
> changed do just check for SEV now).
>

The idea of this change is to avoid calls into SME/SEV related code,
in a way that can easily be backported to -stable kernels.

The reason is that the SME/SEV code has changed much more than the
rest of the code, and so backporting this entire series is going to be
messy, as well as unnecessary given that those kernels are not
expected to get all the latest SEV related changes anyway. We do tend
to try and keep those -stable kernels building with recent Clang
versions, and so keeping the SME/SEV code out of the boot path
entirely would help running into codegen issues such as the ones this
series is working around in code that is substantially different from
the latest revision.

However, I decided to drop this patch anyway. The mainline code should
be obviously correct, and introducing potential problems this way does
not justify the goal of easier -stable backports.

..
> > diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> > index 0166ab1780cc..7ddcf960e92a 100644
> > --- a/arch/x86/mm/mem_encrypt_identity.c
> > +++ b/arch/x86/mm/mem_encrypt_identity.c
> > @@ -45,6 +45,7 @@
> > #include <asm/sections.h>
> > #include <asm/cmdline.h>
> > #include <asm/coco.h>
> > +#include <asm/init.h>
> > #include <asm/sev.h>
> >
> > #include "mm_internal.h"
> > @@ -502,18 +503,15 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> > native_write_cr3(__native_read_cr3());
> > }
> >
> > -void __init sme_enable(struct boot_params *bp)
> > +void __head sme_enable(struct boot_params *bp)
> > {
> > const char *cmdline_ptr, *cmdline_arg, *cmdline_on;
> > unsigned int eax, ebx, ecx, edx;
> > unsigned long feature_mask;
> > unsigned long me_mask;
> > char buffer[16];
> > - bool snp;
> > u64 msr;
> >
> > - snp = snp_init(bp);
>
> The snp_init() call is here because the SNP CPUID table needs to be
> established before the below CPUID instruction is executed. This can't be
> moved.
>

Yeah, good point. I didn't spot this in my SEV-SNP boot testing,
presumably because the firmware's VC handler is still active in my
case, but this isn't guaranteed, e.g., when booting via the
decompressor, or when using 5-level paging.

Thanks for the review.

2024-02-27 15:45:36

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] x86/sme: Avoid SME/SVE related checks on non-SME/SVE platforms

On 2/27/24 08:55, Ard Biesheuvel wrote:
> On Mon, 26 Feb 2024 at 22:38, Tom Lendacky <[email protected]> wrote:
>>
>> On 2/26/24 08:29, Ard Biesheuvel wrote:
>>> From: Ard Biesheuvel <[email protected]>
>>>

>>> -void __init sme_enable(struct boot_params *bp)
>>> +void __head sme_enable(struct boot_params *bp)
>>> {
>>> const char *cmdline_ptr, *cmdline_arg, *cmdline_on;
>>> unsigned int eax, ebx, ecx, edx;
>>> unsigned long feature_mask;
>>> unsigned long me_mask;
>>> char buffer[16];
>>> - bool snp;
>>> u64 msr;
>>>
>>> - snp = snp_init(bp);
>>
>> The snp_init() call is here because the SNP CPUID table needs to be
>> established before the below CPUID instruction is executed. This can't be
>> moved.
>>
>
> Yeah, good point. I didn't spot this in my SEV-SNP boot testing,
> presumably because the firmware's VC handler is still active in my
> case, but this isn't guaranteed, e.g., when booting via the
> decompressor, or when using 5-level paging.

Actually, the kernel's do_vc_no_ghcb() is the handler that gets invoked.
Since the CPUID table is not initialized, -EOPNOTSUPP is returned from
snp_cpuid(), which results in the handler using the MSR protocol to get
the CPUID information from the hypervisor - which we don't want for SNP in
this situation.

Thanks,
Tom


>
> Thanks for the review.