2024-01-10 01:27:05

by Kevin Loughlin

[permalink] [raw]
Subject: [RFC PATCH] x86/sev: x86/sev: enforce PC-relative addressing in clang

SEV/SME code can execute prior to page table fixups for kernel
relocation. However, as with global variables accessed in
__startup_64(), clang does not currently generate PC-relative accesses
for SEV/SME global variables, causing certain flavors of SEV hosts and
guests to crash.

While an attempt was made to force PC-relative addressing for certain
global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
for example), PC-relative addressing must be pervasively-enforced for
SEV/SME global variables that can be accessed prior to page table
fixups.

To avoid the error-prone approach of manually referencing each SEV/SME
global variable via a general form of snp_cpuid_get_table(), it is
preferable to use compiler flags for position-independent code (ex:
`-fPIE`) that result in PC-relative accesses. While architecture-
specific code for Linux can be pervasively compiled as position-
independent on select architectures (ex: RISC-V), this is not currently
the case for x86-64 and would require extensive changes (see "[PATCH
RFC 00/43] x86/pie: Make kernel image's virtual address flexible" for
example).

Fortunately, the relevant files for SEV/SME code do indeed support
position-independent clang compilation, so we can use this technique to
ensure all global variables in these files are accessed via PC-relative
addressing.

Unlike clang, gcc does not currently allow `-fPIE` in conjunction with
`mcmodel=kernel`. Thus, to preserve existing gcc behavior, this patch
does not remove the (otherwise unnecessary) inline assembly that
already enforces PC-relative addressing for select SEV/SME globals
(mentioned above). If gcc supports these joint options in the future,
we can remove such inline assembly and also apply this patch to gcc
builds.

Tested by successful boot of SEV-SNP guest built with clang, alongside
Adam Dunlap's necessary "[PATCH v2] x86/asm: Force native_apic_mem_read
to use mov".

Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
Tested-by: Kevin Loughlin <[email protected]>
Signed-off-by: Kevin Loughlin <[email protected]>
---
arch/x86/coco/Makefile | 10 ++++++++++
arch/x86/kernel/Makefile | 10 ++++++++++
arch/x86/mm/Makefile | 13 +++++++++++++
3 files changed, 33 insertions(+)

diff --git a/arch/x86/coco/Makefile b/arch/x86/coco/Makefile
index c816acf78b6a..286950596ee9 100644
--- a/arch/x86/coco/Makefile
+++ b/arch/x86/coco/Makefile
@@ -5,4 +5,14 @@ CFLAGS_core.o += -fno-stack-protector

obj-y += core.o

+# clang allows -fPIE with mcmodel=kernel; gcc currently does not.
+ifdef CONFIG_CC_IS_CLANG
+# Enforce PC-relative addressing for SEV/SME global variables.
+CFLAGS_core.o += -fPIE
+# Disable relocation relaxation in case the link is not PIE.
+CFLAGS_core.o += $(call cc-option,-Wa$(comma)-mrelax-relocations=no)
+# Avoid unnecessary GOT overhead in PC-relative addressing.
+CFLAGS_core.o += -include $(srctree)/include/linux/hidden.h
+endif
+
obj-$(CONFIG_INTEL_TDX_GUEST) += tdx/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0000325ab98f..bf85f9de89e9 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -7,6 +7,16 @@ extra-y += vmlinux.lds

CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)

+# clang allows -fPIE with mcmodel=kernel; gcc currently does not.
+ifdef CONFIG_CC_IS_CLANG
+# Enforce PC-relative addressing for SEV/SME global variables.
+CFLAGS_sev.o += -fPIE
+# Disable relocation relaxation in case the link is not PIE.
+CFLAGS_sev.o += $(call cc-option,-Wa$(comma)-mrelax-relocations=no)
+# Avoid unnecessary GOT overhead in PC-relative addressing.
+CFLAGS_sev.o += -include $(srctree)/include/linux/hidden.h
+endif
+
ifdef CONFIG_FUNCTION_TRACER
# Do not profile debug and lowlevel utilities
CFLAGS_REMOVE_tsc.o = -pg
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index c80febc44cd2..7abf20a94451 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -17,6 +17,19 @@ KCSAN_SANITIZE := n
# Avoid recursion by not calling KMSAN hooks for CEA code.
KMSAN_SANITIZE_cpu_entry_area.o := n

+# clang allows -fPIE with mcmodel=kernel; gcc currently does not.
+ifdef CONFIG_CC_IS_CLANG
+# Enforce PC-relative addressing for SEV/SME global variables.
+CFLAGS_mem_encrypt_amd.o += -fPIE
+CFLAGS_mem_encrypt_identity.o += -fPIE
+# Disable relocation relaxation in case the link is not PIE.
+CFLAGS_mem_encrypt_amd.o += $(call cc-option,-Wa$(comma)-mrelax-relocations=no)
+CFLAGS_mem_encrypt_identity.o += $(call cc-option,-Wa$(comma)-mrelax-relocations=no)
+# Avoid unnecessary GOT overhead in PC-relative addressing.
+CFLAGS_mem_encrypt_amd.o += -include $(srctree)/include/linux/hidden.h
+CFLAGS_mem_encrypt_identity.o += -include $(srctree)/include/linux/hidden.h
+endif
+
ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_mem_encrypt.o = -pg
CFLAGS_REMOVE_mem_encrypt_amd.o = -pg
--
2.43.0.275.g3460e3d667-goog



2024-01-10 11:51:37

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/sev: x86/sev: enforce PC-relative addressing in clang

On Wed, Jan 10, 2024 at 01:26:39AM +0000, Kevin Loughlin wrote:
> SEV/SME code can execute prior to page table fixups for kernel

That seems to be fundamental problem.

> relocation. However, as with global variables accessed in
> __startup_64(), clang does not currently generate PC-relative accesses
> for SEV/SME global variables, causing certain flavors of SEV hosts and
> guests to crash.

And that's a hack to work around it.

>
> While an attempt was made to force PC-relative addressing for certain
> global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
> for example), PC-relative addressing must be pervasively-enforced for
> SEV/SME global variables that can be accessed prior to page table
> fixups.
>
> To avoid the error-prone approach of manually referencing each SEV/SME
> global variable via a general form of snp_cpuid_get_table(), it is
> preferable to use compiler flags for position-independent code (ex:

But if gcc doesn't support it then it doesn't work.

It seems your approach with incompatible execution models between
the compilers is just a recipe for future patches only working
on one of the compilers because most code submitters probably
won't test both.

It would be better to at least use a unified execution model, if you want
to extend the hack and not fix the underlying issue.

-Andi

2024-01-10 13:38:19

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/sev: x86/sev: enforce PC-relative addressing in clang

On Wed, Jan 10, 2024 at 01:26:39AM +0000, Kevin Loughlin wrote:
> SEV/SME code can execute prior to page table fixups for kernel
> relocation. However, as with global variables accessed in
> __startup_64(), clang does not currently generate PC-relative accesses
> for SEV/SME global variables, causing certain flavors of SEV hosts and
> guests to crash.
>
> While an attempt was made to force PC-relative addressing for certain
> global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
> for example), PC-relative addressing must be pervasively-enforced for
> SEV/SME global variables that can be accessed prior to page table
> fixups.
>
> To avoid the error-prone approach of manually referencing each SEV/SME
> global variable via a general form of snp_cpuid_get_table(), it is
> preferable to use compiler flags for position-independent code (ex:
> `-fPIE`) that result in PC-relative accesses. While architecture-
> specific code for Linux can be pervasively compiled as position-
> independent on select architectures (ex: RISC-V), this is not currently
> the case for x86-64 and would require extensive changes (see "[PATCH
> RFC 00/43] x86/pie: Make kernel image's virtual address flexible" for
> example).
>
> Fortunately, the relevant files for SEV/SME code do indeed support
> position-independent clang compilation, so we can use this technique to
> ensure all global variables in these files are accessed via PC-relative
> addressing.
>
> Unlike clang, gcc does not currently allow `-fPIE` in conjunction with
> `mcmodel=kernel`. Thus, to preserve existing gcc behavior, this patch
> does not remove the (otherwise unnecessary) inline assembly that
> already enforces PC-relative addressing for select SEV/SME globals
> (mentioned above). If gcc supports these joint options in the future,
> we can remove such inline assembly and also apply this patch to gcc
> builds.
>
> Tested by successful boot of SEV-SNP guest built with clang, alongside
> Adam Dunlap's necessary "[PATCH v2] x86/asm: Force native_apic_mem_read
> to use mov".
>

Similar issues was fixed before with fixup_pointer() tricks. Have you
tried looking this direction.

Relevant thread starting with:

https://lore.kernel.org/all/[email protected]

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-10 17:36:07

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/sev: x86/sev: enforce PC-relative addressing in clang

On Wed, Jan 10, 2024 at 5:37 AM Kirill A. Shutemov
<[email protected]> wrote:
>
> Similar issues was fixed before with fixup_pointer() tricks. Have you
> tried looking this direction.
>
> Relevant thread starting with:
>
> https://lore.kernel.org/all/[email protected]

Yes, I considered using `fixup_pointer()` as well, and I agree that it
remains a valid option to solve this problem. I just now mentioned the
primary downside to using `fixup_pointer()` in my response to Andi
(https://lore.kernel.org/lkml/CAGdbjmJcQeLWroBkbRC-cFQ6OuMHONsGLzpUrrA5k3uc8Rzkmw@mail.gmail.com/).
Namely, we would have to pass around `physaddr` from `__startup64()`
to any functions wanting to use `fixup_pointer()`). However, we should
be able to get around needing `physaddr` by using something like the
`GET_RIP_RELATIVE_PTR()` macro that I proposed in the same reply
(forthcoming in v2 of this RFC PATCH).

2024-01-10 17:50:03

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/sev: x86/sev: enforce PC-relative addressing in clang

> On that note, I do have another version of this patch that abstracts
> snp_cpuid_get_table() into a macro along the lines of...
>
> #define GET_RIP_RELATIVE_PTR(var) \
> ({ \
> void *ptr; \
> asm ("lea "#var"(%%rip), %0" \
> : "=r" (ptr) \
> : "p" (&var)); \
> ptr; \
> })
>
> ...and uses this new macro to access all SEV/SME global variables (not
> just the cpuid_table). It's similar in nature to `fixup_pointer()`
> (currently defined in arch/x86/kernel/head64.c) but doesn't require us
> to pass around `physaddr` from `__startup64()`. This wouldn't
> introduce any new execution model changes between clang vs gcc and
> would be consistent with the kernel's current approach of relying on
> developers to manually apply fixups for global variable accesses prior
> to kernel relocation. I can send an RFC v2 for the
> GET_RIP_RELATIVE_PTR() version of this patch.

That looks like a far better solution indeed.

Ideally objtool would check for this, perhaps with a new ELF
section. But actually doing that might be far more work, so perhaps not
worth it.

Thanks,

-Andi

2024-01-11 22:37:15

by Kevin Loughlin

[permalink] [raw]
Subject: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

SEV/SME code can execute prior to page table fixups for kernel
relocation. However, as with global variables accessed in
__startup_64(), the compiler is not required to generate RIP-relative
accesses for SEV/SME global variables, causing certain flavors of SEV
hosts and guests built with clang to crash during boot.

While an attempt was made to force RIP-relative addressing for certain
global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
for example), RIP-relative addressing must be pervasively-enforced for
SEV/SME global variables when accessed prior to page table fixups.

__startup_64() already handles this issue for select non-SEV/SME global
variables using fixup_pointer(), which adjusts the pointer relative to
a `physaddr` argument. To avoid having to pass around this `physaddr`
argument across all functions needing to apply pointer fixups, this
patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
the existing snp_cpuid_get_table()), which generates an RIP-relative
pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
introduced to fixup an existing pointer value with RIP-relative logic.

Applying these macros to early SEV/SME code (alongside Adam Dunlap's
necessary "[PATCH v2] x86/asm: Force native_apic_mem_read to use mov")
enables previously-failing boots of clang builds to succeed, while
preserving successful boot of gcc builds. Tested with and without SEV,
SEV-ES, SEV-SNP enabled in guests built via both gcc and clang.

Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
Tested-by: Kevin Loughlin <[email protected]>
Signed-off-by: Kevin Loughlin <[email protected]>
---
arch/x86/coco/core.c | 22 +++++++----
arch/x86/include/asm/mem_encrypt.h | 37 +++++++++++++++++-
arch/x86/kernel/head64.c | 22 ++++++-----
arch/x86/kernel/head_64.S | 4 +-
arch/x86/kernel/sev-shared.c | 63 ++++++++++++++++--------------
arch/x86/kernel/sev.c | 15 +++++--
arch/x86/mm/mem_encrypt_identity.c | 50 ++++++++++++------------
7 files changed, 136 insertions(+), 77 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index eeec9986570e..8c45b5643f48 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -5,6 +5,11 @@
* Copyright (C) 2021 Advanced Micro Devices, Inc.
*
* Author: Tom Lendacky <[email protected]>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/

#include <linux/export.h>
@@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
static bool noinstr amd_cc_platform_has(enum cc_attr attr)
{
#ifdef CONFIG_AMD_MEM_ENCRYPT
+ const u64 sev_status_fixed_up = sev_get_status_fixup();

- if (sev_status & MSR_AMD64_SNP_VTOM)
+ if (sev_status_fixed_up & MSR_AMD64_SNP_VTOM)
return amd_cc_platform_vtom(attr);

switch (attr) {
case CC_ATTR_MEM_ENCRYPT:
- return sme_me_mask;
+ return sme_get_me_mask_fixup();

case CC_ATTR_HOST_MEM_ENCRYPT:
- return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
+ return sme_get_me_mask_fixup() && !(sev_status_fixed_up & MSR_AMD64_SEV_ENABLED);

case CC_ATTR_GUEST_MEM_ENCRYPT:
- return sev_status & MSR_AMD64_SEV_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_ENABLED;

case CC_ATTR_GUEST_STATE_ENCRYPT:
- return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED;

/*
* With SEV, the rep string I/O instructions need to be unrolled
* but SEV-ES supports them through the #VC handler.
*/
case CC_ATTR_GUEST_UNROLL_STRING_IO:
- return (sev_status & MSR_AMD64_SEV_ENABLED) &&
- !(sev_status & MSR_AMD64_SEV_ES_ENABLED);
+ return (sev_status_fixed_up & MSR_AMD64_SEV_ENABLED) &&
+ !(sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED);

case CC_ATTR_GUEST_SEV_SNP:
- return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_SNP_ENABLED;

default:
return false;
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 359ada486fa9..d007050a0edc 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -17,6 +17,34 @@

#include <asm/bootparam.h>

+/*
+ * Generates an RIP-relative pointer to a data variable "var".
+ * This macro can be used to safely access global data variables prior to kernel
+ * relocation, similar to fixup_pointer() in arch/x86/kernel/head64.c.
+ */
+#define GET_RIP_RELATIVE_PTR(var) \
+({ \
+ void *rip_rel_ptr; \
+ asm ("lea "#var"(%%rip), %0" \
+ : "=r" (rip_rel_ptr) \
+ : "p" (&var)); \
+ rip_rel_ptr; \
+})
+
+/*
+ * Converts an existing pointer "ptr" to an RIP-relative pointer.
+ * This macro can be used to safely access global pointers prior to kernel
+ * relocation, similar to fixup_pointer() in arch/x86/kernel/head64.c.
+ */
+#define PTR_TO_RIP_RELATIVE_PTR(ptr) \
+({ \
+ void *rip_rel_ptr; \
+ asm ("lea "#ptr"(%%rip), %0" \
+ : "=r" (rip_rel_ptr) \
+ : "p" (ptr)); \
+ rip_rel_ptr; \
+})
+
#ifdef CONFIG_X86_MEM_ENCRYPT
void __init mem_encrypt_init(void);
void __init mem_encrypt_setup_arch(void);
@@ -106,9 +134,14 @@ void add_encrypt_protection_map(void);

extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];

-static inline u64 sme_get_me_mask(void)
+static inline u64 sme_get_me_mask_fixup(void)
+{
+ return *((u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask));
+}
+
+static inline u64 sev_get_status_fixup(void)
{
- return sme_me_mask;
+ return *((u64 *) GET_RIP_RELATIVE_PTR(sev_status));
}

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index dc0956067944..8df7a198094d 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -130,6 +130,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
{
unsigned long vaddr, vaddr_end;
int i;
+ const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();

/* Encrypt the kernel and related (if SME is active) */
sme_encrypt_kernel(bp);
@@ -140,7 +141,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* there is no need to zero it after changing the memory encryption
* attribute.
*/
- if (sme_get_me_mask()) {
+ if (sme_me_mask_fixed_up) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;

@@ -158,7 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);

i = pmd_index(vaddr);
- pmd[i] -= sme_get_me_mask();
+ pmd[i] -= sme_me_mask_fixed_up;
}
}

@@ -166,14 +167,16 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* Return the SME encryption mask (if SME is active) to be used as a
* modifier for the initial pgdir entry programmed into CR3.
*/
- return sme_get_me_mask();
+ return sme_me_mask_fixed_up;
}

-/* Code in __startup_64() can be relocated during execution, but the compiler
+/*
+ * WARNING!!
+ * Code in __startup_64() can be relocated during execution, but the compiler
* doesn't have to generate PC-relative relocations when accessing globals from
* that function. Clang actually does not generate them, which leads to
* boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer().
+ * be adjusted using fixup_pointer() or GET_RIP_RELATIVE_PTR().
*/
unsigned long __head __startup_64(unsigned long physaddr,
struct boot_params *bp)
@@ -188,6 +191,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
bool la57;
int i;
unsigned int *next_pgt_ptr;
+ const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();

la57 = check_la57_support(physaddr);

@@ -206,7 +210,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
for (;;);

/* Include the SME encryption mask in the fixup value */
- load_delta += sme_get_me_mask();
+ load_delta += sme_me_mask_fixed_up;

/* Fixup the physical addresses in the page table */

@@ -242,7 +246,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);

- pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
+ pgtable_flags = _KERNPG_TABLE_NOENC + sme_me_mask_fixed_up;

if (la57) {
p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
@@ -269,7 +273,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
/* Filter out unsupported __PAGE_KERNEL_* bits: */
mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
pmd_entry &= *mask_ptr;
- pmd_entry += sme_get_me_mask();
+ pmd_entry += sme_me_mask_fixed_up;
pmd_entry += physaddr;

for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
@@ -313,7 +317,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Fixup phys_base - remove the memory encryption mask to obtain
* the true physical address.
*/
- *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
+ *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;

return sme_postprocess_startup(bp, pmd);
}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d4918d03efb4..b9e52cee6e00 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -176,9 +176,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
* added to the initial pgdir entry that will be programmed into CR3.
+ * Since we may have not completed page table fixups, use RIP-relative
+ * addressing for sme_me_mask.
*/
#ifdef CONFIG_AMD_MEM_ENCRYPT
- movq sme_me_mask, %rax
+ movq sme_me_mask(%rip), %rax
#else
xorq %rax, %rax
#endif
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 1d24ec679915..e71752c990ef 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -7,6 +7,11 @@
* This file is not compiled stand-alone. It contains code shared
* between the pre-decompression boot code and the running Linux kernel
* and is included directly into both code-bases.
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/

#ifndef __BOOT_COMPRESSED
@@ -110,8 +115,9 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
static u64 get_hv_features(void)
{
u64 val;
+ const u16 *ghcb_version_ptr = (const u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);

- if (ghcb_version < 2)
+ if (*ghcb_version_ptr < 2)
return 0;

sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
@@ -143,6 +149,7 @@ static void snp_register_ghcb_early(unsigned long paddr)
static bool sev_es_negotiate_protocol(void)
{
u64 val;
+ u16 *ghcb_version_ptr;

/* Do the GHCB protocol version negotiation */
sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
@@ -156,7 +163,8 @@ static bool sev_es_negotiate_protocol(void)
GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
return false;

- ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+ ghcb_version_ptr = (u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);
+ *ghcb_version_ptr = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);

return true;
}
@@ -318,23 +326,6 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
: __sev_cpuid_hv_msr(leaf);
}

-/*
- * This may be called early while still running on the initial identity
- * mapping. Use RIP-relative addressing to obtain the correct address
- * while running with the initial identity mapping as well as the
- * switch-over to kernel virtual addresses later.
- */
-static const struct snp_cpuid_table *snp_cpuid_get_table(void)
-{
- void *ptr;
-
- asm ("lea cpuid_table_copy(%%rip), %0"
- : "=r" (ptr)
- : "p" (&cpuid_table_copy));
-
- return ptr;
-}
-
/*
* The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
* XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
@@ -357,7 +348,8 @@ static const struct snp_cpuid_table *snp_cpuid_get_table(void)
*/
static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
u64 xfeatures_found = 0;
u32 xsave_size = 0x240;
int i;
@@ -394,7 +386,8 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
static bool
snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
int i;

for (i = 0; i < cpuid_table->count; i++) {
@@ -530,7 +523,9 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
*/
static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
+ const u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;

if (!cpuid_table->count)
return -EOPNOTSUPP;
@@ -555,10 +550,14 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
*/
leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;

+ cpuid_std_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_std_range_max);
+ cpuid_hyp_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_hyp_range_max);
+ cpuid_ext_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_ext_range_max);
+
/* Skip post-processing for out-of-range zero leafs. */
- if (!(leaf->fn <= cpuid_std_range_max ||
- (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
- (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
+ if (!(leaf->fn <= *cpuid_std_range_max_ptr ||
+ (leaf->fn >= 0x40000000 && leaf->fn <= *cpuid_hyp_range_max_ptr) ||
+ (leaf->fn >= 0x80000000 && leaf->fn <= *cpuid_ext_range_max_ptr)))
return 0;
}

@@ -1046,6 +1045,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
{
const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
+ u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
int i;

if (!cc_info || !cc_info->cpuid_phys || cc_info->cpuid_len < PAGE_SIZE)
@@ -1055,19 +1055,24 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
if (!cpuid_table_fw->count || cpuid_table_fw->count > SNP_CPUID_COUNT_MAX)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID);

- cpuid_table = snp_cpuid_get_table();
+ cpuid_table = (const struct snp_cpuid_table *) GET_RIP_RELATIVE_PTR(
+ cpuid_table_copy);
memcpy((void *)cpuid_table, cpuid_table_fw, sizeof(*cpuid_table));

+ cpuid_std_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_std_range_max);
+ cpuid_hyp_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_hyp_range_max);
+ cpuid_ext_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_ext_range_max);
+
/* Initialize CPUID ranges for range-checking. */
for (i = 0; i < cpuid_table->count; i++) {
const struct snp_cpuid_fn *fn = &cpuid_table->fn[i];

if (fn->eax_in == 0x0)
- cpuid_std_range_max = fn->eax;
+ *cpuid_std_range_max_ptr = fn->eax;
else if (fn->eax_in == 0x40000000)
- cpuid_hyp_range_max = fn->eax;
+ *cpuid_hyp_range_max_ptr = fn->eax;
else if (fn->eax_in == 0x80000000)
- cpuid_ext_range_max = fn->eax;
+ *cpuid_ext_range_max_ptr = fn->eax;
}
}

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index c67285824e82..c966bc511949 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -5,6 +5,11 @@
* Copyright (C) 2019 SUSE
*
* Author: Joerg Roedel <[email protected]>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/

#define pr_fmt(fmt) "SEV: " fmt
@@ -748,7 +753,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* This eliminates worries about jump tables or checking boot_cpu_data
* in the cc_platform_has() function.
*/
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
return;

/*
@@ -767,7 +772,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
* This eliminates worries about jump tables or checking boot_cpu_data
* in the cc_platform_has() function.
*/
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
return;

/* Ask hypervisor to mark the memory pages shared in the RMP table. */
@@ -2114,7 +2119,8 @@ void __init __noreturn snp_abort(void)

static void dump_cpuid_table(void)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
int i = 0;

pr_info("count=%d reserved=0x%x reserved2=0x%llx\n",
@@ -2138,7 +2144,8 @@ static void dump_cpuid_table(void)
*/
static int __init report_cpuid_table(void)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);

if (!cpuid_table->count)
return 0;
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index d73aeb16417f..f4c864ea2468 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -5,6 +5,11 @@
* Copyright (C) 2016 Advanced Micro Devices, Inc.
*
* Author: Tom Lendacky <[email protected]>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/

#define DISABLE_BRANCH_PROFILING
@@ -305,7 +310,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* instrumentation or checking boot_cpu_data in the cc_platform_has()
* function.
*/
- if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED)
+ if (!sme_get_me_mask_fixup() || sev_get_status_fixup() & MSR_AMD64_SEV_ENABLED)
return;

/*
@@ -346,9 +351,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* We're running identity mapped, so we must obtain the address to the
* SME encryption workarea using rip-relative addressing.
*/
- asm ("lea sme_workarea(%%rip), %0"
- : "=r" (workarea_start)
- : "p" (sme_workarea));
+ workarea_start = (unsigned long) PTR_TO_RIP_RELATIVE_PTR(sme_workarea);

/*
* Calculate required number of workarea bytes needed:
@@ -511,7 +514,7 @@ void __init sme_enable(struct boot_params *bp)
unsigned long me_mask;
char buffer[16];
bool snp;
- u64 msr;
+ u64 msr, *sme_me_mask_ptr, *sev_status_ptr;

snp = snp_init(bp);

@@ -542,12 +545,14 @@ void __init sme_enable(struct boot_params *bp)

me_mask = 1UL << (ebx & 0x3f);

+ sev_status_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sev_status);
+
/* Check the SEV MSR whether SEV or SME is enabled */
- sev_status = __rdmsr(MSR_AMD64_SEV);
- feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
+ *sev_status_ptr = __rdmsr(MSR_AMD64_SEV);
+ feature_mask = (*sev_status_ptr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;

/* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
- if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (snp && !(*sev_status_ptr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();

/* Check if memory encryption is enabled */
@@ -573,7 +578,8 @@ void __init sme_enable(struct boot_params *bp)
return;
} else {
/* SEV state cannot be controlled by a command line option */
- sme_me_mask = me_mask;
+ sme_me_mask_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask);
+ *sme_me_mask_ptr = me_mask;
goto out;
}

@@ -582,15 +588,9 @@ void __init sme_enable(struct boot_params *bp)
* identity mapped, so we must obtain the address to the SME command
* line argument data using rip-relative addressing.
*/
- asm ("lea sme_cmdline_arg(%%rip), %0"
- : "=r" (cmdline_arg)
- : "p" (sme_cmdline_arg));
- asm ("lea sme_cmdline_on(%%rip), %0"
- : "=r" (cmdline_on)
- : "p" (sme_cmdline_on));
- asm ("lea sme_cmdline_off(%%rip), %0"
- : "=r" (cmdline_off)
- : "p" (sme_cmdline_off));
+ cmdline_arg = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_arg);
+ cmdline_on = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_on);
+ cmdline_off = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_off);

if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
active_by_default = true;
@@ -603,16 +603,18 @@ void __init sme_enable(struct boot_params *bp)
if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
return;

+ sme_me_mask_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask);
+
if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
- sme_me_mask = me_mask;
+ *sme_me_mask_ptr = me_mask;
else if (!strncmp(buffer, cmdline_off, sizeof(buffer)))
- sme_me_mask = 0;
+ *sme_me_mask_ptr = 0;
else
- sme_me_mask = active_by_default ? me_mask : 0;
+ *sme_me_mask_ptr = active_by_default ? me_mask : 0;
out:
- if (sme_me_mask) {
- physical_mask &= ~sme_me_mask;
+ if (*sme_me_mask_ptr) {
+ physical_mask &= ~(*sme_me_mask_ptr);
cc_vendor = CC_VENDOR_AMD;
- cc_set_mask(sme_me_mask);
+ cc_set_mask(*sme_me_mask_ptr);
}
}
--
2.43.0.275.g3460e3d667-goog


2024-01-12 12:17:47

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> SEV/SME code can execute prior to page table fixups for kernel
> relocation. However, as with global variables accessed in
> __startup_64(), the compiler is not required to generate RIP-relative
> accesses for SEV/SME global variables, causing certain flavors of SEV
> hosts and guests built with clang to crash during boot.
>
> While an attempt was made to force RIP-relative addressing for certain
> global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
> for example), RIP-relative addressing must be pervasively-enforced for
> SEV/SME global variables when accessed prior to page table fixups.
>
> __startup_64() already handles this issue for select non-SEV/SME global
> variables using fixup_pointer(), which adjusts the pointer relative to
> a `physaddr` argument. To avoid having to pass around this `physaddr`
> argument across all functions needing to apply pointer fixups, this
> patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
> the existing snp_cpuid_get_table()), which generates an RIP-relative
> pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
> introduced to fixup an existing pointer value with RIP-relative logic.

Can we replace existing fixup_pointer() (and other fixup_*()) with the new
thing? I don't think we need two confusing things for the same function.

Also, is there any reason why GET_RIP_RELATIVE_PTR() and
PTR_TO_RIP_RELATIVE_PTR() have to be macros? Inline functions would be
cleaner.

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-12 18:29:57

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Fri, Jan 12, 2024 at 4:17 AM Kirill A. Shutemov
<[email protected]> wrote:
>
> Can we replace existing fixup_pointer() (and other fixup_*()) with the new
> thing? I don't think we need two confusing things for the same function.

Per my tests, yes we can; I replaced the fixup_*() functions with
GET_RIP_RELATIVE_PTR()/PTR_TO_RIP_RELATIVE_PTR(), and guests with and
without SEV, SEV-ES, and SEV-SNP all successfully booted under both
clang and gcc builds. I have a slight preference for sending that as a
separate follow-up commit, but please let me know if you feel
differently. Thanks.

> Also, is there any reason why GET_RIP_RELATIVE_PTR() and
> PTR_TO_RIP_RELATIVE_PTR() have to be macros? Inline functions would be
> cleaner.

I used macros because we need to use both the global variable itself
and the global variable's string name (obtained via #var in the macro)
in the inline assembly. As a secondary reason, the macro also avoids
the need to provide separate functions for each type of variable for
which we'd like to get RIP-relative pointers (ex: u64, unsigned int,
unsigned long, etc.).

2024-01-15 10:13:03

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Fri, Jan 12, 2024 at 10:29:36AM -0800, Kevin Loughlin wrote:
> On Fri, Jan 12, 2024 at 4:17 AM Kirill A. Shutemov
> <[email protected]> wrote:
> >
> > Can we replace existing fixup_pointer() (and other fixup_*()) with the new
> > thing? I don't think we need two confusing things for the same function.
>
> Per my tests, yes we can; I replaced the fixup_*() functions with
> GET_RIP_RELATIVE_PTR()/PTR_TO_RIP_RELATIVE_PTR(), and guests with and
> without SEV, SEV-ES, and SEV-SNP all successfully booted under both
> clang and gcc builds.

Okay good.

BTW, do we need both macros? Caller can do &var, right?

If we are okay with single macro, maybe rename it to RIP_RELATIVE_PTR().

One other thing: I see you sprinkle casts to for every use of the macros.
But why? void* can cast to any other pointer without explicit casting.

> I have a slight preference for sending that as a
> separate follow-up commit, but please let me know if you feel
> differently. Thanks.

I'm okay with a separate patch in the same patchset.

>
> > Also, is there any reason why GET_RIP_RELATIVE_PTR() and
> > PTR_TO_RIP_RELATIVE_PTR() have to be macros? Inline functions would be
> > cleaner.
>
> I used macros because we need to use both the global variable itself
> and the global variable's string name (obtained via #var in the macro)
> in the inline assembly. As a secondary reason, the macro also avoids
> the need to provide separate functions for each type of variable for
> which we'd like to get RIP-relative pointers (ex: u64, unsigned int,
> unsigned long, etc.).

If we do it only on pointers, wouldn't void * -> void * be enough?

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-15 15:53:54

by Tom Lendacky

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On 1/11/24 16:36, Kevin Loughlin wrote:
> SEV/SME code can execute prior to page table fixups for kernel
> relocation. However, as with global variables accessed in
> __startup_64(), the compiler is not required to generate RIP-relative
> accesses for SEV/SME global variables, causing certain flavors of SEV
> hosts and guests built with clang to crash during boot.
>
> While an attempt was made to force RIP-relative addressing for certain
> global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
> for example), RIP-relative addressing must be pervasively-enforced for
> SEV/SME global variables when accessed prior to page table fixups.
>
> __startup_64() already handles this issue for select non-SEV/SME global
> variables using fixup_pointer(), which adjusts the pointer relative to
> a `physaddr` argument. To avoid having to pass around this `physaddr`
> argument across all functions needing to apply pointer fixups, this
> patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
> the existing snp_cpuid_get_table()), which generates an RIP-relative
> pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
> introduced to fixup an existing pointer value with RIP-relative logic.
>
> Applying these macros to early SEV/SME code (alongside Adam Dunlap's
> necessary "[PATCH v2] x86/asm: Force native_apic_mem_read to use mov")
> enables previously-failing boots of clang builds to succeed, while
> preserving successful boot of gcc builds. Tested with and without SEV,
> SEV-ES, SEV-SNP enabled in guests built via both gcc and clang.
>
> Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
> Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
> Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
> Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
> Tested-by: Kevin Loughlin <[email protected]>
> Signed-off-by: Kevin Loughlin <[email protected]>
> ---
> arch/x86/coco/core.c | 22 +++++++----
> arch/x86/include/asm/mem_encrypt.h | 37 +++++++++++++++++-
> arch/x86/kernel/head64.c | 22 ++++++-----
> arch/x86/kernel/head_64.S | 4 +-
> arch/x86/kernel/sev-shared.c | 63 ++++++++++++++++--------------
> arch/x86/kernel/sev.c | 15 +++++--
> arch/x86/mm/mem_encrypt_identity.c | 50 ++++++++++++------------
> 7 files changed, 136 insertions(+), 77 deletions(-)
>
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index eeec9986570e..8c45b5643f48 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2021 Advanced Micro Devices, Inc.
> *
> * Author: Tom Lendacky <[email protected]>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #include <linux/export.h>
> @@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
> static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> {
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> + const u64 sev_status_fixed_up = sev_get_status_fixup();

Why not also have a variable for sme_me_mask?

>
> - if (sev_status & MSR_AMD64_SNP_VTOM)
> + if (sev_status_fixed_up & MSR_AMD64_SNP_VTOM)
> return amd_cc_platform_vtom(attr);
>
> switch (attr) {
> case CC_ATTR_MEM_ENCRYPT:
> - return sme_me_mask;
> + return sme_get_me_mask_fixup();
>
> case CC_ATTR_HOST_MEM_ENCRYPT:
> - return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
> + return sme_get_me_mask_fixup() && !(sev_status_fixed_up & MSR_AMD64_SEV_ENABLED);
>
> case CC_ATTR_GUEST_MEM_ENCRYPT:
> - return sev_status & MSR_AMD64_SEV_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_ENABLED;
>
> case CC_ATTR_GUEST_STATE_ENCRYPT:
> - return sev_status & MSR_AMD64_SEV_ES_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED;
>
> /*
> * With SEV, the rep string I/O instructions need to be unrolled
> * but SEV-ES supports them through the #VC handler.
> */
> case CC_ATTR_GUEST_UNROLL_STRING_IO:
> - return (sev_status & MSR_AMD64_SEV_ENABLED) &&
> - !(sev_status & MSR_AMD64_SEV_ES_ENABLED);
> + return (sev_status_fixed_up & MSR_AMD64_SEV_ENABLED) &&
> + !(sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED);
>
> case CC_ATTR_GUEST_SEV_SNP:
> - return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_SNP_ENABLED;
>
> default:
> return false;
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 359ada486fa9..d007050a0edc 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -17,6 +17,34 @@
>
> #include <asm/bootparam.h>
>
> +/*
> + * Generates an RIP-relative pointer to a data variable "var".
> + * This macro can be used to safely access global data variables prior to kernel
> + * relocation, similar to fixup_pointer() in arch/x86/kernel/head64.c.
> + */
> +#define GET_RIP_RELATIVE_PTR(var) \
> +({ \
> + void *rip_rel_ptr; \
> + asm ("lea "#var"(%%rip), %0" \
> + : "=r" (rip_rel_ptr) \
> + : "p" (&var)); \
> + rip_rel_ptr; \
> +})
> +
> +/*
> + * Converts an existing pointer "ptr" to an RIP-relative pointer.
> + * This macro can be used to safely access global pointers prior to kernel
> + * relocation, similar to fixup_pointer() in arch/x86/kernel/head64.c.
> + */
> +#define PTR_TO_RIP_RELATIVE_PTR(ptr) \
> +({ \
> + void *rip_rel_ptr; \
> + asm ("lea "#ptr"(%%rip), %0" \
> + : "=r" (rip_rel_ptr) \
> + : "p" (ptr)); \
> + rip_rel_ptr; \
> +})
> +
> #ifdef CONFIG_X86_MEM_ENCRYPT
> void __init mem_encrypt_init(void);
> void __init mem_encrypt_setup_arch(void);
> @@ -106,9 +134,14 @@ void add_encrypt_protection_map(void);
>
> extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];
>
> -static inline u64 sme_get_me_mask(void)
> +static inline u64 sme_get_me_mask_fixup(void)
> +{
> + return *((u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask));
> +}
> +
> +static inline u64 sev_get_status_fixup(void)
> {
> - return sme_me_mask;
> + return *((u64 *) GET_RIP_RELATIVE_PTR(sev_status));
> }
>
> #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index dc0956067944..8df7a198094d 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -130,6 +130,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> {
> unsigned long vaddr, vaddr_end;
> int i;
> + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();

Should be the first variable listed given the length of the line.

>
> /* Encrypt the kernel and related (if SME is active) */
> sme_encrypt_kernel(bp);
> @@ -140,7 +141,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> * there is no need to zero it after changing the memory encryption
> * attribute.
> */
> - if (sme_get_me_mask()) {
> + if (sme_me_mask_fixed_up) {
> vaddr = (unsigned long)__start_bss_decrypted;
> vaddr_end = (unsigned long)__end_bss_decrypted;
>
> @@ -158,7 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
>
> i = pmd_index(vaddr);
> - pmd[i] -= sme_get_me_mask();
> + pmd[i] -= sme_me_mask_fixed_up;
> }
> }
>
> @@ -166,14 +167,16 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> * Return the SME encryption mask (if SME is active) to be used as a
> * modifier for the initial pgdir entry programmed into CR3.
> */
> - return sme_get_me_mask();
> + return sme_me_mask_fixed_up;
> }
>
> -/* Code in __startup_64() can be relocated during execution, but the compiler
> +/*
> + * WARNING!!
> + * Code in __startup_64() can be relocated during execution, but the compiler
> * doesn't have to generate PC-relative relocations when accessing globals from
> * that function. Clang actually does not generate them, which leads to
> * boot-time crashes. To work around this problem, every global pointer must
> - * be adjusted using fixup_pointer().
> + * be adjusted using fixup_pointer() or GET_RIP_RELATIVE_PTR().
> */
> unsigned long __head __startup_64(unsigned long physaddr,
> struct boot_params *bp)
> @@ -188,6 +191,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> bool la57;
> int i;
> unsigned int *next_pgt_ptr;
> + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();

Ditto, here, this should be higher in the variable list.

>
> la57 = check_la57_support(physaddr);
>
> @@ -206,7 +210,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> for (;;);
>
> /* Include the SME encryption mask in the fixup value */
> - load_delta += sme_get_me_mask();
> + load_delta += sme_me_mask_fixed_up;
>
> /* Fixup the physical addresses in the page table */
>
> @@ -242,7 +246,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
>
> - pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
> + pgtable_flags = _KERNPG_TABLE_NOENC + sme_me_mask_fixed_up;
>
> if (la57) {
> p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
> @@ -269,7 +273,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> /* Filter out unsupported __PAGE_KERNEL_* bits: */
> mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
> pmd_entry &= *mask_ptr;
> - pmd_entry += sme_get_me_mask();
> + pmd_entry += sme_me_mask_fixed_up;
> pmd_entry += physaddr;
>
> for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
> @@ -313,7 +317,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> * Fixup phys_base - remove the memory encryption mask to obtain
> * the true physical address.
> */
> - *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
> + *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;
>
> return sme_postprocess_startup(bp, pmd);
> }
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index d4918d03efb4..b9e52cee6e00 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -176,9 +176,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> /*
> * Retrieve the modifier (SME encryption mask if SME is active) to be
> * added to the initial pgdir entry that will be programmed into CR3.
> + * Since we may have not completed page table fixups, use RIP-relative
> + * addressing for sme_me_mask.
> */
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> - movq sme_me_mask, %rax
> + movq sme_me_mask(%rip), %rax
> #else
> xorq %rax, %rax
> #endif
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 1d24ec679915..e71752c990ef 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -7,6 +7,11 @@
> * This file is not compiled stand-alone. It contains code shared
> * between the pre-decompression boot code and the running Linux kernel
> * and is included directly into both code-bases.
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #ifndef __BOOT_COMPRESSED
> @@ -110,8 +115,9 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
> static u64 get_hv_features(void)
> {
> u64 val;
> + const u16 *ghcb_version_ptr = (const u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);

Is this one really needed? The ghcb_version variable isn't referenced
before fixup, right? It's referenced in both decompression and early boot,
but I didn't think a fixup is needed.

Can you elaborate on the call tree/call time when this is needed?

>
> - if (ghcb_version < 2)
> + if (*ghcb_version_ptr < 2)
> return 0;
>
> sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
> @@ -143,6 +149,7 @@ static void snp_register_ghcb_early(unsigned long paddr)
> static bool sev_es_negotiate_protocol(void)
> {
> u64 val;
> + u16 *ghcb_version_ptr;
>
> /* Do the GHCB protocol version negotiation */
> sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
> @@ -156,7 +163,8 @@ static bool sev_es_negotiate_protocol(void)
> GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
> return false;
>
> - ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
> + ghcb_version_ptr = (u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);
> + *ghcb_version_ptr = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);

Ditto here.

>
> return true;
> }
> @@ -318,23 +326,6 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
> : __sev_cpuid_hv_msr(leaf);
> }
>
> -/*
> - * This may be called early while still running on the initial identity
> - * mapping. Use RIP-relative addressing to obtain the correct address
> - * while running with the initial identity mapping as well as the
> - * switch-over to kernel virtual addresses later.
> - */
> -static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> -{
> - void *ptr;
> -
> - asm ("lea cpuid_table_copy(%%rip), %0"
> - : "=r" (ptr)
> - : "p" (&cpuid_table_copy));
> -
> - return ptr;
> -}
> -
> /*
> * The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
> * XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
> @@ -357,7 +348,8 @@ static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> */
> static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);

Can you use typeof in the macro to eliminate this casting? Right now the
macro returns void * so the casting isn't really needed, too, right?

Otherwise, I say move the assignment out of the variable section.

> u64 xfeatures_found = 0;
> u32 xsave_size = 0x240;
> int i;
> @@ -394,7 +386,8 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> static bool
> snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);

Ditto here.

> int i;
>
> for (i = 0; i < cpuid_table->count; i++) {
> @@ -530,7 +523,9 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> */
> static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
> + const u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;

Move this to the top of the variable definitions.

>
> if (!cpuid_table->count)
> return -EOPNOTSUPP;
> @@ -555,10 +550,14 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
> */
> leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;
>
> + cpuid_std_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_std_range_max);
> + cpuid_hyp_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_hyp_range_max);
> + cpuid_ext_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_ext_range_max);
> +
> /* Skip post-processing for out-of-range zero leafs. */
> - if (!(leaf->fn <= cpuid_std_range_max ||
> - (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
> - (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
> + if (!(leaf->fn <= *cpuid_std_range_max_ptr ||
> + (leaf->fn >= 0x40000000 && leaf->fn <= *cpuid_hyp_range_max_ptr) ||
> + (leaf->fn >= 0x80000000 && leaf->fn <= *cpuid_ext_range_max_ptr)))
> return 0;
> }
>
> @@ -1046,6 +1045,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
> static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> {
> const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
> + u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;

Move this to the top of the variable definitions.

> int i;
>
> if (!cc_info || !cc_info->cpuid_phys || cc_info->cpuid_len < PAGE_SIZE)
> @@ -1055,19 +1055,24 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> if (!cpuid_table_fw->count || cpuid_table_fw->count > SNP_CPUID_COUNT_MAX)
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID);
>
> - cpuid_table = snp_cpuid_get_table();
> + cpuid_table = (const struct snp_cpuid_table *) GET_RIP_RELATIVE_PTR(
> + cpuid_table_copy);

This can be a single line since the requirements are now 100 character
line length.

> memcpy((void *)cpuid_table, cpuid_table_fw, sizeof(*cpuid_table));
>
> + cpuid_std_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_std_range_max);
> + cpuid_hyp_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_hyp_range_max);
> + cpuid_ext_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_ext_range_max);
> +
> /* Initialize CPUID ranges for range-checking. */
> for (i = 0; i < cpuid_table->count; i++) {
> const struct snp_cpuid_fn *fn = &cpuid_table->fn[i];
>
> if (fn->eax_in == 0x0)
> - cpuid_std_range_max = fn->eax;
> + *cpuid_std_range_max_ptr = fn->eax;
> else if (fn->eax_in == 0x40000000)
> - cpuid_hyp_range_max = fn->eax;
> + *cpuid_hyp_range_max_ptr = fn->eax;
> else if (fn->eax_in == 0x80000000)
> - cpuid_ext_range_max = fn->eax;
> + *cpuid_ext_range_max_ptr = fn->eax;
> }
> }
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index c67285824e82..c966bc511949 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2019 SUSE
> *
> * Author: Joerg Roedel <[email protected]>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #define pr_fmt(fmt) "SEV: " fmt
> @@ -748,7 +753,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
> * This eliminates worries about jump tables or checking boot_cpu_data
> * in the cc_platform_has() function.
> */
> - if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
> return;
>
> /*
> @@ -767,7 +772,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
> * This eliminates worries about jump tables or checking boot_cpu_data
> * in the cc_platform_has() function.
> */
> - if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
> return;
>
> /* Ask hypervisor to mark the memory pages shared in the RMP table. */
> @@ -2114,7 +2119,8 @@ void __init __noreturn snp_abort(void)
>
> static void dump_cpuid_table(void)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);

Same comment on the casting and placement.

> int i = 0;
>
> pr_info("count=%d reserved=0x%x reserved2=0x%llx\n",
> @@ -2138,7 +2144,8 @@ static void dump_cpuid_table(void)
> */
> static int __init report_cpuid_table(void)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);

Ditto.

>
> if (!cpuid_table->count)
> return 0;
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index d73aeb16417f..f4c864ea2468 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2016 Advanced Micro Devices, Inc.
> *
> * Author: Tom Lendacky <[email protected]>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #define DISABLE_BRANCH_PROFILING
> @@ -305,7 +310,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> * instrumentation or checking boot_cpu_data in the cc_platform_has()
> * function.
> */
> - if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED)
> + if (!sme_get_me_mask_fixup() || sev_get_status_fixup() & MSR_AMD64_SEV_ENABLED)
> return;
>
> /*
> @@ -346,9 +351,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> * We're running identity mapped, so we must obtain the address to the
> * SME encryption workarea using rip-relative addressing.
> */
> - asm ("lea sme_workarea(%%rip), %0"
> - : "=r" (workarea_start)
> - : "p" (sme_workarea));
> + workarea_start = (unsigned long) PTR_TO_RIP_RELATIVE_PTR(sme_workarea);
>
> /*
> * Calculate required number of workarea bytes needed:
> @@ -511,7 +514,7 @@ void __init sme_enable(struct boot_params *bp)
> unsigned long me_mask;
> char buffer[16];
> bool snp;
> - u64 msr;
> + u64 msr, *sme_me_mask_ptr, *sev_status_ptr;

Move up in the variable definitions to maintain coding standards.

>
> snp = snp_init(bp);
>
> @@ -542,12 +545,14 @@ void __init sme_enable(struct boot_params *bp)
>
> me_mask = 1UL << (ebx & 0x3f);
>
> + sev_status_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sev_status);
> +
> /* Check the SEV MSR whether SEV or SME is enabled */
> - sev_status = __rdmsr(MSR_AMD64_SEV);
> - feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
> + *sev_status_ptr = __rdmsr(MSR_AMD64_SEV);

You can remove the extra spaces before the "=" now since there isn't any
alignment to be done.

Thanks,
Tom

> + feature_mask = (*sev_status_ptr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
>
> /* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
> - if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (snp && !(*sev_status_ptr & MSR_AMD64_SEV_SNP_ENABLED))
> snp_abort();
>
> /* Check if memory encryption is enabled */
> @@ -573,7 +578,8 @@ void __init sme_enable(struct boot_params *bp)
> return;
> } else {
> /* SEV state cannot be controlled by a command line option */
> - sme_me_mask = me_mask;
> + sme_me_mask_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask);
> + *sme_me_mask_ptr = me_mask;
> goto out;
> }
>
> @@ -582,15 +588,9 @@ void __init sme_enable(struct boot_params *bp)
> * identity mapped, so we must obtain the address to the SME command
> * line argument data using rip-relative addressing.
> */
> - asm ("lea sme_cmdline_arg(%%rip), %0"
> - : "=r" (cmdline_arg)
> - : "p" (sme_cmdline_arg));
> - asm ("lea sme_cmdline_on(%%rip), %0"
> - : "=r" (cmdline_on)
> - : "p" (sme_cmdline_on));
> - asm ("lea sme_cmdline_off(%%rip), %0"
> - : "=r" (cmdline_off)
> - : "p" (sme_cmdline_off));
> + cmdline_arg = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_arg);
> + cmdline_on = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_on);
> + cmdline_off = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_off);
>
> if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
> active_by_default = true;
> @@ -603,16 +603,18 @@ void __init sme_enable(struct boot_params *bp)
> if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
> return;
>
> + sme_me_mask_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask);
> +
> if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
> - sme_me_mask = me_mask;
> + *sme_me_mask_ptr = me_mask;
> else if (!strncmp(buffer, cmdline_off, sizeof(buffer)))
> - sme_me_mask = 0;
> + *sme_me_mask_ptr = 0;
> else
> - sme_me_mask = active_by_default ? me_mask : 0;
> + *sme_me_mask_ptr = active_by_default ? me_mask : 0;
> out:
> - if (sme_me_mask) {
> - physical_mask &= ~sme_me_mask;
> + if (*sme_me_mask_ptr) {
> + physical_mask &= ~(*sme_me_mask_ptr);
> cc_vendor = CC_VENDOR_AMD;
> - cc_set_mask(sme_me_mask);
> + cc_set_mask(*sme_me_mask_ptr);
> }
> }

2024-01-15 20:47:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> SEV/SME code can execute prior to page table fixups for kernel
> relocation. However, as with global variables accessed in
> __startup_64(), the compiler is not required to generate RIP-relative
> accesses for SEV/SME global variables, causing certain flavors of SEV
> hosts and guests built with clang to crash during boot.

So, about that. If I understand my gcc toolchain folks correctly:

mcmodel=kernel - everything fits into the high 31 bit of the address
space

-fPIE/PIC - position independent

And supplied both don't make a whole lotta of sense: if you're building
position-independent, then mcmodel=kernel would be overridden by the
first.

I have no clue why clang enabled it...

So, *actually* the proper fix here should be not to add this "fixed_up"
gunk everywhere but remove mcmodel=kernel from the build and simply do
-fPIE/PIC.

I'd say...

I could also be missing something obvious ofc.

> Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
> Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
> Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
> Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
> Tested-by: Kevin Loughlin <[email protected]>

You don't need to add your Tested-by tag - it is kinda assumed that
people submit patches *after* testing them. Although I have a gazillion
examples where that is not the case...

:-\

> Signed-off-by: Kevin Loughlin <[email protected]>


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-16 22:44:13

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Mon, Jan 15, 2024 at 2:12 AM Kirill A. Shutemov
<[email protected]> wrote:
>
> On Fri, Jan 12, 2024 at 10:29:36AM -0800, Kevin Loughlin wrote:
> >
> > Per my tests, yes we can; I replaced the fixup_*() functions with
> > GET_RIP_RELATIVE_PTR()/PTR_TO_RIP_RELATIVE_PTR(), and guests with and
> > without SEV, SEV-ES, and SEV-SNP all successfully booted under both
> > clang and gcc builds.
>
> BTW, do we need both macros? Caller can do &var, right?

While I don't think the caller doing "&var" would work without passing
it as a separate argument like `GET_RIP_RELATIVE_PTR(var, &var)` (as
we would still need the original var's string name in the macro for
the inline assembly `#var(%%rip)`), we should nonetheless be able to
merge both into a single macro with one "var" argument. Specifically,
the only current difference between the macros is the input operand
constraint, and GET_RIP_RELATIVE_PTR()'s constraint will work for
both. I will make this change in v3.

> If we are okay with single macro, maybe rename it to RIP_RELATIVE_PTR().

With the merge into a single macro (and upon thinking about the
macro's behavior), I have a slight preference for
`RIP_RELATIVE_ADDR()` in v3 because it makes it clearer that the macro
behaves like the address-of operator "&" (just guaranteeing the use of
RIP-relative addressing to obtain the address). However, I'm happy to
go with RIP_RELATIVE_PTR() if you feel that's better.

> One other thing: I see you sprinkle casts to for every use of the macros.
> But why? void* can cast to any other pointer without explicit casting.

You're right; the casting is unnecessary. I'll eliminate it in v3. Thanks.

> > On Fri, Jan 12, 2024 at 4:17 AM Kirill A. Shutemov
> > <[email protected]> wrote:
> > >
> > > Also, is there any reason why GET_RIP_RELATIVE_PTR() and
> > > PTR_TO_RIP_RELATIVE_PTR() have to be macros? Inline functions would be
> > > cleaner.
> >
> > I used macros because we need to use both the global variable itself
> > and the global variable's string name (obtained via #var in the macro)
> > in the inline assembly. As a secondary reason, the macro also avoids
> > the need to provide separate functions for each type of variable for
> > which we'd like to get RIP-relative pointers (ex: u64, unsigned int,
> > unsigned long, etc.).
>
> If we do it only on pointers, wouldn't void * -> void * be enough?

Only using pointers would indeed eliminate the secondary factor as a
reason to use macros. However, the primary motivation for a macro
would remain: we still need the string name of the variable for the
inline assembly.

2024-01-16 23:44:32

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Mon, Jan 15, 2024 at 7:53 AM Tom Lendacky <[email protected]> wrote:
>
> On 1/11/24 16:36, Kevin Loughlin wrote:
> >
> > @@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
> > static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> > {
> > #ifdef CONFIG_AMD_MEM_ENCRYPT
> > + const u64 sev_status_fixed_up = sev_get_status_fixup();
>
> Why not also have a variable for sme_me_mask?

`sme_get_me_mask_fixup()` is only used on certain conditional paths in
the calculation of the return value (therefore, a max of 1 times per
invocation of `amd_cc_platform_has()`). As such, calling
`sme_get_me_mask()` unconditionally at the beginning of the function
would be unnecessary for some invocations of `amd_cc_platform_has()`.
In contrast, the sev_status is needed on every invocation of
`amd_cc_platform_has()`. Additionally, the `sev_get_status_fixup()`
result is potentially used multiple times in the same invocation of
`amd_cc_platform_has()`, motivating the use of a local variable.

> > @@ -130,6 +130,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > {
> > unsigned long vaddr, vaddr_end;
> > int i;
> > + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
>
> Should be the first variable listed given the length of the line.

I will incorporate this and all other stylistic changes that you
mentioned in v3.

> > @@ -110,8 +115,9 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
> > static u64 get_hv_features(void)
> > {
> > u64 val;
> > + const u16 *ghcb_version_ptr = (const u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);
>
> Is this one really needed? The ghcb_version variable isn't referenced
> before fixup, right? It's referenced in both decompression and early boot,
> but I didn't think a fixup is needed.

You're right; it looks like we do *not* need the fixup for
ghcb_version in both instances that you identified. I will remove
these particular fixups in v3.

Thanks!

2024-01-17 00:08:16

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Mon, Jan 15, 2024 at 12:47 PM Borislav Petkov <[email protected]> wrote:
>
> On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> > SEV/SME code can execute prior to page table fixups for kernel
> > relocation. However, as with global variables accessed in
> > __startup_64(), the compiler is not required to generate RIP-relative
> > accesses for SEV/SME global variables, causing certain flavors of SEV
> > hosts and guests built with clang to crash during boot.
>
> So, about that. If I understand my gcc toolchain folks correctly:
>
> mcmodel=kernel - everything fits into the high 31 bit of the address
> space
>
> -fPIE/PIC - position independent
>
> And supplied both don't make a whole lotta of sense: if you're building
> position-independent, then mcmodel=kernel would be overridden by the
> first.
>
> I have no clue why clang enabled it...
>
> So, *actually* the proper fix here should be not to add this "fixed_up"
> gunk everywhere but remove mcmodel=kernel from the build and simply do
> -fPIE/PIC.

I believe that the key distinction is that using mcmodel=kernel (upper
2 GB of address space) or the similar mcmodel=small (lower 2 GB) means
the compiler *can* use RIP-relative addressing for globals (because
everything is within +/- 2GB of RIP) but is not *required* to do so.
In contrast, fPIE/fPIC *requires* relative addressing but does not
necessarily require a specific 2 GB placement range. Altogether, I do
think there are use cases for both options individually. I can't think
of a reason why gcc wouldn't be able to support mcmodel=kernel in
conjunction with fPIE off the top of my head, but I admittedly haven't
looked into it; I simply observed that the combination is not
currently supported.

RE: compiling the whole x86-64 kernel with fPIE/fPIC, I believe the
required changes would be very extensive (see "[PATCH RFC 00/43]
x86/pie: Make kernel image's virtual address flexible" at
https://lore.kernel.org/lkml/[email protected]/).

2024-01-17 10:59:35

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Mon, 15 Jan 2024 at 21:47, Borislav Petkov <[email protected]> wrote:
>
> On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> > SEV/SME code can execute prior to page table fixups for kernel
> > relocation. However, as with global variables accessed in
> > __startup_64(), the compiler is not required to generate RIP-relative
> > accesses for SEV/SME global variables, causing certain flavors of SEV
> > hosts and guests built with clang to crash during boot.
>
> So, about that. If I understand my gcc toolchain folks correctly:
>
> mcmodel=kernel - everything fits into the high 31 bit of the address
> space
>
> -fPIE/PIC - position independent
>
> And supplied both don't make a whole lotta of sense: if you're building
> position-independent, then mcmodel=kernel would be overridden by the
> first.
>
> I have no clue why clang enabled it...
>
> So, *actually* the proper fix here should be not to add this "fixed_up"
> gunk everywhere but remove mcmodel=kernel from the build and simply do
> -fPIE/PIC.
>

Fully agree. All this fiddling with RIP relative references from C
code is going to be a maintenance burden going forward.

The proper way to do this is use PIC codegen for the objects that
matter. I had a stab [0] at this a while ago (for the purpose of
increasing the KASLR range, which requires PIE linking) but I didn't
pursue it in the end.

On arm64, we use a separate pseudo-namespace for code that can run
safely at any offset, using the __pi_ prefix (for Position
Independent). Using symbol prefixing at the linker level, we ensure
that __pi_ code can only call other __pi_ code, or code that has been
made available to it via an explicit __pi_ prefixed alias. (Happy to
elaborate more but we should find a smaller audience - your cc list is
a tad long). Perhaps this is something we should explore on x86 as
well (note that the EFI stub does something similar for architectures
that link the EFI stub into the core kernel rather than into the
decompressor)



[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=x86-pie&id=4ba81de75d92fafdab2a8a389d1b7791dddf74f3

2024-01-17 11:39:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Wed, Jan 17, 2024 at 11:59:14AM +0100, Ard Biesheuvel wrote:
> On Mon, 15 Jan 2024 at 21:47, Borislav Petkov <[email protected]> wrote:
> >
> > On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> > > SEV/SME code can execute prior to page table fixups for kernel
> > > relocation. However, as with global variables accessed in
> > > __startup_64(), the compiler is not required to generate RIP-relative
> > > accesses for SEV/SME global variables, causing certain flavors of SEV
> > > hosts and guests built with clang to crash during boot.
> >
> > So, about that. If I understand my gcc toolchain folks correctly:
> >
> > mcmodel=kernel - everything fits into the high 31 bit of the address
> > space
> >
> > -fPIE/PIC - position independent
> >
> > And supplied both don't make a whole lotta of sense: if you're building
> > position-independent, then mcmodel=kernel would be overridden by the
> > first.
> >
> > I have no clue why clang enabled it...
> >
> > So, *actually* the proper fix here should be not to add this "fixed_up"
> > gunk everywhere but remove mcmodel=kernel from the build and simply do
> > -fPIE/PIC.

For the SEV file this might not work because it also has functions
that get called later at runtime, and may need to reference real
globals. I doubt the linker could resolve that.

For linking the whole kernel, I haven't seen the latest numbers, but
traditionally -fPIE/PIC cost some performance because globals get loaded
through the GOT instead of directly as immediates. That's why the original
x86-64 port went with -mcmodel=kernel.

Of course for the startup code it doesn't matter, but it might make
a difference for hot path code.

> >
>
> Fully agree. All this fiddling with RIP relative references from C
> code is going to be a maintenance burden going forward.

IIC it's only a few functions in this case, so it shouldn't be that bad.
The early x86 startup code has a few other areas with odd restrictions,
so it's not unprecedented.

-Andi

2024-01-17 11:55:44

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Wed, 17 Jan 2024 at 12:39, Andi Kleen <[email protected]> wrote:
>
> On Wed, Jan 17, 2024 at 11:59:14AM +0100, Ard Biesheuvel wrote:
> > On Mon, 15 Jan 2024 at 21:47, Borislav Petkov <[email protected]> wrote:
> > >
> > > On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> > > > SEV/SME code can execute prior to page table fixups for kernel
> > > > relocation. However, as with global variables accessed in
> > > > __startup_64(), the compiler is not required to generate RIP-relative
> > > > accesses for SEV/SME global variables, causing certain flavors of SEV
> > > > hosts and guests built with clang to crash during boot.
> > >
> > > So, about that. If I understand my gcc toolchain folks correctly:
> > >
> > > mcmodel=kernel - everything fits into the high 31 bit of the address
> > > space
> > >
> > > -fPIE/PIC - position independent
> > >
> > > And supplied both don't make a whole lotta of sense: if you're building
> > > position-independent, then mcmodel=kernel would be overridden by the
> > > first.
> > >
> > > I have no clue why clang enabled it...
> > >
> > > So, *actually* the proper fix here should be not to add this "fixed_up"
> > > gunk everywhere but remove mcmodel=kernel from the build and simply do
> > > -fPIE/PIC.
>
> For the SEV file this might not work because it also has functions
> that get called later at runtime, and may need to reference real
> globals. I doubt the linker could resolve that.
>

I don't think that should be a problem. If the code and data are
within -/+ 2G of each other, RIP-relative references should always be
in range.

> For linking the whole kernel, I haven't seen the latest numbers, but
> traditionally -fPIE/PIC cost some performance because globals get loaded
> through the GOT instead of directly as immediates. That's why the original
> x86-64 port went with -mcmodel=kernel.
>

We can tell the compiler to avoid the GOT (using 'hidden' visibility),
and even if we don't, the amd64 psABI now defines linker relaxations
that turn GOT loads into LEA instructions (which still bloat the code
a bit but eliminate the GOT accesses in most cases).

2024-01-17 13:07:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Wed, Jan 17, 2024 at 11:59:14AM +0100, Ard Biesheuvel wrote:
> Fully agree. All this fiddling with RIP relative references from C
> code is going to be a maintenance burden going forward.

Yah.

> The proper way to do this is use PIC codegen for the objects that
> matter.

And we have arch/x86/mm/mem_encrypt_identity.c which is supposed to deal
with stuff running from the ident mappings and PA == VA.

We could put the rest of those special SEV things there or do a separate
TU to be built using something like PIE_FLAGS, as in your patch.

> I had a stab [0] at this a while ago (for the purpose of increasing
> the KASLR range, which requires PIE linking) but I didn't pursue it in
> the end.

FWIW, that looks a lot more like a natural kernel code with
__va_symbol() etc. Definitely better and we talked about it at some
point already as it does ring a bell.

> On arm64, we use a separate pseudo-namespace for code that can run
> safely at any offset, using the __pi_ prefix (for Position
> Independent). Using symbol prefixing at the linker level, we ensure
> that __pi_ code can only call other __pi_ code, or code that has been
> made available to it via an explicit __pi_ prefixed alias. (Happy to
> elaborate more but we should find a smaller audience - your cc list is
> a tad long). Perhaps this is something we should explore on x86 as
> well (note that the EFI stub does something similar for architectures
> that link the EFI stub into the core kernel rather than into the
> decompressor)

Grepping through the tree, is __pi_memcpy one example for that?

It sure looks like it with the alias and all. From a quick scan, that is
not that bad either. It gives you the clear distinction what that
symbol is and who can call it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-17 13:39:44

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Wed, 17 Jan 2024 at 14:06, Borislav Petkov <[email protected]> wrote:
>
> On Wed, Jan 17, 2024 at 11:59:14AM +0100, Ard Biesheuvel wrote:
..
> > On arm64, we use a separate pseudo-namespace for code that can run
> > safely at any offset, using the __pi_ prefix (for Position
> > Independent). Using symbol prefixing at the linker level, we ensure
> > that __pi_ code can only call other __pi_ code, or code that has been
> > made available to it via an explicit __pi_ prefixed alias. (Happy to
> > elaborate more but we should find a smaller audience - your cc list is
> > a tad long). Perhaps this is something we should explore on x86 as
> > well (note that the EFI stub does something similar for architectures
> > that link the EFI stub into the core kernel rather than into the
> > decompressor)
>
> Grepping through the tree, is __pi_memcpy one example for that?
>

That is an example of a function that is known to be callable from any
context, and so it is emitted with an alias that makes it accessible
to other code that is position independent.

There is some linker foo in arch/arm64/kernel/pi/Makefile that builds
a couple of objects in PIC mode. The source symbols have ordinary
names (even the external imports), but will be renamed by the linker
to have a __pi_ prefix. The result is that those objects can only call
into each other, or out to ordinary code that has been marked as __pi_
as well.

The entry into this code is

arch/arm64/kernel/head.S:885: bl __pi_kaslr_early_init

which is called before relocations have been applied, as that requires
knowing how the kernel base address is randomized.

2024-01-21 14:13:23

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Wed, 17 Jan 2024 at 14:38, Ard Biesheuvel <[email protected]> wrote:
>
> On Wed, 17 Jan 2024 at 14:06, Borislav Petkov <[email protected]> wrote:
> >
> > On Wed, Jan 17, 2024 at 11:59:14AM +0100, Ard Biesheuvel wrote:
> ...
> > > On arm64, we use a separate pseudo-namespace for code that can run
> > > safely at any offset, using the __pi_ prefix (for Position
> > > Independent). Using symbol prefixing at the linker level, we ensure
> > > that __pi_ code can only call other __pi_ code, or code that has been
> > > made available to it via an explicit __pi_ prefixed alias. (Happy to
> > > elaborate more but we should find a smaller audience - your cc list is
> > > a tad long). Perhaps this is something we should explore on x86 as
> > > well (note that the EFI stub does something similar for architectures
> > > that link the EFI stub into the core kernel rather than into the
> > > decompressor)
> >
> > Grepping through the tree, is __pi_memcpy one example for that?
> >
>
> That is an example of a function that is known to be callable from any
> context, and so it is emitted with an alias that makes it accessible
> to other code that is position independent.
>
> There is some linker foo in arch/arm64/kernel/pi/Makefile that builds
> a couple of objects in PIC mode. The source symbols have ordinary
> names (even the external imports), but will be renamed by the linker
> to have a __pi_ prefix. The result is that those objects can only call
> into each other, or out to ordinary code that has been marked as __pi_
> as well.
>
> The entry into this code is
>
> arch/arm64/kernel/head.S:885: bl __pi_kaslr_early_init
>
> which is called before relocations have been applied, as that requires
> knowing how the kernel base address is randomized.

I've done a quick experiment with moving the early code that runs from
the 1:1 mapping into .pi.text, and getting modpost to warn about calls
from this section into other text sections.

My preliminary conclusion confirms that the SEV code is quite
problematic in this regard (which is the reason for this patch, so we
already knew that, of course).
Below are a couple of examples of how these calls fan out.

TL;DR I think we will need a way to build certain objects with -fPIC
(as we do in other places and on other architectures), but we should
add instrumentation to ensure that these issues can be detected at
build time. The SEV boot code is especially tricky here as very few
people can even test it, so applying this patch and hoping that the
compiler will never generate reachable code paths that only work
correctly when executed via the ordinary kernel virtual mapping is not
sufficient.





1)
WARNING: modpost: vmlinux: section mismatch in reference:
startup_64_pi+0x33 (section: .pi.text) -> sme_enable (section:
init.text)

leads to

WARNING: modpost: vmlinux: section mismatch in reference:
sme_enable+0x21 (section: .pi.text) -> snp_init (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference:
sme_enable+0x99 (section: .pi.text) -> cc_set_mask (section:
init.text)
WARNING: modpost: vmlinux: section mismatch in reference:
sme_enable+0xf8 (section: .pi.text) -> cmdline_find_option (section:
text)
WARNING: modpost: vmlinux: section mismatch in reference:
sme_enable+0x10c (section: .pi.text) -> strncmp (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference:
sme_enable+0x129 (section: .pi.text) -> snp_abort (section:
init.text)

2)
WARNING: modpost: vmlinux: section mismatch in reference:
startup_64_setup_env+0x27 (section: .pi.text) -> vc_no_ghcb (section:
init.text)

leads to

WARNING: modpost: vmlinux: section mismatch in reference:
vc_no_ghcb+0x46 (section: .pi.text) -> do_vc_no_ghcb (section:
init.text)

leads to

WARNING: modpost: vmlinux: section mismatch in reference:
do_vc_no_ghcb+0x55 (section: .pi.text) -> snp_cpuid (section: .text)
WARNING: modpost: vmlinux: section mismatch in reference:
do_vc_no_ghcb+0x66 (section: .pi.text) -> __sev_cpuid_hv_msr (section:
text)
WARNING: modpost: vmlinux: section mismatch in reference:
do_vc_no_ghcb+0xc1 (section: .pi.text) -> __x86_return_thunk (section:
text..__x86.return_thunk)
WARNING: modpost: vmlinux: section mismatch in reference:
do_vc_no_ghcb+0xd4 (section: .pi.text) -> sev_es_terminate (section:
text)
WARNING: modpost: vmlinux: section mismatch in reference:
do_vc_no_ghcb+0xd9 (section: .pi.text) -> __stack_chk_fail (section:
noinstr.text)

3)
WARNING: modpost: vmlinux: section mismatch in reference:
__startup_64+0x39d (section: .pi.text) -> early_snp_set_memory_shared
(section: .init.text)

leads to

WARNING: modpost: vmlinux: section mismatch in reference:
early_snp_set_memory_shared+0x18 (section: .pi.text) ->
early_set_pages_state (section: .text)

leads to

WARNING: modpost: vmlinux: section mismatch in reference:
early_set_pages_state+0x151 (section: .pi.text) -> __warn_printk
(section: .text)

2024-01-21 15:39:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Sun, Jan 21, 2024 at 03:12:56PM +0100, Ard Biesheuvel wrote:
> My preliminary conclusion confirms that the SEV code is quite
> problematic in this regard (which is the reason for this patch, so we
> already knew that, of course).

So we can try to improve the situation gradually so that we don't
break current usages.

> TL;DR I think we will need a way to build certain objects with -fPIC
> (as we do in other places and on other architectures), but we should
> add instrumentation to ensure that these issues can be detected at
> build time. The SEV boot code is especially tricky here as very few
> people can even test it,

No worries about that - us, the Google cloud folks, AWS and a bunch of
others are people I could think of who could help out. :-)

> so applying this patch and hoping that the compiler will never
> generate reachable code paths that only work correctly when executed
> via the ordinary kernel virtual mapping is not sufficient.

..

> 1)
> WARNING: modpost: vmlinux: section mismatch in reference:
> startup_64_pi+0x33 (section: .pi.text) -> sme_enable (section:
> .init.text)

sme_enable() is in the 1:1 mapping TU
arch/x86/mm/mem_encrypt_identity.c, see

1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")

so might as well move it to .pi.text

The rest below look like they'd need more serious untangling.

Btw, I just had another idea: we could remove -mcmodel=kernel from the
build flags of the whole kernel once -fPIC is enabled so that gcc can be
forced to do rIP-relative addressing.

I'm being told the reason it doesn't allow mcmodel=kernel with -fPIC is
only a matter of removing that check and that it *should* otherwise work
but someone needs to try that. And then there are older gccs which we
cannot fix.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-21 16:50:06

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Sun, 21 Jan 2024 at 16:38, Borislav Petkov <[email protected]> wrote:
>
> On Sun, Jan 21, 2024 at 03:12:56PM +0100, Ard Biesheuvel wrote:
> > The SEV boot code is especially tricky here as very few
> > people can even test it,
>
> No worries about that - us, the Google cloud folks, AWS and a bunch of
> others are people I could think of who could help out. :-)
>

Yeah. I have been trying to find people internally at Google that can
help me set up some CI that I can throw kernel builds at and they will
be test booted in a SEV guest, but so far progress has been slow.

> > 1)
> > WARNING: modpost: vmlinux: section mismatch in reference:
> > startup_64_pi+0x33 (section: .pi.text) -> sme_enable (section:
> > .init.text)
>
> sme_enable() is in the 1:1 mapping TU
> arch/x86/mm/mem_encrypt_identity.c, see
>
> 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
>
> so might as well move it to .pi.text
>

Ack.

> The rest below look like they'd need more serious untangling.
>
> Btw, I just had another idea: we could remove -mcmodel=kernel from the
> build flags of the whole kernel once -fPIC is enabled so that gcc can be
> forced to do rIP-relative addressing.
>
> I'm being told the reason it doesn't allow mcmodel=kernel with -fPIC is
> only a matter of removing that check and that it *should* otherwise work
> but someone needs to try that. And then there are older gccs which we
> cannot fix.
>

-fPIE -mcmodel=small should work fine afaik. The only problem i
encountered is that it changes the default per-CPU base register to FS
but that can be overridden on the command line.

The problem with building the entire kernel -fPIE is that it increases
code size: RIP-relative LEA instructions are 1 byte longer than
absolute 32-bit MOVs.

2024-01-21 18:21:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Sun, Jan 21, 2024 at 05:49:44PM +0100, Ard Biesheuvel wrote:
> Yeah. I have been trying to find people internally at Google that can
> help me set up some CI that I can throw kernel builds at and they will
> be test booted in a SEV guest, but so far progress has been slow.

Dunno, if you have some internal access to GCE, it does support SEV
guests so you could test that side at least. Peter Gonda is on Cc, he
should have an idea what to do, lemme move him to To.

> -fPIE -mcmodel=small should work fine afaik. The only problem i
> encountered is that it changes the default per-CPU base register to FS
> but that can be overridden on the command line.

Yeah, there's a gcc switch - I hope clang supports it too.

> The problem with building the entire kernel -fPIE is that it increases
> code size: RIP-relative LEA instructions are 1 byte longer than
> absolute 32-bit MOVs.

Right, but the folks who started this thread are already doing that
anyway so...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-30 22:09:02

by Kevin Loughlin

[permalink] [raw]
Subject: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

SEV/SME code can execute prior to page table fixups for kernel
relocation. However, as with global variables accessed in
__startup_64(), the compiler is not required to generate RIP-relative
accesses for SEV/SME global variables, causing certain flavors of SEV
hosts and guests built with clang to crash during boot.

These crashes highlight a broader problem wherein the toolchain does
not guarantee that early x86-64 code executes correctly at any offset.
While Ard has been looking into overhauling the early x86-64 code
going forward [0], the signficant proposed changes are unfortunately
not backport-friendly.

Instead, this patchset continues the approach of fixing the immediate
problem of SEV-SNP boots crashing when built by clang, providing a
backport-friendly set of changes needed to successfully boot SEV-SNP
hosts and guests. In particular, this patchset is a cleanup of V2 [1],
which introduces a macro to force RIP-relative addressing in early
SEV/SME global variable accesses and existing head64 global accesses.

V2 -> V3: Rename RIP_RELATIVE_ADDR(), remove fixup_*(), cleanup style
V1 -> V2: Use GET_RIP_RELATIVE_PTR() macro to avoid -fPIE compilation

[0] https://lore.kernel.org/lkml/[email protected]/T/
[1] https://lore.kernel.org/lkml/[email protected]/

Kevin Loughlin (2):
x86/sev: enforce RIP-relative accesses in early SEV/SME code
x86/head64: Replace pointer fixups with RIP_RELATIVE_ADDR()

arch/x86/coco/core.c | 22 +++++---
arch/x86/include/asm/mem_encrypt.h | 32 +++++++++--
arch/x86/kernel/head64.c | 88 +++++++++++++-----------------
arch/x86/kernel/head_64.S | 4 +-
arch/x86/kernel/sev-shared.c | 52 +++++++++---------
arch/x86/kernel/sev.c | 13 +++--
arch/x86/mm/mem_encrypt_identity.c | 50 +++++++++--------
7 files changed, 143 insertions(+), 118 deletions(-)

--
2.43.0.429.g432eaa2c6b-goog


2024-01-30 22:09:18

by Kevin Loughlin

[permalink] [raw]
Subject: [PATCH v3 1/2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

The compiler is not required to generate RIP-relative accesses for
SEV/SME global variables in early boot. While an attempt was made to
force RIP-relative addressing for certain global SEV/SME variables via
inline assembly (see snp_cpuid_get_table() for example), RIP-relative
addressing must be pervasively- enforced for SEV/SME global variables
when accessed prior to page table fixups.

__startup_64() already handles this issue for select non-SEV/SME global
variables using fixup_pointer(), which adjusts the pointer relative to
a `physaddr` argument. To avoid having to pass around this `physaddr`
argument across all functions needing to apply pointer fixups, this
patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
the existing snp_cpuid_get_table()), which generates an RIP-relative
pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
introduced to fixup an existing pointer value with RIP-relative logic.

Applying these macros to early SEV/SME code (alongside Adam Dunlap's
necessary "[PATCH v2] x86/asm: Force native_apic_mem_read to use mov")
enables previously-failing boots of clang builds to succeed, while
preserving successful boot of gcc builds. Tested with and without SEV,
SEV-ES, SEV-SNP enabled in guests built via both gcc and clang.

Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
Signed-off-by: Kevin Loughlin <[email protected]>
---
arch/x86/coco/core.c | 22 ++++++++-----
arch/x86/include/asm/mem_encrypt.h | 32 +++++++++++++++---
arch/x86/kernel/head64.c | 31 ++++++++++--------
arch/x86/kernel/head_64.S | 4 ++-
arch/x86/kernel/sev-shared.c | 52 ++++++++++++++----------------
arch/x86/kernel/sev.c | 13 +++++---
arch/x86/mm/mem_encrypt_identity.c | 50 ++++++++++++++--------------
7 files changed, 122 insertions(+), 82 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index eeec9986570e..8c45b5643f48 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -5,6 +5,11 @@
* Copyright (C) 2021 Advanced Micro Devices, Inc.
*
* Author: Tom Lendacky <[email protected]>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/

#include <linux/export.h>
@@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
static bool noinstr amd_cc_platform_has(enum cc_attr attr)
{
#ifdef CONFIG_AMD_MEM_ENCRYPT
+ const u64 sev_status_fixed_up = sev_get_status_fixup();

- if (sev_status & MSR_AMD64_SNP_VTOM)
+ if (sev_status_fixed_up & MSR_AMD64_SNP_VTOM)
return amd_cc_platform_vtom(attr);

switch (attr) {
case CC_ATTR_MEM_ENCRYPT:
- return sme_me_mask;
+ return sme_get_me_mask_fixup();

case CC_ATTR_HOST_MEM_ENCRYPT:
- return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
+ return sme_get_me_mask_fixup() && !(sev_status_fixed_up & MSR_AMD64_SEV_ENABLED);

case CC_ATTR_GUEST_MEM_ENCRYPT:
- return sev_status & MSR_AMD64_SEV_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_ENABLED;

case CC_ATTR_GUEST_STATE_ENCRYPT:
- return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED;

/*
* With SEV, the rep string I/O instructions need to be unrolled
* but SEV-ES supports them through the #VC handler.
*/
case CC_ATTR_GUEST_UNROLL_STRING_IO:
- return (sev_status & MSR_AMD64_SEV_ENABLED) &&
- !(sev_status & MSR_AMD64_SEV_ES_ENABLED);
+ return (sev_status_fixed_up & MSR_AMD64_SEV_ENABLED) &&
+ !(sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED);

case CC_ATTR_GUEST_SEV_SNP:
- return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_SNP_ENABLED;

default:
return false;
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 359ada486fa9..b65e66ee79c4 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -17,6 +17,20 @@

#include <asm/bootparam.h>

+/*
+ * Like the address operator "&", evaluates to the address of a LHS variable
+ * "var", but also enforces the use of RIP-relative logic. This macro can be
+ * used to safely access global data variables prior to kernel relocation.
+ */
+#define RIP_RELATIVE_ADDR(var) \
+({ \
+ void *rip_rel_ptr; \
+ asm ("lea "#var"(%%rip), %0" \
+ : "=r" (rip_rel_ptr) \
+ : "p" (&var)); \
+ rip_rel_ptr; \
+})
+
#ifdef CONFIG_X86_MEM_ENCRYPT
void __init mem_encrypt_init(void);
void __init mem_encrypt_setup_arch(void);
@@ -58,6 +72,16 @@ void __init mem_encrypt_free_decrypted_mem(void);

void __init sev_es_init_vc_handling(void);

+static __always_inline u64 sme_get_me_mask_fixup(void)
+{
+ return *((u64 *) RIP_RELATIVE_ADDR(sme_me_mask));
+}
+
+static __always_inline u64 sev_get_status_fixup(void)
+{
+ return *((u64 *) RIP_RELATIVE_ADDR(sev_status));
+}
+
#define __bss_decrypted __section(".bss..decrypted")

#else /* !CONFIG_AMD_MEM_ENCRYPT */
@@ -89,6 +113,9 @@ early_set_mem_enc_dec_hypercall(unsigned long vaddr, unsigned long size, bool en

static inline void mem_encrypt_free_decrypted_mem(void) { }

+static inline u64 sme_get_me_mask_fixup(void) { return 0; }
+static inline u64 sev_get_status_fixup(void) { return 0; }
+
#define __bss_decrypted

#endif /* CONFIG_AMD_MEM_ENCRYPT */
@@ -106,11 +133,6 @@ void add_encrypt_protection_map(void);

extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];

-static inline u64 sme_get_me_mask(void)
-{
- return sme_me_mask;
-}
-
#endif /* __ASSEMBLY__ */

#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index dc0956067944..d159239997f4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -128,6 +128,7 @@ static bool __head check_la57_support(unsigned long physaddr)

static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
{
+ const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
unsigned long vaddr, vaddr_end;
int i;

@@ -140,7 +141,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* there is no need to zero it after changing the memory encryption
* attribute.
*/
- if (sme_get_me_mask()) {
+ if (sme_me_mask_fixed_up) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;

@@ -158,7 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);

i = pmd_index(vaddr);
- pmd[i] -= sme_get_me_mask();
+ pmd[i] -= sme_me_mask_fixed_up;
}
}

@@ -166,18 +167,22 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* Return the SME encryption mask (if SME is active) to be used as a
* modifier for the initial pgdir entry programmed into CR3.
*/
- return sme_get_me_mask();
+ return sme_me_mask_fixed_up;
}

-/* Code in __startup_64() can be relocated during execution, but the compiler
+/*
+ * WARNING!!
+ * Code in __startup_64() can be relocated during execution, but the compiler
* doesn't have to generate PC-relative relocations when accessing globals from
* that function. Clang actually does not generate them, which leads to
* boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer().
+ * be adjusted using fixup_pointer() or RIP_RELATIVE_ADDR().
*/
unsigned long __head __startup_64(unsigned long physaddr,
struct boot_params *bp)
{
+ const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
+ pmd_t **early_dynamic_pgts_ptr;
unsigned long load_delta, *p;
unsigned long pgtable_flags;
pgdval_t *pgd;
@@ -206,7 +211,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
for (;;);

/* Include the SME encryption mask in the fixup value */
- load_delta += sme_get_me_mask();
+ load_delta += sme_me_mask_fixed_up;

/* Fixup the physical addresses in the page table */

@@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
*/

next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
- pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
- pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
+ early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
+ pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
+ pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];

- pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
+ pgtable_flags = _KERNPG_TABLE_NOENC + sme_me_mask_fixed_up;

if (la57) {
- p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
- physaddr);
+ p4d = (p4dval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];

i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
@@ -269,7 +274,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
/* Filter out unsupported __PAGE_KERNEL_* bits: */
mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
pmd_entry &= *mask_ptr;
- pmd_entry += sme_get_me_mask();
+ pmd_entry += sme_me_mask_fixed_up;
pmd_entry += physaddr;

for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
@@ -313,7 +318,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Fixup phys_base - remove the memory encryption mask to obtain
* the true physical address.
*/
- *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
+ *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;

return sme_postprocess_startup(bp, pmd);
}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d4918d03efb4..b9e52cee6e00 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -176,9 +176,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
* added to the initial pgdir entry that will be programmed into CR3.
+ * Since we may have not completed page table fixups, use RIP-relative
+ * addressing for sme_me_mask.
*/
#ifdef CONFIG_AMD_MEM_ENCRYPT
- movq sme_me_mask, %rax
+ movq sme_me_mask(%rip), %rax
#else
xorq %rax, %rax
#endif
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 1d24ec679915..9ea6bea37e1d 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -7,6 +7,11 @@
* This file is not compiled stand-alone. It contains code shared
* between the pre-decompression boot code and the running Linux kernel
* and is included directly into both code-bases.
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/

#ifndef __BOOT_COMPRESSED
@@ -318,23 +323,6 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
: __sev_cpuid_hv_msr(leaf);
}

-/*
- * This may be called early while still running on the initial identity
- * mapping. Use RIP-relative addressing to obtain the correct address
- * while running with the initial identity mapping as well as the
- * switch-over to kernel virtual addresses later.
- */
-static const struct snp_cpuid_table *snp_cpuid_get_table(void)
-{
- void *ptr;
-
- asm ("lea cpuid_table_copy(%%rip), %0"
- : "=r" (ptr)
- : "p" (&cpuid_table_copy));
-
- return ptr;
-}
-
/*
* The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
* XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
@@ -357,7 +345,7 @@ static const struct snp_cpuid_table *snp_cpuid_get_table(void)
*/
static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
u64 xfeatures_found = 0;
u32 xsave_size = 0x240;
int i;
@@ -394,7 +382,7 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
static bool
snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
int i;

for (i = 0; i < cpuid_table->count; i++) {
@@ -530,7 +518,8 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
*/
static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
+ const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);

if (!cpuid_table->count)
return -EOPNOTSUPP;
@@ -555,10 +544,14 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
*/
leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;

+ cpuid_std_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_std_range_max);
+ cpuid_hyp_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_hyp_range_max);
+ cpuid_ext_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_ext_range_max);
+
/* Skip post-processing for out-of-range zero leafs. */
- if (!(leaf->fn <= cpuid_std_range_max ||
- (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
- (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
+ if (!(leaf->fn <= *cpuid_std_range_max_ptr ||
+ (leaf->fn >= 0x40000000 && leaf->fn <= *cpuid_hyp_range_max_ptr) ||
+ (leaf->fn >= 0x80000000 && leaf->fn <= *cpuid_ext_range_max_ptr)))
return 0;
}

@@ -1045,6 +1038,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
*/
static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
{
+ u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
int i;

@@ -1055,19 +1049,23 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
if (!cpuid_table_fw->count || cpuid_table_fw->count > SNP_CPUID_COUNT_MAX)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID);

- cpuid_table = snp_cpuid_get_table();
+ cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
memcpy((void *)cpuid_table, cpuid_table_fw, sizeof(*cpuid_table));

+ cpuid_std_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_std_range_max);
+ cpuid_hyp_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_hyp_range_max);
+ cpuid_ext_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_ext_range_max);
+
/* Initialize CPUID ranges for range-checking. */
for (i = 0; i < cpuid_table->count; i++) {
const struct snp_cpuid_fn *fn = &cpuid_table->fn[i];

if (fn->eax_in == 0x0)
- cpuid_std_range_max = fn->eax;
+ *cpuid_std_range_max_ptr = fn->eax;
else if (fn->eax_in == 0x40000000)
- cpuid_hyp_range_max = fn->eax;
+ *cpuid_hyp_range_max_ptr = fn->eax;
else if (fn->eax_in == 0x80000000)
- cpuid_ext_range_max = fn->eax;
+ *cpuid_ext_range_max_ptr = fn->eax;
}
}

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index c67285824e82..54dd58d13d66 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -5,6 +5,11 @@
* Copyright (C) 2019 SUSE
*
* Author: Joerg Roedel <[email protected]>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/

#define pr_fmt(fmt) "SEV: " fmt
@@ -748,7 +753,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* This eliminates worries about jump tables or checking boot_cpu_data
* in the cc_platform_has() function.
*/
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
return;

/*
@@ -767,7 +772,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
* This eliminates worries about jump tables or checking boot_cpu_data
* in the cc_platform_has() function.
*/
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
return;

/* Ask hypervisor to mark the memory pages shared in the RMP table. */
@@ -2114,7 +2119,7 @@ void __init __noreturn snp_abort(void)

static void dump_cpuid_table(void)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
int i = 0;

pr_info("count=%d reserved=0x%x reserved2=0x%llx\n",
@@ -2138,7 +2143,7 @@ static void dump_cpuid_table(void)
*/
static int __init report_cpuid_table(void)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);

if (!cpuid_table->count)
return 0;
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index d73aeb16417f..533e59bc5757 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -5,6 +5,11 @@
* Copyright (C) 2016 Advanced Micro Devices, Inc.
*
* Author: Tom Lendacky <[email protected]>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/

#define DISABLE_BRANCH_PROFILING
@@ -305,7 +310,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* instrumentation or checking boot_cpu_data in the cc_platform_has()
* function.
*/
- if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED)
+ if (!sme_get_me_mask_fixup() || sev_get_status_fixup() & MSR_AMD64_SEV_ENABLED)
return;

/*
@@ -346,9 +351,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* We're running identity mapped, so we must obtain the address to the
* SME encryption workarea using rip-relative addressing.
*/
- asm ("lea sme_workarea(%%rip), %0"
- : "=r" (workarea_start)
- : "p" (sme_workarea));
+ workarea_start = (unsigned long) RIP_RELATIVE_ADDR(sme_workarea);

/*
* Calculate required number of workarea bytes needed:
@@ -505,13 +508,13 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
void __init sme_enable(struct boot_params *bp)
{
const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
+ u64 msr, *sme_me_mask_ptr, *sev_status_ptr;
unsigned int eax, ebx, ecx, edx;
unsigned long feature_mask;
bool active_by_default;
unsigned long me_mask;
char buffer[16];
bool snp;
- u64 msr;

snp = snp_init(bp);

@@ -542,12 +545,14 @@ void __init sme_enable(struct boot_params *bp)

me_mask = 1UL << (ebx & 0x3f);

+ sev_status_ptr = RIP_RELATIVE_ADDR(sev_status);
+
/* Check the SEV MSR whether SEV or SME is enabled */
- sev_status = __rdmsr(MSR_AMD64_SEV);
- feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
+ *sev_status_ptr = __rdmsr(MSR_AMD64_SEV);
+ feature_mask = (*sev_status_ptr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;

/* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
- if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (snp && !(*sev_status_ptr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();

/* Check if memory encryption is enabled */
@@ -573,7 +578,8 @@ void __init sme_enable(struct boot_params *bp)
return;
} else {
/* SEV state cannot be controlled by a command line option */
- sme_me_mask = me_mask;
+ sme_me_mask_ptr = RIP_RELATIVE_ADDR(sme_me_mask);
+ *sme_me_mask_ptr = me_mask;
goto out;
}

@@ -582,15 +588,9 @@ void __init sme_enable(struct boot_params *bp)
* identity mapped, so we must obtain the address to the SME command
* line argument data using rip-relative addressing.
*/
- asm ("lea sme_cmdline_arg(%%rip), %0"
- : "=r" (cmdline_arg)
- : "p" (sme_cmdline_arg));
- asm ("lea sme_cmdline_on(%%rip), %0"
- : "=r" (cmdline_on)
- : "p" (sme_cmdline_on));
- asm ("lea sme_cmdline_off(%%rip), %0"
- : "=r" (cmdline_off)
- : "p" (sme_cmdline_off));
+ cmdline_arg = RIP_RELATIVE_ADDR(sme_cmdline_arg);
+ cmdline_on = RIP_RELATIVE_ADDR(sme_cmdline_on);
+ cmdline_off = RIP_RELATIVE_ADDR(sme_cmdline_off);

if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
active_by_default = true;
@@ -603,16 +603,18 @@ void __init sme_enable(struct boot_params *bp)
if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
return;

+ sme_me_mask_ptr = RIP_RELATIVE_ADDR(sme_me_mask);
+
if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
- sme_me_mask = me_mask;
+ *sme_me_mask_ptr = me_mask;
else if (!strncmp(buffer, cmdline_off, sizeof(buffer)))
- sme_me_mask = 0;
+ *sme_me_mask_ptr = 0;
else
- sme_me_mask = active_by_default ? me_mask : 0;
+ *sme_me_mask_ptr = active_by_default ? me_mask : 0;
out:
- if (sme_me_mask) {
- physical_mask &= ~sme_me_mask;
+ if (*sme_me_mask_ptr) {
+ physical_mask &= ~(*sme_me_mask_ptr);
cc_vendor = CC_VENDOR_AMD;
- cc_set_mask(sme_me_mask);
+ cc_set_mask(*sme_me_mask_ptr);
}
}
--
2.43.0.429.g432eaa2c6b-goog


2024-01-30 22:09:24

by Kevin Loughlin

[permalink] [raw]
Subject: [PATCH v3 2/2] x86/head64: Replace pointer fixups with RIP_RELATIVE_ADDR()

Now that we have RIP_RELATIVE_ADDR(), we can replace the "fixup_*()"
family of functions in head64.c with the new macro for cleaner code.

Signed-off-by: Kevin Loughlin <[email protected]>
---
arch/x86/kernel/head64.c | 63 +++++++++++++++-------------------------
1 file changed, 24 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index d159239997f4..e782149eefc4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -85,23 +85,8 @@ static struct desc_ptr startup_gdt_descr __initdata = {
.address = 0,
};

-static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
-{
- return ptr - (void *)_text + (void *)physaddr;
-}
-
-static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
-{
- return fixup_pointer(ptr, physaddr);
-}
-
#ifdef CONFIG_X86_5LEVEL
-static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
-{
- return fixup_pointer(ptr, physaddr);
-}
-
-static bool __head check_la57_support(unsigned long physaddr)
+static bool __head check_la57_support(void)
{
/*
* 5-level paging is detected and enabled at kernel decompression
@@ -110,17 +95,17 @@ static bool __head check_la57_support(unsigned long physaddr)
if (!(native_read_cr4() & X86_CR4_LA57))
return false;

- *fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
- *fixup_int(&pgdir_shift, physaddr) = 48;
- *fixup_int(&ptrs_per_p4d, physaddr) = 512;
- *fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
- *fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
- *fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
+ *((unsigned int *)RIP_RELATIVE_ADDR(__pgtable_l5_enabled)) = 1;
+ *((unsigned int *)RIP_RELATIVE_ADDR(pgdir_shift)) = 48;
+ *((unsigned int *)RIP_RELATIVE_ADDR(ptrs_per_p4d)) = 512;
+ *((unsigned long *)RIP_RELATIVE_ADDR(page_offset_base)) = __PAGE_OFFSET_BASE_L5;
+ *((unsigned long *)RIP_RELATIVE_ADDR(vmalloc_base)) = __VMALLOC_BASE_L5;
+ *((unsigned long *)RIP_RELATIVE_ADDR(vmemmap_base)) = __VMEMMAP_BASE_L5;

return true;
}
#else
-static bool __head check_la57_support(unsigned long physaddr)
+static bool __head check_la57_support(void)
{
return false;
}
@@ -175,8 +160,8 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* Code in __startup_64() can be relocated during execution, but the compiler
* doesn't have to generate PC-relative relocations when accessing globals from
* that function. Clang actually does not generate them, which leads to
- * boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer() or RIP_RELATIVE_ADDR().
+ * boot-time crashes. To work around this problem, every global variable access
+ * must be adjusted using RIP_RELATIVE_ADDR().
*/
unsigned long __head __startup_64(unsigned long physaddr,
struct boot_params *bp)
@@ -194,7 +179,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
int i;
unsigned int *next_pgt_ptr;

- la57 = check_la57_support(physaddr);
+ la57 = check_la57_support();

/* Is the address too large? */
if (physaddr >> MAX_PHYSMEM_BITS)
@@ -215,7 +200,7 @@ unsigned long __head __startup_64(unsigned long physaddr,

/* Fixup the physical addresses in the page table */

- pgd = fixup_pointer(early_top_pgt, physaddr);
+ pgd = RIP_RELATIVE_ADDR(early_top_pgt);
p = pgd + pgd_index(__START_KERNEL_map);
if (la57)
*p = (unsigned long)level4_kernel_pgt;
@@ -224,15 +209,15 @@ unsigned long __head __startup_64(unsigned long physaddr,
*p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;

if (la57) {
- p4d = fixup_pointer(level4_kernel_pgt, physaddr);
+ p4d = RIP_RELATIVE_ADDR(level4_kernel_pgt);
p4d[511] += load_delta;
}

- pud = fixup_pointer(level3_kernel_pgt, physaddr);
+ pud = RIP_RELATIVE_ADDR(level3_kernel_pgt);
pud[510] += load_delta;
pud[511] += load_delta;

- pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
+ pmd = RIP_RELATIVE_ADDR(level2_fixmap_pgt);
for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--)
pmd[i] += load_delta;

@@ -243,8 +228,8 @@ unsigned long __head __startup_64(unsigned long physaddr,
* it avoids problems around wraparound.
*/

- next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
- early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
+ early_dynamic_pgts_ptr = RIP_RELATIVE_ADDR(early_dynamic_pgts);
+ next_pgt_ptr = RIP_RELATIVE_ADDR(next_early_pgt);
pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];

@@ -272,7 +257,7 @@ unsigned long __head __startup_64(unsigned long physaddr,

pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
/* Filter out unsupported __PAGE_KERNEL_* bits: */
- mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
+ mask_ptr = RIP_RELATIVE_ADDR(__supported_pte_mask);
pmd_entry &= *mask_ptr;
pmd_entry += sme_me_mask_fixed_up;
pmd_entry += physaddr;
@@ -299,7 +284,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* error, causing the BIOS to halt the system.
*/

- pmd = fixup_pointer(level2_kernel_pgt, physaddr);
+ pmd = RIP_RELATIVE_ADDR(level2_kernel_pgt);

/* invalidate pages before the kernel image */
for (i = 0; i < pmd_index((unsigned long)_text); i++)
@@ -318,7 +303,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Fixup phys_base - remove the memory encryption mask to obtain
* the true physical address.
*/
- *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;
+ *((unsigned long *)RIP_RELATIVE_ADDR(phys_base)) += load_delta - sme_me_mask_fixed_up;

return sme_postprocess_startup(bp, pmd);
}
@@ -594,15 +579,15 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
/* This runs while still in the direct mapping */
static void __head startup_64_load_idt(unsigned long physbase)
{
- struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
- gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
+ struct desc_ptr *desc = RIP_RELATIVE_ADDR(bringup_idt_descr);
+ gate_desc *idt = RIP_RELATIVE_ADDR(bringup_idt_table);


if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
void *handler;

/* VMM Communication Exception */
- handler = fixup_pointer(vc_no_ghcb, physbase);
+ handler = RIP_RELATIVE_ADDR(vc_no_ghcb);
set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
}

@@ -629,7 +614,7 @@ void early_setup_idt(void)
void __head startup_64_setup_env(unsigned long physbase)
{
/* Load GDT */
- startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
+ startup_gdt_descr.address = (unsigned long)RIP_RELATIVE_ADDR(startup_gdt);
native_load_gdt(&startup_gdt_descr);

/* New GDT is live - reload data segment registers */
--
2.43.0.429.g432eaa2c6b-goog


2024-01-31 09:36:45

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Tue, Jan 30, 2024 at 10:08:44PM +0000, Kevin Loughlin wrote:
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 359ada486fa9..b65e66ee79c4 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -17,6 +17,20 @@
>
> #include <asm/bootparam.h>
>
> +/*
> + * Like the address operator "&", evaluates to the address of a LHS variable
> + * "var", but also enforces the use of RIP-relative logic. This macro can be
> + * used to safely access global data variables prior to kernel relocation.
> + */
> +#define RIP_RELATIVE_ADDR(var) \
> +({ \
> + void *rip_rel_ptr; \
> + asm ("lea "#var"(%%rip), %0" \
> + : "=r" (rip_rel_ptr) \
> + : "p" (&var)); \
> + rip_rel_ptr; \
> +})
> +

I don't think it is the right place for the macro. The next patch uses for
things unrelated to memory encryption.

> @@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> */
>
> next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> - pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> - pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> + early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> + pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> + pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
>

This change doesn't belong to this patch. Maybe move it into the next
patch and combine with removing fixup_pointer().

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-31 09:40:30

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86/head64: Replace pointer fixups with RIP_RELATIVE_ADDR()

On Tue, Jan 30, 2024 at 10:08:45PM +0000, Kevin Loughlin wrote:
> @@ -594,15 +579,15 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
> /* This runs while still in the direct mapping */
> static void __head startup_64_load_idt(unsigned long physbase)
> {
> - struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
> - gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
> + struct desc_ptr *desc = RIP_RELATIVE_ADDR(bringup_idt_descr);
> + gate_desc *idt = RIP_RELATIVE_ADDR(bringup_idt_table);
>
>
> if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
> void *handler;
>
> /* VMM Communication Exception */
> - handler = fixup_pointer(vc_no_ghcb, physbase);
> + handler = RIP_RELATIVE_ADDR(vc_no_ghcb);
> set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
> }
>

Looks like you removed all physbase users in the function. No need to pass
it to it.

> @@ -629,7 +614,7 @@ void early_setup_idt(void)
> void __head startup_64_setup_env(unsigned long physbase)
> {
> /* Load GDT */
> - startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
> + startup_gdt_descr.address = (unsigned long)RIP_RELATIVE_ADDR(startup_gdt);
> native_load_gdt(&startup_gdt_descr);
>
> /* New GDT is live - reload data segment registers */

With startup_64_load_idt() fixed, no users for physbase in this function
either.

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-31 14:11:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

On Tue, Jan 30, 2024 at 10:08:43PM +0000, Kevin Loughlin wrote:
> Instead, this patchset continues the approach of fixing the immediate
> problem of SEV-SNP boots crashing when built by clang, providing a
> backport-friendly set of changes needed to successfully boot SEV-SNP
> hosts and guests.

What use cases are those exactly? How do I reproduce them here?

SNP is not upstream yet and the SEV* code has been out there for a while
now without a single such report so this must be something new happening
due to <raisins>...?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-01-31 14:13:12

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

Hi Kevin,

On Tue, 30 Jan 2024 at 23:09, Kevin Loughlin <[email protected]> wrote:
>
> The compiler is not required to generate RIP-relative accesses for
> SEV/SME global variables in early boot. While an attempt was made to
> force RIP-relative addressing for certain global SEV/SME variables via
> inline assembly (see snp_cpuid_get_table() for example), RIP-relative
> addressing must be pervasively- enforced for SEV/SME global variables
> when accessed prior to page table fixups.
>
> __startup_64() already handles this issue for select non-SEV/SME global
> variables using fixup_pointer(), which adjusts the pointer relative to
> a `physaddr` argument. To avoid having to pass around this `physaddr`
> argument across all functions needing to apply pointer fixups, this
> patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
> the existing snp_cpuid_get_table()), which generates an RIP-relative
> pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
> introduced to fixup an existing pointer value with RIP-relative logic.
>
> Applying these macros to early SEV/SME code (alongside Adam Dunlap's
> necessary "[PATCH v2] x86/asm: Force native_apic_mem_read to use mov")
> enables previously-failing boots of clang builds to succeed, while
> preserving successful boot of gcc builds. Tested with and without SEV,
> SEV-ES, SEV-SNP enabled in guests built via both gcc and clang.
>
> Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
> Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
> Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
> Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
> Signed-off-by: Kevin Loughlin <[email protected]>
> ---
> arch/x86/coco/core.c | 22 ++++++++-----
> arch/x86/include/asm/mem_encrypt.h | 32 +++++++++++++++---
> arch/x86/kernel/head64.c | 31 ++++++++++--------
> arch/x86/kernel/head_64.S | 4 ++-
> arch/x86/kernel/sev-shared.c | 52 ++++++++++++++----------------
> arch/x86/kernel/sev.c | 13 +++++---
> arch/x86/mm/mem_encrypt_identity.c | 50 ++++++++++++++--------------
> 7 files changed, 122 insertions(+), 82 deletions(-)
>

OK, so the purpose of this patch is to have something that can be
backported before applying the changes I proposed to fix this more
comprehensively, right?

I think that makes sense, although I'd like to understand how far this
would need to be backported, and for which purpose.


> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index eeec9986570e..8c45b5643f48 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2021 Advanced Micro Devices, Inc.
> *
> * Author: Tom Lendacky <[email protected]>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #include <linux/export.h>
> @@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
> static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> {
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> + const u64 sev_status_fixed_up = sev_get_status_fixup();
>
> - if (sev_status & MSR_AMD64_SNP_VTOM)
> + if (sev_status_fixed_up & MSR_AMD64_SNP_VTOM)
> return amd_cc_platform_vtom(attr);
>
> switch (attr) {
> case CC_ATTR_MEM_ENCRYPT:
> - return sme_me_mask;
> + return sme_get_me_mask_fixup();
>
> case CC_ATTR_HOST_MEM_ENCRYPT:
> - return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
> + return sme_get_me_mask_fixup() && !(sev_status_fixed_up & MSR_AMD64_SEV_ENABLED);
>
> case CC_ATTR_GUEST_MEM_ENCRYPT:
> - return sev_status & MSR_AMD64_SEV_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_ENABLED;
>
> case CC_ATTR_GUEST_STATE_ENCRYPT:
> - return sev_status & MSR_AMD64_SEV_ES_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED;
>
> /*
> * With SEV, the rep string I/O instructions need to be unrolled
> * but SEV-ES supports them through the #VC handler.
> */
> case CC_ATTR_GUEST_UNROLL_STRING_IO:
> - return (sev_status & MSR_AMD64_SEV_ENABLED) &&
> - !(sev_status & MSR_AMD64_SEV_ES_ENABLED);
> + return (sev_status_fixed_up & MSR_AMD64_SEV_ENABLED) &&
> + !(sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED);
>
> case CC_ATTR_GUEST_SEV_SNP:
> - return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_SNP_ENABLED;
>
> default:
> return false;

Is this code actually called early enough to matter here?

> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 359ada486fa9..b65e66ee79c4 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -17,6 +17,20 @@
>
> #include <asm/bootparam.h>
>
> +/*
> + * Like the address operator "&", evaluates to the address of a LHS variable
> + * "var", but also enforces the use of RIP-relative logic. This macro can be
> + * used to safely access global data variables prior to kernel relocation.
> + */
> +#define RIP_RELATIVE_ADDR(var) \
> +({ \
> + void *rip_rel_ptr; \
> + asm ("lea "#var"(%%rip), %0" \
> + : "=r" (rip_rel_ptr) \
> + : "p" (&var)); \

I'd prefer to make this

asm ("lea %c1(%%rip), %0" : "=r" (rip_rel_ptr) : "i" (&var));

the difference being that the compiler is forced to double check that
#var and &var actually refer to the same global variable.

That also means we can make it static inline.

static inline __attribute_const__ rip_relative_ptr(const void *var)
{
void *rip_rel_ptr;

asm ("lea %c1(%%rip), %0" : "=r" (rip_rel_ptr) : "i" (&var));
return rip_rel_ptr;
}

#define RIP_RELATIVE_ADDR(var) rip_relative_ptr(&var)

> #ifdef CONFIG_X86_MEM_ENCRYPT
> void __init mem_encrypt_init(void);
> void __init mem_encrypt_setup_arch(void);
> @@ -58,6 +72,16 @@ void __init mem_encrypt_free_decrypted_mem(void);
>
> void __init sev_es_init_vc_handling(void);
>
> +static __always_inline u64 sme_get_me_mask_fixup(void)

Just call this sme_get_me_mask(void) as before, and keep the existing
users. The RIP-relative reference will always work correctly so no
need to avoid it later.

> +{
> + return *((u64 *) RIP_RELATIVE_ADDR(sme_me_mask));

Can we move the cast into the macro?

#define RIP_RELATIVE_REF(var) (*(typeof(&var))rip_relative_ptr(&var))

and make this

return RIP_RELATIVE_REF(sme_me_mask);


> +}
> +
> +static __always_inline u64 sev_get_status_fixup(void)

Can we drop the _fixup suffix here? Or if we need to convey the fact
that this is a special accessor that can be used early, use _early
instead.

> +{
> + return *((u64 *) RIP_RELATIVE_ADDR(sev_status));
> +}
> +
> #define __bss_decrypted __section(".bss..decrypted")
>
> #else /* !CONFIG_AMD_MEM_ENCRYPT */
> @@ -89,6 +113,9 @@ early_set_mem_enc_dec_hypercall(unsigned long vaddr, unsigned long size, bool en
>
> static inline void mem_encrypt_free_decrypted_mem(void) { }
>
> +static inline u64 sme_get_me_mask_fixup(void) { return 0; }
> +static inline u64 sev_get_status_fixup(void) { return 0; }
> +
> #define __bss_decrypted
>
> #endif /* CONFIG_AMD_MEM_ENCRYPT */
> @@ -106,11 +133,6 @@ void add_encrypt_protection_map(void);
>
> extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];
>
> -static inline u64 sme_get_me_mask(void)
> -{
> - return sme_me_mask;
> -}
> -
> #endif /* __ASSEMBLY__ */
>
> #endif /* __X86_MEM_ENCRYPT_H__ */
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index dc0956067944..d159239997f4 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -128,6 +128,7 @@ static bool __head check_la57_support(unsigned long physaddr)
>
> static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
> {
> + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
> unsigned long vaddr, vaddr_end;
> int i;
>
> @@ -140,7 +141,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> * there is no need to zero it after changing the memory encryption
> * attribute.
> */
> - if (sme_get_me_mask()) {
> + if (sme_me_mask_fixed_up) {
> vaddr = (unsigned long)__start_bss_decrypted;
> vaddr_end = (unsigned long)__end_bss_decrypted;
>
> @@ -158,7 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
>
> i = pmd_index(vaddr);
> - pmd[i] -= sme_get_me_mask();
> + pmd[i] -= sme_me_mask_fixed_up;
> }
> }
>
> @@ -166,18 +167,22 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> * Return the SME encryption mask (if SME is active) to be used as a
> * modifier for the initial pgdir entry programmed into CR3.
> */
> - return sme_get_me_mask();
> + return sme_me_mask_fixed_up;

Just use sme_get_me_mask() as before in this file.

> }
>
> -/* Code in __startup_64() can be relocated during execution, but the compiler
> +/*
> + * WARNING!!
> + * Code in __startup_64() can be relocated during execution, but the compiler
> * doesn't have to generate PC-relative relocations when accessing globals from
> * that function. Clang actually does not generate them, which leads to
> * boot-time crashes. To work around this problem, every global pointer must
> - * be adjusted using fixup_pointer().
> + * be adjusted using fixup_pointer() or RIP_RELATIVE_ADDR().
> */
> unsigned long __head __startup_64(unsigned long physaddr,
> struct boot_params *bp)
> {
> + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
> + pmd_t **early_dynamic_pgts_ptr;
> unsigned long load_delta, *p;
> unsigned long pgtable_flags;
> pgdval_t *pgd;
> @@ -206,7 +211,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> for (;;);
>
> /* Include the SME encryption mask in the fixup value */
> - load_delta += sme_get_me_mask();
> + load_delta += sme_me_mask_fixed_up;
>
> /* Fixup the physical addresses in the page table */
>
> @@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> */
>
> next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> - pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> - pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> + early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> + pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> + pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
>

Better to introduce early_dynamic_pgts_ptr in a separate patch if it
is just an optimization but doesn't actually fix anything.

> - pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
> + pgtable_flags = _KERNPG_TABLE_NOENC + sme_me_mask_fixed_up;
>
> if (la57) {
> - p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
> - physaddr);
> + p4d = (p4dval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
>
> i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
> pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
> @@ -269,7 +274,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> /* Filter out unsupported __PAGE_KERNEL_* bits: */
> mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
> pmd_entry &= *mask_ptr;
> - pmd_entry += sme_get_me_mask();
> + pmd_entry += sme_me_mask_fixed_up;
> pmd_entry += physaddr;
>
> for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
> @@ -313,7 +318,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> * Fixup phys_base - remove the memory encryption mask to obtain
> * the true physical address.
> */
> - *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
> + *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;
>
> return sme_postprocess_startup(bp, pmd);
> }
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index d4918d03efb4..b9e52cee6e00 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -176,9 +176,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> /*
> * Retrieve the modifier (SME encryption mask if SME is active) to be
> * added to the initial pgdir entry that will be programmed into CR3.
> + * Since we may have not completed page table fixups, use RIP-relative
> + * addressing for sme_me_mask.

This runs on the secondary path only, so this comment is inaccurate.

> */
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> - movq sme_me_mask, %rax
> + movq sme_me_mask(%rip), %rax
> #else
> xorq %rax, %rax
> #endif
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 1d24ec679915..9ea6bea37e1d 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -7,6 +7,11 @@
> * This file is not compiled stand-alone. It contains code shared
> * between the pre-decompression boot code and the running Linux kernel
> * and is included directly into both code-bases.
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #ifndef __BOOT_COMPRESSED
> @@ -318,23 +323,6 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
> : __sev_cpuid_hv_msr(leaf);
> }
>
> -/*
> - * This may be called early while still running on the initial identity
> - * mapping. Use RIP-relative addressing to obtain the correct address
> - * while running with the initial identity mapping as well as the
> - * switch-over to kernel virtual addresses later.
> - */
> -static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> -{

You could just make this return the RIP_RELATIVE_ADDR() result, right?

> - void *ptr;
> -
> - asm ("lea cpuid_table_copy(%%rip), %0"
> - : "=r" (ptr)
> - : "p" (&cpuid_table_copy));
> -
> - return ptr;
> -}
> -
> /*
> * The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
> * XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
> @@ -357,7 +345,7 @@ static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> */
> static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> u64 xfeatures_found = 0;
> u32 xsave_size = 0x240;
> int i;
> @@ -394,7 +382,7 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> static bool
> snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> int i;
>
> for (i = 0; i < cpuid_table->count; i++) {
> @@ -530,7 +518,8 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> */
> static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
> + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
>
> if (!cpuid_table->count)
> return -EOPNOTSUPP;
> @@ -555,10 +544,14 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
> */
> leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;
>
> + cpuid_std_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_std_range_max);
> + cpuid_hyp_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_hyp_range_max);
> + cpuid_ext_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_ext_range_max);
> +
> /* Skip post-processing for out-of-range zero leafs. */
> - if (!(leaf->fn <= cpuid_std_range_max ||
> - (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
> - (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
> + if (!(leaf->fn <= *cpuid_std_range_max_ptr ||
> + (leaf->fn >= 0x40000000 && leaf->fn <= *cpuid_hyp_range_max_ptr) ||
> + (leaf->fn >= 0x80000000 && leaf->fn <= *cpuid_ext_range_max_ptr)))
> return 0;
> }
>
> @@ -1045,6 +1038,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
> */
> static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> {
> + u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
> const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
> int i;
>
> @@ -1055,19 +1049,23 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> if (!cpuid_table_fw->count || cpuid_table_fw->count > SNP_CPUID_COUNT_MAX)
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID);
>
> - cpuid_table = snp_cpuid_get_table();
> + cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> memcpy((void *)cpuid_table, cpuid_table_fw, sizeof(*cpuid_table));
>
> + cpuid_std_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_std_range_max);
> + cpuid_hyp_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_hyp_range_max);
> + cpuid_ext_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_ext_range_max);
> +

Can we cache the values here rather than the pointers?

> /* Initialize CPUID ranges for range-checking. */
> for (i = 0; i < cpuid_table->count; i++) {
> const struct snp_cpuid_fn *fn = &cpuid_table->fn[i];
>
> if (fn->eax_in == 0x0)
> - cpuid_std_range_max = fn->eax;
> + *cpuid_std_range_max_ptr = fn->eax;
> else if (fn->eax_in == 0x40000000)
> - cpuid_hyp_range_max = fn->eax;
> + *cpuid_hyp_range_max_ptr = fn->eax;
> else if (fn->eax_in == 0x80000000)
> - cpuid_ext_range_max = fn->eax;
> + *cpuid_ext_range_max_ptr = fn->eax;
> }
> }
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index c67285824e82..54dd58d13d66 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2019 SUSE
> *
> * Author: Joerg Roedel <[email protected]>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #define pr_fmt(fmt) "SEV: " fmt
> @@ -748,7 +753,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
> * This eliminates worries about jump tables or checking boot_cpu_data
> * in the cc_platform_has() function.
> */
> - if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
> return;
>
> /*
> @@ -767,7 +772,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
> * This eliminates worries about jump tables or checking boot_cpu_data
> * in the cc_platform_has() function.
> */
> - if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
> return;
>
> /* Ask hypervisor to mark the memory pages shared in the RMP table. */
> @@ -2114,7 +2119,7 @@ void __init __noreturn snp_abort(void)
>
> static void dump_cpuid_table(void)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> int i = 0;
>
> pr_info("count=%d reserved=0x%x reserved2=0x%llx\n",
> @@ -2138,7 +2143,7 @@ static void dump_cpuid_table(void)
> */
> static int __init report_cpuid_table(void)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
>
> if (!cpuid_table->count)
> return 0;
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index d73aeb16417f..533e59bc5757 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2016 Advanced Micro Devices, Inc.
> *
> * Author: Tom Lendacky <[email protected]>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #define DISABLE_BRANCH_PROFILING
> @@ -305,7 +310,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> * instrumentation or checking boot_cpu_data in the cc_platform_has()
> * function.
> */
> - if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED)
> + if (!sme_get_me_mask_fixup() || sev_get_status_fixup() & MSR_AMD64_SEV_ENABLED)
> return;
>
> /*
> @@ -346,9 +351,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> * We're running identity mapped, so we must obtain the address to the
> * SME encryption workarea using rip-relative addressing.
> */
> - asm ("lea sme_workarea(%%rip), %0"
> - : "=r" (workarea_start)
> - : "p" (sme_workarea));
> + workarea_start = (unsigned long) RIP_RELATIVE_ADDR(sme_workarea);
>
> /*
> * Calculate required number of workarea bytes needed:
> @@ -505,13 +508,13 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> void __init sme_enable(struct boot_params *bp)
> {
> const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> + u64 msr, *sme_me_mask_ptr, *sev_status_ptr;
> unsigned int eax, ebx, ecx, edx;
> unsigned long feature_mask;
> bool active_by_default;
> unsigned long me_mask;
> char buffer[16];
> bool snp;
> - u64 msr;
>
> snp = snp_init(bp);
>
> @@ -542,12 +545,14 @@ void __init sme_enable(struct boot_params *bp)
>
> me_mask = 1UL << (ebx & 0x3f);
>
> + sev_status_ptr = RIP_RELATIVE_ADDR(sev_status);
> +
> /* Check the SEV MSR whether SEV or SME is enabled */
> - sev_status = __rdmsr(MSR_AMD64_SEV);
> - feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
> + *sev_status_ptr = __rdmsr(MSR_AMD64_SEV);
> + feature_mask = (*sev_status_ptr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
>
> /* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
> - if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (snp && !(*sev_status_ptr & MSR_AMD64_SEV_SNP_ENABLED))
> snp_abort();
>
> /* Check if memory encryption is enabled */
> @@ -573,7 +578,8 @@ void __init sme_enable(struct boot_params *bp)
> return;
> } else {
> /* SEV state cannot be controlled by a command line option */
> - sme_me_mask = me_mask;
> + sme_me_mask_ptr = RIP_RELATIVE_ADDR(sme_me_mask);
> + *sme_me_mask_ptr = me_mask;
> goto out;
> }
>
> @@ -582,15 +588,9 @@ void __init sme_enable(struct boot_params *bp)
> * identity mapped, so we must obtain the address to the SME command
> * line argument data using rip-relative addressing.
> */
> - asm ("lea sme_cmdline_arg(%%rip), %0"
> - : "=r" (cmdline_arg)
> - : "p" (sme_cmdline_arg));
> - asm ("lea sme_cmdline_on(%%rip), %0"
> - : "=r" (cmdline_on)
> - : "p" (sme_cmdline_on));
> - asm ("lea sme_cmdline_off(%%rip), %0"
> - : "=r" (cmdline_off)
> - : "p" (sme_cmdline_off));
> + cmdline_arg = RIP_RELATIVE_ADDR(sme_cmdline_arg);
> + cmdline_on = RIP_RELATIVE_ADDR(sme_cmdline_on);
> + cmdline_off = RIP_RELATIVE_ADDR(sme_cmdline_off);
>
> if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
> active_by_default = true;
> @@ -603,16 +603,18 @@ void __init sme_enable(struct boot_params *bp)
> if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
> return;
>
> + sme_me_mask_ptr = RIP_RELATIVE_ADDR(sme_me_mask);
> +
> if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
> - sme_me_mask = me_mask;
> + *sme_me_mask_ptr = me_mask;
> else if (!strncmp(buffer, cmdline_off, sizeof(buffer)))
> - sme_me_mask = 0;
> + *sme_me_mask_ptr = 0;
> else
> - sme_me_mask = active_by_default ? me_mask : 0;
> + *sme_me_mask_ptr = active_by_default ? me_mask : 0;
> out:
> - if (sme_me_mask) {
> - physical_mask &= ~sme_me_mask;
> + if (*sme_me_mask_ptr) {
> + physical_mask &= ~(*sme_me_mask_ptr);
> cc_vendor = CC_VENDOR_AMD;
> - cc_set_mask(sme_me_mask);
> + cc_set_mask(*sme_me_mask_ptr);
> }
> }
> --
> 2.43.0.429.g432eaa2c6b-goog
>

2024-01-31 15:36:13

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86/head64: Replace pointer fixups with RIP_RELATIVE_ADDR()

On 1/30/24 16:08, Kevin Loughlin wrote:
> Now that we have RIP_RELATIVE_ADDR(), we can replace the "fixup_*()"
> family of functions in head64.c with the new macro for cleaner code.

If this series is purely for backporting, this specific patch isn't
needed, right? Since this all works today with clang?

Thanks,
Tom

>
> Signed-off-by: Kevin Loughlin <[email protected]>
> ---
> arch/x86/kernel/head64.c | 63 +++++++++++++++-------------------------
> 1 file changed, 24 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index d159239997f4..e782149eefc4 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -85,23 +85,8 @@ static struct desc_ptr startup_gdt_descr __initdata = {
> .address = 0,
> };
>
> -static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
> -{
> - return ptr - (void *)_text + (void *)physaddr;
> -}
> -
> -static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
> -{
> - return fixup_pointer(ptr, physaddr);
> -}
> -
> #ifdef CONFIG_X86_5LEVEL
> -static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
> -{
> - return fixup_pointer(ptr, physaddr);
> -}
> -
> -static bool __head check_la57_support(unsigned long physaddr)
> +static bool __head check_la57_support(void)
> {
> /*
> * 5-level paging is detected and enabled at kernel decompression
> @@ -110,17 +95,17 @@ static bool __head check_la57_support(unsigned long physaddr)
> if (!(native_read_cr4() & X86_CR4_LA57))
> return false;
>
> - *fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
> - *fixup_int(&pgdir_shift, physaddr) = 48;
> - *fixup_int(&ptrs_per_p4d, physaddr) = 512;
> - *fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
> - *fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
> - *fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
> + *((unsigned int *)RIP_RELATIVE_ADDR(__pgtable_l5_enabled)) = 1;
> + *((unsigned int *)RIP_RELATIVE_ADDR(pgdir_shift)) = 48;
> + *((unsigned int *)RIP_RELATIVE_ADDR(ptrs_per_p4d)) = 512;
> + *((unsigned long *)RIP_RELATIVE_ADDR(page_offset_base)) = __PAGE_OFFSET_BASE_L5;
> + *((unsigned long *)RIP_RELATIVE_ADDR(vmalloc_base)) = __VMALLOC_BASE_L5;
> + *((unsigned long *)RIP_RELATIVE_ADDR(vmemmap_base)) = __VMEMMAP_BASE_L5;
>
> return true;
> }
> #else
> -static bool __head check_la57_support(unsigned long physaddr)
> +static bool __head check_la57_support(void)
> {
> return false;
> }
> @@ -175,8 +160,8 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> * Code in __startup_64() can be relocated during execution, but the compiler
> * doesn't have to generate PC-relative relocations when accessing globals from
> * that function. Clang actually does not generate them, which leads to
> - * boot-time crashes. To work around this problem, every global pointer must
> - * be adjusted using fixup_pointer() or RIP_RELATIVE_ADDR().
> + * boot-time crashes. To work around this problem, every global variable access
> + * must be adjusted using RIP_RELATIVE_ADDR().
> */
> unsigned long __head __startup_64(unsigned long physaddr,
> struct boot_params *bp)
> @@ -194,7 +179,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> int i;
> unsigned int *next_pgt_ptr;
>
> - la57 = check_la57_support(physaddr);
> + la57 = check_la57_support();
>
> /* Is the address too large? */
> if (physaddr >> MAX_PHYSMEM_BITS)
> @@ -215,7 +200,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>
> /* Fixup the physical addresses in the page table */
>
> - pgd = fixup_pointer(early_top_pgt, physaddr);
> + pgd = RIP_RELATIVE_ADDR(early_top_pgt);
> p = pgd + pgd_index(__START_KERNEL_map);
> if (la57)
> *p = (unsigned long)level4_kernel_pgt;
> @@ -224,15 +209,15 @@ unsigned long __head __startup_64(unsigned long physaddr,
> *p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;
>
> if (la57) {
> - p4d = fixup_pointer(level4_kernel_pgt, physaddr);
> + p4d = RIP_RELATIVE_ADDR(level4_kernel_pgt);
> p4d[511] += load_delta;
> }
>
> - pud = fixup_pointer(level3_kernel_pgt, physaddr);
> + pud = RIP_RELATIVE_ADDR(level3_kernel_pgt);
> pud[510] += load_delta;
> pud[511] += load_delta;
>
> - pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
> + pmd = RIP_RELATIVE_ADDR(level2_fixmap_pgt);
> for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--)
> pmd[i] += load_delta;
>
> @@ -243,8 +228,8 @@ unsigned long __head __startup_64(unsigned long physaddr,
> * it avoids problems around wraparound.
> */
>
> - next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> - early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> + early_dynamic_pgts_ptr = RIP_RELATIVE_ADDR(early_dynamic_pgts);
> + next_pgt_ptr = RIP_RELATIVE_ADDR(next_early_pgt);
> pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
>
> @@ -272,7 +257,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>
> pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
> /* Filter out unsupported __PAGE_KERNEL_* bits: */
> - mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
> + mask_ptr = RIP_RELATIVE_ADDR(__supported_pte_mask);
> pmd_entry &= *mask_ptr;
> pmd_entry += sme_me_mask_fixed_up;
> pmd_entry += physaddr;
> @@ -299,7 +284,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> * error, causing the BIOS to halt the system.
> */
>
> - pmd = fixup_pointer(level2_kernel_pgt, physaddr);
> + pmd = RIP_RELATIVE_ADDR(level2_kernel_pgt);
>
> /* invalidate pages before the kernel image */
> for (i = 0; i < pmd_index((unsigned long)_text); i++)
> @@ -318,7 +303,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> * Fixup phys_base - remove the memory encryption mask to obtain
> * the true physical address.
> */
> - *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;
> + *((unsigned long *)RIP_RELATIVE_ADDR(phys_base)) += load_delta - sme_me_mask_fixed_up;
>
> return sme_postprocess_startup(bp, pmd);
> }
> @@ -594,15 +579,15 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
> /* This runs while still in the direct mapping */
> static void __head startup_64_load_idt(unsigned long physbase)
> {
> - struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
> - gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
> + struct desc_ptr *desc = RIP_RELATIVE_ADDR(bringup_idt_descr);
> + gate_desc *idt = RIP_RELATIVE_ADDR(bringup_idt_table);
>
>
> if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
> void *handler;
>
> /* VMM Communication Exception */
> - handler = fixup_pointer(vc_no_ghcb, physbase);
> + handler = RIP_RELATIVE_ADDR(vc_no_ghcb);
> set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
> }
>
> @@ -629,7 +614,7 @@ void early_setup_idt(void)
> void __head startup_64_setup_env(unsigned long physbase)
> {
> /* Load GDT */
> - startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
> + startup_gdt_descr.address = (unsigned long)RIP_RELATIVE_ADDR(startup_gdt);
> native_load_gdt(&startup_gdt_descr);
>
> /* New GDT is live - reload data segment registers */

2024-01-31 15:46:52

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86/head64: Replace pointer fixups with RIP_RELATIVE_ADDR()

On Wed, Jan 31, 2024 at 09:30:10AM -0600, Tom Lendacky wrote:
> On 1/30/24 16:08, Kevin Loughlin wrote:
> > Now that we have RIP_RELATIVE_ADDR(), we can replace the "fixup_*()"
> > family of functions in head64.c with the new macro for cleaner code.
>
> If this series is purely for backporting, this specific patch isn't needed,
> right? Since this all works today with clang?

That's true. However, temporary things often end up becoming permanent :/

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-31 18:17:20

by Jacob Xu

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

On Wed, Jan 31, 2024 at 6:01 AM Borislav Petkov <[email protected]> wrote:
>
> On Tue, Jan 30, 2024 at 10:08:43PM +0000, Kevin Loughlin wrote:
> > Instead, this patchset continues the approach of fixing the immediate
> > problem of SEV-SNP boots crashing when built by clang, providing a
> > backport-friendly set of changes needed to successfully boot SEV-SNP
> > hosts and guests.
>
> What use cases are those exactly? How do I reproduce them here?
>

We're interested in fixing SEV-SNP guest boots which are currently
broken when using a guest kernel compiled with clang. It seems like
every other user of SEV/SNP linux kernel code uses GCC to compile the
kernel so they've avoided this issue.

> SNP is not upstream yet and the SEV* code has been out there for a while
> now without a single such report so this must be something new happening
> due to <raisins>...?

E.g. Google COS uses clang to compile the kernel and we've made do
with an internal fix for a while. We've unfortunately just been rather
slow to report upstream.


>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2024-01-31 18:30:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

On Wed, Jan 31, 2024 at 10:16:55AM -0800, Jacob Xu wrote:
> We're interested in fixing SEV-SNP guest boots which are currently
> broken when using a guest kernel compiled with clang. It seems like
> every other user of SEV/SNP linux kernel code uses GCC to compile the
> kernel so they've avoided this issue.

Lemme give that a try here.

> E.g. Google COS uses clang to compile the kernel and we've made do
> with an internal fix for a while.

Which means that, theoretically, you could forward-port this internal
fix until the issue is fixed for real, I'd say.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-02-01 16:38:45

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86/head64: Replace pointer fixups with RIP_RELATIVE_ADDR()

On Wed, Jan 31, 2024 at 12:24 AM Kirill A. Shutemov
<[email protected]> wrote:
>
> On Tue, Jan 30, 2024 at 10:08:45PM +0000, Kevin Loughlin wrote:
> > @@ -594,15 +579,15 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
> > /* This runs while still in the direct mapping */
> > static void __head startup_64_load_idt(unsigned long physbase)
> > {
> > - struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
> > - gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
> > + struct desc_ptr *desc = RIP_RELATIVE_ADDR(bringup_idt_descr);
> > + gate_desc *idt = RIP_RELATIVE_ADDR(bringup_idt_table);
> >
> >
> > if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
> > void *handler;
> >
> > /* VMM Communication Exception */
> > - handler = fixup_pointer(vc_no_ghcb, physbase);
> > + handler = RIP_RELATIVE_ADDR(vc_no_ghcb);
> > set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
> > }
> >
>
> Looks like you removed all physbase users in the function. No need to pass
> it to it.

Thanks, will fix.

>
> > @@ -629,7 +614,7 @@ void early_setup_idt(void)
> > void __head startup_64_setup_env(unsigned long physbase)
> > {
> > /* Load GDT */
> > - startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
> > + startup_gdt_descr.address = (unsigned long)RIP_RELATIVE_ADDR(startup_gdt);
> > native_load_gdt(&startup_gdt_descr);
> >
> > /* New GDT is live - reload data segment registers */
>
> With startup_64_load_idt() fixed, no users for physbase in this function
> either.

Ditto.

2024-02-02 22:00:52

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Wed, Jan 31, 2024 at 12:20 AM Kirill A. Shutemov
<[email protected]> wrote:
>
> On Tue, Jan 30, 2024 at 10:08:44PM +0000, Kevin Loughlin wrote:
> > diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> > index 359ada486fa9..b65e66ee79c4 100644
> > --- a/arch/x86/include/asm/mem_encrypt.h
> > +++ b/arch/x86/include/asm/mem_encrypt.h
> > @@ -17,6 +17,20 @@
> >
> > #include <asm/bootparam.h>
> >
> > +/*
> > + * Like the address operator "&", evaluates to the address of a LHS variable
> > + * "var", but also enforces the use of RIP-relative logic. This macro can be
> > + * used to safely access global data variables prior to kernel relocation.
> > + */
> > +#define RIP_RELATIVE_ADDR(var) \
> > +({ \
> > + void *rip_rel_ptr; \
> > + asm ("lea "#var"(%%rip), %0" \
> > + : "=r" (rip_rel_ptr) \
> > + : "p" (&var)); \
> > + rip_rel_ptr; \
> > +})
> > +
>
> I don't think it is the right place for the macro. The next patch uses for
> things unrelated to memory encryption.

You're right; with the cleanup, I agree it becomes more general. We
can move it to arch/x86/include/asm/asm.h.

>
> > @@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > */
> >
> > next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> > - pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > - pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > + early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> > + pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> > + pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> >
>
> This change doesn't belong to this patch. Maybe move it into the next
> patch and combine with removing fixup_pointer().

I'll put it in a separate commit even preceding this one, as it's
actually a bug in the existing fixup pointer logic that I noticed when
transitioning to the use of the RIP-relative macro. Specifically,
early_dynamic_pgts is a global variable just like next_early_pgt and
thus also needs to be fixed up to guarantee the correct address is
used across toolchains.

2024-02-02 22:47:48

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Fri, 2 Feb 2024 at 23:00, Kevin Loughlin <[email protected]> wrote:
>
> On Wed, Jan 31, 2024 at 12:20 AM Kirill A. Shutemov
> <[email protected]> wrote:
> >
> > On Tue, Jan 30, 2024 at 10:08:44PM +0000, Kevin Loughlin wrote:
> > > diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> > > index 359ada486fa9..b65e66ee79c4 100644
> > > --- a/arch/x86/include/asm/mem_encrypt.h
> > > +++ b/arch/x86/include/asm/mem_encrypt.h
> > > @@ -17,6 +17,20 @@
> > >
> > > #include <asm/bootparam.h>
> > >
> > > +/*
> > > + * Like the address operator "&", evaluates to the address of a LHS variable
> > > + * "var", but also enforces the use of RIP-relative logic. This macro can be
> > > + * used to safely access global data variables prior to kernel relocation.
> > > + */
> > > +#define RIP_RELATIVE_ADDR(var) \
> > > +({ \
> > > + void *rip_rel_ptr; \
> > > + asm ("lea "#var"(%%rip), %0" \
> > > + : "=r" (rip_rel_ptr) \
> > > + : "p" (&var)); \
> > > + rip_rel_ptr; \
> > > +})
> > > +
> >
> > I don't think it is the right place for the macro. The next patch uses for
> > things unrelated to memory encryption.
>
> You're right; with the cleanup, I agree it becomes more general. We
> can move it to arch/x86/include/asm/asm.h.
>
> >
> > > @@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > > */
> > >
> > > next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> > > - pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > > - pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > > + early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> > > + pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> > > + pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> > >
> >
> > This change doesn't belong to this patch. Maybe move it into the next
> > patch and combine with removing fixup_pointer().
>
> I'll put it in a separate commit even preceding this one, as it's
> actually a bug in the existing fixup pointer logic that I noticed when
> transitioning to the use of the RIP-relative macro. Specifically,
> early_dynamic_pgts is a global variable just like next_early_pgt and
> thus also needs to be fixed up to guarantee the correct address is
> used across toolchains.

No, this is not a bug.

early_dynamic_pgts is a two dimensional array

extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];

and so this expression

early_dynamic_pgts[(*next_pgt_ptr)++]

produces the kernel virtual address of one of the top level elements,
which itself is an array, and so this can be fixed up as usual.
IOW, the [] operation does not result in pointer dereference here,
only in pointer arithmetic.

2024-02-03 00:19:27

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Fri, Feb 2, 2024 at 2:47 PM Ard Biesheuvel <[email protected]> wrote:
>
> On Fri, 2 Feb 2024 at 23:00, Kevin Loughlin <[email protected]> wrote:
> >
> > On Wed, Jan 31, 2024 at 12:20 AM Kirill A. Shutemov
> > <[email protected]> wrote:
> > >
> > > On Tue, Jan 30, 2024 at 10:08:44PM +0000, Kevin Loughlin wrote:
> > > > diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> > > > index 359ada486fa9..b65e66ee79c4 100644
> > > > --- a/arch/x86/include/asm/mem_encrypt.h
> > > > +++ b/arch/x86/include/asm/mem_encrypt.h
> > > > @@ -17,6 +17,20 @@
> > > >
> > > > #include <asm/bootparam.h>
> > > >
> > > > +/*
> > > > + * Like the address operator "&", evaluates to the address of a LHS variable
> > > > + * "var", but also enforces the use of RIP-relative logic. This macro can be
> > > > + * used to safely access global data variables prior to kernel relocation.
> > > > + */
> > > > +#define RIP_RELATIVE_ADDR(var) \
> > > > +({ \
> > > > + void *rip_rel_ptr; \
> > > > + asm ("lea "#var"(%%rip), %0" \
> > > > + : "=r" (rip_rel_ptr) \
> > > > + : "p" (&var)); \
> > > > + rip_rel_ptr; \
> > > > +})
> > > > +
> > >
> > > I don't think it is the right place for the macro. The next patch uses for
> > > things unrelated to memory encryption.
> >
> > You're right; with the cleanup, I agree it becomes more general. We
> > can move it to arch/x86/include/asm/asm.h.
> >
> > >
> > > > @@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > > > */
> > > >
> > > > next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> > > > - pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > > > - pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > > > + early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> > > > + pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> > > > + pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> > > >
> > >
> > > This change doesn't belong to this patch. Maybe move it into the next
> > > patch and combine with removing fixup_pointer().
> >
> > I'll put it in a separate commit even preceding this one, as it's
> > actually a bug in the existing fixup pointer logic that I noticed when
> > transitioning to the use of the RIP-relative macro. Specifically,
> > early_dynamic_pgts is a global variable just like next_early_pgt and
> > thus also needs to be fixed up to guarantee the correct address is
> > used across toolchains.
>
> No, this is not a bug.
>
> early_dynamic_pgts is a two dimensional array
>
> extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
>
> and so this expression
>
> early_dynamic_pgts[(*next_pgt_ptr)++]
>
> produces the kernel virtual address of one of the top level elements,
> which itself is an array, and so this can be fixed up as usual.
> IOW, the [] operation does not result in pointer dereference here,
> only in pointer arithmetic.

Ah, yes, you're right. Thanks for the correction! I will just drop it then.

2024-02-03 00:23:01

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] x86/sev: enforce RIP-relative accesses in early SEV/SME code

On Wed, Jan 31, 2024 at 5:42 AM Ard Biesheuvel <[email protected]> wrote:
>
> Hi Kevin,
>
> On Tue, 30 Jan 2024 at 23:09, Kevin Loughlin <[email protected]> wrote:
> >
> > The compiler is not required to generate RIP-relative accesses for
> > SEV/SME global variables in early boot. While an attempt was made to
> > force RIP-relative addressing for certain global SEV/SME variables via
> > inline assembly (see snp_cpuid_get_table() for example), RIP-relative
> > addressing must be pervasively- enforced for SEV/SME global variables
> > when accessed prior to page table fixups.
> >
> > __startup_64() already handles this issue for select non-SEV/SME global
> > variables using fixup_pointer(), which adjusts the pointer relative to
> > a `physaddr` argument. To avoid having to pass around this `physaddr`
> > argument across all functions needing to apply pointer fixups, this
> > patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
> > the existing snp_cpuid_get_table()), which generates an RIP-relative
> > pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
> > introduced to fixup an existing pointer value with RIP-relative logic.
> >
> > Applying these macros to early SEV/SME code (alongside Adam Dunlap's
> > necessary "[PATCH v2] x86/asm: Force native_apic_mem_read to use mov")
> > enables previously-failing boots of clang builds to succeed, while
> > preserving successful boot of gcc builds. Tested with and without SEV,
> > SEV-ES, SEV-SNP enabled in guests built via both gcc and clang.
> >
> > Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
> > Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
> > Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
> > Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
> > Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
> > Signed-off-by: Kevin Loughlin <[email protected]>
> > ---
> > arch/x86/coco/core.c | 22 ++++++++-----
> > arch/x86/include/asm/mem_encrypt.h | 32 +++++++++++++++---
> > arch/x86/kernel/head64.c | 31 ++++++++++--------
> > arch/x86/kernel/head_64.S | 4 ++-
> > arch/x86/kernel/sev-shared.c | 52 ++++++++++++++----------------
> > arch/x86/kernel/sev.c | 13 +++++---
> > arch/x86/mm/mem_encrypt_identity.c | 50 ++++++++++++++--------------
> > 7 files changed, 122 insertions(+), 82 deletions(-)
> >
>
> OK, so the purpose of this patch is to have something that can be
> backported before applying the changes I proposed to fix this more
> comprehensively, right?

Correct.

> I think that makes sense, although I'd like to understand how far this
> would need to be backported, and for which purpose.

It would need to be backported to the first SEV-SNP support merged
upstream, which I believe was in 5.19. The rationale for the backport
is to provide an upstream fix for clang builds of SEV-SNP guests [0].

[0] https://lore.kernel.org/lkml/CAJ5mJ6j-Vw2P=QLK-J_J79S35UggvZPtm5sia74=enR1qq9X9A@mail.gmail.com/

> > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > index eeec9986570e..8c45b5643f48 100644
> > --- a/arch/x86/coco/core.c
> > +++ b/arch/x86/coco/core.c
> > @@ -5,6 +5,11 @@
> > * Copyright (C) 2021 Advanced Micro Devices, Inc.
> > *
> > * Author: Tom Lendacky <[email protected]>
> > + *
> > + * WARNING!!
> > + * Select functions in this file can execute prior to page table fixups and thus
> > + * require pointer fixups for global variable accesses. See WARNING in
> > + * arch/x86/kernel/head64.c.
> > */
> >
> > #include <linux/export.h>
> > @@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
> > static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> > {
> > #ifdef CONFIG_AMD_MEM_ENCRYPT
> > + const u64 sev_status_fixed_up = sev_get_status_fixup();
> >
> > - if (sev_status & MSR_AMD64_SNP_VTOM)
> > + if (sev_status_fixed_up & MSR_AMD64_SNP_VTOM)
> > return amd_cc_platform_vtom(attr);
> >
> > switch (attr) {
> > case CC_ATTR_MEM_ENCRYPT:
> > - return sme_me_mask;
> > + return sme_get_me_mask_fixup();
> >
> > case CC_ATTR_HOST_MEM_ENCRYPT:
> > - return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
> > + return sme_get_me_mask_fixup() && !(sev_status_fixed_up & MSR_AMD64_SEV_ENABLED);
> >
> > case CC_ATTR_GUEST_MEM_ENCRYPT:
> > - return sev_status & MSR_AMD64_SEV_ENABLED;
> > + return sev_status_fixed_up & MSR_AMD64_SEV_ENABLED;
> >
> > case CC_ATTR_GUEST_STATE_ENCRYPT:
> > - return sev_status & MSR_AMD64_SEV_ES_ENABLED;
> > + return sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED;
> >
> > /*
> > * With SEV, the rep string I/O instructions need to be unrolled
> > * but SEV-ES supports them through the #VC handler.
> > */
> > case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > - return (sev_status & MSR_AMD64_SEV_ENABLED) &&
> > - !(sev_status & MSR_AMD64_SEV_ES_ENABLED);
> > + return (sev_status_fixed_up & MSR_AMD64_SEV_ENABLED) &&
> > + !(sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED);
> >
> > case CC_ATTR_GUEST_SEV_SNP:
> > - return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> > + return sev_status_fixed_up & MSR_AMD64_SEV_SNP_ENABLED;
> >
> > default:
> > return false;
>
> Is this code actually called early enough to matter here?

I think you're right that we don't need this; it looks like
cc_platform_has() is avoided early. An example of such avoidance is in
sme_encrypt_kernel() in arch/x86/mm/mem_encrypt_identity.c.

> > diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> > index 359ada486fa9..b65e66ee79c4 100644
> > --- a/arch/x86/include/asm/mem_encrypt.h
> > +++ b/arch/x86/include/asm/mem_encrypt.h
> > @@ -17,6 +17,20 @@
> >
> > #include <asm/bootparam.h>
> >
> > +/*
> > + * Like the address operator "&", evaluates to the address of a LHS variable
> > + * "var", but also enforces the use of RIP-relative logic. This macro can be
> > + * used to safely access global data variables prior to kernel relocation.
> > + */
> > +#define RIP_RELATIVE_ADDR(var) \
> > +({ \
> > + void *rip_rel_ptr; \
> > + asm ("lea "#var"(%%rip), %0" \
> > + : "=r" (rip_rel_ptr) \
> > + : "p" (&var)); \
>
> I'd prefer to make this
>
> asm ("lea %c1(%%rip), %0" : "=r" (rip_rel_ptr) : "i" (&var));
>
> the difference being that the compiler is forced to double check that
> #var and &var actually refer to the same global variable.
>
> That also means we can make it static inline.
>
> static inline __attribute_const__ rip_relative_ptr(const void *var)
> {
> void *rip_rel_ptr;
>
> asm ("lea %c1(%%rip), %0" : "=r" (rip_rel_ptr) : "i" (&var));
> return rip_rel_ptr;
> }
>
> #define RIP_RELATIVE_ADDR(var) rip_relative_ptr(&var)

Good idea, works for me.

> > #ifdef CONFIG_X86_MEM_ENCRYPT
> > void __init mem_encrypt_init(void);
> > void __init mem_encrypt_setup_arch(void);
> > @@ -58,6 +72,16 @@ void __init mem_encrypt_free_decrypted_mem(void);
> >
> > void __init sev_es_init_vc_handling(void);
> >
> > +static __always_inline u64 sme_get_me_mask_fixup(void)
>
> Just call this sme_get_me_mask(void) as before, and keep the existing
> users. The RIP-relative reference will always work correctly so no
> need to avoid it later.

Agreed. In general, I will rework to minimize the changes for
backport-friendliness.

> > +{
> > + return *((u64 *) RIP_RELATIVE_ADDR(sme_me_mask));
>
> Can we move the cast into the macro?
>
> #define RIP_RELATIVE_REF(var) (*(typeof(&var))rip_relative_ptr(&var))
>
> and make this
>
> return RIP_RELATIVE_REF(sme_me_mask);

Yes. I will double check that there are no instances where we actually
want the pointer instead of the dereferenced value, but I believe this
always works.

> > +}
> > +
> > +static __always_inline u64 sev_get_status_fixup(void)
>
> Can we drop the _fixup suffix here? Or if we need to convey the fact
> that this is a special accessor that can be used early, use _early
> instead.

I'll just drop the suffix. I prefer not to use "early" in order to
avoid conflating the meaning with "runs before page table fixups",
which I don't think is true for all existing functions using the early
suffix.

> > +{
> > + return *((u64 *) RIP_RELATIVE_ADDR(sev_status));
> > +}
> > +
> > #define __bss_decrypted __section(".bss..decrypted")
> >
> > #else /* !CONFIG_AMD_MEM_ENCRYPT */
> > @@ -89,6 +113,9 @@ early_set_mem_enc_dec_hypercall(unsigned long vaddr, unsigned long size, bool en
> >
> > static inline void mem_encrypt_free_decrypted_mem(void) { }
> >
> > +static inline u64 sme_get_me_mask_fixup(void) { return 0; }
> > +static inline u64 sev_get_status_fixup(void) { return 0; }
> > +
> > #define __bss_decrypted
> >
> > #endif /* CONFIG_AMD_MEM_ENCRYPT */
> > @@ -106,11 +133,6 @@ void add_encrypt_protection_map(void);
> >
> > extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];
> >
> > -static inline u64 sme_get_me_mask(void)
> > -{
> > - return sme_me_mask;
> > -}
> > -
> > #endif /* __ASSEMBLY__ */
> >
> > #endif /* __X86_MEM_ENCRYPT_H__ */
> > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> > index dc0956067944..d159239997f4 100644
> > --- a/arch/x86/kernel/head64.c
> > +++ b/arch/x86/kernel/head64.c
> > @@ -128,6 +128,7 @@ static bool __head check_la57_support(unsigned long physaddr)
> >
> > static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
> > {
> > + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
> > unsigned long vaddr, vaddr_end;
> > int i;
> >
> > @@ -140,7 +141,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > * there is no need to zero it after changing the memory encryption
> > * attribute.
> > */
> > - if (sme_get_me_mask()) {
> > + if (sme_me_mask_fixed_up) {
> > vaddr = (unsigned long)__start_bss_decrypted;
> > vaddr_end = (unsigned long)__end_bss_decrypted;
> >
> > @@ -158,7 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
> >
> > i = pmd_index(vaddr);
> > - pmd[i] -= sme_get_me_mask();
> > + pmd[i] -= sme_me_mask_fixed_up;
> > }
> > }
> >
> > @@ -166,18 +167,22 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > * Return the SME encryption mask (if SME is active) to be used as a
> > * modifier for the initial pgdir entry programmed into CR3.
> > */
> > - return sme_get_me_mask();
> > + return sme_me_mask_fixed_up;
>
> Just use sme_get_me_mask() as before in this file.

Will do.

> > }
> >
> > -/* Code in __startup_64() can be relocated during execution, but the compiler
> > +/*
> > + * WARNING!!
> > + * Code in __startup_64() can be relocated during execution, but the compiler
> > * doesn't have to generate PC-relative relocations when accessing globals from
> > * that function. Clang actually does not generate them, which leads to
> > * boot-time crashes. To work around this problem, every global pointer must
> > - * be adjusted using fixup_pointer().
> > + * be adjusted using fixup_pointer() or RIP_RELATIVE_ADDR().
> > */
> > unsigned long __head __startup_64(unsigned long physaddr,
> > struct boot_params *bp)
> > {
> > + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
> > + pmd_t **early_dynamic_pgts_ptr;
> > unsigned long load_delta, *p;
> > unsigned long pgtable_flags;
> > pgdval_t *pgd;
> > @@ -206,7 +211,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > for (;;);
> >
> > /* Include the SME encryption mask in the fixup value */
> > - load_delta += sme_get_me_mask();
> > + load_delta += sme_me_mask_fixed_up;
> >
> > /* Fixup the physical addresses in the page table */
> >
> > @@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > */
> >
> > next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> > - pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > - pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > + early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> > + pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> > + pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> >
>
> Better to introduce early_dynamic_pgts_ptr in a separate patch if it
> is just an optimization but doesn't actually fix anything.

Yeah, we can just drop. I mistakenly previously believed
early_dynamic_pgts also needed a fixup.

> > - pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
> > + pgtable_flags = _KERNPG_TABLE_NOENC + sme_me_mask_fixed_up;
> >
> > if (la57) {
> > - p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
> > - physaddr);
> > + p4d = (p4dval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> >
> > i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
> > pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
> > @@ -269,7 +274,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > /* Filter out unsupported __PAGE_KERNEL_* bits: */
> > mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
> > pmd_entry &= *mask_ptr;
> > - pmd_entry += sme_get_me_mask();
> > + pmd_entry += sme_me_mask_fixed_up;
> > pmd_entry += physaddr;
> >
> > for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
> > @@ -313,7 +318,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > * Fixup phys_base - remove the memory encryption mask to obtain
> > * the true physical address.
> > */
> > - *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
> > + *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;
> >
> > return sme_postprocess_startup(bp, pmd);
> > }
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index d4918d03efb4..b9e52cee6e00 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -176,9 +176,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> > /*
> > * Retrieve the modifier (SME encryption mask if SME is active) to be
> > * added to the initial pgdir entry that will be programmed into CR3.
> > + * Since we may have not completed page table fixups, use RIP-relative
> > + * addressing for sme_me_mask.
>
> This runs on the secondary path only, so this comment is inaccurate.

Good catch, thanks. I'll drop it.

> > */
> > #ifdef CONFIG_AMD_MEM_ENCRYPT
> > - movq sme_me_mask, %rax
> > + movq sme_me_mask(%rip), %rax
> > #else
> > xorq %rax, %rax
> > #endif
> > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> > index 1d24ec679915..9ea6bea37e1d 100644
> > --- a/arch/x86/kernel/sev-shared.c
> > +++ b/arch/x86/kernel/sev-shared.c
> > @@ -7,6 +7,11 @@
> > * This file is not compiled stand-alone. It contains code shared
> > * between the pre-decompression boot code and the running Linux kernel
> > * and is included directly into both code-bases.
> > + *
> > + * WARNING!!
> > + * Select functions in this file can execute prior to page table fixups and thus
> > + * require pointer fixups for global variable accesses. See WARNING in
> > + * arch/x86/kernel/head64.c.
> > */
> >
> > #ifndef __BOOT_COMPRESSED
> > @@ -318,23 +323,6 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
> > : __sev_cpuid_hv_msr(leaf);
> > }
> >
> > -/*
> > - * This may be called early while still running on the initial identity
> > - * mapping. Use RIP-relative addressing to obtain the correct address
> > - * while running with the initial identity mapping as well as the
> > - * switch-over to kernel virtual addresses later.
> > - */
> > -static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> > -{
>
> You could just make this return the RIP_RELATIVE_ADDR() result, right?

Yes, I'll do that to minimize changes for backport-friendliness.

> > - void *ptr;
> > -
> > - asm ("lea cpuid_table_copy(%%rip), %0"
> > - : "=r" (ptr)
> > - : "p" (&cpuid_table_copy));
> > -
> > - return ptr;
> > -}
> > -
> > /*
> > * The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
> > * XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
> > @@ -357,7 +345,7 @@ static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> > */
> > static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> > {
> > - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> > + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> > u64 xfeatures_found = 0;
> > u32 xsave_size = 0x240;
> > int i;
> > @@ -394,7 +382,7 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> > static bool
> > snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> > {
> > - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> > + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> > int i;
> >
> > for (i = 0; i < cpuid_table->count; i++) {
> > @@ -530,7 +518,8 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > */
> > static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> > {
> > - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> > + const u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
> > + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> >
> > if (!cpuid_table->count)
> > return -EOPNOTSUPP;
> > @@ -555,10 +544,14 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
> > */
> > leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;
> >
> > + cpuid_std_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_std_range_max);
> > + cpuid_hyp_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_hyp_range_max);
> > + cpuid_ext_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_ext_range_max);
> > +
> > /* Skip post-processing for out-of-range zero leafs. */
> > - if (!(leaf->fn <= cpuid_std_range_max ||
> > - (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
> > - (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
> > + if (!(leaf->fn <= *cpuid_std_range_max_ptr ||
> > + (leaf->fn >= 0x40000000 && leaf->fn <= *cpuid_hyp_range_max_ptr) ||
> > + (leaf->fn >= 0x80000000 && leaf->fn <= *cpuid_ext_range_max_ptr)))
> > return 0;
> > }
> >
> > @@ -1045,6 +1038,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
> > */
> > static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> > {
> > + u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
> > const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
> > int i;
> >
> > @@ -1055,19 +1049,23 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> > if (!cpuid_table_fw->count || cpuid_table_fw->count > SNP_CPUID_COUNT_MAX)
> > sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID);
> >
> > - cpuid_table = snp_cpuid_get_table();
> > + cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> > memcpy((void *)cpuid_table, cpuid_table_fw, sizeof(*cpuid_table));
> >
> > + cpuid_std_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_std_range_max);
> > + cpuid_hyp_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_hyp_range_max);
> > + cpuid_ext_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_ext_range_max);
> > +
>
> Can we cache the values here rather than the pointers?

Yes, I'll do so.

2024-02-03 00:23:38

by Kevin Loughlin

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

On Wed, Jan 31, 2024 at 10:30 AM Borislav Petkov <[email protected]> wrote:
>
> On Wed, Jan 31, 2024 at 10:16:55AM -0800, Jacob Xu wrote:
> > We're interested in fixing SEV-SNP guest boots which are currently
> > broken when using a guest kernel compiled with clang. It seems like
> > every other user of SEV/SNP linux kernel code uses GCC to compile the
> > kernel so they've avoided this issue.
>
> Lemme give that a try here.
>
> > E.g. Google COS uses clang to compile the kernel and we've made do
> > with an internal fix for a while.
>
> Which means that, theoretically, you could forward-port this internal
> fix until the issue is fixed for real, I'd say.

True. I just think it would be better to have an upstream fix for
clang builds of SEV-SNP guests; I believe the first such SEV-SNP code
was merged in 5.19 if I'm not mistaken.

2024-02-03 10:16:17

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

On Sat, 3 Feb 2024 at 01:22, Kevin Loughlin <[email protected]> wrote:
>
> On Wed, Jan 31, 2024 at 10:30 AM Borislav Petkov <[email protected]> wrote:
> >
> > On Wed, Jan 31, 2024 at 10:16:55AM -0800, Jacob Xu wrote:
> > > We're interested in fixing SEV-SNP guest boots which are currently
> > > broken when using a guest kernel compiled with clang. It seems like
> > > every other user of SEV/SNP linux kernel code uses GCC to compile the
> > > kernel so they've avoided this issue.
> >
> > Lemme give that a try here.
> >
> > > E.g. Google COS uses clang to compile the kernel and we've made do
> > > with an internal fix for a while.
> >
> > Which means that, theoretically, you could forward-port this internal
> > fix until the issue is fixed for real, I'd say.
>
> True. I just think it would be better to have an upstream fix for
> clang builds of SEV-SNP guests; I believe the first such SEV-SNP code
> was merged in 5.19 if I'm not mistaken.

The problem is not only Clang. The problem is that we try to keep the
stable trees working with newer compilers in general, and we are
relying heavily on behavior on the part of the compiler that could
change in the future. Those references that GCC happens to emit as
RIP-relative today even without the workarounds could easily turn into
absolute references on tomorrow's version, given that both are
permitted by the code model under -fno-pic.

I've compared notes with Kevin internally, and we'll get a minimal,
simplified version of these changes into my v4 SEV PIC series so that
we can easily cherry-pick the fixes, either into linux-stable or into
our downstream fork.

2024-02-03 10:20:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

On Fri, Feb 02, 2024 at 04:22:02PM -0800, Kevin Loughlin wrote:
> True. I just think it would be better to have an upstream fix for
> clang builds of SEV-SNP guests; I believe the first such SEV-SNP code
> was merged in 5.19 if I'm not mistaken.

SNP host support is not upstream yet. So we'd be supporting something
which is out-of-tree. Lemme see how ugly it'll get...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-02-03 10:28:52

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

On Sat, 3 Feb 2024 at 11:20, Borislav Petkov <[email protected]> wrote:
>
> On Fri, Feb 02, 2024 at 04:22:02PM -0800, Kevin Loughlin wrote:
> > True. I just think it would be better to have an upstream fix for
> > clang builds of SEV-SNP guests; I believe the first such SEV-SNP code
> > was merged in 5.19 if I'm not mistaken.
>
> SNP host support is not upstream yet. So we'd be supporting something
> which is out-of-tree. Lemme see how ugly it'll get...
>

The minimal fix doesn't look that bad IMHO. Note that this version is
based on your patch that removes
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT, and we'd need to see whether
or not to backport that as well.

arch/x86/include/asm/asm.h | 13 +++++++++++++
arch/x86/include/asm/mem_encrypt.h | 13 ++++++++-----
arch/x86/kernel/sev-shared.c | 12 ++++++------
arch/x86/kernel/sev.c | 9 +++++++--
arch/x86/mm/mem_encrypt_identity.c | 27 ++++++++++++---------------
5 files changed, 46 insertions(+), 28 deletions(-)

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=x86-pie-for-sev-v4a&id=65d0e5f4ed6ca807cdf28a1c5c0389af2c9f9bda

2024-02-03 11:26:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86: enforce and cleanup RIP-relative accesses in early boot code

On Sat, Feb 03, 2024 at 11:27:19AM +0100, Ard Biesheuvel wrote:
> The minimal fix doesn't look that bad IMHO. Note that this version is
> based on your patch that removes
> CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT, and we'd need to see whether
> or not to backport that as well.

Ok, please send it formally so that I can take a look.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-02-06 15:48:37

by tip-bot2 for Tony Luck

[permalink] [raw]
Subject: [tip: x86/sev] x86/sev: Fix position dependent variable references in startup code

The following commit has been merged into the x86/sev branch of tip:

Commit-ID: 1c811d403afd73f04bde82b83b24c754011bd0e8
Gitweb: https://git.kernel.org/tip/1c811d403afd73f04bde82b83b24c754011bd0e8
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Sat, 03 Feb 2024 13:53:06 +01:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Tue, 06 Feb 2024 16:38:42 +01:00

x86/sev: Fix position dependent variable references in startup code

The early startup code executes from a 1:1 mapping of memory, which
differs from the mapping that the code was linked and/or relocated to
run at. The latter mapping is not active yet at this point, and so
symbol references that rely on it will fault.

Given that the core kernel is built without -fPIC, symbol references are
typically emitted as absolute, and so any such references occuring in
the early startup code will therefore crash the kernel.

While an attempt was made to work around this for the early SEV/SME
startup code, by forcing RIP-relative addressing for certain global
SEV/SME variables via inline assembly (see snp_cpuid_get_table() for
example), RIP-relative addressing must be pervasively enforced for
SEV/SME global variables when accessed prior to page table fixups.

__startup_64() already handles this issue for select non-SEV/SME global
variables using fixup_pointer(), which adjusts the pointer relative to a
`physaddr` argument. To avoid having to pass around this `physaddr`
argument across all functions needing to apply pointer fixups, introduce
a macro RIP_RELATIVE_REF() which generates a RIP-relative reference to
a given global variable. It is used where necessary to force
RIP-relative accesses to global variables.

For backporting purposes, this patch makes no attempt at cleaning up
other occurrences of this pattern, involving either inline asm or
fixup_pointer(). Those will be addressed later.

[ bp: Call it "rip_rel_ref" everywhere like other code shortens
"rIP-relative reference" and make the asm wrapper __always_inline. ]

Co-developed-by: Kevin Loughlin <[email protected]>
Signed-off-by: Kevin Loughlin <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
---
arch/x86/coco/core.c | 7 +------
arch/x86/include/asm/asm.h | 14 ++++++++++++++
arch/x86/include/asm/coco.h | 8 +++++++-
arch/x86/include/asm/mem_encrypt.h | 15 +++++++++------
arch/x86/kernel/sev-shared.c | 12 ++++++------
arch/x86/kernel/sev.c | 4 ++--
arch/x86/mm/mem_encrypt_identity.c | 27 ++++++++++++---------------
7 files changed, 51 insertions(+), 36 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index eeec998..d07be9d 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -14,7 +14,7 @@
#include <asm/processor.h>

enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
-static u64 cc_mask __ro_after_init;
+u64 cc_mask __ro_after_init;

static bool noinstr intel_cc_platform_has(enum cc_attr attr)
{
@@ -148,8 +148,3 @@ u64 cc_mkdec(u64 val)
}
}
EXPORT_SYMBOL_GPL(cc_mkdec);
-
-__init void cc_set_mask(u64 mask)
-{
- cc_mask = mask;
-}
diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index fbcfec4..ca8eed1 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -113,6 +113,20 @@

#endif

+#ifndef __ASSEMBLY__
+#ifndef __pic__
+static __always_inline __pure void *rip_rel_ptr(void *p)
+{
+ asm("leaq %c1(%%rip), %0" : "=r"(p) : "i"(p));
+
+ return p;
+}
+#define RIP_REL_REF(var) (*(typeof(&(var)))rip_rel_ptr(&(var)))
+#else
+#define RIP_REL_REF(var) (var)
+#endif
+#endif
+
/*
* Macros to generate condition code outputs from inline assembly,
* The output operand must be type "bool".
diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h
index 6ae2d16..21940ef 100644
--- a/arch/x86/include/asm/coco.h
+++ b/arch/x86/include/asm/coco.h
@@ -2,6 +2,7 @@
#ifndef _ASM_X86_COCO_H
#define _ASM_X86_COCO_H

+#include <asm/asm.h>
#include <asm/types.h>

enum cc_vendor {
@@ -11,9 +12,14 @@ enum cc_vendor {
};

extern enum cc_vendor cc_vendor;
+extern u64 cc_mask;

#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
-void cc_set_mask(u64 mask);
+static inline void cc_set_mask(u64 mask)
+{
+ RIP_REL_REF(cc_mask) = mask;
+}
+
u64 cc_mkenc(u64 val);
u64 cc_mkdec(u64 val);
#else
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 359ada4..b31eb9f 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -15,7 +15,8 @@
#include <linux/init.h>
#include <linux/cc_platform.h>

-#include <asm/bootparam.h>
+#include <asm/asm.h>
+struct boot_params;

#ifdef CONFIG_X86_MEM_ENCRYPT
void __init mem_encrypt_init(void);
@@ -58,6 +59,11 @@ void __init mem_encrypt_free_decrypted_mem(void);

void __init sev_es_init_vc_handling(void);

+static inline u64 sme_get_me_mask(void)
+{
+ return RIP_REL_REF(sme_me_mask);
+}
+
#define __bss_decrypted __section(".bss..decrypted")

#else /* !CONFIG_AMD_MEM_ENCRYPT */
@@ -89,6 +95,8 @@ early_set_mem_enc_dec_hypercall(unsigned long vaddr, unsigned long size, bool en

static inline void mem_encrypt_free_decrypted_mem(void) { }

+static inline u64 sme_get_me_mask(void) { return 0; }
+
#define __bss_decrypted

#endif /* CONFIG_AMD_MEM_ENCRYPT */
@@ -106,11 +114,6 @@ void add_encrypt_protection_map(void);

extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];

-static inline u64 sme_get_me_mask(void)
-{
- return sme_me_mask;
-}
-
#endif /* __ASSEMBLY__ */

#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 5db24d0..ae79f95 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -560,9 +560,9 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;

/* Skip post-processing for out-of-range zero leafs. */
- if (!(leaf->fn <= cpuid_std_range_max ||
- (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
- (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
+ if (!(leaf->fn <= RIP_REL_REF(cpuid_std_range_max) ||
+ (leaf->fn >= 0x40000000 && leaf->fn <= RIP_REL_REF(cpuid_hyp_range_max)) ||
+ (leaf->fn >= 0x80000000 && leaf->fn <= RIP_REL_REF(cpuid_ext_range_max))))
return 0;
}

@@ -1072,11 +1072,11 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
const struct snp_cpuid_fn *fn = &cpuid_table->fn[i];

if (fn->eax_in == 0x0)
- cpuid_std_range_max = fn->eax;
+ RIP_REL_REF(cpuid_std_range_max) = fn->eax;
else if (fn->eax_in == 0x40000000)
- cpuid_hyp_range_max = fn->eax;
+ RIP_REL_REF(cpuid_hyp_range_max) = fn->eax;
else if (fn->eax_in == 0x80000000)
- cpuid_ext_range_max = fn->eax;
+ RIP_REL_REF(cpuid_ext_range_max) = fn->eax;
}
}

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 002af6c..1ef7ae8 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -748,7 +748,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* This eliminates worries about jump tables or checking boot_cpu_data
* in the cc_platform_has() function.
*/
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (!(RIP_REL_REF(sev_status) & MSR_AMD64_SEV_SNP_ENABLED))
return;

/*
@@ -767,7 +767,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
* This eliminates worries about jump tables or checking boot_cpu_data
* in the cc_platform_has() function.
*/
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (!(RIP_REL_REF(sev_status) & MSR_AMD64_SEV_SNP_ENABLED))
return;

/* Ask hypervisor to mark the memory pages shared in the RMP table. */
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index efe9f21..0166ab1 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -304,7 +304,8 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* instrumentation or checking boot_cpu_data in the cc_platform_has()
* function.
*/
- if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED)
+ if (!sme_get_me_mask() ||
+ RIP_REL_REF(sev_status) & MSR_AMD64_SEV_ENABLED)
return;

/*
@@ -541,11 +542,11 @@ void __init sme_enable(struct boot_params *bp)
me_mask = 1UL << (ebx & 0x3f);

/* Check the SEV MSR whether SEV or SME is enabled */
- sev_status = __rdmsr(MSR_AMD64_SEV);
- feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
+ RIP_REL_REF(sev_status) = msr = __rdmsr(MSR_AMD64_SEV);
+ feature_mask = (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;

/* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
- if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (snp && !(msr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();

/* Check if memory encryption is enabled */
@@ -571,7 +572,6 @@ void __init sme_enable(struct boot_params *bp)
return;
} else {
/* SEV state cannot be controlled by a command line option */
- sme_me_mask = me_mask;
goto out;
}

@@ -590,16 +590,13 @@ void __init sme_enable(struct boot_params *bp)
cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
((u64)bp->ext_cmd_line_ptr << 32));

- if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
- goto out;
-
- if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
- sme_me_mask = me_mask;
+ if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0 ||
+ strncmp(buffer, cmdline_on, sizeof(buffer)))
+ return;

out:
- if (sme_me_mask) {
- physical_mask &= ~sme_me_mask;
- cc_vendor = CC_VENDOR_AMD;
- cc_set_mask(sme_me_mask);
- }
+ RIP_REL_REF(sme_me_mask) = me_mask;
+ physical_mask &= ~me_mask;
+ cc_vendor = CC_VENDOR_AMD;
+ cc_set_mask(me_mask);
}