2023-05-18 23:34:38

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv12 0/9] mm, x86/cc, efi: Implement support for unaccepted memory

UEFI Specification version 2.9 introduces the concept of memory
acceptance: some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, requiring memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific for the Virtual
Machine platform.

Accepting memory is costly and it makes VMM allocate memory for the
accepted guest physical address range. It's better to postpone memory
acceptance until memory is needed. It lowers boot time and reduces
memory overhead.

The kernel needs to know what memory has been accepted. Firmware
communicates this information via memory map: a new memory type --
EFI_UNACCEPTED_MEMORY -- indicates such memory.

Range-based tracking works fine for firmware, but it gets bulky for
the kernel: e820 has to be modified on every page acceptance. It leads
to table fragmentation, but there's a limited number of entries in the
e820 table

Another option is to mark such memory as usable in e820 and track if the
range has been accepted in a bitmap. One bit in the bitmap represents
2MiB in the address space: one 4k page is enough to track 64GiB or
physical address space.

In the worst-case scenario -- a huge hole in the middle of the
address space -- It needs 256MiB to handle 4PiB of the address
space.

Any unaccepted memory that is not aligned to 2M gets accepted upfront.

The approach lowers boot time substantially. Boot to shell is ~2.5x
faster for 4G TDX VM and ~4x faster for 64G.

TDX-specific code isolated from the core of unaccepted memory support. It
supposed to help to plug-in different implementation of unaccepted memory
such as SEV-SNP.

-- Fragmentation study --

Vlastimil and Mel were concern about effect of unaccepted memory on
fragmentation prevention measures in page allocator. I tried to evaluate
it, but it is tricky. As suggested I tried to run multiple parallel kernel
builds and follow how often kmem:mm_page_alloc_extfrag gets hit.

See results in the v9 of the patchset[1][2]

[1] https://lore.kernel.org/all/[email protected]
[2] https://lore.kernel.org/all/[email protected]

--

The tree can be found here:

https://github.com/intel/tdx.git guest-unaccepted-memory

The patchset depends on MAX_ORDER changes in MM tree.

v12:
- Re-initialize 'unaccepted_table' variable from decompressor to cover some
boot scenarios;
- Add missing memblock_reserve() for the unaccepted memory configuration
table (Mika);
- Add efi.unaccepted into efi_tables (Tom);
- Do not build tdx-shared.o for !TDX (Tom);
- Typo fix (Liam)
- Whitespace fix;
- Reviewed-bys from Liam, Tom and Ard;
v11:
- Restructure the code to make it less x86-specific (suggested by Ard):
+ use EFI configuration table instead of zero-page to pass down bitmap;
+ do not imply 1bit == 2M in bitmap;
+ move bulk of the code under driver/firmware/efi;
- The bitmap only covers unaccpeted memory now. All memory that is not covered
by the bitmap assumed accepted;
- Reviewed-by from Ard;
v10:
- Restructure code around zones_with_unaccepted_pages static brach to avoid
unnecessary function calls (Suggested by Vlastimil);
- Drop mentions of PageUnaccepted();
- Drop patches that add fake unaccepted memory support and sysfs handle to
accept memory manually;
- Add Reviewed-by from Vlastimil;
v9:
- Accept memory up to high watermark when kernel runs out of free memory;
- Treat unaccepted memory as unusable in __zone_watermark_unusable_free();
- Per-zone unaccepted memory accounting;
- All pages on unaccepted list are MAX_ORDER now;
- accept_memory=eager in cmdline to pre-accept memory during the boot;
- Implement fake unaccepted memory;
- Sysfs handle to accept memory manually;
- Drop PageUnaccepted();
- Rename unaccepted_pages static key to zones_with_unaccepted_pages;
v8:
- Rewrite core-mm support for unaccepted memory (patch 02/14);
- s/UnacceptedPages/Unaccepted/ in meminfo;
- Drop arch/x86/boot/compressed/compiler.h;
- Fix build errors;
- Adjust commit messages and comments;
- Reviewed-bys from Dave and Borislav;
- Rebased to tip/master.
v7:
- Rework meminfo counter to use PageUnaccepted() and move to generic code;
- Fix range_contains_unaccepted_memory() on machines without unaccepted memory;
- Add Reviewed-by from David;
v6:
- Fix load_unaligned_zeropad() on machine with unaccepted memory;
- Clear PageUnaccepted() on merged pages, leaving it only on head;
- Clarify error handling in allocate_e820();
- Fix build with CONFIG_UNACCEPTED_MEMORY=y, but without TDX;
- Disable kexec at boottime instead of build conflict;
- Rebased to tip/master;
- Spelling fixes;
- Add Reviewed-by from Mike and David;
v5:
- Updates comments and commit messages;
+ Explain options for unaccepted memory handling;
- Expose amount of unaccepted memory in /proc/meminfo
- Adjust check in page_expected_state();
- Fix error code handling in allocate_e820();
- Centralize __pa()/__va() definitions in the boot stub;
- Avoid includes from the main kernel in the boot stub;
- Use an existing hole in boot_param for unaccepted_memory, instead of adding
to the end of the structure;
- Extract allocate_unaccepted_memory() form allocate_e820();
- Complain if there's unaccepted memory, but kernel does not support it;
- Fix vmstat counter;
- Split up few preparatory patches;
- Random readability adjustments;
v4:
- PageBuddyUnaccepted() -> PageUnaccepted;
- Use separate page_type, not shared with offline;
- Rework interface between core-mm and arch code;
- Adjust commit messages;
- Ack from Mike;
Kirill A. Shutemov (9):
mm: Add support for unaccepted memory
efi/x86: Get full memory map in allocate_e820()
efi/libstub: Implement support for unaccepted memory
x86/boot/compressed: Handle unaccepted memory
efi: Add unaccepted memory support
efi/unaccepted: Avoid load_unaligned_zeropad() stepping into
unaccepted memory
x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in
boot stub
x86/tdx: Refactor try_accept_one()
x86/tdx: Add unaccepted memory support

arch/x86/Kconfig | 2 +
arch/x86/boot/compressed/Makefile | 3 +-
arch/x86/boot/compressed/efi.h | 10 +
arch/x86/boot/compressed/error.c | 19 ++
arch/x86/boot/compressed/error.h | 1 +
arch/x86/boot/compressed/kaslr.c | 35 ++-
arch/x86/boot/compressed/mem.c | 63 +++++
arch/x86/boot/compressed/misc.c | 7 +
arch/x86/boot/compressed/misc.h | 6 +
arch/x86/boot/compressed/tdx-shared.c | 2 +
arch/x86/boot/compressed/tdx.c | 37 +++
arch/x86/coco/tdx/Makefile | 2 +-
arch/x86/coco/tdx/tdx-shared.c | 95 +++++++
arch/x86/coco/tdx/tdx.c | 118 +--------
arch/x86/include/asm/efi.h | 2 +
arch/x86/include/asm/shared/tdx.h | 53 ++++
arch/x86/include/asm/tdx.h | 21 +-
arch/x86/include/asm/unaccepted_memory.h | 23 ++
arch/x86/platform/efi/efi.c | 3 +
drivers/base/node.c | 7 +
drivers/firmware/efi/Kconfig | 14 ++
drivers/firmware/efi/Makefile | 1 +
drivers/firmware/efi/efi.c | 26 ++
drivers/firmware/efi/libstub/Makefile | 2 +
drivers/firmware/efi/libstub/bitmap.c | 41 +++
drivers/firmware/efi/libstub/efistub.h | 6 +
drivers/firmware/efi/libstub/find.c | 43 ++++
.../firmware/efi/libstub/unaccepted_memory.c | 234 ++++++++++++++++++
drivers/firmware/efi/libstub/x86-stub.c | 39 +--
drivers/firmware/efi/unaccepted_memory.c | 138 +++++++++++
fs/proc/meminfo.c | 5 +
include/linux/efi.h | 13 +-
include/linux/mm.h | 19 ++
include/linux/mmzone.h | 8 +
mm/memblock.c | 9 +
mm/mm_init.c | 7 +
mm/page_alloc.c | 173 +++++++++++++
mm/vmstat.c | 3 +
38 files changed, 1125 insertions(+), 165 deletions(-)
create mode 100644 arch/x86/boot/compressed/mem.c
create mode 100644 arch/x86/boot/compressed/tdx-shared.c
create mode 100644 arch/x86/coco/tdx/tdx-shared.c
create mode 100644 arch/x86/include/asm/unaccepted_memory.h
create mode 100644 drivers/firmware/efi/libstub/bitmap.c
create mode 100644 drivers/firmware/efi/libstub/find.c
create mode 100644 drivers/firmware/efi/libstub/unaccepted_memory.c
create mode 100644 drivers/firmware/efi/unaccepted_memory.c

--
2.39.3



2023-05-19 16:29:27

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH 2/6] x86/sev: Put PSC struct on the stack in prep for unaccepted memory support

In advance of providing support for unaccepted memory, switch from using
kmalloc() for allocating the Page State Change (PSC) structure to using a
local variable that lives on the stack. This is needed to avoid a possible
recursive call into set_pages_state() if the kmalloc() call requires
(more) memory to be accepted, which would result in a hang.

The current size of the PSC struct is 2,032 bytes. To make the struct more
stack friendly, reduce the number of PSC entries from 253 down to 64,
resulting in a size of 520 bytes. This is a nice compromise on struct size
and total PSC requests while still allowing parallel PSC operations across
vCPUs.

If the reduction in PSC entries results in any kind of performance issue
(that is not seen at the moment), use of a larger static PSC struct, with
fallback to the smaller stack version, can be investigated.

For more background info on this decision, see the subthread in the Link:
tag below.

Signed-off-by: Tom Lendacky <[email protected]>
Link: https://lore.kernel.org/lkml/658c455c40e8950cb046dd885dd19dc1c52d060a.1659103274.git.thomas.lendacky@amd.com
---
arch/x86/include/asm/sev-common.h | 9 +++++++--
arch/x86/kernel/sev.c | 10 ++--------
2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 0759af9b1acf..b463fcbd4b90 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -106,8 +106,13 @@ enum psc_op {
#define GHCB_HV_FT_SNP BIT_ULL(0)
#define GHCB_HV_FT_SNP_AP_CREATION BIT_ULL(1)

-/* SNP Page State Change NAE event */
-#define VMGEXIT_PSC_MAX_ENTRY 253
+/*
+ * SNP Page State Change NAE event
+ * The VMGEXIT_PSC_MAX_ENTRY determines the size of the PSC structure, which
+ * is a local stack variable in set_pages_state(). Do not increase this value
+ * without evaluating the impact to stack usage.
+ */
+#define VMGEXIT_PSC_MAX_ENTRY 64

struct psc_hdr {
u16 cur_entry;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 108bbae59c35..7b0144acd7bf 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -882,11 +882,7 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
{
unsigned long vaddr_end, next_vaddr;
- struct snp_psc_desc *desc;
-
- desc = kmalloc(sizeof(*desc), GFP_KERNEL_ACCOUNT);
- if (!desc)
- panic("SNP: failed to allocate memory for PSC descriptor\n");
+ struct snp_psc_desc desc;

vaddr = vaddr & PAGE_MASK;
vaddr_end = vaddr + (npages << PAGE_SHIFT);
@@ -896,12 +892,10 @@ static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
next_vaddr = min_t(unsigned long, vaddr_end,
(VMGEXIT_PSC_MAX_ENTRY * PAGE_SIZE) + vaddr);

- __set_pages_state(desc, vaddr, next_vaddr, op);
+ __set_pages_state(&desc, vaddr, next_vaddr, op);

vaddr = next_vaddr;
}
-
- kfree(desc);
}

void snp_set_memory_shared(unsigned long vaddr, unsigned long npages)
--
2.40.0


2023-05-19 16:29:51

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH 3/6] x86/sev: Allow for use of the early boot GHCB for PSC requests

Using a GHCB for a page stage change (as opposed to the MSR protocol)
allows for multiple pages to be processed in a single request. In prep
for early PSC requests in support of unaccepted memory, update the
invocation of vmgexit_psc() to be able to use the early boot GHCB and not
just the per-CPU GHCB structure.

In order to use the proper GHCB (early boot vs per-CPU), set a flag that
indicates when the per-CPU GHCBs are available and registered. For APs,
the per-CPU GHCBs are created before they are started and registered upon
startup, so this flag can be used globally for the BSP and APs instead of
creating a per-CPU flag. This will allow for a significant reduction in
the number of MSR protocol page state change requests when accepting
memory.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kernel/sev.c | 61 +++++++++++++++++++++++++++----------------
1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 7b0144acd7bf..973756c89dac 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -119,7 +119,19 @@ static DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);

struct sev_config {
__u64 debug : 1,
- __reserved : 63;
+
+ /*
+ * A flag used by __set_pages_state() that indicates when the
+ * per-CPU GHCB has been created and registered and thus can be
+ * used by the BSP instead of the early boot GHCB.
+ *
+ * For APs, the per-CPU GHCB is created before they are started
+ * and registered upon startup, so this flag can be used globally
+ * for the BSP and APs.
+ */
+ ghcbs_initialized : 1,
+
+ __reserved : 62;
};

static struct sev_config sev_cfg __read_mostly;
@@ -662,7 +674,7 @@ static void pvalidate_pages(unsigned long vaddr, unsigned long npages, bool vali
}
}

-static void __init early_set_pages_state(unsigned long paddr, unsigned long npages, enum psc_op op)
+static void early_set_pages_state(unsigned long paddr, unsigned long npages, enum psc_op op)
{
unsigned long paddr_end;
u64 val;
@@ -756,26 +768,13 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
WARN(1, "invalid memory op %d\n", op);
}

-static int vmgexit_psc(struct snp_psc_desc *desc)
+static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc)
{
int cur_entry, end_entry, ret = 0;
struct snp_psc_desc *data;
- struct ghcb_state state;
struct es_em_ctxt ctxt;
- unsigned long flags;
- struct ghcb *ghcb;

- /*
- * __sev_get_ghcb() needs to run with IRQs disabled because it is using
- * a per-CPU GHCB.
- */
- local_irq_save(flags);
-
- ghcb = __sev_get_ghcb(&state);
- if (!ghcb) {
- ret = 1;
- goto out_unlock;
- }
+ vc_ghcb_invalidate(ghcb);

/* Copy the input desc into GHCB shared buffer */
data = (struct snp_psc_desc *)ghcb->shared_buffer;
@@ -832,20 +831,18 @@ static int vmgexit_psc(struct snp_psc_desc *desc)
}

out:
- __sev_put_ghcb(&state);
-
-out_unlock:
- local_irq_restore(flags);
-
return ret;
}

static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
unsigned long vaddr_end, int op)
{
+ struct ghcb_state state;
struct psc_hdr *hdr;
struct psc_entry *e;
+ unsigned long flags;
unsigned long pfn;
+ struct ghcb *ghcb;
int i;

hdr = &data->hdr;
@@ -875,8 +872,20 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
i++;
}

- if (vmgexit_psc(data))
+ local_irq_save(flags);
+
+ if (sev_cfg.ghcbs_initialized)
+ ghcb = __sev_get_ghcb(&state);
+ else
+ ghcb = boot_ghcb;
+
+ if (!ghcb || vmgexit_psc(ghcb, data))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ if (sev_cfg.ghcbs_initialized)
+ __sev_put_ghcb(&state);
+
+ local_irq_restore(flags);
}

static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
@@ -884,6 +893,10 @@ static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
unsigned long vaddr_end, next_vaddr;
struct snp_psc_desc desc;

+ /* Use the MSR protocol when a GHCB is not available. */
+ if (!boot_ghcb)
+ return early_set_pages_state(__pa(vaddr), npages, op);
+
vaddr = vaddr & PAGE_MASK;
vaddr_end = vaddr + (npages << PAGE_SHIFT);

@@ -1261,6 +1274,8 @@ void setup_ghcb(void)
if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
snp_register_per_cpu_ghcb();

+ sev_cfg.ghcbs_initialized = true;
+
return;
}

--
2.40.0


2023-05-19 16:33:27

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH 5/6] x86/sev: Add SNP-specific unaccepted memory support

Add SNP-specific hooks to the unaccepted memory support in the boot
path (__accept_memory()) and the core kernel (accept_memory()) in order
to support booting SNP guests when unaccepted memory is present. Without
this support, SNP guests will fail to boot and/or panic() when unaccepted
memory is present in the EFI memory map.

The process of accepting memory under SNP involves invoking the hypervisor
to perform a page state change for the page to private memory and then
issuing a PVALIDATE instruction to accept the page.

Since the boot path and the core kernel paths perform similar operations,
move the pvalidate_pages() and vmgexit_psc() functions into sev-shared.c
to avoid code duplication.

Create the new header file arch/x86/boot/compressed/sev.h because adding
the function declaration to any of the existing SEV related header files
pulls in too many other header files, causing the build to fail.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/Kconfig | 2 +
arch/x86/boot/compressed/mem.c | 3 +
arch/x86/boot/compressed/sev.c | 54 ++++++++++-
arch/x86/boot/compressed/sev.h | 23 +++++
arch/x86/include/asm/sev.h | 3 +
arch/x86/include/asm/unaccepted_memory.h | 3 +
arch/x86/kernel/sev-shared.c | 103 +++++++++++++++++++++
arch/x86/kernel/sev.c | 112 +++--------------------
8 files changed, 204 insertions(+), 99 deletions(-)
create mode 100644 arch/x86/boot/compressed/sev.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c72067c06d4..b9c451f75d5e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1543,11 +1543,13 @@ config X86_MEM_ENCRYPT
config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+ depends on EFI_STUB
select DMA_COHERENT_POOL
select ARCH_USE_MEMREMAP_PROT
select INSTRUCTION_DECODER
select ARCH_HAS_CC_PLATFORM
select X86_MEM_ENCRYPT
+ select UNACCEPTED_MEMORY
help
Say yes to enable support for the encryption of system memory.
This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
index 8df3d988ae69..c8f2353f6894 100644
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -3,6 +3,7 @@
#include "error.h"
#include "misc.h"
#include "tdx.h"
+#include "sev.h"
#include <asm/shared/tdx.h>

/*
@@ -36,6 +37,8 @@ void arch_accept_memory(phys_addr_t start, phys_addr_t end)
/* Platform-specific memory-acceptance call goes here */
if (early_is_tdx_guest())
tdx_accept_memory(start, end);
+ else if (sev_snp_enabled())
+ snp_accept_memory(start, end);
else
error("Cannot accept memory: unknown platform\n");
}
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 014b89c89088..09dc8c187b3c 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -115,7 +115,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"

-static inline bool sev_snp_enabled(void)
+bool sev_snp_enabled(void)
{
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}
@@ -181,6 +181,58 @@ static bool early_setup_ghcb(void)
return true;
}

+static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
+ phys_addr_t pa, phys_addr_t pa_end)
+{
+ struct psc_hdr *hdr;
+ struct psc_entry *e;
+ unsigned int i;
+
+ hdr = &desc->hdr;
+ memset(hdr, 0, sizeof(*hdr));
+
+ e = desc->entries;
+
+ i = 0;
+ while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
+ hdr->end_entry = i;
+
+ e->gfn = pa >> PAGE_SHIFT;
+ e->operation = SNP_PAGE_STATE_PRIVATE;
+ if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
+ e->pagesize = RMP_PG_SIZE_2M;
+ pa += PMD_SIZE;
+ } else {
+ e->pagesize = RMP_PG_SIZE_4K;
+ pa += PAGE_SIZE;
+ }
+
+ e++;
+ i++;
+ }
+
+ if (vmgexit_psc(boot_ghcb, desc))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ pvalidate_pages(desc);
+
+ return pa;
+}
+
+void snp_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ struct snp_psc_desc desc = {};
+ unsigned int i;
+ phys_addr_t pa;
+
+ if (!boot_ghcb && !early_setup_ghcb())
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ pa = start;
+ while (pa < end)
+ pa = __snp_accept_memory(&desc, pa, end);
+}
+
void sev_es_shutdown_ghcb(void)
{
if (!boot_ghcb)
diff --git a/arch/x86/boot/compressed/sev.h b/arch/x86/boot/compressed/sev.h
new file mode 100644
index 000000000000..fc725a981b09
--- /dev/null
+++ b/arch/x86/boot/compressed/sev.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD SEV header for early boot related functions.
+ *
+ * Author: Tom Lendacky <[email protected]>
+ */
+
+#ifndef BOOT_COMPRESSED_SEV_H
+#define BOOT_COMPRESSED_SEV_H
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+bool sev_snp_enabled(void);
+void snp_accept_memory(phys_addr_t start, phys_addr_t end);
+
+#else
+
+static inline bool sev_snp_enabled(void) { return false; }
+static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
+
+#endif
+
+#endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index e21e1c5397c1..86e1296e87f5 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -206,6 +206,7 @@ void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
+void snp_accept_memory(phys_addr_t start, phys_addr_t end);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -229,6 +230,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
{
return -ENOTTY;
}
+
+static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
#endif

#endif
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
index 72b354f992bb..ed3fcd3ac9dd 100644
--- a/arch/x86/include/asm/unaccepted_memory.h
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -3,12 +3,15 @@

#include <linux/efi.h>
#include <asm/tdx.h>
+#include <asm/sev.h>

static inline void arch_accept_memory(phys_addr_t start, phys_addr_t end)
{
/* Platform-specific memory-acceptance call goes here */
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
tdx_accept_memory(start, end);
+ } else if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
+ snp_accept_memory(start, end);
} else {
panic("Cannot accept memory: unknown platform\n");
}
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 3a5b0c9c4fcc..be312db48a49 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -12,6 +12,9 @@
#ifndef __BOOT_COMPRESSED
#define error(v) pr_err(v)
#define has_cpuflag(f) boot_cpu_has(f)
+#else
+#undef WARN
+#define WARN(condition, format...) (!!(condition))
#endif

/* I/O parameters for CPUID-related helpers */
@@ -991,3 +994,103 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
cpuid_ext_range_max = fn->eax;
}
}
+
+static void pvalidate_pages(struct snp_psc_desc *desc)
+{
+ struct psc_entry *e;
+ unsigned long vaddr;
+ unsigned int size;
+ unsigned int i;
+ bool validate;
+ int rc;
+
+ for (i = 0; i <= desc->hdr.end_entry; i++) {
+ e = &desc->entries[i];
+
+ vaddr = (unsigned long)pfn_to_kaddr(e->gfn);
+ size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
+ validate = (e->operation == SNP_PAGE_STATE_PRIVATE) ? true : false;
+
+ rc = pvalidate(vaddr, size, validate);
+ if (rc == PVALIDATE_FAIL_SIZEMISMATCH && size == RMP_PG_SIZE_2M) {
+ unsigned long vaddr_end = vaddr + PMD_SIZE;
+
+ for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
+ rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
+ if (rc)
+ break;
+ }
+ }
+
+ if (rc) {
+ WARN(1, "Failed to validate address 0x%lx ret %d", vaddr, rc);
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+ }
+ }
+}
+
+static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc)
+{
+ int cur_entry, end_entry, ret = 0;
+ struct snp_psc_desc *data;
+ struct es_em_ctxt ctxt;
+
+ vc_ghcb_invalidate(ghcb);
+
+ /* Copy the input desc into GHCB shared buffer */
+ data = (struct snp_psc_desc *)ghcb->shared_buffer;
+ memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof(*desc)));
+
+ /*
+ * As per the GHCB specification, the hypervisor can resume the guest
+ * before processing all the entries. Check whether all the entries
+ * are processed. If not, then keep retrying. Note, the hypervisor
+ * will update the data memory directly to indicate the status, so
+ * reference the data->hdr everywhere.
+ *
+ * The strategy here is to wait for the hypervisor to change the page
+ * state in the RMP table before guest accesses the memory pages. If the
+ * page state change was not successful, then later memory access will
+ * result in a crash.
+ */
+ cur_entry = data->hdr.cur_entry;
+ end_entry = data->hdr.end_entry;
+
+ while (data->hdr.cur_entry <= data->hdr.end_entry) {
+ ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+ /* This will advance the shared buffer data points to. */
+ ret = sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0);
+
+ /*
+ * Page State Change VMGEXIT can pass error code through
+ * exit_info_2.
+ */
+ if (WARN(ret || ghcb->save.sw_exit_info_2,
+ "SNP: PSC failed ret=%d exit_info_2=%llx\n",
+ ret, ghcb->save.sw_exit_info_2)) {
+ ret = 1;
+ goto out;
+ }
+
+ /* Verify that reserved bit is not set */
+ if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n")) {
+ ret = 1;
+ goto out;
+ }
+
+ /*
+ * Sanity check that entry processing is not going backwards.
+ * This will happen only if hypervisor is tricking us.
+ */
+ if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_entry,
+"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
+ end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) {
+ ret = 1;
+ goto out;
+ }
+ }
+
+out:
+ return ret;
+}
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 8802a75e1c20..ea2546e5130f 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -657,38 +657,6 @@ static u64 __init get_jump_table_addr(void)
return ret;
}

-static void pvalidate_pages(struct snp_psc_desc *desc)
-{
- struct psc_entry *e;
- unsigned long vaddr;
- unsigned int size;
- unsigned int i;
- bool validate;
- int rc;
-
- for (i = 0; i <= desc->hdr.end_entry; i++) {
- e = &desc->entries[i];
-
- vaddr = (unsigned long)pfn_to_kaddr(e->gfn);
- size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
- validate = (e->operation == SNP_PAGE_STATE_PRIVATE) ? true : false;
-
- rc = pvalidate(vaddr, size, validate);
- if (rc == PVALIDATE_FAIL_SIZEMISMATCH && size == RMP_PG_SIZE_2M) {
- unsigned long vaddr_end = vaddr + PMD_SIZE;
-
- for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
- rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
- if (rc)
- break;
- }
- }
-
- if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
- }
-}
-
static void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op)
{
@@ -796,72 +764,6 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
WARN(1, "invalid memory op %d\n", op);
}

-static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc)
-{
- int cur_entry, end_entry, ret = 0;
- struct snp_psc_desc *data;
- struct es_em_ctxt ctxt;
-
- vc_ghcb_invalidate(ghcb);
-
- /* Copy the input desc into GHCB shared buffer */
- data = (struct snp_psc_desc *)ghcb->shared_buffer;
- memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof(*desc)));
-
- /*
- * As per the GHCB specification, the hypervisor can resume the guest
- * before processing all the entries. Check whether all the entries
- * are processed. If not, then keep retrying. Note, the hypervisor
- * will update the data memory directly to indicate the status, so
- * reference the data->hdr everywhere.
- *
- * The strategy here is to wait for the hypervisor to change the page
- * state in the RMP table before guest accesses the memory pages. If the
- * page state change was not successful, then later memory access will
- * result in a crash.
- */
- cur_entry = data->hdr.cur_entry;
- end_entry = data->hdr.end_entry;
-
- while (data->hdr.cur_entry <= data->hdr.end_entry) {
- ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
-
- /* This will advance the shared buffer data points to. */
- ret = sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0);
-
- /*
- * Page State Change VMGEXIT can pass error code through
- * exit_info_2.
- */
- if (WARN(ret || ghcb->save.sw_exit_info_2,
- "SNP: PSC failed ret=%d exit_info_2=%llx\n",
- ret, ghcb->save.sw_exit_info_2)) {
- ret = 1;
- goto out;
- }
-
- /* Verify that reserved bit is not set */
- if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n")) {
- ret = 1;
- goto out;
- }
-
- /*
- * Sanity check that entry processing is not going backwards.
- * This will happen only if hypervisor is tricking us.
- */
- if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_entry,
-"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
- end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) {
- ret = 1;
- goto out;
- }
- }
-
-out:
- return ret;
-}
-
static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
unsigned long vaddr_end, int op)
{
@@ -966,6 +868,20 @@ void snp_set_memory_private(unsigned long vaddr, unsigned long npages)
set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
}

+void snp_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ unsigned long vaddr;
+ unsigned int npages;
+
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ vaddr = (unsigned long)__va(start);
+ npages = (end - start) >> PAGE_SHIFT;
+
+ set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
+}
+
static int snp_set_vmsa(void *va, bool vmsa)
{
u64 attrs;
--
2.40.0


2023-05-19 16:40:02

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH 6/6] x86/efi: Safely enable unaccepted memory in UEFI

From: Dionna Glaze <[email protected]>

The UEFI v2.9 specification includes a new memory type to be used in
environments where the OS must accept memory that is provided from its
host. Before the introduction of this memory type, all memory was
accepted eagerly in the firmware. In order for the firmware to safely
stop accepting memory on the OS's behalf, the OS must affirmatively
indicate support to the firmware. This is only a problem for AMD
SEV-SNP, since Linux has had support for it since 5.19. The other
technology that can make use of unaccepted memory, Intel TDX, does not
yet have Linux support, so it can strictly require unaccepted memory
support as a dependency of CONFIG_TDX and not require communication with
the firmware.

Enabling unaccepted memory requires calling a 0-argument enablement
protocol before ExitBootServices. This call is only made if the kernel
is compiled with UNACCEPTED_MEMORY=y

This protocol will be removed after the end of life of the first LTS
that includes it, in order to give firmware implementations an
expiration date for it. When the protocol is removed, firmware will
strictly infer that a SEV-SNP VM is running an OS that supports the
unaccepted memory type. At the earliest convenience, when unaccepted
memory support is added to Linux, SEV-SNP may take strict dependence in
it. After the firmware removes support for the protocol, this patch
should be reverted.

[tl: address some checkscript warnings]

Cc: Ard Biescheuvel <[email protected]>
Cc: "Min M. Xu" <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Jiewen Yao <[email protected]>
Cc: Erdem Aktas <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Signed-off-by: Dionna Glaze <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 36 +++++++++++++++++++++++++
include/linux/efi.h | 3 +++
2 files changed, 39 insertions(+)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 8d17cee8b98e..e2193dbe1f66 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -26,6 +26,17 @@ const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;

+typedef union sev_memory_acceptance_protocol sev_memory_acceptance_protocol_t;
+union sev_memory_acceptance_protocol {
+ struct {
+ efi_status_t (__efiapi * allow_unaccepted_memory)(
+ sev_memory_acceptance_protocol_t *);
+ };
+ struct {
+ u32 allow_unaccepted_memory;
+ } mixed_mode;
+};
+
static efi_status_t
preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
{
@@ -310,6 +321,29 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
#endif
}

+static void setup_unaccepted_memory(void)
+{
+ efi_guid_t mem_acceptance_proto = OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID;
+ sev_memory_acceptance_protocol_t *proto;
+ efi_status_t status;
+
+ if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
+ return;
+
+ /*
+ * Enable unaccepted memory before calling exit boot services in order
+ * for the UEFI to not accept all memory on EBS.
+ */
+ status = efi_bs_call(locate_protocol, &mem_acceptance_proto, NULL,
+ (void **)&proto);
+ if (status != EFI_SUCCESS)
+ return;
+
+ status = efi_call_proto(proto, allow_unaccepted_memory);
+ if (status != EFI_SUCCESS)
+ efi_err("Memory acceptance protocol failed\n");
+}
+
static const efi_char16_t apple[] = L"Apple";

static void setup_quirks(struct boot_params *boot_params,
@@ -908,6 +942,8 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,

setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);

+ setup_unaccepted_memory();
+
status = exit_boot(boot_params, handle);
if (status != EFI_SUCCESS) {
efi_err("exit_boot() failed!\n");
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 9864f9c00da2..8c5abcf70a05 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -437,6 +437,9 @@ void efi_native_runtime_setup(void);
#define DELLEMC_EFI_RCI2_TABLE_GUID EFI_GUID(0x2d9f28a2, 0xa886, 0x456a, 0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
#define AMD_SEV_MEM_ENCRYPT_GUID EFI_GUID(0x0cf29b71, 0x9e51, 0x433a, 0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)

+/* OVMF protocol GUIDs */
+#define OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID EFI_GUID(0xc5a010fe, 0x38a7, 0x4531, 0x8a, 0x4a, 0x05, 0x00, 0xd2, 0xfd, 0x16, 0x49)
+
typedef struct {
efi_guid_t guid;
u64 table;
--
2.40.0


2023-05-19 16:42:47

by Tom Lendacky

[permalink] [raw]
Subject: [RESEND PATCH v8 0/6] Provide SEV-SNP support for unaccepted memory

This series adds SEV-SNP support for unaccepted memory to the patch series
titled:

[PATCHv12 0/9] mm, x86/cc, efi: Implement support for unaccepted memory

Currently, when changing the state of a page under SNP, the page state
change structure is kmalloc()'d. This leads to hangs during boot when
accepting memory because the allocation can trigger the need to accept
more memory. Additionally, the page state change operations are not
optimized under Linux since it was expected that all memory has been
validated already, resulting in poor performance when adding basic
support for unaccepted memory.

This series consists of six patches:
- One pre-patch fix which can be taken regardless of this series.

- A pre-patch to switch from a kmalloc()'d page state change structure
to a (smaller) stack-based page state change structure.

- A pre-patch to allow the use of the early boot GHCB in the core kernel
path.

- A pre-patch to allow for use of 2M page state change requests and 2M
page validation.

- SNP support for unaccepted memory.

- EFI support to inform EFI/OVMF to not accept all memory on exit boot
services (from Dionna Glaze <[email protected]>)

The series is based off of and tested against Kirill Shutemov's tree:
https://github.com/intel/tdx.git guest-unaccepted-memory

---

Changes since v8:
- Rebase on updated unaccepted memory series

Changes since v7:
- Rebase on update unaccepted memory series

Changes since v6:
- Add missing Kconfig dependency on EFI_STUB.
- Include an EFI patch to tell OVMF to not accept all memory on the
exit boot services call (from Dionna Glaze <[email protected]>)

Changes since v5:
- Replace the two fix patches with a single fix patch that uses a new
function signature to address the problem instead of using casting.
- Adjust the #define WARN definition in arch/x86/kernel/sev-shared.c to
allow use from arch/x86/boot/compressed path without moving them outside
of the if statements.

Changes since v4:
- Two fixes for when an unsigned int used as the number of pages to
process, it needs to be converted to an unsigned long before being
used to calculate ending addresses, otherwise a value >= 0x100000
results in adding 0 in the calculation.
- Commit message and comment updates.

Changes since v3:
- Reworks the PSC process to greatly improve performance:
- Optimize the PSC process to use 2M pages when applicable.
- Optimize the page validation process to use 2M pages when applicable.
- Use the early GHCB in both the decompression phase and core kernel
boot phase in order to minimize the use of the MSR protocol. The MSR
protocol only allows for a single 4K page to be updated at a time.
- Move the ghcb_percpu_ready flag into the sev_config structure and
rename it to ghcbs_initialized.

Changes since v2:
- Improve code comments in regards to when to use the per-CPU GHCB vs
the MSR protocol and why a single global value is valid for both
the BSP and APs.
- Add a comment related to the number of PSC entries and how it can
impact the size of the struct and, therefore, stack usage.
- Add a WARN_ON_ONCE() for invoking vmgexit_psc() when per-CPU GHCBs
haven't been created or registered, yet.
- Use the compiler support for clearing the PSC struct instead of
issuing memset().

Changes since v1:
- Change from using a per-CPU PSC structure to a (smaller) stack PSC
structure.


Dionna Glaze (1):
x86/efi: Safely enable unaccepted memory in UEFI

Tom Lendacky (5):
x86/sev: Fix calculation of end address based on number of pages
x86/sev: Put PSC struct on the stack in prep for unaccepted memory
support
x86/sev: Allow for use of the early boot GHCB for PSC requests
x86/sev: Use large PSC requests if applicable
x86/sev: Add SNP-specific unaccepted memory support

arch/x86/Kconfig | 2 +
arch/x86/boot/compressed/mem.c | 3 +
arch/x86/boot/compressed/sev.c | 54 ++++-
arch/x86/boot/compressed/sev.h | 23 ++
arch/x86/include/asm/sev-common.h | 9 +-
arch/x86/include/asm/sev.h | 23 +-
arch/x86/include/asm/unaccepted_memory.h | 3 +
arch/x86/kernel/sev-shared.c | 103 +++++++++
arch/x86/kernel/sev.c | 256 ++++++++++-------------
drivers/firmware/efi/libstub/x86-stub.c | 36 ++++
include/linux/efi.h | 3 +
11 files changed, 356 insertions(+), 159 deletions(-)
create mode 100644 arch/x86/boot/compressed/sev.h

--
2.40.0


2023-05-19 16:43:01

by Tom Lendacky

[permalink] [raw]
Subject: [RESEND PATCH v8 6/6] x86/efi: Safely enable unaccepted memory in UEFI

From: Dionna Glaze <[email protected]>

The UEFI v2.9 specification includes a new memory type to be used in
environments where the OS must accept memory that is provided from its
host. Before the introduction of this memory type, all memory was
accepted eagerly in the firmware. In order for the firmware to safely
stop accepting memory on the OS's behalf, the OS must affirmatively
indicate support to the firmware. This is only a problem for AMD
SEV-SNP, since Linux has had support for it since 5.19. The other
technology that can make use of unaccepted memory, Intel TDX, does not
yet have Linux support, so it can strictly require unaccepted memory
support as a dependency of CONFIG_TDX and not require communication with
the firmware.

Enabling unaccepted memory requires calling a 0-argument enablement
protocol before ExitBootServices. This call is only made if the kernel
is compiled with UNACCEPTED_MEMORY=y

This protocol will be removed after the end of life of the first LTS
that includes it, in order to give firmware implementations an
expiration date for it. When the protocol is removed, firmware will
strictly infer that a SEV-SNP VM is running an OS that supports the
unaccepted memory type. At the earliest convenience, when unaccepted
memory support is added to Linux, SEV-SNP may take strict dependence in
it. After the firmware removes support for the protocol, this patch
should be reverted.

[tl: address some checkscript warnings]

Cc: Ard Biescheuvel <[email protected]>
Cc: "Min M. Xu" <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Jiewen Yao <[email protected]>
Cc: Erdem Aktas <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Signed-off-by: Dionna Glaze <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 36 +++++++++++++++++++++++++
include/linux/efi.h | 3 +++
2 files changed, 39 insertions(+)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 8d17cee8b98e..e2193dbe1f66 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -26,6 +26,17 @@ const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;

+typedef union sev_memory_acceptance_protocol sev_memory_acceptance_protocol_t;
+union sev_memory_acceptance_protocol {
+ struct {
+ efi_status_t (__efiapi * allow_unaccepted_memory)(
+ sev_memory_acceptance_protocol_t *);
+ };
+ struct {
+ u32 allow_unaccepted_memory;
+ } mixed_mode;
+};
+
static efi_status_t
preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
{
@@ -310,6 +321,29 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
#endif
}

+static void setup_unaccepted_memory(void)
+{
+ efi_guid_t mem_acceptance_proto = OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID;
+ sev_memory_acceptance_protocol_t *proto;
+ efi_status_t status;
+
+ if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
+ return;
+
+ /*
+ * Enable unaccepted memory before calling exit boot services in order
+ * for the UEFI to not accept all memory on EBS.
+ */
+ status = efi_bs_call(locate_protocol, &mem_acceptance_proto, NULL,
+ (void **)&proto);
+ if (status != EFI_SUCCESS)
+ return;
+
+ status = efi_call_proto(proto, allow_unaccepted_memory);
+ if (status != EFI_SUCCESS)
+ efi_err("Memory acceptance protocol failed\n");
+}
+
static const efi_char16_t apple[] = L"Apple";

static void setup_quirks(struct boot_params *boot_params,
@@ -908,6 +942,8 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,

setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);

+ setup_unaccepted_memory();
+
status = exit_boot(boot_params, handle);
if (status != EFI_SUCCESS) {
efi_err("exit_boot() failed!\n");
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 9864f9c00da2..8c5abcf70a05 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -437,6 +437,9 @@ void efi_native_runtime_setup(void);
#define DELLEMC_EFI_RCI2_TABLE_GUID EFI_GUID(0x2d9f28a2, 0xa886, 0x456a, 0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
#define AMD_SEV_MEM_ENCRYPT_GUID EFI_GUID(0x0cf29b71, 0x9e51, 0x433a, 0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)

+/* OVMF protocol GUIDs */
+#define OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID EFI_GUID(0xc5a010fe, 0x38a7, 0x4531, 0x8a, 0x4a, 0x05, 0x00, 0xd2, 0xfd, 0x16, 0x49)
+
typedef struct {
efi_guid_t guid;
u64 table;
--
2.40.0


2023-05-19 16:45:58

by Tom Lendacky

[permalink] [raw]
Subject: [RESEND PATCH v8 5/6] x86/sev: Add SNP-specific unaccepted memory support

Add SNP-specific hooks to the unaccepted memory support in the boot
path (__accept_memory()) and the core kernel (accept_memory()) in order
to support booting SNP guests when unaccepted memory is present. Without
this support, SNP guests will fail to boot and/or panic() when unaccepted
memory is present in the EFI memory map.

The process of accepting memory under SNP involves invoking the hypervisor
to perform a page state change for the page to private memory and then
issuing a PVALIDATE instruction to accept the page.

Since the boot path and the core kernel paths perform similar operations,
move the pvalidate_pages() and vmgexit_psc() functions into sev-shared.c
to avoid code duplication.

Create the new header file arch/x86/boot/compressed/sev.h because adding
the function declaration to any of the existing SEV related header files
pulls in too many other header files, causing the build to fail.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/Kconfig | 2 +
arch/x86/boot/compressed/mem.c | 3 +
arch/x86/boot/compressed/sev.c | 54 ++++++++++-
arch/x86/boot/compressed/sev.h | 23 +++++
arch/x86/include/asm/sev.h | 3 +
arch/x86/include/asm/unaccepted_memory.h | 3 +
arch/x86/kernel/sev-shared.c | 103 +++++++++++++++++++++
arch/x86/kernel/sev.c | 112 +++--------------------
8 files changed, 204 insertions(+), 99 deletions(-)
create mode 100644 arch/x86/boot/compressed/sev.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c72067c06d4..b9c451f75d5e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1543,11 +1543,13 @@ config X86_MEM_ENCRYPT
config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+ depends on EFI_STUB
select DMA_COHERENT_POOL
select ARCH_USE_MEMREMAP_PROT
select INSTRUCTION_DECODER
select ARCH_HAS_CC_PLATFORM
select X86_MEM_ENCRYPT
+ select UNACCEPTED_MEMORY
help
Say yes to enable support for the encryption of system memory.
This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
index 8df3d988ae69..c8f2353f6894 100644
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -3,6 +3,7 @@
#include "error.h"
#include "misc.h"
#include "tdx.h"
+#include "sev.h"
#include <asm/shared/tdx.h>

/*
@@ -36,6 +37,8 @@ void arch_accept_memory(phys_addr_t start, phys_addr_t end)
/* Platform-specific memory-acceptance call goes here */
if (early_is_tdx_guest())
tdx_accept_memory(start, end);
+ else if (sev_snp_enabled())
+ snp_accept_memory(start, end);
else
error("Cannot accept memory: unknown platform\n");
}
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 014b89c89088..09dc8c187b3c 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -115,7 +115,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
/* Include code for early handlers */
#include "../../kernel/sev-shared.c"

-static inline bool sev_snp_enabled(void)
+bool sev_snp_enabled(void)
{
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}
@@ -181,6 +181,58 @@ static bool early_setup_ghcb(void)
return true;
}

+static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
+ phys_addr_t pa, phys_addr_t pa_end)
+{
+ struct psc_hdr *hdr;
+ struct psc_entry *e;
+ unsigned int i;
+
+ hdr = &desc->hdr;
+ memset(hdr, 0, sizeof(*hdr));
+
+ e = desc->entries;
+
+ i = 0;
+ while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
+ hdr->end_entry = i;
+
+ e->gfn = pa >> PAGE_SHIFT;
+ e->operation = SNP_PAGE_STATE_PRIVATE;
+ if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
+ e->pagesize = RMP_PG_SIZE_2M;
+ pa += PMD_SIZE;
+ } else {
+ e->pagesize = RMP_PG_SIZE_4K;
+ pa += PAGE_SIZE;
+ }
+
+ e++;
+ i++;
+ }
+
+ if (vmgexit_psc(boot_ghcb, desc))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ pvalidate_pages(desc);
+
+ return pa;
+}
+
+void snp_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ struct snp_psc_desc desc = {};
+ unsigned int i;
+ phys_addr_t pa;
+
+ if (!boot_ghcb && !early_setup_ghcb())
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ pa = start;
+ while (pa < end)
+ pa = __snp_accept_memory(&desc, pa, end);
+}
+
void sev_es_shutdown_ghcb(void)
{
if (!boot_ghcb)
diff --git a/arch/x86/boot/compressed/sev.h b/arch/x86/boot/compressed/sev.h
new file mode 100644
index 000000000000..fc725a981b09
--- /dev/null
+++ b/arch/x86/boot/compressed/sev.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD SEV header for early boot related functions.
+ *
+ * Author: Tom Lendacky <[email protected]>
+ */
+
+#ifndef BOOT_COMPRESSED_SEV_H
+#define BOOT_COMPRESSED_SEV_H
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+bool sev_snp_enabled(void);
+void snp_accept_memory(phys_addr_t start, phys_addr_t end);
+
+#else
+
+static inline bool sev_snp_enabled(void) { return false; }
+static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
+
+#endif
+
+#endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index e21e1c5397c1..86e1296e87f5 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -206,6 +206,7 @@ void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
+void snp_accept_memory(phys_addr_t start, phys_addr_t end);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -229,6 +230,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
{
return -ENOTTY;
}
+
+static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
#endif

#endif
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
index 72b354f992bb..ed3fcd3ac9dd 100644
--- a/arch/x86/include/asm/unaccepted_memory.h
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -3,12 +3,15 @@

#include <linux/efi.h>
#include <asm/tdx.h>
+#include <asm/sev.h>

static inline void arch_accept_memory(phys_addr_t start, phys_addr_t end)
{
/* Platform-specific memory-acceptance call goes here */
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
tdx_accept_memory(start, end);
+ } else if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
+ snp_accept_memory(start, end);
} else {
panic("Cannot accept memory: unknown platform\n");
}
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 3a5b0c9c4fcc..be312db48a49 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -12,6 +12,9 @@
#ifndef __BOOT_COMPRESSED
#define error(v) pr_err(v)
#define has_cpuflag(f) boot_cpu_has(f)
+#else
+#undef WARN
+#define WARN(condition, format...) (!!(condition))
#endif

/* I/O parameters for CPUID-related helpers */
@@ -991,3 +994,103 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
cpuid_ext_range_max = fn->eax;
}
}
+
+static void pvalidate_pages(struct snp_psc_desc *desc)
+{
+ struct psc_entry *e;
+ unsigned long vaddr;
+ unsigned int size;
+ unsigned int i;
+ bool validate;
+ int rc;
+
+ for (i = 0; i <= desc->hdr.end_entry; i++) {
+ e = &desc->entries[i];
+
+ vaddr = (unsigned long)pfn_to_kaddr(e->gfn);
+ size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
+ validate = (e->operation == SNP_PAGE_STATE_PRIVATE) ? true : false;
+
+ rc = pvalidate(vaddr, size, validate);
+ if (rc == PVALIDATE_FAIL_SIZEMISMATCH && size == RMP_PG_SIZE_2M) {
+ unsigned long vaddr_end = vaddr + PMD_SIZE;
+
+ for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
+ rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
+ if (rc)
+ break;
+ }
+ }
+
+ if (rc) {
+ WARN(1, "Failed to validate address 0x%lx ret %d", vaddr, rc);
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
+ }
+ }
+}
+
+static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc)
+{
+ int cur_entry, end_entry, ret = 0;
+ struct snp_psc_desc *data;
+ struct es_em_ctxt ctxt;
+
+ vc_ghcb_invalidate(ghcb);
+
+ /* Copy the input desc into GHCB shared buffer */
+ data = (struct snp_psc_desc *)ghcb->shared_buffer;
+ memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof(*desc)));
+
+ /*
+ * As per the GHCB specification, the hypervisor can resume the guest
+ * before processing all the entries. Check whether all the entries
+ * are processed. If not, then keep retrying. Note, the hypervisor
+ * will update the data memory directly to indicate the status, so
+ * reference the data->hdr everywhere.
+ *
+ * The strategy here is to wait for the hypervisor to change the page
+ * state in the RMP table before guest accesses the memory pages. If the
+ * page state change was not successful, then later memory access will
+ * result in a crash.
+ */
+ cur_entry = data->hdr.cur_entry;
+ end_entry = data->hdr.end_entry;
+
+ while (data->hdr.cur_entry <= data->hdr.end_entry) {
+ ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+ /* This will advance the shared buffer data points to. */
+ ret = sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0);
+
+ /*
+ * Page State Change VMGEXIT can pass error code through
+ * exit_info_2.
+ */
+ if (WARN(ret || ghcb->save.sw_exit_info_2,
+ "SNP: PSC failed ret=%d exit_info_2=%llx\n",
+ ret, ghcb->save.sw_exit_info_2)) {
+ ret = 1;
+ goto out;
+ }
+
+ /* Verify that reserved bit is not set */
+ if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n")) {
+ ret = 1;
+ goto out;
+ }
+
+ /*
+ * Sanity check that entry processing is not going backwards.
+ * This will happen only if hypervisor is tricking us.
+ */
+ if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_entry,
+"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
+ end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) {
+ ret = 1;
+ goto out;
+ }
+ }
+
+out:
+ return ret;
+}
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 8802a75e1c20..ea2546e5130f 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -657,38 +657,6 @@ static u64 __init get_jump_table_addr(void)
return ret;
}

-static void pvalidate_pages(struct snp_psc_desc *desc)
-{
- struct psc_entry *e;
- unsigned long vaddr;
- unsigned int size;
- unsigned int i;
- bool validate;
- int rc;
-
- for (i = 0; i <= desc->hdr.end_entry; i++) {
- e = &desc->entries[i];
-
- vaddr = (unsigned long)pfn_to_kaddr(e->gfn);
- size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
- validate = (e->operation == SNP_PAGE_STATE_PRIVATE) ? true : false;
-
- rc = pvalidate(vaddr, size, validate);
- if (rc == PVALIDATE_FAIL_SIZEMISMATCH && size == RMP_PG_SIZE_2M) {
- unsigned long vaddr_end = vaddr + PMD_SIZE;
-
- for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
- rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
- if (rc)
- break;
- }
- }
-
- if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
- }
-}
-
static void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op)
{
@@ -796,72 +764,6 @@ void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op
WARN(1, "invalid memory op %d\n", op);
}

-static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc)
-{
- int cur_entry, end_entry, ret = 0;
- struct snp_psc_desc *data;
- struct es_em_ctxt ctxt;
-
- vc_ghcb_invalidate(ghcb);
-
- /* Copy the input desc into GHCB shared buffer */
- data = (struct snp_psc_desc *)ghcb->shared_buffer;
- memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof(*desc)));
-
- /*
- * As per the GHCB specification, the hypervisor can resume the guest
- * before processing all the entries. Check whether all the entries
- * are processed. If not, then keep retrying. Note, the hypervisor
- * will update the data memory directly to indicate the status, so
- * reference the data->hdr everywhere.
- *
- * The strategy here is to wait for the hypervisor to change the page
- * state in the RMP table before guest accesses the memory pages. If the
- * page state change was not successful, then later memory access will
- * result in a crash.
- */
- cur_entry = data->hdr.cur_entry;
- end_entry = data->hdr.end_entry;
-
- while (data->hdr.cur_entry <= data->hdr.end_entry) {
- ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
-
- /* This will advance the shared buffer data points to. */
- ret = sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0);
-
- /*
- * Page State Change VMGEXIT can pass error code through
- * exit_info_2.
- */
- if (WARN(ret || ghcb->save.sw_exit_info_2,
- "SNP: PSC failed ret=%d exit_info_2=%llx\n",
- ret, ghcb->save.sw_exit_info_2)) {
- ret = 1;
- goto out;
- }
-
- /* Verify that reserved bit is not set */
- if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n")) {
- ret = 1;
- goto out;
- }
-
- /*
- * Sanity check that entry processing is not going backwards.
- * This will happen only if hypervisor is tricking us.
- */
- if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_entry,
-"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (got %d)\n",
- end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) {
- ret = 1;
- goto out;
- }
- }
-
-out:
- return ret;
-}
-
static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
unsigned long vaddr_end, int op)
{
@@ -966,6 +868,20 @@ void snp_set_memory_private(unsigned long vaddr, unsigned long npages)
set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
}

+void snp_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ unsigned long vaddr;
+ unsigned int npages;
+
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ vaddr = (unsigned long)__va(start);
+ npages = (end - start) >> PAGE_SHIFT;
+
+ set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
+}
+
static int snp_set_vmsa(void *va, bool vmsa)
{
u64 attrs;
--
2.40.0


2023-05-19 16:46:18

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH 4/6] x86/sev: Use large PSC requests if applicable

In advance of providing support for unaccepted memory, request 2M Page
State Change (PSC) requests when the address range allows for it. By using
a 2M page size, more PSC operations can be handled in a single request to
the hypervisor. The hypervisor will determine if it can accommodate the
larger request by checking the mapping in the nested page table. If mapped
as a large page, then the 2M page request can be performed, otherwise the
2M page request will be broken down into 512 4K page requests. This is
still more efficient than having the guest perform multiple PSC requests
in order to process the 512 4K pages.

In conjunction with the 2M PSC requests, attempt to perform the associated
PVALIDATE instruction of the page using the 2M page size. If PVALIDATE
fails with a size mismatch, then fallback to validating 512 4K pages. To
do this, page validation is modified to work with the PSC structure and
not just a virtual address range.

Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/sev.h | 4 ++
arch/x86/kernel/sev.c | 125 ++++++++++++++++++++++++-------------
2 files changed, 84 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 7ca5c9ec8b52..e21e1c5397c1 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -80,11 +80,15 @@ extern void vc_no_ghcb(void);
extern void vc_boot_ghcb(void);
extern bool handle_vc_boot_ghcb(struct pt_regs *regs);

+/* PVALIDATE return codes */
+#define PVALIDATE_FAIL_SIZEMISMATCH 6
+
/* Software defined (when rFlags.CF = 1) */
#define PVALIDATE_FAIL_NOUPDATE 255

/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMP_PG_SIZE_2M 1

#define RMPADJUST_VMSA_PAGE_BIT BIT(16)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 973756c89dac..8802a75e1c20 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -657,32 +657,58 @@ static u64 __init get_jump_table_addr(void)
return ret;
}

-static void pvalidate_pages(unsigned long vaddr, unsigned long npages, bool validate)
+static void pvalidate_pages(struct snp_psc_desc *desc)
{
- unsigned long vaddr_end;
+ struct psc_entry *e;
+ unsigned long vaddr;
+ unsigned int size;
+ unsigned int i;
+ bool validate;
int rc;

- vaddr = vaddr & PAGE_MASK;
- vaddr_end = vaddr + (npages << PAGE_SHIFT);
+ for (i = 0; i <= desc->hdr.end_entry; i++) {
+ e = &desc->entries[i];
+
+ vaddr = (unsigned long)pfn_to_kaddr(e->gfn);
+ size = e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K;
+ validate = (e->operation == SNP_PAGE_STATE_PRIVATE) ? true : false;
+
+ rc = pvalidate(vaddr, size, validate);
+ if (rc == PVALIDATE_FAIL_SIZEMISMATCH && size == RMP_PG_SIZE_2M) {
+ unsigned long vaddr_end = vaddr + PMD_SIZE;
+
+ for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
+ rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
+ if (rc)
+ break;
+ }
+ }

- while (vaddr < vaddr_end) {
- rc = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
if (WARN(rc, "Failed to validate address 0x%lx ret %d", vaddr, rc))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
-
- vaddr = vaddr + PAGE_SIZE;
}
}

-static void early_set_pages_state(unsigned long paddr, unsigned long npages, enum psc_op op)
+static void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
+ unsigned long npages, enum psc_op op)
{
unsigned long paddr_end;
u64 val;
+ int ret;
+
+ vaddr = vaddr & PAGE_MASK;

paddr = paddr & PAGE_MASK;
paddr_end = paddr + (npages << PAGE_SHIFT);

while (paddr < paddr_end) {
+ if (op == SNP_PAGE_STATE_SHARED) {
+ /* Page validation must be rescinded before changing to shared */
+ ret = pvalidate(vaddr, RMP_PG_SIZE_4K, false);
+ if (WARN(ret, "Failed to validate address 0x%lx ret %d", paddr, ret))
+ goto e_term;
+ }
+
/*
* Use the MSR protocol because this function can be called before
* the GHCB is established.
@@ -703,7 +729,15 @@ static void early_set_pages_state(unsigned long paddr, unsigned long npages, enu
paddr, GHCB_MSR_PSC_RESP_VAL(val)))
goto e_term;

- paddr = paddr + PAGE_SIZE;
+ if (op == SNP_PAGE_STATE_PRIVATE) {
+ /* Page validation must be performed after changing to private */
+ ret = pvalidate(vaddr, RMP_PG_SIZE_4K, true);
+ if (WARN(ret, "Failed to validate address 0x%lx ret %d", paddr, ret))
+ goto e_term;
+ }
+
+ vaddr += PAGE_SIZE;
+ paddr += PAGE_SIZE;
}

return;
@@ -728,10 +762,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* Ask the hypervisor to mark the memory pages as private in the RMP
* table.
*/
- early_set_pages_state(paddr, npages, SNP_PAGE_STATE_PRIVATE);
-
- /* Validate the memory pages after they've been added in the RMP table. */
- pvalidate_pages(vaddr, npages, true);
+ early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE);
}

void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
@@ -746,11 +777,8 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
return;

- /* Invalidate the memory pages before they are marked shared in the RMP table. */
- pvalidate_pages(vaddr, npages, false);
-
/* Ask hypervisor to mark the memory pages shared in the RMP table. */
- early_set_pages_state(paddr, npages, SNP_PAGE_STATE_SHARED);
+ early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED);
}

void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op)
@@ -834,10 +862,11 @@ static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc)
return ret;
}

-static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
- unsigned long vaddr_end, int op)
+static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
+ unsigned long vaddr_end, int op)
{
struct ghcb_state state;
+ bool use_large_entry;
struct psc_hdr *hdr;
struct psc_entry *e;
unsigned long flags;
@@ -851,27 +880,37 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
memset(data, 0, sizeof(*data));
i = 0;

- while (vaddr < vaddr_end) {
- if (is_vmalloc_addr((void *)vaddr))
+ while (vaddr < vaddr_end && i < ARRAY_SIZE(data->entries)) {
+ hdr->end_entry = i;
+
+ if (is_vmalloc_addr((void *)vaddr)) {
pfn = vmalloc_to_pfn((void *)vaddr);
- else
+ use_large_entry = false;
+ } else {
pfn = __pa(vaddr) >> PAGE_SHIFT;
+ use_large_entry = true;
+ }

e->gfn = pfn;
e->operation = op;
- hdr->end_entry = i;

- /*
- * Current SNP implementation doesn't keep track of the RMP page
- * size so use 4K for simplicity.
- */
- e->pagesize = RMP_PG_SIZE_4K;
+ if (use_large_entry && IS_ALIGNED(vaddr, PMD_SIZE) &&
+ (vaddr_end - vaddr) >= PMD_SIZE) {
+ e->pagesize = RMP_PG_SIZE_2M;
+ vaddr += PMD_SIZE;
+ } else {
+ e->pagesize = RMP_PG_SIZE_4K;
+ vaddr += PAGE_SIZE;
+ }

- vaddr = vaddr + PAGE_SIZE;
e++;
i++;
}

+ /* Page validation must be rescinded before changing to shared */
+ if (op == SNP_PAGE_STATE_SHARED)
+ pvalidate_pages(data);
+
local_irq_save(flags);

if (sev_cfg.ghcbs_initialized)
@@ -879,6 +918,7 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
else
ghcb = boot_ghcb;

+ /* Invoke the hypervisor to perform the page state changes */
if (!ghcb || vmgexit_psc(ghcb, data))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);

@@ -886,29 +926,28 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
__sev_put_ghcb(&state);

local_irq_restore(flags);
+
+ /* Page validation must be performed after changing to private */
+ if (op == SNP_PAGE_STATE_PRIVATE)
+ pvalidate_pages(data);
+
+ return vaddr;
}

static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
{
- unsigned long vaddr_end, next_vaddr;
struct snp_psc_desc desc;
+ unsigned long vaddr_end;

/* Use the MSR protocol when a GHCB is not available. */
if (!boot_ghcb)
- return early_set_pages_state(__pa(vaddr), npages, op);
+ return early_set_pages_state(vaddr, __pa(vaddr), npages, op);

vaddr = vaddr & PAGE_MASK;
vaddr_end = vaddr + (npages << PAGE_SHIFT);

- while (vaddr < vaddr_end) {
- /* Calculate the last vaddr that fits in one struct snp_psc_desc. */
- next_vaddr = min_t(unsigned long, vaddr_end,
- (VMGEXIT_PSC_MAX_ENTRY * PAGE_SIZE) + vaddr);
-
- __set_pages_state(&desc, vaddr, next_vaddr, op);
-
- vaddr = next_vaddr;
- }
+ while (vaddr < vaddr_end)
+ vaddr = __set_pages_state(&desc, vaddr, vaddr_end, op);
}

void snp_set_memory_shared(unsigned long vaddr, unsigned long npages)
@@ -916,8 +955,6 @@ void snp_set_memory_shared(unsigned long vaddr, unsigned long npages)
if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
return;

- pvalidate_pages(vaddr, npages, false);
-
set_pages_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
}

@@ -927,8 +964,6 @@ void snp_set_memory_private(unsigned long vaddr, unsigned long npages)
return;

set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
-
- pvalidate_pages(vaddr, npages, true);
}

static int snp_set_vmsa(void *va, bool vmsa)
--
2.40.0


2023-05-19 16:46:39

by Tom Lendacky

[permalink] [raw]
Subject: [PATCH 1/6] x86/sev: Fix calculation of end address based on number of pages

When calculating an end address based on an unsigned int number of pages,
any value greater than or equal to 0x100000 that is shift PAGE_SHIFT bits
results in a 0 value, resulting in an invalid end address. Change the
number of pages variable in various routines from an unsigned int to an
unsigned long to calculate the end address correctly.

Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes")
Fixes: dc3f3d2474b8 ("x86/mm: Validate memory when changing the C-bit")
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/sev.h | 16 ++++++++--------
arch/x86/kernel/sev.c | 14 +++++++-------
2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 13dc2a9d23c1..7ca5c9ec8b52 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -192,12 +192,12 @@ struct snp_guest_request_ioctl;

void setup_ghcb(void);
void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
- unsigned int npages);
+ unsigned long npages);
void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
- unsigned int npages);
+ unsigned long npages);
void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
-void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
-void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
+void snp_set_memory_shared(unsigned long vaddr, unsigned long npages);
+void snp_set_memory_private(unsigned long vaddr, unsigned long npages);
void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
@@ -212,12 +212,12 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
static inline void setup_ghcb(void) { }
static inline void __init
-early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages) { }
static inline void __init
-early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
+early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages) { }
static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
-static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
-static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
+static inline void snp_set_memory_shared(unsigned long vaddr, unsigned long npages) { }
+static inline void snp_set_memory_private(unsigned long vaddr, unsigned long npages) { }
static inline void snp_set_wakeup_secondary_cpu(void) { }
static inline bool snp_init(struct boot_params *bp) { return false; }
static inline void snp_abort(void) { }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b031244d6d2d..108bbae59c35 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -645,7 +645,7 @@ static u64 __init get_jump_table_addr(void)
return ret;
}

-static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
+static void pvalidate_pages(unsigned long vaddr, unsigned long npages, bool validate)
{
unsigned long vaddr_end;
int rc;
@@ -662,7 +662,7 @@ static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool valid
}
}

-static void __init early_set_pages_state(unsigned long paddr, unsigned int npages, enum psc_op op)
+static void __init early_set_pages_state(unsigned long paddr, unsigned long npages, enum psc_op op)
{
unsigned long paddr_end;
u64 val;
@@ -701,7 +701,7 @@ static void __init early_set_pages_state(unsigned long paddr, unsigned int npage
}

void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
- unsigned int npages)
+ unsigned long npages)
{
/*
* This can be invoked in early boot while running identity mapped, so
@@ -723,7 +723,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
}

void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
- unsigned int npages)
+ unsigned long npages)
{
/*
* This can be invoked in early boot while running identity mapped, so
@@ -879,7 +879,7 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
}

-static void set_pages_state(unsigned long vaddr, unsigned int npages, int op)
+static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
{
unsigned long vaddr_end, next_vaddr;
struct snp_psc_desc *desc;
@@ -904,7 +904,7 @@ static void set_pages_state(unsigned long vaddr, unsigned int npages, int op)
kfree(desc);
}

-void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
+void snp_set_memory_shared(unsigned long vaddr, unsigned long npages)
{
if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
return;
@@ -914,7 +914,7 @@ void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
set_pages_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
}

-void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
+void snp_set_memory_private(unsigned long vaddr, unsigned long npages)
{
if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
return;
--
2.40.0


2023-05-19 17:06:56

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH 1/6] x86/sev: Fix calculation of end address based on number of pages

Hit send by mistake so I missed adding the cover letter and version
update. Resending...

Thanks,
Tom

On 5/19/23 11:24, Tom Lendacky wrote:
> When calculating an end address based on an unsigned int number of pages,
> any value greater than or equal to 0x100000 that is shift PAGE_SHIFT bits
> results in a 0 value, resulting in an invalid end address. Change the
> number of pages variable in various routines from an unsigned int to an
> unsigned long to calculate the end address correctly.
>
> Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes")
> Fixes: dc3f3d2474b8 ("x86/mm: Validate memory when changing the C-bit")
> Signed-off-by: Tom Lendacky <[email protected]>
> ---
> arch/x86/include/asm/sev.h | 16 ++++++++--------
> arch/x86/kernel/sev.c | 14 +++++++-------
> 2 files changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 13dc2a9d23c1..7ca5c9ec8b52 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -192,12 +192,12 @@ struct snp_guest_request_ioctl;
>
> void setup_ghcb(void);
> void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
> - unsigned int npages);
> + unsigned long npages);
> void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
> - unsigned int npages);
> + unsigned long npages);
> void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op);
> -void snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
> -void snp_set_memory_private(unsigned long vaddr, unsigned int npages);
> +void snp_set_memory_shared(unsigned long vaddr, unsigned long npages);
> +void snp_set_memory_private(unsigned long vaddr, unsigned long npages);
> void snp_set_wakeup_secondary_cpu(void);
> bool snp_init(struct boot_params *bp);
> void __init __noreturn snp_abort(void);
> @@ -212,12 +212,12 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
> static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
> static inline void setup_ghcb(void) { }
> static inline void __init
> -early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
> +early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, unsigned long npages) { }
> static inline void __init
> -early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned int npages) { }
> +early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned long npages) { }
> static inline void __init snp_prep_memory(unsigned long paddr, unsigned int sz, enum psc_op op) { }
> -static inline void snp_set_memory_shared(unsigned long vaddr, unsigned int npages) { }
> -static inline void snp_set_memory_private(unsigned long vaddr, unsigned int npages) { }
> +static inline void snp_set_memory_shared(unsigned long vaddr, unsigned long npages) { }
> +static inline void snp_set_memory_private(unsigned long vaddr, unsigned long npages) { }
> static inline void snp_set_wakeup_secondary_cpu(void) { }
> static inline bool snp_init(struct boot_params *bp) { return false; }
> static inline void snp_abort(void) { }
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index b031244d6d2d..108bbae59c35 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -645,7 +645,7 @@ static u64 __init get_jump_table_addr(void)
> return ret;
> }
>
> -static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool validate)
> +static void pvalidate_pages(unsigned long vaddr, unsigned long npages, bool validate)
> {
> unsigned long vaddr_end;
> int rc;
> @@ -662,7 +662,7 @@ static void pvalidate_pages(unsigned long vaddr, unsigned int npages, bool valid
> }
> }
>
> -static void __init early_set_pages_state(unsigned long paddr, unsigned int npages, enum psc_op op)
> +static void __init early_set_pages_state(unsigned long paddr, unsigned long npages, enum psc_op op)
> {
> unsigned long paddr_end;
> u64 val;
> @@ -701,7 +701,7 @@ static void __init early_set_pages_state(unsigned long paddr, unsigned int npage
> }
>
> void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
> - unsigned int npages)
> + unsigned long npages)
> {
> /*
> * This can be invoked in early boot while running identity mapped, so
> @@ -723,7 +723,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
> }
>
> void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
> - unsigned int npages)
> + unsigned long npages)
> {
> /*
> * This can be invoked in early boot while running identity mapped, so
> @@ -879,7 +879,7 @@ static void __set_pages_state(struct snp_psc_desc *data, unsigned long vaddr,
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
> }
>
> -static void set_pages_state(unsigned long vaddr, unsigned int npages, int op)
> +static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
> {
> unsigned long vaddr_end, next_vaddr;
> struct snp_psc_desc *desc;
> @@ -904,7 +904,7 @@ static void set_pages_state(unsigned long vaddr, unsigned int npages, int op)
> kfree(desc);
> }
>
> -void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
> +void snp_set_memory_shared(unsigned long vaddr, unsigned long npages)
> {
> if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> return;
> @@ -914,7 +914,7 @@ void snp_set_memory_shared(unsigned long vaddr, unsigned int npages)
> set_pages_state(vaddr, npages, SNP_PAGE_STATE_SHARED);
> }
>
> -void snp_set_memory_private(unsigned long vaddr, unsigned int npages)
> +void snp_set_memory_private(unsigned long vaddr, unsigned long npages)
> {
> if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
> return;