2023-06-01 18:39:13

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv13 0/9] mm, x86/cc, efi: Implement support for unaccepted memory

UEFI Specification version 2.9 introduces the concept of memory
acceptance: some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, requiring memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific for the Virtual
Machine platform.

Accepting memory is costly and it makes VMM allocate memory for the
accepted guest physical address range. It's better to postpone memory
acceptance until memory is needed. It lowers boot time and reduces
memory overhead.

The kernel needs to know what memory has been accepted. Firmware
communicates this information via memory map: a new memory type --
EFI_UNACCEPTED_MEMORY -- indicates such memory.

Range-based tracking works fine for firmware, but it gets bulky for
the kernel: e820 has to be modified on every page acceptance. It leads
to table fragmentation, but there's a limited number of entries in the
e820 table

Another option is to mark such memory as usable in e820 and track if the
range has been accepted in a bitmap. One bit in the bitmap represents
2MiB in the address space: one 4k page is enough to track 64GiB or
physical address space.

In the worst-case scenario -- a huge hole in the middle of the
address space -- It needs 256MiB to handle 4PiB of the address
space.

Any unaccepted memory that is not aligned to 2M gets accepted upfront.

The approach lowers boot time substantially. Boot to shell is ~2.5x
faster for 4G TDX VM and ~4x faster for 64G.

TDX-specific code isolated from the core of unaccepted memory support. It
supposed to help to plug-in different implementation of unaccepted memory
such as SEV-SNP.

-- Fragmentation study --

Vlastimil and Mel were concern about effect of unaccepted memory on
fragmentation prevention measures in page allocator. I tried to evaluate
it, but it is tricky. As suggested I tried to run multiple parallel kernel
builds and follow how often kmem:mm_page_alloc_extfrag gets hit.

See results in the v9 of the patchset[1][2]

[1] https://lore.kernel.org/all/[email protected]
[2] https://lore.kernel.org/all/[email protected]

--

The tree can be found here:

https://github.com/intel/tdx.git guest-unaccepted-memory

v13:
- Fix few boot issues discovered by 0day;
- Simplify tdx_accept_memory(): no need in MAP_GPA hypercall;
- Update commit message for the first patch;
- Add Reviewed-bys from Tom and Ard;
v12:
- Re-initialize 'unaccepted_table' variable from decompressor to cover some
boot scenarios;
- Add missing memblock_reserve() for the unaccepted memory configuration
table (Mika);
- Add efi.unaccepted into efi_tables (Tom);
- Do not build tdx-shared.o for !TDX (Tom);
- Typo fix (Liam)
- Whitespace fix;
- Reviewed-bys from Liam, Tom and Ard;
v11:
- Restructure the code to make it less x86-specific (suggested by Ard):
+ use EFI configuration table instead of zero-page to pass down bitmap;
+ do not imply 1bit == 2M in bitmap;
+ move bulk of the code under driver/firmware/efi;
- The bitmap only covers unaccpeted memory now. All memory that is not covered
by the bitmap assumed accepted;
- Reviewed-by from Ard;
v10:
- Restructure code around zones_with_unaccepted_pages static brach to avoid
unnecessary function calls (Suggested by Vlastimil);
- Drop mentions of PageUnaccepted();
- Drop patches that add fake unaccepted memory support and sysfs handle to
accept memory manually;
- Add Reviewed-by from Vlastimil;
v9:
- Accept memory up to high watermark when kernel runs out of free memory;
- Treat unaccepted memory as unusable in __zone_watermark_unusable_free();
- Per-zone unaccepted memory accounting;
- All pages on unaccepted list are MAX_ORDER now;
- accept_memory=eager in cmdline to pre-accept memory during the boot;
- Implement fake unaccepted memory;
- Sysfs handle to accept memory manually;
- Drop PageUnaccepted();
- Rename unaccepted_pages static key to zones_with_unaccepted_pages;
v8:
- Rewrite core-mm support for unaccepted memory (patch 02/14);
- s/UnacceptedPages/Unaccepted/ in meminfo;
- Drop arch/x86/boot/compressed/compiler.h;
- Fix build errors;
- Adjust commit messages and comments;
- Reviewed-bys from Dave and Borislav;
- Rebased to tip/master.
v7:
- Rework meminfo counter to use PageUnaccepted() and move to generic code;
- Fix range_contains_unaccepted_memory() on machines without unaccepted memory;
- Add Reviewed-by from David;
v6:
- Fix load_unaligned_zeropad() on machine with unaccepted memory;
- Clear PageUnaccepted() on merged pages, leaving it only on head;
- Clarify error handling in allocate_e820();
- Fix build with CONFIG_UNACCEPTED_MEMORY=y, but without TDX;
- Disable kexec at boottime instead of build conflict;
- Rebased to tip/master;
- Spelling fixes;
- Add Reviewed-by from Mike and David;
v5:
- Updates comments and commit messages;
+ Explain options for unaccepted memory handling;
- Expose amount of unaccepted memory in /proc/meminfo
- Adjust check in page_expected_state();
- Fix error code handling in allocate_e820();
- Centralize __pa()/__va() definitions in the boot stub;
- Avoid includes from the main kernel in the boot stub;
- Use an existing hole in boot_param for unaccepted_memory, instead of adding
to the end of the structure;
- Extract allocate_unaccepted_memory() form allocate_e820();
- Complain if there's unaccepted memory, but kernel does not support it;
- Fix vmstat counter;
- Split up few preparatory patches;
- Random readability adjustments;
v4:
- PageBuddyUnaccepted() -> PageUnaccepted;
- Use separate page_type, not shared with offline;
- Rework interface between core-mm and arch code;
- Adjust commit messages;
- Ack from Mike;
Kirill A. Shutemov (9):
mm: Add support for unaccepted memory
efi/x86: Get full memory map in allocate_e820()
efi/libstub: Implement support for unaccepted memory
x86/boot/compressed: Handle unaccepted memory
efi: Add unaccepted memory support
efi/unaccepted: Avoid load_unaligned_zeropad() stepping into
unaccepted memory
x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in
boot stub
x86/tdx: Refactor try_accept_one()
x86/tdx: Add unaccepted memory support

arch/x86/Kconfig | 2 +
arch/x86/boot/compressed/Makefile | 3 +-
arch/x86/boot/compressed/efi.h | 10 +
arch/x86/boot/compressed/error.c | 19 ++
arch/x86/boot/compressed/error.h | 1 +
arch/x86/boot/compressed/kaslr.c | 35 ++-
arch/x86/boot/compressed/mem.c | 73 ++++++
arch/x86/boot/compressed/misc.c | 7 +
arch/x86/boot/compressed/misc.h | 6 +
arch/x86/boot/compressed/tdx-shared.c | 2 +
arch/x86/coco/tdx/Makefile | 2 +-
arch/x86/coco/tdx/tdx-shared.c | 71 ++++++
arch/x86/coco/tdx/tdx.c | 102 +-------
arch/x86/include/asm/efi.h | 2 +
arch/x86/include/asm/shared/tdx.h | 53 ++++
arch/x86/include/asm/tdx.h | 19 --
arch/x86/include/asm/unaccepted_memory.h | 24 ++
arch/x86/platform/efi/efi.c | 3 +
drivers/base/node.c | 7 +
drivers/firmware/efi/Kconfig | 14 ++
drivers/firmware/efi/Makefile | 1 +
drivers/firmware/efi/efi.c | 26 ++
drivers/firmware/efi/libstub/Makefile | 2 +
drivers/firmware/efi/libstub/bitmap.c | 41 +++
drivers/firmware/efi/libstub/efistub.h | 6 +
drivers/firmware/efi/libstub/find.c | 43 ++++
.../firmware/efi/libstub/unaccepted_memory.c | 234 ++++++++++++++++++
drivers/firmware/efi/libstub/x86-stub.c | 39 +--
drivers/firmware/efi/unaccepted_memory.c | 138 +++++++++++
fs/proc/meminfo.c | 5 +
include/linux/efi.h | 13 +-
include/linux/mm.h | 19 ++
include/linux/mmzone.h | 8 +
mm/memblock.c | 9 +
mm/mm_init.c | 7 +
mm/page_alloc.c | 173 +++++++++++++
mm/vmstat.c | 3 +
37 files changed, 1074 insertions(+), 148 deletions(-)
create mode 100644 arch/x86/boot/compressed/mem.c
create mode 100644 arch/x86/boot/compressed/tdx-shared.c
create mode 100644 arch/x86/coco/tdx/tdx-shared.c
create mode 100644 arch/x86/include/asm/unaccepted_memory.h
create mode 100644 drivers/firmware/efi/libstub/bitmap.c
create mode 100644 drivers/firmware/efi/libstub/find.c
create mode 100644 drivers/firmware/efi/libstub/unaccepted_memory.c
create mode 100644 drivers/firmware/efi/unaccepted_memory.c

--
2.39.3



2023-06-01 18:50:17

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv13 7/9] x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub

Memory acceptance requires a hypercall and one or multiple module calls.

Make helpers for the calls available in boot stub. It has to accept
memory where kernel image and initrd are placed.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
---
arch/x86/coco/tdx/tdx.c | 32 -------------------
arch/x86/include/asm/shared/tdx.h | 51 +++++++++++++++++++++++++++++++
arch/x86/include/asm/tdx.h | 19 ------------
3 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index e146b599260f..e6f4c2758a68 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -14,20 +14,6 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>

-/* TDX module Call Leaf IDs */
-#define TDX_GET_INFO 1
-#define TDX_GET_VEINFO 3
-#define TDX_GET_REPORT 4
-#define TDX_ACCEPT_PAGE 6
-#define TDX_WR 8
-
-/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
-#define TDCS_NOTIFY_ENABLES 0x9100000000000010
-
-/* TDX hypercall Leaf IDs */
-#define TDVMCALL_MAP_GPA 0x10001
-#define TDVMCALL_REPORT_FATAL_ERROR 0x10003
-
/* MMIO direction */
#define EPT_READ 0
#define EPT_WRITE 1
@@ -51,24 +37,6 @@

#define TDREPORT_SUBTYPE_0 0

-/*
- * Wrapper for standard use of __tdx_hypercall with no output aside from
- * return code.
- */
-static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
-{
- struct tdx_hypercall_args args = {
- .r10 = TDX_HYPERCALL_STANDARD,
- .r11 = fn,
- .r12 = r12,
- .r13 = r13,
- .r14 = r14,
- .r15 = r15,
- };
-
- return __tdx_hypercall(&args);
-}
-
/* Called from __tdx_hypercall() for unrecoverable failure */
noinstr void __tdx_hypercall_failed(void)
{
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 2631e01f6e0f..1ff0ee822961 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -10,6 +10,20 @@
#define TDX_CPUID_LEAF_ID 0x21
#define TDX_IDENT "IntelTDX "

+/* TDX module Call Leaf IDs */
+#define TDX_GET_INFO 1
+#define TDX_GET_VEINFO 3
+#define TDX_GET_REPORT 4
+#define TDX_ACCEPT_PAGE 6
+#define TDX_WR 8
+
+/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
+#define TDCS_NOTIFY_ENABLES 0x9100000000000010
+
+/* TDX hypercall Leaf IDs */
+#define TDVMCALL_MAP_GPA 0x10001
+#define TDVMCALL_REPORT_FATAL_ERROR 0x10003
+
#ifndef __ASSEMBLY__

/*
@@ -37,8 +51,45 @@ struct tdx_hypercall_args {
u64 __tdx_hypercall(struct tdx_hypercall_args *args);
u64 __tdx_hypercall_ret(struct tdx_hypercall_args *args);

+/*
+ * Wrapper for standard use of __tdx_hypercall with no output aside from
+ * return code.
+ */
+static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
+{
+ struct tdx_hypercall_args args = {
+ .r10 = TDX_HYPERCALL_STANDARD,
+ .r11 = fn,
+ .r12 = r12,
+ .r13 = r13,
+ .r14 = r14,
+ .r15 = r15,
+ };
+
+ return __tdx_hypercall(&args);
+}
+
+
/* Called from __tdx_hypercall() for unrecoverable failure */
void __tdx_hypercall_failed(void);

+/*
+ * Used in __tdx_module_call() to gather the output registers' values of the
+ * TDCALL instruction when requesting services from the TDX module. This is a
+ * software only structure and not part of the TDX module/VMM ABI
+ */
+struct tdx_module_output {
+ u64 rcx;
+ u64 rdx;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+};
+
+/* Used to communicate with the TDX module */
+u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+ struct tdx_module_output *out);
+
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_SHARED_TDX_H */
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 28d889c9aa16..234197ec17e4 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -20,21 +20,6 @@

#ifndef __ASSEMBLY__

-/*
- * Used to gather the output registers values of the TDCALL and SEAMCALL
- * instructions when requesting services from the TDX module.
- *
- * This is a software only structure and not part of the TDX module/VMM ABI.
- */
-struct tdx_module_output {
- u64 rcx;
- u64 rdx;
- u64 r8;
- u64 r9;
- u64 r10;
- u64 r11;
-};
-
/*
* Used by the #VE exception handler to gather the #VE exception
* info from the TDX module. This is a software only structure
@@ -55,10 +40,6 @@ struct ve_info {

void __init tdx_early_init(void);

-/* Used to communicate with the TDX module */
-u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
- struct tdx_module_output *out);
-
void tdx_get_ve_info(struct ve_info *ve);

bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);
--
2.39.3


2023-06-01 18:50:18

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv13 2/9] efi/x86: Get full memory map in allocate_e820()

Currently allocate_e820() is only interested in the size of map and size
of memory descriptor to determine how many e820 entries the kernel
needs.

UEFI Specification version 2.9 introduces a new memory type --
unaccepted memory. To track unaccepted memory kernel needs to allocate
a bitmap. The size of the bitmap is dependent on the maximum physical
address present in the system. A full memory map is required to find
the maximum address.

Modify allocate_e820() to get a full memory map.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 26 +++++++++++--------------
1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index a0bfd31358ba..fff81843169c 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -681,28 +681,24 @@ static efi_status_t allocate_e820(struct boot_params *params,
struct setup_data **e820ext,
u32 *e820ext_size)
{
- unsigned long map_size, desc_size, map_key;
+ struct efi_boot_memmap *map;
efi_status_t status;
- __u32 nr_desc, desc_version;
+ __u32 nr_desc;

- /* Only need the size of the mem map and size of each mem descriptor */
- map_size = 0;
- status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key,
- &desc_size, &desc_version);
- if (status != EFI_BUFFER_TOO_SMALL)
- return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED;
+ status = efi_get_memory_map(&map, false);
+ if (status != EFI_SUCCESS)
+ return status;

- nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS;
-
- if (nr_desc > ARRAY_SIZE(params->e820_table)) {
- u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table);
+ nr_desc = map->map_size / map->desc_size;
+ if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) {
+ u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) +
+ EFI_MMAP_NR_SLACK_SLOTS;

status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
- if (status != EFI_SUCCESS)
- return status;
}

- return EFI_SUCCESS;
+ efi_bs_call(free_pool, map);
+ return status;
}

struct exit_boot_struct {
--
2.39.3


2023-06-01 18:50:39

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv13 9/9] x86/tdx: Add unaccepted memory support

Hookup TDX-specific code to accept memory.

Accepting the memory is done with ACCEPT_PAGE module call on every page
in the range. MAP_GPA hypercall is not required as the unaccepted memory
is considered private already.

Extract the part of tdx_enc_status_changed() that does memory acceptance
in a new helper. Move the helper tdx-shared.c. It is going to be used by
both main kernel and decompressor.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/Kconfig | 2 +
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/error.c | 19 +++++++
arch/x86/boot/compressed/error.h | 1 +
arch/x86/boot/compressed/mem.c | 35 +++++++++++-
arch/x86/boot/compressed/tdx-shared.c | 2 +
arch/x86/coco/tdx/Makefile | 2 +-
arch/x86/coco/tdx/tdx-shared.c | 71 ++++++++++++++++++++++++
arch/x86/coco/tdx/tdx.c | 70 +----------------------
arch/x86/include/asm/shared/tdx.h | 2 +
arch/x86/include/asm/unaccepted_memory.h | 24 ++++++++
11 files changed, 160 insertions(+), 70 deletions(-)
create mode 100644 arch/x86/boot/compressed/tdx-shared.c
create mode 100644 arch/x86/coco/tdx/tdx-shared.c
create mode 100644 arch/x86/include/asm/unaccepted_memory.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 53bab123a8ee..5c72067c06d4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -884,9 +884,11 @@ config INTEL_TDX_GUEST
bool "Intel TDX (Trust Domain Extensions) - Guest Support"
depends on X86_64 && CPU_SUP_INTEL
depends on X86_X2APIC
+ depends on EFI_STUB
select ARCH_HAS_CC_PLATFORM
select X86_MEM_ENCRYPT
select X86_MCE
+ select UNACCEPTED_MEMORY
help
Support running as a guest under Intel TDX. Without this support,
the guest kernel can not boot or run under TDX.
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index cc4978123c30..b13a58021086 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -106,7 +106,7 @@ ifdef CONFIG_X86_64
endif

vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
-vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o $(obj)/tdx-shared.o
vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o

vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
diff --git a/arch/x86/boot/compressed/error.c b/arch/x86/boot/compressed/error.c
index c881878e56d3..5313c5cb2b80 100644
--- a/arch/x86/boot/compressed/error.c
+++ b/arch/x86/boot/compressed/error.c
@@ -22,3 +22,22 @@ void error(char *m)
while (1)
asm("hlt");
}
+
+/* EFI libstub provides vsnprintf() */
+#ifdef CONFIG_EFI_STUB
+void panic(const char *fmt, ...)
+{
+ static char buf[1024];
+ va_list args;
+ int len;
+
+ va_start(args, fmt);
+ len = vsnprintf(buf, sizeof(buf), fmt, args);
+ va_end(args);
+
+ if (len && buf[len - 1] == '\n')
+ buf[len - 1] = '\0';
+
+ error(buf);
+}
+#endif
diff --git a/arch/x86/boot/compressed/error.h b/arch/x86/boot/compressed/error.h
index 1de5821184f1..86fe33b93715 100644
--- a/arch/x86/boot/compressed/error.h
+++ b/arch/x86/boot/compressed/error.h
@@ -6,5 +6,6 @@

void warn(char *m);
void error(char *m) __noreturn;
+void panic(const char *fmt, ...) __noreturn __cold;

#endif /* BOOT_COMPRESSED_ERROR_H */
diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
index 4ecf26576a77..d2b6948a7801 100644
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -2,11 +2,44 @@

#include "error.h"
#include "misc.h"
+#include "tdx.h"
+#include <asm/shared/tdx.h>
+
+/*
+ * accept_memory() and process_unaccepted_memory() called from EFI stub which
+ * runs before decompresser and its early_tdx_detect().
+ *
+ * Enumerate TDX directly from the early users.
+ */
+static bool early_is_tdx_guest(void)
+{
+ static bool once;
+ static bool is_tdx;
+
+ if (!IS_ENABLED(CONFIG_INTEL_TDX_GUEST))
+ return false;
+
+ if (!once) {
+ u32 eax, sig[3];
+
+ cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax,
+ &sig[0], &sig[2], &sig[1]);
+ is_tdx = !memcmp(TDX_IDENT, sig, sizeof(sig));
+ once = true;
+ }
+
+ return is_tdx;
+}

void arch_accept_memory(phys_addr_t start, phys_addr_t end)
{
/* Platform-specific memory-acceptance call goes here */
- error("Cannot accept memory");
+ if (early_is_tdx_guest()) {
+ if (tdx_accept_memory(start, end))
+ return;
+ }
+
+ error("Cannot accept memory: unknown platform\n");
}

void init_unaccepted_memory(void)
diff --git a/arch/x86/boot/compressed/tdx-shared.c b/arch/x86/boot/compressed/tdx-shared.c
new file mode 100644
index 000000000000..5ac43762fe13
--- /dev/null
+++ b/arch/x86/boot/compressed/tdx-shared.c
@@ -0,0 +1,2 @@
+#include "error.h"
+#include "../../coco/tdx/tdx-shared.c"
diff --git a/arch/x86/coco/tdx/Makefile b/arch/x86/coco/tdx/Makefile
index 46c55998557d..2c7dcbf1458b 100644
--- a/arch/x86/coco/tdx/Makefile
+++ b/arch/x86/coco/tdx/Makefile
@@ -1,3 +1,3 @@
# SPDX-License-Identifier: GPL-2.0

-obj-y += tdx.o tdcall.o
+obj-y += tdx.o tdx-shared.o tdcall.o
diff --git a/arch/x86/coco/tdx/tdx-shared.c b/arch/x86/coco/tdx/tdx-shared.c
new file mode 100644
index 000000000000..ef20ddc37b58
--- /dev/null
+++ b/arch/x86/coco/tdx/tdx-shared.c
@@ -0,0 +1,71 @@
+#include <asm/tdx.h>
+#include <asm/pgtable.h>
+
+static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
+ enum pg_level pg_level)
+{
+ unsigned long accept_size = page_level_size(pg_level);
+ u64 tdcall_rcx;
+ u8 page_size;
+
+ if (!IS_ALIGNED(start, accept_size))
+ return 0;
+
+ if (len < accept_size)
+ return 0;
+
+ /*
+ * Pass the page physical address to the TDX module to accept the
+ * pending, private page.
+ *
+ * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
+ */
+ switch (pg_level) {
+ case PG_LEVEL_4K:
+ page_size = 0;
+ break;
+ case PG_LEVEL_2M:
+ page_size = 1;
+ break;
+ case PG_LEVEL_1G:
+ page_size = 2;
+ break;
+ default:
+ return 0;
+ }
+
+ tdcall_rcx = start | page_size;
+ if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL))
+ return 0;
+
+ return accept_size;
+}
+
+bool tdx_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ /*
+ * For shared->private conversion, accept the page using
+ * TDX_ACCEPT_PAGE TDX module call.
+ */
+ while (start < end) {
+ unsigned long len = end - start;
+ unsigned long accept_size;
+
+ /*
+ * Try larger accepts first. It gives chance to VMM to keep
+ * 1G/2M Secure EPT entries where possible and speeds up
+ * process by cutting number of hypercalls (if successful).
+ */
+
+ accept_size = try_accept_one(start, len, PG_LEVEL_1G);
+ if (!accept_size)
+ accept_size = try_accept_one(start, len, PG_LEVEL_2M);
+ if (!accept_size)
+ accept_size = try_accept_one(start, len, PG_LEVEL_4K);
+ if (!accept_size)
+ return false;
+ start += accept_size;
+ }
+
+ return true;
+}
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 0d5fe6e24e45..a9c4ba6c5c5d 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -713,46 +713,6 @@ static bool tdx_cache_flush_required(void)
return true;
}

-static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
- enum pg_level pg_level)
-{
- unsigned long accept_size = page_level_size(pg_level);
- u64 tdcall_rcx;
- u8 page_size;
-
- if (!IS_ALIGNED(start, accept_size))
- return 0;
-
- if (len < accept_size)
- return 0;
-
- /*
- * Pass the page physical address to the TDX module to accept the
- * pending, private page.
- *
- * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
- */
- switch (pg_level) {
- case PG_LEVEL_4K:
- page_size = 0;
- break;
- case PG_LEVEL_2M:
- page_size = 1;
- break;
- case PG_LEVEL_1G:
- page_size = 2;
- break;
- default:
- return 0;
- }
-
- tdcall_rcx = start | page_size;
- if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL))
- return 0;
-
- return accept_size;
-}
-
/*
* Inform the VMM of the guest's intent for this physical page: shared with
* the VMM or private to the guest. The VMM is expected to change its mapping
@@ -777,33 +737,9 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
return false;

- /* private->shared conversion requires only MapGPA call */
- if (!enc)
- return true;
-
- /*
- * For shared->private conversion, accept the page using
- * TDX_ACCEPT_PAGE TDX module call.
- */
- while (start < end) {
- unsigned long len = end - start;
- unsigned long accept_size;
-
- /*
- * Try larger accepts first. It gives chance to VMM to keep
- * 1G/2M Secure EPT entries where possible and speeds up
- * process by cutting number of hypercalls (if successful).
- */
-
- accept_size = try_accept_one(start, len, PG_LEVEL_1G);
- if (!accept_size)
- accept_size = try_accept_one(start, len, PG_LEVEL_2M);
- if (!accept_size)
- accept_size = try_accept_one(start, len, PG_LEVEL_4K);
- if (!accept_size)
- return false;
- start += accept_size;
- }
+ /* shared->private conversion requires memory to be accepted before use */
+ if (enc)
+ return tdx_accept_memory(start, end);

return true;
}
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 1ff0ee822961..19228beb4894 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -91,5 +91,7 @@ struct tdx_module_output {
u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
struct tdx_module_output *out);

+bool tdx_accept_memory(phys_addr_t start, phys_addr_t end);
+
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_SHARED_TDX_H */
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
new file mode 100644
index 000000000000..f0ab217b566f
--- /dev/null
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -0,0 +1,24 @@
+#ifndef _ASM_X86_UNACCEPTED_MEMORY_H
+#define _ASM_X86_UNACCEPTED_MEMORY_H
+
+#include <linux/efi.h>
+#include <asm/tdx.h>
+
+static inline void arch_accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ /* Platform-specific memory-acceptance call goes here */
+ if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
+ if (tdx_accept_memory(start, end))
+ return;
+ }
+
+ panic("Cannot accept memory: unknown platform\n");
+}
+
+static inline struct efi_unaccepted_memory *efi_get_unaccepted_table(void)
+{
+ if (efi.unaccepted == EFI_INVALID_TABLE_ADDR)
+ return NULL;
+ return __va(efi.unaccepted);
+}
+#endif
--
2.39.3


2023-06-01 18:52:01

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv13 5/9] efi: Add unaccepted memory support

efi_config_parse_tables() reserves memory that holds unaccepted memory
configuration table so it won't be reused by page allocator.

Core-mm requires few helpers to support unaccepted memory:

- accept_memory() checks the range of addresses against the bitmap and
accept memory if needed.

- range_contains_unaccepted_memory() checks if anything within the
range requires acceptance.

Architectural code has to provide efi_get_unaccepted_table() that
returns pointer to the unaccepted memory configuration table.

arch_accept_memory() handles arch-specific part of memory acceptance.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
arch/x86/platform/efi/efi.c | 3 +
drivers/firmware/efi/Makefile | 1 +
drivers/firmware/efi/efi.c | 25 ++++++
drivers/firmware/efi/unaccepted_memory.c | 103 +++++++++++++++++++++++
include/linux/efi.h | 1 +
5 files changed, 133 insertions(+)
create mode 100644 drivers/firmware/efi/unaccepted_memory.c

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index f3f2d87cce1b..e9f99c56f3ce 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -96,6 +96,9 @@ static const unsigned long * const efi_tables[] = {
#ifdef CONFIG_EFI_COCO_SECRET
&efi.coco_secret,
#endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+ &efi.unaccepted,
+#endif
};

u64 efi_setup; /* efi setup_data physical address */
diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
index b51f2a4c821e..e489fefd23da 100644
--- a/drivers/firmware/efi/Makefile
+++ b/drivers/firmware/efi/Makefile
@@ -41,3 +41,4 @@ obj-$(CONFIG_EFI_CAPSULE_LOADER) += capsule-loader.o
obj-$(CONFIG_EFI_EARLYCON) += earlycon.o
obj-$(CONFIG_UEFI_CPER_ARM) += cper-arm.o
obj-$(CONFIG_UEFI_CPER_X86) += cper-x86.o
+obj-$(CONFIG_UNACCEPTED_MEMORY) += unaccepted_memory.o
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 7dce06e419c5..d817e7afd266 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -50,6 +50,9 @@ struct efi __read_mostly efi = {
#ifdef CONFIG_EFI_COCO_SECRET
.coco_secret = EFI_INVALID_TABLE_ADDR,
#endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+ .unaccepted = EFI_INVALID_TABLE_ADDR,
+#endif
};
EXPORT_SYMBOL(efi);

@@ -605,6 +608,9 @@ static const efi_config_table_type_t common_tables[] __initconst = {
#ifdef CONFIG_EFI_COCO_SECRET
{LINUX_EFI_COCO_SECRET_AREA_GUID, &efi.coco_secret, "CocoSecret" },
#endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+ {LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID, &efi.unaccepted, "Unaccepted" },
+#endif
#ifdef CONFIG_EFI_GENERIC_STUB
{LINUX_EFI_SCREEN_INFO_TABLE_GUID, &screen_info_table },
#endif
@@ -759,6 +765,25 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
}
}

+ if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY) &&
+ efi.unaccepted != EFI_INVALID_TABLE_ADDR) {
+ struct efi_unaccepted_memory *unaccepted;
+
+ unaccepted = early_memremap(efi.unaccepted, sizeof(*unaccepted));
+ if (unaccepted) {
+ unsigned long size;
+
+ if (unaccepted->version == 1) {
+ size = sizeof(*unaccepted) + unaccepted->size;
+ memblock_reserve(efi.unaccepted, size);
+ } else {
+ efi.unaccepted = EFI_INVALID_TABLE_ADDR;
+ }
+
+ early_memunmap(unaccepted, sizeof(*unaccepted));
+ }
+ }
+
return 0;
}

diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c
new file mode 100644
index 000000000000..bb91c41f76fb
--- /dev/null
+++ b/drivers/firmware/efi/unaccepted_memory.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/efi.h>
+#include <linux/memblock.h>
+#include <linux/spinlock.h>
+#include <asm/unaccepted_memory.h>
+
+/* Protects unaccepted memory bitmap */
+static DEFINE_SPINLOCK(unaccepted_memory_lock);
+
+void accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ struct efi_unaccepted_memory *unaccepted;
+ unsigned long range_start, range_end;
+ unsigned long flags;
+ u64 unit_size;
+
+ if (efi.unaccepted == EFI_INVALID_TABLE_ADDR)
+ return;
+
+ unaccepted = efi_get_unaccepted_table();
+ if (!unaccepted)
+ return;
+
+ unit_size = unaccepted->unit_size;
+
+ /*
+ * Only care for the part of the range that is represented
+ * in the bitmap.
+ */
+ if (start < unaccepted->phys_base)
+ start = unaccepted->phys_base;
+ if (end < unaccepted->phys_base)
+ return;
+
+ /* Translate to offsets from the beginning of the bitmap */
+ start -= unaccepted->phys_base;
+ end -= unaccepted->phys_base;
+
+ /* Make sure not to overrun the bitmap */
+ if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
+ end = unaccepted->size * unit_size * BITS_PER_BYTE;
+
+ range_start = start / unit_size;
+
+ spin_lock_irqsave(&unaccepted_memory_lock, flags);
+ for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap,
+ DIV_ROUND_UP(end, unit_size)) {
+ unsigned long phys_start, phys_end;
+ unsigned long len = range_end - range_start;
+
+ phys_start = range_start * unit_size + unaccepted->phys_base;
+ phys_end = range_end * unit_size + unaccepted->phys_base;
+
+ arch_accept_memory(phys_start, phys_end);
+ bitmap_clear(unaccepted->bitmap, range_start, len);
+ }
+ spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+}
+
+bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end)
+{
+ struct efi_unaccepted_memory *unaccepted;
+ unsigned long flags;
+ bool ret = false;
+ u64 unit_size;
+
+ unaccepted = efi_get_unaccepted_table();
+ if (!unaccepted)
+ return false;
+
+ unit_size = unaccepted->unit_size;
+
+ /*
+ * Only care for the part of the range that is represented
+ * in the bitmap.
+ */
+ if (start < unaccepted->phys_base)
+ start = unaccepted->phys_base;
+ if (end < unaccepted->phys_base)
+ return false;
+
+ /* Translate to offsets from the beginning of the bitmap */
+ start -= unaccepted->phys_base;
+ end -= unaccepted->phys_base;
+
+ /* Make sure not to overrun the bitmap */
+ if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
+ end = unaccepted->size * unit_size * BITS_PER_BYTE;
+
+ spin_lock_irqsave(&unaccepted_memory_lock, flags);
+ while (start < end) {
+ if (test_bit(start / unit_size, unaccepted->bitmap)) {
+ ret = true;
+ break;
+ }
+
+ start += unit_size;
+ }
+ spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+
+ return ret;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 29cc622910da..9864f9c00da2 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -646,6 +646,7 @@ extern struct efi {
unsigned long tpm_final_log; /* TPM2 Final Events Log table */
unsigned long mokvar_table; /* MOK variable config table */
unsigned long coco_secret; /* Confidential computing secret table */
+ unsigned long unaccepted; /* Unaccepted memory table */

efi_get_time_t *get_time;
efi_set_time_t *set_time;
--
2.39.3


2023-06-02 13:30:50

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCHv13 9/9] x86/tdx: Add unaccepted memory support

On 6/1/23 13:25, Kirill A. Shutemov wrote:
> Hookup TDX-specific code to accept memory.
>
> Accepting the memory is done with ACCEPT_PAGE module call on every page
> in the range. MAP_GPA hypercall is not required as the unaccepted memory
> is considered private already.
>
> Extract the part of tdx_enc_status_changed() that does memory acceptance
> in a new helper. Move the helper tdx-shared.c. It is going to be used by
> both main kernel and decompressor.
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
> arch/x86/Kconfig | 2 +
> arch/x86/boot/compressed/Makefile | 2 +-
> arch/x86/boot/compressed/error.c | 19 +++++++
> arch/x86/boot/compressed/error.h | 1 +
> arch/x86/boot/compressed/mem.c | 35 +++++++++++-
> arch/x86/boot/compressed/tdx-shared.c | 2 +
> arch/x86/coco/tdx/Makefile | 2 +-
> arch/x86/coco/tdx/tdx-shared.c | 71 ++++++++++++++++++++++++
> arch/x86/coco/tdx/tdx.c | 70 +----------------------
> arch/x86/include/asm/shared/tdx.h | 2 +
> arch/x86/include/asm/unaccepted_memory.h | 24 ++++++++
> 11 files changed, 160 insertions(+), 70 deletions(-)
> create mode 100644 arch/x86/boot/compressed/tdx-shared.c
> create mode 100644 arch/x86/coco/tdx/tdx-shared.c
> create mode 100644 arch/x86/include/asm/unaccepted_memory.h
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 53bab123a8ee..5c72067c06d4 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -884,9 +884,11 @@ config INTEL_TDX_GUEST
> bool "Intel TDX (Trust Domain Extensions) - Guest Support"
> depends on X86_64 && CPU_SUP_INTEL
> depends on X86_X2APIC
> + depends on EFI_STUB
> select ARCH_HAS_CC_PLATFORM
> select X86_MEM_ENCRYPT
> select X86_MCE
> + select UNACCEPTED_MEMORY
> help
> Support running as a guest under Intel TDX. Without this support,
> the guest kernel can not boot or run under TDX.
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index cc4978123c30..b13a58021086 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -106,7 +106,7 @@ ifdef CONFIG_X86_64
> endif
>
> vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
> -vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
> +vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o $(obj)/tdx-shared.o
> vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o
>
> vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
> diff --git a/arch/x86/boot/compressed/error.c b/arch/x86/boot/compressed/error.c
> index c881878e56d3..5313c5cb2b80 100644
> --- a/arch/x86/boot/compressed/error.c
> +++ b/arch/x86/boot/compressed/error.c
> @@ -22,3 +22,22 @@ void error(char *m)
> while (1)
> asm("hlt");
> }
> +
> +/* EFI libstub provides vsnprintf() */
> +#ifdef CONFIG_EFI_STUB
> +void panic(const char *fmt, ...)
> +{
> + static char buf[1024];
> + va_list args;
> + int len;
> +
> + va_start(args, fmt);
> + len = vsnprintf(buf, sizeof(buf), fmt, args);
> + va_end(args);
> +
> + if (len && buf[len - 1] == '\n')
> + buf[len - 1] = '\0';
> +
> + error(buf);
> +}
> +#endif
> diff --git a/arch/x86/boot/compressed/error.h b/arch/x86/boot/compressed/error.h
> index 1de5821184f1..86fe33b93715 100644
> --- a/arch/x86/boot/compressed/error.h
> +++ b/arch/x86/boot/compressed/error.h
> @@ -6,5 +6,6 @@
>
> void warn(char *m);
> void error(char *m) __noreturn;
> +void panic(const char *fmt, ...) __noreturn __cold;
>
> #endif /* BOOT_COMPRESSED_ERROR_H */
> diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
> index 4ecf26576a77..d2b6948a7801 100644
> --- a/arch/x86/boot/compressed/mem.c
> +++ b/arch/x86/boot/compressed/mem.c
> @@ -2,11 +2,44 @@
>
> #include "error.h"
> #include "misc.h"
> +#include "tdx.h"
> +#include <asm/shared/tdx.h>
> +
> +/*
> + * accept_memory() and process_unaccepted_memory() called from EFI stub which
> + * runs before decompresser and its early_tdx_detect().
> + *
> + * Enumerate TDX directly from the early users.
> + */
> +static bool early_is_tdx_guest(void)
> +{
> + static bool once;
> + static bool is_tdx;
> +
> + if (!IS_ENABLED(CONFIG_INTEL_TDX_GUEST))
> + return false;
> +
> + if (!once) {
> + u32 eax, sig[3];
> +
> + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax,
> + &sig[0], &sig[2], &sig[1]);
> + is_tdx = !memcmp(TDX_IDENT, sig, sizeof(sig));
> + once = true;
> + }
> +
> + return is_tdx;
> +}
>
> void arch_accept_memory(phys_addr_t start, phys_addr_t end)
> {
> /* Platform-specific memory-acceptance call goes here */
> - error("Cannot accept memory");
> + if (early_is_tdx_guest()) {
> + if (tdx_accept_memory(start, end))
> + return;
> + }
> +
> + error("Cannot accept memory: unknown platform\n");

So this is a change in this version. If tdx_accept_memory() fails, you'll
report unknown platform. Wouldn't it be better to have an error message
that indicates a failure in the accept path?

Thanks,
Tom

> }
>
> void init_unaccepted_memory(void)
> diff --git a/arch/x86/boot/compressed/tdx-shared.c b/arch/x86/boot/compressed/tdx-shared.c
> new file mode 100644
> index 000000000000..5ac43762fe13
> --- /dev/null
> +++ b/arch/x86/boot/compressed/tdx-shared.c
> @@ -0,0 +1,2 @@
> +#include "error.h"
> +#include "../../coco/tdx/tdx-shared.c"
> diff --git a/arch/x86/coco/tdx/Makefile b/arch/x86/coco/tdx/Makefile
> index 46c55998557d..2c7dcbf1458b 100644
> --- a/arch/x86/coco/tdx/Makefile
> +++ b/arch/x86/coco/tdx/Makefile
> @@ -1,3 +1,3 @@
> # SPDX-License-Identifier: GPL-2.0
>
> -obj-y += tdx.o tdcall.o
> +obj-y += tdx.o tdx-shared.o tdcall.o
> diff --git a/arch/x86/coco/tdx/tdx-shared.c b/arch/x86/coco/tdx/tdx-shared.c
> new file mode 100644
> index 000000000000..ef20ddc37b58
> --- /dev/null
> +++ b/arch/x86/coco/tdx/tdx-shared.c
> @@ -0,0 +1,71 @@
> +#include <asm/tdx.h>
> +#include <asm/pgtable.h>
> +
> +static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
> + enum pg_level pg_level)
> +{
> + unsigned long accept_size = page_level_size(pg_level);
> + u64 tdcall_rcx;
> + u8 page_size;
> +
> + if (!IS_ALIGNED(start, accept_size))
> + return 0;
> +
> + if (len < accept_size)
> + return 0;
> +
> + /*
> + * Pass the page physical address to the TDX module to accept the
> + * pending, private page.
> + *
> + * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
> + */
> + switch (pg_level) {
> + case PG_LEVEL_4K:
> + page_size = 0;
> + break;
> + case PG_LEVEL_2M:
> + page_size = 1;
> + break;
> + case PG_LEVEL_1G:
> + page_size = 2;
> + break;
> + default:
> + return 0;
> + }
> +
> + tdcall_rcx = start | page_size;
> + if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL))
> + return 0;
> +
> + return accept_size;
> +}
> +
> +bool tdx_accept_memory(phys_addr_t start, phys_addr_t end)
> +{
> + /*
> + * For shared->private conversion, accept the page using
> + * TDX_ACCEPT_PAGE TDX module call.
> + */
> + while (start < end) {
> + unsigned long len = end - start;
> + unsigned long accept_size;
> +
> + /*
> + * Try larger accepts first. It gives chance to VMM to keep
> + * 1G/2M Secure EPT entries where possible and speeds up
> + * process by cutting number of hypercalls (if successful).
> + */
> +
> + accept_size = try_accept_one(start, len, PG_LEVEL_1G);
> + if (!accept_size)
> + accept_size = try_accept_one(start, len, PG_LEVEL_2M);
> + if (!accept_size)
> + accept_size = try_accept_one(start, len, PG_LEVEL_4K);
> + if (!accept_size)
> + return false;
> + start += accept_size;
> + }
> +
> + return true;
> +}
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 0d5fe6e24e45..a9c4ba6c5c5d 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -713,46 +713,6 @@ static bool tdx_cache_flush_required(void)
> return true;
> }
>
> -static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
> - enum pg_level pg_level)
> -{
> - unsigned long accept_size = page_level_size(pg_level);
> - u64 tdcall_rcx;
> - u8 page_size;
> -
> - if (!IS_ALIGNED(start, accept_size))
> - return 0;
> -
> - if (len < accept_size)
> - return 0;
> -
> - /*
> - * Pass the page physical address to the TDX module to accept the
> - * pending, private page.
> - *
> - * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
> - */
> - switch (pg_level) {
> - case PG_LEVEL_4K:
> - page_size = 0;
> - break;
> - case PG_LEVEL_2M:
> - page_size = 1;
> - break;
> - case PG_LEVEL_1G:
> - page_size = 2;
> - break;
> - default:
> - return 0;
> - }
> -
> - tdcall_rcx = start | page_size;
> - if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL))
> - return 0;
> -
> - return accept_size;
> -}
> -
> /*
> * Inform the VMM of the guest's intent for this physical page: shared with
> * the VMM or private to the guest. The VMM is expected to change its mapping
> @@ -777,33 +737,9 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
> return false;
>
> - /* private->shared conversion requires only MapGPA call */
> - if (!enc)
> - return true;
> -
> - /*
> - * For shared->private conversion, accept the page using
> - * TDX_ACCEPT_PAGE TDX module call.
> - */
> - while (start < end) {
> - unsigned long len = end - start;
> - unsigned long accept_size;
> -
> - /*
> - * Try larger accepts first. It gives chance to VMM to keep
> - * 1G/2M Secure EPT entries where possible and speeds up
> - * process by cutting number of hypercalls (if successful).
> - */
> -
> - accept_size = try_accept_one(start, len, PG_LEVEL_1G);
> - if (!accept_size)
> - accept_size = try_accept_one(start, len, PG_LEVEL_2M);
> - if (!accept_size)
> - accept_size = try_accept_one(start, len, PG_LEVEL_4K);
> - if (!accept_size)
> - return false;
> - start += accept_size;
> - }
> + /* shared->private conversion requires memory to be accepted before use */
> + if (enc)
> + return tdx_accept_memory(start, end);
>
> return true;
> }
> diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
> index 1ff0ee822961..19228beb4894 100644
> --- a/arch/x86/include/asm/shared/tdx.h
> +++ b/arch/x86/include/asm/shared/tdx.h
> @@ -91,5 +91,7 @@ struct tdx_module_output {
> u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> struct tdx_module_output *out);
>
> +bool tdx_accept_memory(phys_addr_t start, phys_addr_t end);
> +
> #endif /* !__ASSEMBLY__ */
> #endif /* _ASM_X86_SHARED_TDX_H */
> diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
> new file mode 100644
> index 000000000000..f0ab217b566f
> --- /dev/null
> +++ b/arch/x86/include/asm/unaccepted_memory.h
> @@ -0,0 +1,24 @@
> +#ifndef _ASM_X86_UNACCEPTED_MEMORY_H
> +#define _ASM_X86_UNACCEPTED_MEMORY_H
> +
> +#include <linux/efi.h>
> +#include <asm/tdx.h>
> +
> +static inline void arch_accept_memory(phys_addr_t start, phys_addr_t end)
> +{
> + /* Platform-specific memory-acceptance call goes here */
> + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> + if (tdx_accept_memory(start, end))
> + return;
> + }
> +
> + panic("Cannot accept memory: unknown platform\n");
> +}
> +
> +static inline struct efi_unaccepted_memory *efi_get_unaccepted_table(void)
> +{
> + if (efi.unaccepted == EFI_INVALID_TABLE_ADDR)
> + return NULL;
> + return __va(efi.unaccepted);
> +}
> +#endif

2023-06-02 14:30:56

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCHv13 9/9] x86/tdx: Add unaccepted memory support

On 6/2/23 08:22, Tom Lendacky wrote:
> On 6/1/23 13:25, Kirill A. Shutemov wrote:
>> Hookup TDX-specific code to accept memory.
>>
>> Accepting the memory is done with ACCEPT_PAGE module call on every page
>> in the range. MAP_GPA hypercall is not required as the unaccepted memory
>> is considered private already.
>>
>> Extract the part of tdx_enc_status_changed() that does memory acceptance
>> in a new helper. Move the helper tdx-shared.c. It is going to be used by
>> both main kernel and decompressor.
>>
>> Signed-off-by: Kirill A. Shutemov <[email protected]>
>> ---
>>   arch/x86/Kconfig                         |  2 +
>>   arch/x86/boot/compressed/Makefile        |  2 +-
>>   arch/x86/boot/compressed/error.c         | 19 +++++++
>>   arch/x86/boot/compressed/error.h         |  1 +
>>   arch/x86/boot/compressed/mem.c           | 35 +++++++++++-
>>   arch/x86/boot/compressed/tdx-shared.c    |  2 +
>>   arch/x86/coco/tdx/Makefile               |  2 +-
>>   arch/x86/coco/tdx/tdx-shared.c           | 71 ++++++++++++++++++++++++
>>   arch/x86/coco/tdx/tdx.c                  | 70 +----------------------
>>   arch/x86/include/asm/shared/tdx.h        |  2 +
>>   arch/x86/include/asm/unaccepted_memory.h | 24 ++++++++
>>   11 files changed, 160 insertions(+), 70 deletions(-)
>>   create mode 100644 arch/x86/boot/compressed/tdx-shared.c
>>   create mode 100644 arch/x86/coco/tdx/tdx-shared.c
>>   create mode 100644 arch/x86/include/asm/unaccepted_memory.h
>>

>> diff --git a/arch/x86/boot/compressed/mem.c
>> b/arch/x86/boot/compressed/mem.c
>> index 4ecf26576a77..d2b6948a7801 100644
>> --- a/arch/x86/boot/compressed/mem.c
>> +++ b/arch/x86/boot/compressed/mem.c
>> @@ -2,11 +2,44 @@
>>   #include "error.h"
>>   #include "misc.h"
>> +#include "tdx.h"
>> +#include <asm/shared/tdx.h>
>> +
>> +/*
>> + * accept_memory() and process_unaccepted_memory() called from EFI stub
>> which
>> + * runs before decompresser and its early_tdx_detect().
>> + *
>> + * Enumerate TDX directly from the early users.
>> + */
>> +static bool early_is_tdx_guest(void)
>> +{
>> +    static bool once;
>> +    static bool is_tdx;
>> +
>> +    if (!IS_ENABLED(CONFIG_INTEL_TDX_GUEST))
>> +        return false;
>> +
>> +    if (!once) {
>> +        u32 eax, sig[3];
>> +
>> +        cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax,
>> +                &sig[0], &sig[2],  &sig[1]);
>> +        is_tdx = !memcmp(TDX_IDENT, sig, sizeof(sig));
>> +        once = true;
>> +    }
>> +
>> +    return is_tdx;
>> +}
>>   void arch_accept_memory(phys_addr_t start, phys_addr_t end)
>>   {
>>       /* Platform-specific memory-acceptance call goes here */
>> -    error("Cannot accept memory");
>> +    if (early_is_tdx_guest()) {
>> +        if (tdx_accept_memory(start, end))
>> +            return;
>> +    }
>> +
>> +    error("Cannot accept memory: unknown platform\n");
>
> So this is a change in this version. If tdx_accept_memory() fails, you'll
> report unknown platform. Wouldn't it be better to have an error message
> that indicates a failure in the accept path?
>

Maybe you can keep it similar to the v12 version with just a new error
message, something like:

if (early_is_tdx_guest()) {
if (!tdx_accept_memory(start, end))
error("TDX error accepting memory\n");
} else {
error("Cannot accept memory: unknown platform\n");
}

And similar in arch/x86/include/asm/unaccepted_memory.h.

Thanks,
Tom

> Thanks,
> Tom
>
>>   }
>>   void init_unaccepted_memory(void)

2023-06-02 14:49:51

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv13 9/9] x86/tdx: Add unaccepted memory support

On Fri, Jun 02, 2023 at 08:22:35AM -0500, Tom Lendacky wrote:
> On 6/1/23 13:25, Kirill A. Shutemov wrote:
> > Hookup TDX-specific code to accept memory.
> >
> > Accepting the memory is done with ACCEPT_PAGE module call on every page
> > in the range. MAP_GPA hypercall is not required as the unaccepted memory
> > is considered private already.
> >
> > Extract the part of tdx_enc_status_changed() that does memory acceptance
> > in a new helper. Move the helper tdx-shared.c. It is going to be used by
> > both main kernel and decompressor.
> >
> > Signed-off-by: Kirill A. Shutemov <[email protected]>
> > ---
> > arch/x86/Kconfig | 2 +
> > arch/x86/boot/compressed/Makefile | 2 +-
> > arch/x86/boot/compressed/error.c | 19 +++++++
> > arch/x86/boot/compressed/error.h | 1 +
> > arch/x86/boot/compressed/mem.c | 35 +++++++++++-
> > arch/x86/boot/compressed/tdx-shared.c | 2 +
> > arch/x86/coco/tdx/Makefile | 2 +-
> > arch/x86/coco/tdx/tdx-shared.c | 71 ++++++++++++++++++++++++
> > arch/x86/coco/tdx/tdx.c | 70 +----------------------
> > arch/x86/include/asm/shared/tdx.h | 2 +
> > arch/x86/include/asm/unaccepted_memory.h | 24 ++++++++
> > 11 files changed, 160 insertions(+), 70 deletions(-)
> > create mode 100644 arch/x86/boot/compressed/tdx-shared.c
> > create mode 100644 arch/x86/coco/tdx/tdx-shared.c
> > create mode 100644 arch/x86/include/asm/unaccepted_memory.h
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 53bab123a8ee..5c72067c06d4 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -884,9 +884,11 @@ config INTEL_TDX_GUEST
> > bool "Intel TDX (Trust Domain Extensions) - Guest Support"
> > depends on X86_64 && CPU_SUP_INTEL
> > depends on X86_X2APIC
> > + depends on EFI_STUB
> > select ARCH_HAS_CC_PLATFORM
> > select X86_MEM_ENCRYPT
> > select X86_MCE
> > + select UNACCEPTED_MEMORY
> > help
> > Support running as a guest under Intel TDX. Without this support,
> > the guest kernel can not boot or run under TDX.
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index cc4978123c30..b13a58021086 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -106,7 +106,7 @@ ifdef CONFIG_X86_64
> > endif
> > vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
> > -vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
> > +vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o $(obj)/tdx-shared.o
> > vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o
> > vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
> > diff --git a/arch/x86/boot/compressed/error.c b/arch/x86/boot/compressed/error.c
> > index c881878e56d3..5313c5cb2b80 100644
> > --- a/arch/x86/boot/compressed/error.c
> > +++ b/arch/x86/boot/compressed/error.c
> > @@ -22,3 +22,22 @@ void error(char *m)
> > while (1)
> > asm("hlt");
> > }
> > +
> > +/* EFI libstub provides vsnprintf() */
> > +#ifdef CONFIG_EFI_STUB
> > +void panic(const char *fmt, ...)
> > +{
> > + static char buf[1024];
> > + va_list args;
> > + int len;
> > +
> > + va_start(args, fmt);
> > + len = vsnprintf(buf, sizeof(buf), fmt, args);
> > + va_end(args);
> > +
> > + if (len && buf[len - 1] == '\n')
> > + buf[len - 1] = '\0';
> > +
> > + error(buf);
> > +}
> > +#endif
> > diff --git a/arch/x86/boot/compressed/error.h b/arch/x86/boot/compressed/error.h
> > index 1de5821184f1..86fe33b93715 100644
> > --- a/arch/x86/boot/compressed/error.h
> > +++ b/arch/x86/boot/compressed/error.h
> > @@ -6,5 +6,6 @@
> > void warn(char *m);
> > void error(char *m) __noreturn;
> > +void panic(const char *fmt, ...) __noreturn __cold;
> > #endif /* BOOT_COMPRESSED_ERROR_H */
> > diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
> > index 4ecf26576a77..d2b6948a7801 100644
> > --- a/arch/x86/boot/compressed/mem.c
> > +++ b/arch/x86/boot/compressed/mem.c
> > @@ -2,11 +2,44 @@
> > #include "error.h"
> > #include "misc.h"
> > +#include "tdx.h"
> > +#include <asm/shared/tdx.h>
> > +
> > +/*
> > + * accept_memory() and process_unaccepted_memory() called from EFI stub which
> > + * runs before decompresser and its early_tdx_detect().
> > + *
> > + * Enumerate TDX directly from the early users.
> > + */
> > +static bool early_is_tdx_guest(void)
> > +{
> > + static bool once;
> > + static bool is_tdx;
> > +
> > + if (!IS_ENABLED(CONFIG_INTEL_TDX_GUEST))
> > + return false;
> > +
> > + if (!once) {
> > + u32 eax, sig[3];
> > +
> > + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax,
> > + &sig[0], &sig[2], &sig[1]);
> > + is_tdx = !memcmp(TDX_IDENT, sig, sizeof(sig));
> > + once = true;
> > + }
> > +
> > + return is_tdx;
> > +}
> > void arch_accept_memory(phys_addr_t start, phys_addr_t end)
> > {
> > /* Platform-specific memory-acceptance call goes here */
> > - error("Cannot accept memory");
> > + if (early_is_tdx_guest()) {
> > + if (tdx_accept_memory(start, end))
> > + return;
> > + }
> > +
> > + error("Cannot accept memory: unknown platform\n");
>
> So this is a change in this version. If tdx_accept_memory() fails, you'll
> report unknown platform. Wouldn't it be better to have an error message that
> indicates a failure in the accept path?

Urgh.. Didn't read the error message on the rework.

diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
index d2b6948a7801..a0d24df1004d 100644
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -35,11 +35,11 @@ void arch_accept_memory(phys_addr_t start, phys_addr_t end)
{
/* Platform-specific memory-acceptance call goes here */
if (early_is_tdx_guest()) {
- if (tdx_accept_memory(start, end))
- return;
+ if (!tdx_accept_memory(start, end))
+ panic("TDX: Failed to accept memory\n");
+ } else {
+ error("Cannot accept memory: unknown platform\n");
}
-
- error("Cannot accept memory: unknown platform\n");
}

void init_unaccepted_memory(void)
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
index f0ab217b566f..572514e36fde 100644
--- a/arch/x86/include/asm/unaccepted_memory.h
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -8,11 +8,11 @@ static inline void arch_accept_memory(phys_addr_t start, phys_addr_t end)
{
/* Platform-specific memory-acceptance call goes here */
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
- if (tdx_accept_memory(start, end))
- return;
+ if (!tdx_accept_memory(start, end))
+ panic("TDX: Failed to accept memory\n");
+ } else {
+ panic("Cannot accept memory: unknown platform\n");
}
-
- panic("Cannot accept memory: unknown platform\n");
}

static inline struct efi_unaccepted_memory *efi_get_unaccepted_table(void)
--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-05 16:24:40

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv13 5/9] efi: Add unaccepted memory support

On Thu, Jun 01, 2023 at 09:25:39PM +0300, Kirill A. Shutemov wrote:
> +void accept_memory(phys_addr_t start, phys_addr_t end)
> +{
> + struct efi_unaccepted_memory *unaccepted;
> + unsigned long range_start, range_end;
> + unsigned long flags;
> + u64 unit_size;
> +
> + if (efi.unaccepted == EFI_INVALID_TABLE_ADDR)
> + return;

efi_get_unaccepted_table() already does this test.

> + unaccepted = efi_get_unaccepted_table();
> + if (!unaccepted)
> + return;

So this looks weird: callers can call accept_memory() and that function
can fail. But they can't know whether it failed or not because it
returns void.

> + unit_size = unaccepted->unit_size;
> +
> + /*
> + * Only care for the part of the range that is represented
> + * in the bitmap.
> + */
> + if (start < unaccepted->phys_base)
> + start = unaccepted->phys_base;

So this silently trims start...

> + if (end < unaccepted->phys_base)
> + return;

But fails only when end is outside of range.

I'd warn here at least. And return an error so that the callers know.

> + /* Translate to offsets from the beginning of the bitmap */
> + start -= unaccepted->phys_base;
> + end -= unaccepted->phys_base;
> +
> + /* Make sure not to overrun the bitmap */
> + if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
> + end = unaccepted->size * unit_size * BITS_PER_BYTE;

How is all that trimming not important to the caller?

It would assume that its memory got accepted but not really.

> + range_start = start / unit_size;
> +
> + spin_lock_irqsave(&unaccepted_memory_lock, flags);
> + for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap,
> + DIV_ROUND_UP(end, unit_size)) {
> + unsigned long phys_start, phys_end;
> + unsigned long len = range_end - range_start;
> +
> + phys_start = range_start * unit_size + unaccepted->phys_base;
> + phys_end = range_end * unit_size + unaccepted->phys_base;
> +
> + arch_accept_memory(phys_start, phys_end);
> + bitmap_clear(unaccepted->bitmap, range_start, len);
> + }
> + spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
> +}
> +
> +bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end)
> +{
> + struct efi_unaccepted_memory *unaccepted;
> + unsigned long flags;
> + bool ret = false;
> + u64 unit_size;
> +
> + unaccepted = efi_get_unaccepted_table();
> + if (!unaccepted)
> + return false;
> +
> + unit_size = unaccepted->unit_size;
> +
> + /*
> + * Only care for the part of the range that is represented
> + * in the bitmap.
> + */
> + if (start < unaccepted->phys_base)
> + start = unaccepted->phys_base;

Same comment as above. Trimming start is fine?

> + if (end < unaccepted->phys_base)
> + return false;
> +
> + /* Translate to offsets from the beginning of the bitmap */
> + start -= unaccepted->phys_base;
> + end -= unaccepted->phys_base;

Ditto as above.

> +
> + /* Make sure not to overrun the bitmap */
> + if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
> + end = unaccepted->size * unit_size * BITS_PER_BYTE;

Ditto.

> + spin_lock_irqsave(&unaccepted_memory_lock, flags);
> + while (start < end) {
> + if (test_bit(start / unit_size, unaccepted->bitmap)) {
> + ret = true;
> + break;

I have a faint memory we've had this before but you need to check
*every* bit in the unaccepted bitmap before returning true. Doh.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-05 17:56:37

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv13 5/9] efi: Add unaccepted memory support

On Mon, Jun 05, 2023 at 05:43:33PM +0200, Borislav Petkov wrote:
> On Thu, Jun 01, 2023 at 09:25:39PM +0300, Kirill A. Shutemov wrote:
> > +void accept_memory(phys_addr_t start, phys_addr_t end)
> > +{
> > + struct efi_unaccepted_memory *unaccepted;
> > + unsigned long range_start, range_end;
> > + unsigned long flags;
> > + u64 unit_size;
> > +
> > + if (efi.unaccepted == EFI_INVALID_TABLE_ADDR)
> > + return;
>
> efi_get_unaccepted_table() already does this test.

Okay.

> > + unaccepted = efi_get_unaccepted_table();
> > + if (!unaccepted)
> > + return;
>
> So this looks weird: callers can call accept_memory() and that function
> can fail. But they can't know whether it failed or not because it
> returns void.

It is not a failure here. If there's no unaccepted memory in the system
accept_memory() always succeeds.

> > + unit_size = unaccepted->unit_size;
> > +
> > + /*
> > + * Only care for the part of the range that is represented
> > + * in the bitmap.
> > + */
> > + if (start < unaccepted->phys_base)
> > + start = unaccepted->phys_base;
>
> So this silently trims start...
>
> > + if (end < unaccepted->phys_base)
> > + return;
>
> But fails only when end is outside of range.
>
> I'd warn here at least. And return an error so that the callers know.

There's nothing to warn about. The range (or part of it) is not
represented in the bitmap because it is not unaccepted. We only allocate
bitmap for the range that has unaccepted memory. It can reduce memory
overhead on the bitmap if the unaccepted memory starts very high or ends
early, but there's something else very high in physical addresss space.

> > + /* Translate to offsets from the beginning of the bitmap */
> > + start -= unaccepted->phys_base;
> > + end -= unaccepted->phys_base;
> > +
> > + /* Make sure not to overrun the bitmap */
> > + if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
> > + end = unaccepted->size * unit_size * BITS_PER_BYTE;
>
> How is all that trimming not important to the caller?
>
> It would assume that its memory got accepted but not really.

See above: not represented in the bitmap means pre-accepted.

...

> > + spin_lock_irqsave(&unaccepted_memory_lock, flags);
> > + while (start < end) {
> > + if (test_bit(start / unit_size, unaccepted->bitmap)) {
> > + ret = true;
> > + break;
>
> I have a faint memory we've had this before but you need to check
> *every* bit in the unaccepted bitmap before returning true. Doh.

Yes, it was discussed before. Here's context:

https://lore.kernel.org/all/Ynt8vDY78/[email protected]

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-05 19:17:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv13 5/9] efi: Add unaccepted memory support

On Mon, Jun 05, 2023 at 08:33:03PM +0300, Kirill A. Shutemov wrote:
> There's nothing to warn about. The range (or part of it) is not
> represented in the bitmap because it is not unaccepted.

Sorry but how am I supposed to know that?!

I've read the whole patchset up until now and all text talks like *all*
*memory* needs to be accepted and before that has happeend, it is
unaccepted.

So how about you explain that explicitly somewhere, perhaps in a comment
above accept_memory(), that the unaccepted range is not the whole memory
but only, well, what is unaccepted and the rest is implicitly accepted?

And I went and looked at the final result - we error() if we fail
accepting.

I guess that's the only action we can do anyway...

> Yes, it was discussed before. Here's context:
>
> https://lore.kernel.org/all/Ynt8vDY78/[email protected]

You should try those before you paste them - it says "Not found" because
of the '/' in the Message-ID and it needs to be escaped.

This works:

https://lore.kernel.org/all/Ynt8vDY78%[email protected]/

Now I remember.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-05 19:41:35

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCHv13 9/9] x86/tdx: Add unaccepted memory support

On 6/2/23 07:26, Tom Lendacky wrote:
>> So this is a change in this version. If tdx_accept_memory() fails,
>> you'll report unknown platform. Wouldn't it be better to have an error
>> message that indicates a failure in the accept path?
>>
>
> Maybe you can keep it similar to the v12 version with just a new error
> message, something like:
>
>     if (early_is_tdx_guest()) {
>         if (!tdx_accept_memory(start, end))
>             error("TDX error accepting memory\n");
>     } else {
>         error("Cannot accept memory: unknown platform\n");
>     }

In the end, these errors aren't plumbed out to the page allocator. They
*need* to succeed or we are dead anyway. Should we just send a fatal
error up to the TDX module when we fail to accept memory? It's
_slightly_ less opaque than plowing into an unaccepted page.

2023-06-05 22:08:33

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv13 9/9] x86/tdx: Add unaccepted memory support

On Mon, Jun 05, 2023 at 12:18:21PM -0700, Dave Hansen wrote:
> On 6/2/23 07:26, Tom Lendacky wrote:
> >> So this is a change in this version. If tdx_accept_memory() fails,
> >> you'll report unknown platform. Wouldn't it be better to have an error
> >> message that indicates a failure in the accept path?
> >>
> >
> > Maybe you can keep it similar to the v12 version with just a new error
> > message, something like:
> >
> > ????if (early_is_tdx_guest()) {
> > ??????? if (!tdx_accept_memory(start, end))
> > ??????????? error("TDX error accepting memory\n");
> > ????} else {
> > ??????? error("Cannot accept memory: unknown platform\n");
> > ????}
>
> In the end, these errors aren't plumbed out to the page allocator. They
> *need* to succeed or we are dead anyway. Should we just send a fatal
> error up to the TDX module when we fail to accept memory? It's
> _slightly_ less opaque than plowing into an unaccepted page.

This is decompressor's error()s which are fatal.

arch_accept_memory() in the main kernel uses panic() in the same spot.

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-05 22:19:45

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv13 5/9] efi: Add unaccepted memory support

On Mon, Jun 05, 2023 at 09:12:25PM +0200, Borislav Petkov wrote:
> On Mon, Jun 05, 2023 at 08:33:03PM +0300, Kirill A. Shutemov wrote:
> > There's nothing to warn about. The range (or part of it) is not
> > represented in the bitmap because it is not unaccepted.
>
> Sorry but how am I supposed to know that?!
>
> I've read the whole patchset up until now and all text talks like *all*
> *memory* needs to be accepted and before that has happeend, it is
> unaccepted.
>
> So how about you explain that explicitly somewhere, perhaps in a comment
> above accept_memory(), that the unaccepted range is not the whole memory
> but only, well, what is unaccepted and the rest is implicitly accepted?

Okay, will do.

> And I went and looked at the final result - we error() if we fail
> accepting.
>
> I guess that's the only action we can do anyway...

Right, there's no recovery from the error.

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-06 12:41:50

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv13 5/9] efi: Add unaccepted memory support

On Mon, Jun 05, 2023 at 09:12:25PM +0200, Borislav Petkov wrote:
> So how about you explain that explicitly somewhere, perhaps in a comment
> above accept_memory(), that the unaccepted range is not the whole memory
> but only, well, what is unaccepted and the rest is implicitly accepted?

Does it look okay to you?

/*
* accept_memory() -- Consult bitmap and accept the memory if needed.
*
* Only memory that explicitly marked as unaccepted in the bitmap requires
* an action.
*
* No need to accept:
* - anything if the system has no unaccepted table;
* - memory that is below phys_base;
* - memory that is above the memory that addressable by the bitmap;
*/

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-06 12:42:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv13 5/9] efi: Add unaccepted memory support

On Tue, Jun 06, 2023 at 03:19:24PM +0300, Kirill A. Shutemov wrote:
> Does it look okay to you?
>
> /*
> * accept_memory() -- Consult bitmap and accept the memory if needed.
> *
> * Only memory that explicitly marked as unaccepted in the bitmap requires

... that is ...

> * an action.

And let's add an additional sentence stating it all clearly:

"All the remaining memory is implicitly accepted and doesn't need acceptance."

> *
> * No need to accept:
> * - anything if the system has no unaccepted table;
> * - memory that is below phys_base;
> * - memory that is above the memory that addressable by the bitmap;

And this is an additional clarification.

Good, thanks.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette