2022-04-25 06:03:45

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 00/12] mm, x86/cc: Implement support for unaccepted memory

UEFI Specification version 2.9 introduces the concept of memory
acceptance: some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, requiring memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific for the Virtual
Machine platform.

Accepting memory is costly and it makes VMM allocate memory for the
accepted guest physical address range. It's better to postpone memory
acceptance until memory is needed. It lowers boot time and reduces
memory overhead.

The kernel needs to know what memory has been accepted. Firmware
communicates this information via memory map: a new memory type --
EFI_UNACCEPTED_MEMORY -- indicates such memory.

Range-based tracking works fine for firmware, but it gets bulky for
the kernel: e820 has to be modified on every page acceptance. It leads
to table fragmentation, but there's a limited number of entries in the
e820 table

Another option is to mark such memory as usable in e820 and track if the
range has been accepted in a bitmap. One bit in the bitmap represents
2MiB in the address space: one 4k page is enough to track 64GiB or
physical address space.

In the worst-case scenario -- a huge hole in the middle of the
address space -- It needs 256MiB to handle 4PiB of the address
space.

Any unaccepted memory that is not aligned to 2M gets accepted upfront.

The approach lowers boot time substantially. Boot to shell is ~2.5x
faster for 4G TDX VM and ~4x faster for 64G.

TDX-specific code isolated from the core of unaccepted memory support. It
supposed to help to plug-in different implementation of unaccepted memory
such as SEV-SNP.

The tree can be found here:

https://github.com/intel/tdx.git guest-unaccepted-memory

v5:
- Updates comments and commit messages;
+ Explain options for unaccepted memory handling;
- Expose amount of unaccepted memory in /proc/meminfo
- Adjust check in page_expected_state();
- Fix error code handling in allocate_e820();
- Centralize __pa()/__va() definitions in the boot stub;
- Avoid includes from the main kernel in the boot stub;
- Use an existing hole in boot_param for unaccepted_memory, instead of adding
to the end of the structure;
- Extract allocate_unaccepted_memory() form allocate_e820();
- Complain if there's unaccepted memory, but kernel does not support it;
- Fix vmstat counter;
- Split up few preparatory patches;
- Random readability adjustments;
v4:
- PageBuddyUnaccepted() -> PageUnaccepted;
- Use separate page_type, not shared with offline;
- Rework interface between core-mm and arch code;
- Adjust commit messages;
- Ack from Mike;

Kirill A. Shutemov (12):
x86/boot/: Centralize __pa()/__va() definitions
mm: Add support for unaccepted memory
efi/x86: Get full memory map in allocate_e820()
x86/boot: Add infrastructure required for unaccepted memory support
efi/x86: Implement support for unaccepted memory
x86/boot/compressed: Handle unaccepted memory
x86/mm: Reserve unaccepted memory bitmap
x86/mm: Provide helpers for unaccepted memory
x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in
boot stub
x86/tdx: Unaccepted memory support
mm/vmstat: Add counter for memory accepting
x86/mm: Report unaccepted memory in /proc/meminfo

Documentation/x86/zero-page.rst | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/bitops.h | 40 ++++++++
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/align.h | 14 +++
arch/x86/boot/compressed/bitmap.c | 43 +++++++++
arch/x86/boot/compressed/bitmap.h | 49 ++++++++++
arch/x86/boot/compressed/bits.h | 36 ++++++++
arch/x86/boot/compressed/compiler.h | 9 ++
arch/x86/boot/compressed/find.c | 54 +++++++++++
arch/x86/boot/compressed/find.h | 80 ++++++++++++++++
arch/x86/boot/compressed/ident_map_64.c | 8 --
arch/x86/boot/compressed/kaslr.c | 14 ++-
arch/x86/boot/compressed/math.h | 37 ++++++++
arch/x86/boot/compressed/mem.c | 111 +++++++++++++++++++++++
arch/x86/boot/compressed/minmax.h | 61 +++++++++++++
arch/x86/boot/compressed/misc.c | 9 ++
arch/x86/boot/compressed/misc.h | 9 ++
arch/x86/boot/compressed/sev.c | 2 -
arch/x86/boot/compressed/tdx.c | 85 +++++++++++++++++
arch/x86/coco/tdx/tdx.c | 57 +++++-------
arch/x86/include/asm/page.h | 3 +
arch/x86/include/asm/set_memory.h | 2 +
arch/x86/include/asm/shared/tdx.h | 47 ++++++++++
arch/x86/include/asm/tdx.h | 19 ----
arch/x86/include/asm/unaccepted_memory.h | 25 +++++
arch/x86/include/uapi/asm/bootparam.h | 2 +-
arch/x86/kernel/e820.c | 10 ++
arch/x86/mm/Makefile | 2 +
arch/x86/mm/init.c | 8 ++
arch/x86/mm/pat/set_memory.c | 2 +-
arch/x86/mm/unaccepted_memory.c | 98 ++++++++++++++++++++
drivers/firmware/efi/Kconfig | 15 +++
drivers/firmware/efi/efi.c | 1 +
drivers/firmware/efi/libstub/x86-stub.c | 97 +++++++++++++++++---
include/linux/efi.h | 3 +-
include/linux/page-flags.h | 31 +++++++
include/linux/vm_event_item.h | 3 +
mm/internal.h | 11 +++
mm/memblock.c | 9 ++
mm/page_alloc.c | 77 +++++++++++++++-
mm/vmstat.c | 3 +
42 files changed, 1103 insertions(+), 86 deletions(-)
create mode 100644 arch/x86/boot/compressed/align.h
create mode 100644 arch/x86/boot/compressed/bitmap.c
create mode 100644 arch/x86/boot/compressed/bitmap.h
create mode 100644 arch/x86/boot/compressed/bits.h
create mode 100644 arch/x86/boot/compressed/compiler.h
create mode 100644 arch/x86/boot/compressed/find.c
create mode 100644 arch/x86/boot/compressed/find.h
create mode 100644 arch/x86/boot/compressed/math.h
create mode 100644 arch/x86/boot/compressed/mem.c
create mode 100644 arch/x86/boot/compressed/minmax.h
create mode 100644 arch/x86/include/asm/unaccepted_memory.h
create mode 100644 arch/x86/mm/unaccepted_memory.c

--
2.35.1


2022-04-25 06:11:53

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

The firmware will pre-accept the memory used to run the stub. But, the
stub is responsible for accepting the memory into which it decompresses
the main kernel. Accept memory just before decompression starts.

The stub is also responsible for choosing a physical address in which to
place the decompressed kernel image. The KASLR mechanism will randomize
this physical address. Since the unaccepted memory region is relatively
small, KASLR would be quite ineffective if it only used the pre-accepted
area (EFI_CONVENTIONAL_MEMORY). Ensure that KASLR randomizes among the
entire physical address space by also including EFI_UNACCEPTED_MEMOR

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/kaslr.c | 14 ++++++++++++--
arch/x86/boot/compressed/mem.c | 21 +++++++++++++++++++++
arch/x86/boot/compressed/misc.c | 9 +++++++++
arch/x86/include/asm/unaccepted_memory.h | 2 ++
5 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 7f672f7e2fea..b59007e57cbf 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,7 +102,7 @@ endif

vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
-vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/mem.o
+vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/find.o $(obj)/mem.o

vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 411b268bc0a2..59db90626042 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -725,10 +725,20 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
* but in practice there's firmware where using that memory leads
* to crashes.
*
- * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
+ * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
+ * supported) are guaranteed to be free.
*/
- if (md->type != EFI_CONVENTIONAL_MEMORY)
+
+ switch (md->type) {
+ case EFI_CONVENTIONAL_MEMORY:
+ break;
+ case EFI_UNACCEPTED_MEMORY:
+ if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
+ break;
continue;
+ default:
+ continue;
+ }

if (efi_soft_reserve_enabled() &&
(md->attribute & EFI_MEMORY_SP))
diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
index 415df0d3bc81..b5058c975d26 100644
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -3,12 +3,15 @@
#include "../cpuflags.h"
#include "bitmap.h"
#include "error.h"
+#include "find.h"
#include "math.h"

#define PMD_SHIFT 21
#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT)
#define PMD_MASK (~(PMD_SIZE - 1))

+extern struct boot_params *boot_params;
+
static inline void __accept_memory(phys_addr_t start, phys_addr_t end)
{
/* Platform-specific memory-acceptance call goes here */
@@ -66,3 +69,21 @@ void process_unaccepted_memory(struct boot_params *params, u64 start, u64 end)
bitmap_set((unsigned long *)params->unaccepted_memory,
start / PMD_SIZE, (end - start) / PMD_SIZE);
}
+
+void accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ unsigned long range_start, range_end;
+ unsigned long *unaccepted_memory;
+ unsigned long bitmap_size;
+
+ unaccepted_memory = (unsigned long *)boot_params->unaccepted_memory;
+ range_start = start / PMD_SIZE;
+ bitmap_size = DIV_ROUND_UP(end, PMD_SIZE);
+
+ for_each_set_bitrange_from(range_start, range_end,
+ unaccepted_memory, bitmap_size) {
+ __accept_memory(range_start * PMD_SIZE, range_end * PMD_SIZE);
+ bitmap_clear(unaccepted_memory,
+ range_start, range_end - range_start);
+ }
+}
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index fa8969fad011..285b37e28074 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -18,6 +18,7 @@
#include "../string.h"
#include "../voffset.h"
#include <asm/bootparam_utils.h>
+#include <asm/unaccepted_memory.h>

/*
* WARNING!!
@@ -451,6 +452,14 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
#endif

debug_putstr("\nDecompressing Linux... ");
+
+#ifdef CONFIG_UNACCEPTED_MEMORY
+ if (boot_params->unaccepted_memory) {
+ debug_putstr("Accepting memory... ");
+ accept_memory(__pa(output), __pa(output) + needed_size);
+ }
+#endif
+
__decompress(input_data, input_len, NULL, NULL, output, output_len,
NULL, error);
parse_elf(output);
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
index df0736d32858..41fbfc798100 100644
--- a/arch/x86/include/asm/unaccepted_memory.h
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -7,4 +7,6 @@ struct boot_params;

void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num);

+void accept_memory(phys_addr_t start, phys_addr_t end);
+
#endif
--
2.35.1

2022-04-25 06:21:41

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 07/12] x86/mm: Reserve unaccepted memory bitmap

A given page of memory can only be accepted once. The kernel has a need
to accept memory both in the early decompression stage and during normal
runtime.

A bitmap used to communicate the acceptance state of each page between the
decompression stage and normal runtime.

boot_params is used to communicate location of the bitmap through out
the boot. The bitmap is allocated and initially populated in EFI stub.
Decompression stage accepts pages required for kernel/initrd and mark
these pages accordingly in the bitmap. The main kernel picks up the
bitmap from the same boot_params and uses it to determinate what has to
be accepted on allocation.

In the runtime kernel, reserve the bitmap's memory to ensure nothing
overwrites it.

The size of bitmap is determinated with e820__end_of_ram_pfn() which
relies on setup_e820() marking unaccepted memory as E820_TYPE_RAM.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
---
arch/x86/kernel/e820.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index f267205f2d5a..22d1fe48dcba 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1316,6 +1316,16 @@ void __init e820__memblock_setup(void)
int i;
u64 end;

+ /* Mark unaccepted memory bitmap reserved */
+ if (boot_params.unaccepted_memory) {
+ unsigned long size;
+
+ /* One bit per 2MB */
+ size = DIV_ROUND_UP(e820__end_of_ram_pfn() * PAGE_SIZE,
+ PMD_SIZE * BITS_PER_BYTE);
+ memblock_reserve(boot_params.unaccepted_memory, size);
+ }
+
/*
* The bootstrap memblock region count maximum is 128 entries
* (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
--
2.35.1

2022-04-25 06:52:12

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 08/12] x86/mm: Provide helpers for unaccepted memory

Core-mm requires few helpers to support unaccepted memory:

- accept_memory() checks the range of addresses against the bitmap and
accept memory if needed.

- memory_is_unaccepted() check if anything within the range requires
acceptance.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/page.h | 3 ++
arch/x86/include/asm/unaccepted_memory.h | 4 ++
arch/x86/mm/Makefile | 2 +
arch/x86/mm/unaccepted_memory.c | 56 ++++++++++++++++++++++++
4 files changed, 65 insertions(+)
create mode 100644 arch/x86/mm/unaccepted_memory.c

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 9cc82f305f4b..df4ec3a988dc 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -19,6 +19,9 @@
struct page;

#include <linux/range.h>
+
+#include <asm/unaccepted_memory.h>
+
extern struct range pfn_mapped[];
extern int nr_pfn_mapped;

diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
index 41fbfc798100..a59264ee0ab3 100644
--- a/arch/x86/include/asm/unaccepted_memory.h
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -7,6 +7,10 @@ struct boot_params;

void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num);

+#ifdef CONFIG_UNACCEPTED_MEMORY
+
void accept_memory(phys_addr_t start, phys_addr_t end);
+bool memory_is_unaccepted(phys_addr_t start, phys_addr_t end);

#endif
+#endif
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index fe3d3061fc11..e327f83e6bbf 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -60,3 +60,5 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o

obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o
+
+obj-$(CONFIG_UNACCEPTED_MEMORY) += unaccepted_memory.o
diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c
new file mode 100644
index 000000000000..1327f64d5205
--- /dev/null
+++ b/arch/x86/mm/unaccepted_memory.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/memblock.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/spinlock.h>
+
+#include <asm/io.h>
+#include <asm/setup.h>
+#include <asm/unaccepted_memory.h>
+
+/* Protects unaccepted memory bitmap */
+static DEFINE_SPINLOCK(unaccepted_memory_lock);
+
+void accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ unsigned long *unaccepted_memory;
+ unsigned long flags;
+ unsigned long range_start, range_end;
+
+ if (!boot_params.unaccepted_memory)
+ return;
+
+ unaccepted_memory = __va(boot_params.unaccepted_memory);
+ range_start = start / PMD_SIZE;
+
+ spin_lock_irqsave(&unaccepted_memory_lock, flags);
+ for_each_set_bitrange_from(range_start, range_end, unaccepted_memory,
+ DIV_ROUND_UP(end, PMD_SIZE)) {
+ unsigned long len = range_end - range_start;
+
+ /* Platform-specific memory-acceptance call goes here */
+ panic("Cannot accept memory");
+ bitmap_clear(unaccepted_memory, range_start, len);
+ }
+ spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+}
+
+bool memory_is_unaccepted(phys_addr_t start, phys_addr_t end)
+{
+ unsigned long *unaccepted_memory = __va(boot_params.unaccepted_memory);
+ unsigned long flags;
+ bool ret = false;
+
+ spin_lock_irqsave(&unaccepted_memory_lock, flags);
+ while (start < end) {
+ if (test_bit(start / PMD_SIZE, unaccepted_memory)) {
+ ret = true;
+ break;
+ }
+
+ start += PMD_SIZE;
+ }
+ spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+
+ return ret;
+}
--
2.35.1

2022-04-25 08:07:14

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 05/12] efi/x86: Implement support for unaccepted memory

UEFI Specification version 2.9 introduces the concept of memory
acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, requiring memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific for the Virtual
Machine platform.

Accepting memory is costly and it makes VMM allocate memory for the
accepted guest physical address range. It's better to postpone memory
acceptance until memory is needed. It lowers boot time and reduces
memory overhead.

The kernel needs to know what memory has been accepted. Firmware
communicates this information via memory map: a new memory type --
EFI_UNACCEPTED_MEMORY -- indicates such memory.

Range-based tracking works fine for firmware, but it gets bulky for
the kernel: e820 has to be modified on every page acceptance. It leads
to table fragmentation, but there's a limited number of entries in the
e820 table

Another option is to mark such memory as usable in e820 and track if the
range has been accepted in a bitmap. One bit in the bitmap represents
2MiB in the address space: one 4k page is enough to track 64GiB or
physical address space.

In the worst-case scenario -- a huge hole in the middle of the
address space -- It needs 256MiB to handle 4PiB of the address
space.

Any unaccepted memory that is not aligned to 2M gets accepted upfront.

The bitmap is allocated and constructed in the EFI stub and passed down
to the kernel via boot_params. allocate_e820() allocates the bitmap if
unaccepted memory is present, according to the maximum address in the
memory map.

The same boot_params.unaccepted_memory can be used to pass the bitmap
between two kernels on kexec, but the use-case is not yet implemented.
Make KEXEC and UNACCEPTED_MEMORY mutually exclusive for now.

The implementation requires some basic helpers in boot stub. They
provided by linux/ includes in the main kernel image, but is not present
in boot stub. Create copy of required functionality in the boot stub.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
Documentation/x86/zero-page.rst | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/mem.c | 68 +++++++++++++++++++++++
arch/x86/include/asm/unaccepted_memory.h | 10 ++++
arch/x86/include/uapi/asm/bootparam.h | 2 +-
drivers/firmware/efi/Kconfig | 15 ++++++
drivers/firmware/efi/efi.c | 1 +
drivers/firmware/efi/libstub/x86-stub.c | 69 ++++++++++++++++++++++++
include/linux/efi.h | 3 +-
9 files changed, 168 insertions(+), 2 deletions(-)
create mode 100644 arch/x86/boot/compressed/mem.c
create mode 100644 arch/x86/include/asm/unaccepted_memory.h

diff --git a/Documentation/x86/zero-page.rst b/Documentation/x86/zero-page.rst
index f088f5881666..bb8e9cb093cc 100644
--- a/Documentation/x86/zero-page.rst
+++ b/Documentation/x86/zero-page.rst
@@ -19,6 +19,7 @@ Offset/Size Proto Name Meaning
058/008 ALL tboot_addr Physical address of tboot shared page
060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information
(struct ist_info)
+078/008 ALL unaccepted_memory Bitmap of unaccepted memory (1bit == 2M)
080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!!
090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!!
0A0/010 ALL sys_desc_table System description table (struct sys_desc_table),
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 8fd0e6ae2e1f..7f672f7e2fea 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,6 +102,7 @@ endif

vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
+vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/mem.o

vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
new file mode 100644
index 000000000000..415df0d3bc81
--- /dev/null
+++ b/arch/x86/boot/compressed/mem.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "../cpuflags.h"
+#include "bitmap.h"
+#include "error.h"
+#include "math.h"
+
+#define PMD_SHIFT 21
+#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT)
+#define PMD_MASK (~(PMD_SIZE - 1))
+
+static inline void __accept_memory(phys_addr_t start, phys_addr_t end)
+{
+ /* Platform-specific memory-acceptance call goes here */
+ error("Cannot accept memory");
+}
+
+/*
+ * The accepted memory bitmap only works at PMD_SIZE granularity. If a request
+ * comes in to mark memory as unaccepted which is not PMD_SIZE-aligned, simply
+ * accept the memory now since it can not be *marked* as unaccepted.
+ */
+void process_unaccepted_memory(struct boot_params *params, u64 start, u64 end)
+{
+ /*
+ * Accept small regions that might not be able to be represented
+ * in the bitmap. This is a bit imprecise and may accept some
+ * areas that could have been represented in the bitmap instead.
+ *
+ * Consider case like this:
+ *
+ * | 4k | 2044k | 2048k |
+ * ^ 0x0 ^ 2MB ^ 4MB
+ *
+ * all memory in the range is unaccepted, except for the first 4k.
+ * The second 2M can be represented in the bitmap, but kernel accept it
+ * right away. The imprecision makes the code simpler by ensuring that
+ * at least one bit will be set int the bitmap below.
+ */
+ if (end - start < 2 * PMD_SIZE) {
+ __accept_memory(start, end);
+ return;
+ }
+
+ /*
+ * No matter how the start and end are aligned, at least one unaccepted
+ * PMD_SIZE area will remain.
+ */
+
+ /* Immediately accept a <PMD_SIZE piece at the start: */
+ if (start & ~PMD_MASK) {
+ __accept_memory(start, round_up(start, PMD_SIZE));
+ start = round_up(start, PMD_SIZE);
+ }
+
+ /* Immediately accept a <PMD_SIZE piece at the end: */
+ if (end & ~PMD_MASK) {
+ __accept_memory(round_down(end, PMD_SIZE), end);
+ end = round_down(end, PMD_SIZE);
+ }
+
+ /*
+ * 'start' and 'end' are now both PMD-aligned.
+ * Record the range as being unaccepted:
+ */
+ bitmap_set((unsigned long *)params->unaccepted_memory,
+ start / PMD_SIZE, (end - start) / PMD_SIZE);
+}
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
new file mode 100644
index 000000000000..df0736d32858
--- /dev/null
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation */
+#ifndef _ASM_X86_UNACCEPTED_MEMORY_H
+#define _ASM_X86_UNACCEPTED_MEMORY_H
+
+struct boot_params;
+
+void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num);
+
+#endif
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index b25d3f82c2f3..f7a32176f301 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -179,7 +179,7 @@ struct boot_params {
__u64 tboot_addr; /* 0x058 */
struct ist_info ist_info; /* 0x060 */
__u64 acpi_rsdp_addr; /* 0x070 */
- __u8 _pad3[8]; /* 0x078 */
+ __u64 unaccepted_memory; /* 0x078 */
__u8 hd0_info[16]; /* obsolete! */ /* 0x080 */
__u8 hd1_info[16]; /* obsolete! */ /* 0x090 */
struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index 2c3dac5ecb36..e8048586aefa 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -243,6 +243,21 @@ config EFI_DISABLE_PCI_DMA
options "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma"
may be used to override this option.

+config UNACCEPTED_MEMORY
+ bool
+ depends on EFI_STUB
+ depends on !KEXEC_CORE
+ help
+ Some Virtual Machine platforms, such as Intel TDX, require
+ some memory to be "accepted" by the guest before it can be used.
+ This mechanism helps prevent malicious hosts from making changes
+ to guest memory.
+
+ UEFI specification v2.9 introduced EFI_UNACCEPTED_MEMORY memory type.
+
+ This option adds support for unaccepted memory and makes such memory
+ usable by the kernel.
+
endmenu

config EFI_EMBEDDED_FIRMWARE
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 5502e176d51b..2c055afb1b11 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -747,6 +747,7 @@ static __initdata char memory_type_name[][13] = {
"MMIO Port",
"PAL Code",
"Persistent",
+ "Unaccepted",
};

char * __init efi_md_typeattr_format(char *buf, size_t size,
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 5401985901f5..f9b88174209e 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,6 +15,7 @@
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
+#include <asm/unaccepted_memory.h>

#include "efistub.h"

@@ -504,6 +505,17 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
e820_type = E820_TYPE_PMEM;
break;

+ case EFI_UNACCEPTED_MEMORY:
+ if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) {
+ efi_warn_once("The system has unaccepted memory,"
+ " but kernel does not support it\n");
+ efi_warn_once("Consider enabling UNACCEPTED_MEMORY\n");
+ continue;
+ }
+ e820_type = E820_TYPE_RAM;
+ process_unaccepted_memory(params, d->phys_addr,
+ d->phys_addr + PAGE_SIZE * d->num_pages);
+ break;
default:
continue;
}
@@ -568,6 +580,59 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext,
return status;
}

+static efi_status_t allocate_unaccepted_memory(struct boot_params *params,
+ __u32 nr_desc,
+ struct efi_boot_memmap *map)
+{
+ unsigned long *mem = NULL;
+ u64 size, max_addr = 0;
+ efi_status_t status;
+ bool found = false;
+ int i;
+
+ /* Check if there's any unaccepted memory and find the max address */
+ for (i = 0; i < nr_desc; i++) {
+ efi_memory_desc_t *d;
+
+ d = efi_early_memdesc_ptr(*map->map, *map->desc_size, i);
+ if (d->type == EFI_UNACCEPTED_MEMORY)
+ found = true;
+ if (d->phys_addr + d->num_pages * PAGE_SIZE > max_addr)
+ max_addr = d->phys_addr + d->num_pages * PAGE_SIZE;
+ }
+
+ if (!found) {
+ params->unaccepted_memory = 0;
+ return EFI_SUCCESS;
+ }
+
+ /*
+ * If unaccepted memory is present allocate a bitmap to track what
+ * memory has to be accepted before access.
+ *
+ * One bit in the bitmap represents 2MiB in the address space:
+ * A 4k bitmap can track 64GiB of physical address space.
+ *
+ * In the worst case scenario -- a huge hole in the middle of the
+ * address space -- It needs 256MiB to handle 4PiB of the address
+ * space.
+ *
+ * TODO: handle situation if params->unaccepted_memory has already set.
+ * It's required to deal with kexec.
+ *
+ * The bitmap will be populated in setup_e820() according to the memory
+ * map after efi_exit_boot_services().
+ */
+ size = DIV_ROUND_UP(max_addr, PMD_SIZE * BITS_PER_BYTE);
+ status = efi_allocate_pages(size, (unsigned long *)&mem, ULONG_MAX);
+ if (status == EFI_SUCCESS) {
+ memset(mem, 0, size);
+ params->unaccepted_memory = (unsigned long)mem;
+ }
+
+ return status;
+}
+
static efi_status_t allocate_e820(struct boot_params *params,
struct efi_boot_memmap *map,
struct setup_data **e820ext,
@@ -589,6 +654,10 @@ static efi_status_t allocate_e820(struct boot_params *params,
if (status != EFI_SUCCESS)
goto out;
}
+
+ if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
+ status = allocate_unaccepted_memory(params, nr_desc, map);
+
out:
efi_bs_call(free_pool, *map->map);
return status;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index ccd4d3f91c98..b0240fdcaf5b 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -108,7 +108,8 @@ typedef struct {
#define EFI_MEMORY_MAPPED_IO_PORT_SPACE 12
#define EFI_PAL_CODE 13
#define EFI_PERSISTENT_MEMORY 14
-#define EFI_MAX_MEMORY_TYPE 15
+#define EFI_UNACCEPTED_MEMORY 15
+#define EFI_MAX_MEMORY_TYPE 16

/* Attribute values: */
#define EFI_MEMORY_UC ((u64)0x0000000000000001ULL) /* uncached */
--
2.35.1

2022-04-25 08:51:21

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 11/12] mm/vmstat: Add counter for memory accepting

The counter increased every time kernel accepts a memory region.

The counter allows to see if memory acceptation is still ongoing and
contributes to memory allocation latency.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/mm/unaccepted_memory.c | 1 +
include/linux/vm_event_item.h | 3 +++
mm/vmstat.c | 3 +++
3 files changed, 7 insertions(+)

diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c
index de0790af1824..65cd49b93c50 100644
--- a/arch/x86/mm/unaccepted_memory.c
+++ b/arch/x86/mm/unaccepted_memory.c
@@ -38,6 +38,7 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
}

bitmap_clear(unaccepted_memory, range_start, len);
+ count_vm_events(ACCEPT_MEMORY, len * PMD_SIZE / PAGE_SIZE);
}
spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
}
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 16a0a4fd000b..6a468164a2f9 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -136,6 +136,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_X86
DIRECT_MAP_LEVEL2_SPLIT,
DIRECT_MAP_LEVEL3_SPLIT,
+#endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+ ACCEPT_MEMORY,
#endif
NR_VM_EVENT_ITEMS
};
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b75b1a64b54c..4c9197f32406 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1397,6 +1397,9 @@ const char * const vmstat_text[] = {
"direct_map_level2_splits",
"direct_map_level3_splits",
#endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+ "accept_memory",
+#endif
#endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
};
#endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */
--
2.35.1

2022-04-25 10:02:46

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 03/12] efi/x86: Get full memory map in allocate_e820()

Currently allocate_e820() only interested in the size of map and size of
memory descriptor to determine how many e820 entries the kernel needs.

UEFI Specification version 2.9 introduces a new memory type --
unaccepted memory. To track unaccepted memory kernel needs to allocate
a bitmap. The size of the bitmap is dependent on the maximum physical
address present in the system. A full memory map is required to find
the maximum address.

Modify allocate_e820() to get a full memory map.

This is preparation for the next patch that implements handling of
unaccepted memory in EFI stub.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 30 ++++++++++++-------------
1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 01ddd4502e28..5401985901f5 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -569,31 +569,29 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext,
}

static efi_status_t allocate_e820(struct boot_params *params,
+ struct efi_boot_memmap *map,
struct setup_data **e820ext,
u32 *e820ext_size)
{
- unsigned long map_size, desc_size, map_key;
efi_status_t status;
- __u32 nr_desc, desc_version;
+ __u32 nr_desc;

- /* Only need the size of the mem map and size of each mem descriptor */
- map_size = 0;
- status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key,
- &desc_size, &desc_version);
- if (status != EFI_BUFFER_TOO_SMALL)
- return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED;
-
- nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS;
+ status = efi_get_memory_map(map);
+ if (status != EFI_SUCCESS)
+ return status;

- if (nr_desc > ARRAY_SIZE(params->e820_table)) {
- u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table);
+ nr_desc = *map->map_size / *map->desc_size;
+ if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) {
+ u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) +
+ EFI_MMAP_NR_SLACK_SLOTS;

status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
if (status != EFI_SUCCESS)
- return status;
+ goto out;
}
-
- return EFI_SUCCESS;
+out:
+ efi_bs_call(free_pool, *map->map);
+ return status;
}

struct exit_boot_struct {
@@ -642,7 +640,7 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
priv.boot_params = boot_params;
priv.efi = &boot_params->efi_info;

- status = allocate_e820(boot_params, &e820ext, &e820ext_size);
+ status = allocate_e820(boot_params, &map, &e820ext, &e820ext_size);
if (status != EFI_SUCCESS)
return status;

--
2.35.1

2022-04-25 10:50:57

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 09/12] x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub

Memory acceptance requires a hypercall and one or multiple module calls.

Make helpers for the calls available in boot stub. It has to accept
memory where kernel image and initrd are placed.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/coco/tdx/tdx.c | 26 ------------------
arch/x86/include/asm/shared/tdx.h | 45 +++++++++++++++++++++++++++++++
arch/x86/include/asm/tdx.h | 19 -------------
3 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 03deb4d6920d..ddb60a87b426 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -12,14 +12,6 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>

-/* TDX module Call Leaf IDs */
-#define TDX_GET_INFO 1
-#define TDX_GET_VEINFO 3
-#define TDX_ACCEPT_PAGE 6
-
-/* TDX hypercall Leaf IDs */
-#define TDVMCALL_MAP_GPA 0x10001
-
/* MMIO direction */
#define EPT_READ 0
#define EPT_WRITE 1
@@ -34,24 +26,6 @@
#define VE_GET_PORT_NUM(e) ((e) >> 16)
#define VE_IS_IO_STRING(e) ((e) & BIT(4))

-/*
- * Wrapper for standard use of __tdx_hypercall with no output aside from
- * return code.
- */
-static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
-{
- struct tdx_hypercall_args args = {
- .r10 = TDX_HYPERCALL_STANDARD,
- .r11 = fn,
- .r12 = r12,
- .r13 = r13,
- .r14 = r14,
- .r15 = r15,
- };
-
- return __tdx_hypercall(&args, 0);
-}
-
/* Called from __tdx_hypercall() for unrecoverable failure */
void __tdx_hypercall_failed(void)
{
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index e53f26228fbb..956ced04c3be 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -13,6 +13,14 @@
#define TDX_CPUID_LEAF_ID 0x21
#define TDX_IDENT "IntelTDX "

+/* TDX module Call Leaf IDs */
+#define TDX_GET_INFO 1
+#define TDX_GET_VEINFO 3
+#define TDX_ACCEPT_PAGE 6
+
+/* TDX hypercall Leaf IDs */
+#define TDVMCALL_MAP_GPA 0x10001
+
#ifndef __ASSEMBLY__

/*
@@ -33,8 +41,45 @@ struct tdx_hypercall_args {
/* Used to request services from the VMM */
u64 __tdx_hypercall(struct tdx_hypercall_args *args, unsigned long flags);

+/*
+ * Wrapper for standard use of __tdx_hypercall with no output aside from
+ * return code.
+ */
+static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
+{
+ struct tdx_hypercall_args args = {
+ .r10 = TDX_HYPERCALL_STANDARD,
+ .r11 = fn,
+ .r12 = r12,
+ .r13 = r13,
+ .r14 = r14,
+ .r15 = r15,
+ };
+
+ return __tdx_hypercall(&args, 0);
+}
+
+
/* Called from __tdx_hypercall() for unrecoverable failure */
void __tdx_hypercall_failed(void);

+/*
+ * Used in __tdx_module_call() to gather the output registers' values of the
+ * TDCALL instruction when requesting services from the TDX module. This is a
+ * software only structure and not part of the TDX module/VMM ABI
+ */
+struct tdx_module_output {
+ u64 rcx;
+ u64 rdx;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+};
+
+/* Used to communicate with the TDX module */
+u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+ struct tdx_module_output *out);
+
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_SHARED_TDX_H */
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 020c81a7c729..d9106d3e89f8 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -20,21 +20,6 @@

#ifndef __ASSEMBLY__

-/*
- * Used to gather the output registers values of the TDCALL and SEAMCALL
- * instructions when requesting services from the TDX module.
- *
- * This is a software only structure and not part of the TDX module/VMM ABI.
- */
-struct tdx_module_output {
- u64 rcx;
- u64 rdx;
- u64 r8;
- u64 r9;
- u64 r10;
- u64 r11;
-};
-
/*
* Used by the #VE exception handler to gather the #VE exception
* info from the TDX module. This is a software only structure
@@ -55,10 +40,6 @@ struct ve_info {

void __init tdx_early_init(void);

-/* Used to communicate with the TDX module */
-u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
- struct tdx_module_output *out);
-
void tdx_get_ve_info(struct ve_info *ve);

bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);
--
2.35.1

2022-04-25 19:38:14

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 12/12] x86/mm: Report unaccepted memory in /proc/meminfo

Track amount of unaccepted memory and report it in /proc/meminfo.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/set_memory.h | 2 ++
arch/x86/include/asm/unaccepted_memory.h | 9 ++++++
arch/x86/mm/init.c | 8 ++++++
arch/x86/mm/pat/set_memory.c | 2 +-
arch/x86/mm/unaccepted_memory.c | 36 +++++++++++++++++++++++-
5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 78ca53512486..e467f3941d22 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -86,6 +86,8 @@ bool kernel_page_present(struct page *page);

extern int kernel_set_to_readonly;

+void direct_map_meminfo(struct seq_file *m);
+
#ifdef CONFIG_X86_64
/*
* Prevent speculative access to the page by either unmapping
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
index a59264ee0ab3..7c93661152a9 100644
--- a/arch/x86/include/asm/unaccepted_memory.h
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -3,7 +3,10 @@
#ifndef _ASM_X86_UNACCEPTED_MEMORY_H
#define _ASM_X86_UNACCEPTED_MEMORY_H

+#include <linux/types.h>
+
struct boot_params;
+struct seq_file;

void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num);

@@ -12,5 +15,11 @@ void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num);
void accept_memory(phys_addr_t start, phys_addr_t end);
bool memory_is_unaccepted(phys_addr_t start, phys_addr_t end);

+void unaccepted_meminfo(struct seq_file *m);
+
+#else
+
+static inline void unaccepted_meminfo(struct seq_file *m) {}
+
#endif
#endif
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d8cfce221275..7e92a9d93994 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1065,3 +1065,11 @@ unsigned long max_swapfile_size(void)
return pages;
}
#endif
+
+#ifdef CONFIG_PROC_FS
+void arch_report_meminfo(struct seq_file *m)
+{
+ direct_map_meminfo(m);
+ unaccepted_meminfo(m);
+}
+#endif
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index abf5ed76e4b7..2880ba01451c 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -105,7 +105,7 @@ static void split_page_count(int level)
direct_pages_count[level - 1] += PTRS_PER_PTE;
}

-void arch_report_meminfo(struct seq_file *m)
+void direct_map_meminfo(struct seq_file *m)
{
seq_printf(m, "DirectMap4k: %8lu kB\n",
direct_pages_count[PG_LEVEL_4K] << 2);
diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c
index 65cd49b93c50..66a6c529bf31 100644
--- a/arch/x86/mm/unaccepted_memory.c
+++ b/arch/x86/mm/unaccepted_memory.c
@@ -3,14 +3,17 @@
#include <linux/mm.h>
#include <linux/pfn.h>
#include <linux/spinlock.h>
+#include <linux/seq_file.h>

+#include <asm/e820/api.h>
#include <asm/io.h>
#include <asm/setup.h>
#include <asm/shared/tdx.h>
#include <asm/unaccepted_memory.h>

-/* Protects unaccepted memory bitmap */
+/* Protects unaccepted memory bitmap and nr_unaccepted */
static DEFINE_SPINLOCK(unaccepted_memory_lock);
+static unsigned long nr_unaccepted;

void accept_memory(phys_addr_t start, phys_addr_t end)
{
@@ -39,6 +42,12 @@ void accept_memory(phys_addr_t start, phys_addr_t end)

bitmap_clear(unaccepted_memory, range_start, len);
count_vm_events(ACCEPT_MEMORY, len * PMD_SIZE / PAGE_SIZE);
+
+ /* In early boot nr_unaccepted is not yet initialized */
+ if (nr_unaccepted) {
+ WARN_ON(nr_unaccepted < len);
+ nr_unaccepted -= len;
+ }
}
spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
}
@@ -62,3 +71,28 @@ bool memory_is_unaccepted(phys_addr_t start, phys_addr_t end)

return ret;
}
+
+void unaccepted_meminfo(struct seq_file *m)
+{
+ seq_printf(m, "UnacceptedMem: %8lu kB\n",
+ (READ_ONCE(nr_unaccepted) * PMD_SIZE) >> 10);
+}
+
+static int __init unaccepted_meminfo_init(void)
+{
+ unsigned long *unaccepted_memory;
+ unsigned long flags, bitmap_size;
+
+ if (!boot_params.unaccepted_memory)
+ return 0;
+
+ bitmap_size = e820__end_of_ram_pfn() * PAGE_SIZE / PMD_SIZE;
+ unaccepted_memory = __va(boot_params.unaccepted_memory);
+
+ spin_lock_irqsave(&unaccepted_memory_lock, flags);
+ nr_unaccepted = bitmap_weight(unaccepted_memory, bitmap_size);
+ spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+
+ return 0;
+}
+fs_initcall(unaccepted_meminfo_init);
--
2.35.1

2022-04-25 22:22:25

by Kirill A. Shutemov

[permalink] [raw]
Subject: [PATCHv5 01/12] x86/boot/: Centralize __pa()/__va() definitions

Replace multiple __pa()/__va() definitions with a single one in misc.h.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 8 --------
arch/x86/boot/compressed/misc.h | 9 +++++++++
arch/x86/boot/compressed/sev.c | 2 --
3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index f7213d0943b8..fe523ee1a19f 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -8,14 +8,6 @@
* Copyright (C) 2016 Kees Cook
*/

-/*
- * Since we're dealing with identity mappings, physical and virtual
- * addresses are the same, so override these defines which are ultimately
- * used by the headers in misc.h.
- */
-#define __pa(x) ((unsigned long)(x))
-#define __va(x) ((void *)((unsigned long)(x)))
-
/* No PAGE_TABLE_ISOLATION support needed either: */
#undef CONFIG_PAGE_TABLE_ISOLATION

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index ea71cf3d64e1..9f7154a30d37 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -19,6 +19,15 @@
/* cpu_feature_enabled() cannot be used this early */
#define USE_EARLY_PGTABLE_L5

+/*
+ * Boot stub deals with identity mappings, physical and virtual addresses are
+ * the same, so override these defines.
+ *
+ * <asm/page.h> will not define them if they are already defined.
+ */
+#define __pa(x) ((unsigned long)(x))
+#define __va(x) ((void *)((unsigned long)(x)))
+
#include <linux/linkage.h>
#include <linux/screen_info.h>
#include <linux/elf.h>
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 28bcf04c022e..4dcea0bc4fe4 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -106,9 +106,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
}

#undef __init
-#undef __pa
#define __init
-#define __pa(x) ((unsigned long)(x))

#define __BOOT_COMPRESSED

--
2.35.1

2022-04-27 10:50:06

by Michael Roth

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Mon, Apr 25, 2022 at 06:39:28AM +0300, Kirill A. Shutemov wrote:
> The firmware will pre-accept the memory used to run the stub. But, the
> stub is responsible for accepting the memory into which it decompresses
> the main kernel. Accept memory just before decompression starts.
>
> The stub is also responsible for choosing a physical address in which to
> place the decompressed kernel image. The KASLR mechanism will randomize
> this physical address. Since the unaccepted memory region is relatively
> small, KASLR would be quite ineffective if it only used the pre-accepted
> area (EFI_CONVENTIONAL_MEMORY). Ensure that KASLR randomizes among the
> entire physical address space by also including EFI_UNACCEPTED_MEMOR
>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> ---
> arch/x86/boot/compressed/Makefile | 2 +-
> arch/x86/boot/compressed/kaslr.c | 14 ++++++++++++--
> arch/x86/boot/compressed/mem.c | 21 +++++++++++++++++++++
> arch/x86/boot/compressed/misc.c | 9 +++++++++
> arch/x86/include/asm/unaccepted_memory.h | 2 ++
> 5 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 7f672f7e2fea..b59007e57cbf 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -102,7 +102,7 @@ endif
>
> vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
> vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
> -vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/mem.o
> +vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/find.o $(obj)/mem.o

Since it's possible to have CONFIG_UNACCEPTED_MEMORY=y while
CONFIG_INTEL_TDX_GUEST=n (e.g. for SNP-only guest kernels), this can
result in mem.o reporting linker errors due to tdx_accept_memory() not
being defined. I think it needs a stub for !CONFIG_INTEL_TDX_GUEST, or
something along that line.

>
> vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
> efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 411b268bc0a2..59db90626042 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -725,10 +725,20 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> * but in practice there's firmware where using that memory leads
> * to crashes.
> *
> - * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
> + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
> + * supported) are guaranteed to be free.
> */
> - if (md->type != EFI_CONVENTIONAL_MEMORY)
> +
> + switch (md->type) {
> + case EFI_CONVENTIONAL_MEMORY:
> + break;
> + case EFI_UNACCEPTED_MEMORY:

Just FYI, but with latest tip boot/compressed now relies on a separate header
in arch/x86/boot/compressed/efi.h where this need to be defined again.

2022-04-27 14:46:29

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Tue, Apr 26, 2022 at 07:17:56PM -0500, Michael Roth wrote:
> On Mon, Apr 25, 2022 at 06:39:28AM +0300, Kirill A. Shutemov wrote:
> > The firmware will pre-accept the memory used to run the stub. But, the
> > stub is responsible for accepting the memory into which it decompresses
> > the main kernel. Accept memory just before decompression starts.
> >
> > The stub is also responsible for choosing a physical address in which to
> > place the decompressed kernel image. The KASLR mechanism will randomize
> > this physical address. Since the unaccepted memory region is relatively
> > small, KASLR would be quite ineffective if it only used the pre-accepted
> > area (EFI_CONVENTIONAL_MEMORY). Ensure that KASLR randomizes among the
> > entire physical address space by also including EFI_UNACCEPTED_MEMOR
> >
> > Signed-off-by: Kirill A. Shutemov <[email protected]>
> > ---
> > arch/x86/boot/compressed/Makefile | 2 +-
> > arch/x86/boot/compressed/kaslr.c | 14 ++++++++++++--
> > arch/x86/boot/compressed/mem.c | 21 +++++++++++++++++++++
> > arch/x86/boot/compressed/misc.c | 9 +++++++++
> > arch/x86/include/asm/unaccepted_memory.h | 2 ++
> > 5 files changed, 45 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index 7f672f7e2fea..b59007e57cbf 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -102,7 +102,7 @@ endif
> >
> > vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
> > vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
> > -vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/mem.o
> > +vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/find.o $(obj)/mem.o
>
> Since it's possible to have CONFIG_UNACCEPTED_MEMORY=y while
> CONFIG_INTEL_TDX_GUEST=n (e.g. for SNP-only guest kernels), this can
> result in mem.o reporting linker errors due to tdx_accept_memory() not
> being defined. I think it needs a stub for !CONFIG_INTEL_TDX_GUEST, or
> something along that line.

Fair enough. This would do the trick:

diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
index 539fff27de49..4a49a2438180 100644
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -19,6 +19,9 @@ static bool is_tdx_guest(void)
static bool once;
static bool is_tdx;

+ if (!IS_ENABLED(CONFIG_INTEL_TDX_GUEST))
+ return false;
+
if (!once) {
u32 eax, sig[3];

> > vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
> > efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
> > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> > index 411b268bc0a2..59db90626042 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -725,10 +725,20 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> > * but in practice there's firmware where using that memory leads
> > * to crashes.
> > *
> > - * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
> > + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
> > + * supported) are guaranteed to be free.
> > */
> > - if (md->type != EFI_CONVENTIONAL_MEMORY)
> > +
> > + switch (md->type) {
> > + case EFI_CONVENTIONAL_MEMORY:
> > + break;
> > + case EFI_UNACCEPTED_MEMORY:
>
> Just FYI, but with latest tip boot/compressed now relies on a separate header
> in arch/x86/boot/compressed/efi.h where this need to be defined again.

Right.

Borislav, how do you want to handle this? Do you want me to rebase the
tree to a specific branch?

--
Kirill A. Shutemov

2022-04-27 23:37:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 03/12] efi/x86: Get full memory map in allocate_e820()

On Mon, Apr 25, 2022 at 06:39:25AM +0300, Kirill A. Shutemov wrote:
> static efi_status_t allocate_e820(struct boot_params *params,
> + struct efi_boot_memmap *map,
> struct setup_data **e820ext,
> u32 *e820ext_size)
> {
> - unsigned long map_size, desc_size, map_key;
> efi_status_t status;
> - __u32 nr_desc, desc_version;
> + __u32 nr_desc;
>
> - /* Only need the size of the mem map and size of each mem descriptor */
> - map_size = 0;
> - status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key,
> - &desc_size, &desc_version);
> - if (status != EFI_BUFFER_TOO_SMALL)
> - return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED;
> -
> - nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS;
> + status = efi_get_memory_map(map);
> + if (status != EFI_SUCCESS)
> + return status;
>
> - if (nr_desc > ARRAY_SIZE(params->e820_table)) {
> - u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table);
> + nr_desc = *map->map_size / *map->desc_size;
> + if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) {
> + u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) +
> + EFI_MMAP_NR_SLACK_SLOTS;
>
> status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
> if (status != EFI_SUCCESS)
> - return status;
> + goto out;

Still silly, that label is useless then. Pasting the whole, simplified
function below.

It looks like it all boils down to propagating the retval up the chain...

static efi_status_t allocate_e820(struct boot_params *params,
struct efi_boot_memmap *map,
struct setup_data **e820ext,
u32 *e820ext_size)
{
efi_status_t status;
__u32 nr_desc;

status = efi_get_memory_map(map);
if (status != EFI_SUCCESS)
return status;

nr_desc = *map->map_size / *map->desc_size;
if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) {
u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) + EFI_MMAP_NR_SLACK_SLOTS;

status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
}

efi_bs_call(free_pool, *map->map);
return status;
}

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-04-28 06:43:22

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv5 03/12] efi/x86: Get full memory map in allocate_e820()

On Wed, Apr 27, 2022 at 10:25:11PM +0200, Borislav Petkov wrote:
> On Mon, Apr 25, 2022 at 06:39:25AM +0300, Kirill A. Shutemov wrote:
> > static efi_status_t allocate_e820(struct boot_params *params,
> > + struct efi_boot_memmap *map,
> > struct setup_data **e820ext,
> > u32 *e820ext_size)
> > {
> > - unsigned long map_size, desc_size, map_key;
> > efi_status_t status;
> > - __u32 nr_desc, desc_version;
> > + __u32 nr_desc;
> >
> > - /* Only need the size of the mem map and size of each mem descriptor */
> > - map_size = 0;
> > - status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key,
> > - &desc_size, &desc_version);
> > - if (status != EFI_BUFFER_TOO_SMALL)
> > - return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED;
> > -
> > - nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS;
> > + status = efi_get_memory_map(map);
> > + if (status != EFI_SUCCESS)
> > + return status;
> >
> > - if (nr_desc > ARRAY_SIZE(params->e820_table)) {
> > - u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table);
> > + nr_desc = *map->map_size / *map->desc_size;
> > + if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) {
> > + u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) +
> > + EFI_MMAP_NR_SLACK_SLOTS;
> >
> > status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
> > if (status != EFI_SUCCESS)
> > - return status;
> > + goto out;
>
> Still silly, that label is useless then. Pasting the whole, simplified
> function below.
>
> It looks like it all boils down to propagating the retval up the chain...

Right. That's true. But having goto here makes patch 5/12 a bit cleaner.
I will move the goto there. Is that what you want to see?

--
Kirill A. Shutemov

2022-04-28 14:57:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 03/12] efi/x86: Get full memory map in allocate_e820()

On Thu, Apr 28, 2022 at 02:48:53AM +0300, Kirill A. Shutemov wrote:
> Right. That's true. But having goto here makes patch 5/12 a bit cleaner.

Ok, let's take our time machine and go into the future:

This patch is in git, there's no concept of "next patch" anymore - and
someone is staring at it for whatever reason.

Someone is wondering: why the hell was this done this way? And which
is that "next patch"? Someone probably needs to sort them in the
application order to figure out which next patch the author is talking
about...

See what I mean?

Also, if this hunk

+
+ if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
+ status = allocate_unaccepted_memory(params, nr_desc, map);
+

is what this is all about, then no, this confusion is not even worth it
- please make sure your patches make sense on their own.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-04-29 13:24:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 05/12] efi/x86: Implement support for unaccepted memory

On Mon, Apr 25, 2022 at 06:39:27AM +0300, Kirill A. Shutemov wrote:
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 5401985901f5..f9b88174209e 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -15,6 +15,7 @@
> #include <asm/setup.h>
> #include <asm/desc.h>
> #include <asm/boot.h>
> +#include <asm/unaccepted_memory.h>
>
> #include "efistub.h"
>
> @@ -504,6 +505,17 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
> e820_type = E820_TYPE_PMEM;
> break;
>
> + case EFI_UNACCEPTED_MEMORY:
> + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) {
> + efi_warn_once("The system has unaccepted memory,"
> + " but kernel does not support it\n");
> + efi_warn_once("Consider enabling UNACCEPTED_MEMORY\n");

CONFIG_UNACCEPTED_MEMORY

> + continue;
> + }
> + e820_type = E820_TYPE_RAM;
> + process_unaccepted_memory(params, d->phys_addr,
> + d->phys_addr + PAGE_SIZE * d->num_pages);

Align arguments on the opening brace.

> + break;
> default:
> continue;
> }
> @@ -568,6 +580,59 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext,
> return status;
> }
>
> +static efi_status_t allocate_unaccepted_memory(struct boot_params *params,
> + __u32 nr_desc,
> + struct efi_boot_memmap *map)
> +{
> + unsigned long *mem = NULL;
> + u64 size, max_addr = 0;
> + efi_status_t status;
> + bool found = false;
> + int i;
> +
> + /* Check if there's any unaccepted memory and find the max address */
> + for (i = 0; i < nr_desc; i++) {
> + efi_memory_desc_t *d;
> +
> + d = efi_early_memdesc_ptr(*map->map, *map->desc_size, i);
> + if (d->type == EFI_UNACCEPTED_MEMORY)
> + found = true;
> + if (d->phys_addr + d->num_pages * PAGE_SIZE > max_addr)
> + max_addr = d->phys_addr + d->num_pages * PAGE_SIZE;
> + }
> +
> + if (!found) {
> + params->unaccepted_memory = 0;
> + return EFI_SUCCESS;
> + }
> +
> + /*
> + * If unaccepted memory is present allocate a bitmap to track what
> + * memory has to be accepted before access.
> + *
> + * One bit in the bitmap represents 2MiB in the address space:
> + * A 4k bitmap can track 64GiB of physical address space.
> + *
> + * In the worst case scenario -- a huge hole in the middle of the
> + * address space -- It needs 256MiB to handle 4PiB of the address
> + * space.
> + *
> + * TODO: handle situation if params->unaccepted_memory has already set.

"... is already set."

And when is that TODO taken care of? Later patch or patchset?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-05-01 21:09:54

by Wander Lairson Costa

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Mon, Apr 25, 2022 at 06:39:28AM +0300, Kirill A. Shutemov wrote:
> The firmware will pre-accept the memory used to run the stub. But, the
> stub is responsible for accepting the memory into which it decompresses
> the main kernel. Accept memory just before decompression starts.
>
> The stub is also responsible for choosing a physical address in which to
> place the decompressed kernel image. The KASLR mechanism will randomize
> this physical address. Since the unaccepted memory region is relatively
> small, KASLR would be quite ineffective if it only used the pre-accepted
> area (EFI_CONVENTIONAL_MEMORY). Ensure that KASLR randomizes among the
> entire physical address space by also including EFI_UNACCEPTED_MEMOR

nit: s/EFI_UNACCEPTED_MEMOR/EFI_UNACCEPTED_MEMORY./

[snip]

2022-05-02 23:52:51

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv5 05/12] efi/x86: Implement support for unaccepted memory

On Fri, Apr 29, 2022 at 12:53:30PM +0200, Borislav Petkov wrote:
> > + /*
> > + * If unaccepted memory is present allocate a bitmap to track what
> > + * memory has to be accepted before access.
> > + *
> > + * One bit in the bitmap represents 2MiB in the address space:
> > + * A 4k bitmap can track 64GiB of physical address space.
> > + *
> > + * In the worst case scenario -- a huge hole in the middle of the
> > + * address space -- It needs 256MiB to handle 4PiB of the address
> > + * space.
> > + *
> > + * TODO: handle situation if params->unaccepted_memory has already set.
>
> "... is already set."
>
> And when is that TODO taken care of? Later patch or patchset?

No, not yet. kexec is not a priority at the moment. It will come later.

--
Kirill A. Shutemov

2022-05-03 00:41:46

by Wander Lairson Costa

[permalink] [raw]
Subject: Re: [PATCHv5 07/12] x86/mm: Reserve unaccepted memory bitmap

On Mon, Apr 25, 2022 at 06:39:29AM +0300, Kirill A. Shutemov wrote:
> A given page of memory can only be accepted once. The kernel has a need
> to accept memory both in the early decompression stage and during normal
> runtime.
>
> A bitmap used to communicate the acceptance state of each page between the

nit: s/bitmap used/bitmap is used/

[snip]

2022-05-03 21:38:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Wed, Apr 27, 2022 at 05:19:14PM +0300, Kirill A. Shutemov wrote:
> Borislav, how do you want to handle this? Do you want me to rebase the
> tree to a specific branch?

All patchsets for tip should usually be based ontop of current
tip/master.

The compressed/efi.h change is ontop of tip:x86/sev so we might have to
do some patch tetris...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-05-04 00:50:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Mon, Apr 25, 2022 at 06:39:28AM +0300, Kirill A. Shutemov wrote:
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 411b268bc0a2..59db90626042 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -725,10 +725,20 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> * but in practice there's firmware where using that memory leads
> * to crashes.
> *
> - * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
> + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
> + * supported) are guaranteed to be free.
> */
> - if (md->type != EFI_CONVENTIONAL_MEMORY)
> +
> + switch (md->type) {
> + case EFI_CONVENTIONAL_MEMORY:
> + break;
> + case EFI_UNACCEPTED_MEMORY:
> + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
> + break;
> continue;
> + default:
> + continue;
> + }

Is there any special reason for this to be a switch-case or can it
simply be a compound conditional if (bla...) ?

> @@ -66,3 +69,21 @@ void process_unaccepted_memory(struct boot_params *params, u64 start, u64 end)
> bitmap_set((unsigned long *)params->unaccepted_memory,
> start / PMD_SIZE, (end - start) / PMD_SIZE);
> }
> +
> +void accept_memory(phys_addr_t start, phys_addr_t end)
> +{
> + unsigned long range_start, range_end;
> + unsigned long *unaccepted_memory;

Please shorten that name so that you don't have to break the lines below.

> + unsigned long bitmap_size;
> +
> + unaccepted_memory = (unsigned long *)boot_params->unaccepted_memory;
> + range_start = start / PMD_SIZE;
> + bitmap_size = DIV_ROUND_UP(end, PMD_SIZE);
> +
> + for_each_set_bitrange_from(range_start, range_end,
> + unaccepted_memory, bitmap_size) {
> + __accept_memory(range_start * PMD_SIZE, range_end * PMD_SIZE);
> + bitmap_clear(unaccepted_memory,
> + range_start, range_end - range_start);
> + }
> +}
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index fa8969fad011..285b37e28074 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -18,6 +18,7 @@
> #include "../string.h"
> #include "../voffset.h"
> #include <asm/bootparam_utils.h>
> +#include <asm/unaccepted_memory.h>
>
> /*
> * WARNING!!
> @@ -451,6 +452,14 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> #endif
>
> debug_putstr("\nDecompressing Linux... ");
> +
> +#ifdef CONFIG_UNACCEPTED_MEMORY

IS_ENABLED() perhaps?

> + if (boot_params->unaccepted_memory) {

Also, that ->unaccepted_memory will be set only when
ACONFIG_UNACCEPTED_MEMORY is set, FAICT.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-05-04 16:55:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 07/12] x86/mm: Reserve unaccepted memory bitmap

On Mon, Apr 25, 2022 at 06:39:29AM +0300, Kirill A. Shutemov wrote:
> A given page of memory can only be accepted once. The kernel has a need

s/has a need to/has to/

> to accept memory both in the early decompression stage and during normal
> runtime.
>
> A bitmap used to communicate the acceptance state of each page between the
^
is

> decompression stage and normal runtime.
>
> boot_params is used to communicate location of the bitmap through out

throughout

> the boot. The bitmap is allocated and initially populated in EFI stub.
> Decompression stage accepts pages required for kernel/initrd and mark

marks

> these pages accordingly in the bitmap. The main kernel picks up the
> bitmap from the same boot_params and uses it to determinate what has to

determine

> be accepted on allocation.
>
> In the runtime kernel, reserve the bitmap's memory to ensure nothing
> overwrites it.
>
> The size of bitmap is determinated with e820__end_of_ram_pfn() which

Unknown word [determinated] in commit message.
Suggestions: ['determinate', 'determined', 'terminated', 'determinant']

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-05-05 13:25:22

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 08/12] x86/mm: Provide helpers for unaccepted memory

On Mon, Apr 25, 2022 at 06:39:30AM +0300, Kirill A. Shutemov wrote:
> +/* Protects unaccepted memory bitmap */
> +static DEFINE_SPINLOCK(unaccepted_memory_lock);
> +
> +void accept_memory(phys_addr_t start, phys_addr_t end)
> +{
> + unsigned long *unaccepted_memory;

shorten that name.

> + unsigned long flags;
> + unsigned long range_start, range_end;

The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::

struct long_struct_name *descriptive_name;
unsigned long foo, bar;
unsigned int tmp;
int ret;

The above is faster to parse than the reverse ordering::

int ret;
unsigned int tmp;
unsigned long foo, bar;
struct long_struct_name *descriptive_name;

And even more so than random ordering::

unsigned long foo, bar;
int ret;
struct long_struct_name *descriptive_name;
unsigned int tmp;

> +
> + if (!boot_params.unaccepted_memory)
> + return;
> +
> + unaccepted_memory = __va(boot_params.unaccepted_memory);
> + range_start = start / PMD_SIZE;
> +
> + spin_lock_irqsave(&unaccepted_memory_lock, flags);
> + for_each_set_bitrange_from(range_start, range_end, unaccepted_memory,
> + DIV_ROUND_UP(end, PMD_SIZE)) {
> + unsigned long len = range_end - range_start;
> +
> + /* Platform-specific memory-acceptance call goes here */
> + panic("Cannot accept memory");

Yeah, no, WARN_ON_ONCE() pls.

> + bitmap_clear(unaccepted_memory, range_start, len);
> + }
> + spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
> +}
> +
> +bool memory_is_unaccepted(phys_addr_t start, phys_addr_t end)
> +{
> + unsigned long *unaccepted_memory = __va(boot_params.unaccepted_memory);

As above, shorten that one.

> + unsigned long flags;
> + bool ret = false;
> +
> + spin_lock_irqsave(&unaccepted_memory_lock, flags);
> + while (start < end) {
> + if (test_bit(start / PMD_SIZE, unaccepted_memory)) {
> + ret = true;

Wait, what?

That thing is lying: it'll return true for *some* PMD which is accepted
but not the whole range of [start, end].

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-05-08 14:01:13

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Tue, May 03, 2022 at 04:12:55PM +0200, Borislav Petkov wrote:
> On Mon, Apr 25, 2022 at 06:39:28AM +0300, Kirill A. Shutemov wrote:
> > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> > index 411b268bc0a2..59db90626042 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -725,10 +725,20 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> > * but in practice there's firmware where using that memory leads
> > * to crashes.
> > *
> > - * Only EFI_CONVENTIONAL_MEMORY is guaranteed to be free.
> > + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
> > + * supported) are guaranteed to be free.
> > */
> > - if (md->type != EFI_CONVENTIONAL_MEMORY)
> > +
> > + switch (md->type) {
> > + case EFI_CONVENTIONAL_MEMORY:
> > + break;
> > + case EFI_UNACCEPTED_MEMORY:
> > + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
> > + break;
> > continue;
> > + default:
> > + continue;
> > + }
>
> Is there any special reason for this to be a switch-case or can it
> simply be a compound conditional if (bla...) ?

The equivalent 'if' statement is something like:

if (md->type != EFI_CONVENTIONAL_MEMORY &&
!(md->type == EFI_UNACCEPTED_MEMORY && IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)))
continue;

I find it harder to follow.

Do you want me to change to the 'if' anyway?

--
Kirill A. Shutemov

2022-05-09 03:59:07

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv5 08/12] x86/mm: Provide helpers for unaccepted memory

On Wed, May 04, 2022 at 01:12:06PM +0200, Borislav Petkov wrote:
> On Mon, Apr 25, 2022 at 06:39:30AM +0300, Kirill A. Shutemov wrote:
> > + unaccepted_memory = __va(boot_params.unaccepted_memory);
> > + range_start = start / PMD_SIZE;
> > +
> > + spin_lock_irqsave(&unaccepted_memory_lock, flags);
> > + for_each_set_bitrange_from(range_start, range_end, unaccepted_memory,
> > + DIV_ROUND_UP(end, PMD_SIZE)) {
> > + unsigned long len = range_end - range_start;
> > +
> > + /* Platform-specific memory-acceptance call goes here */
> > + panic("Cannot accept memory");
>
> Yeah, no, WARN_ON_ONCE() pls.

Failure to accept the memory is fatal. Why pretend it is not?

For TDX it will result in a crash on the first access. Prolonging the
suffering just make it harder to understand what happened.

> > + unsigned long flags;
> > + bool ret = false;
> > +
> > + spin_lock_irqsave(&unaccepted_memory_lock, flags);
> > + while (start < end) {
> > + if (test_bit(start / PMD_SIZE, unaccepted_memory)) {
> > + ret = true;
>
> Wait, what?
>
> That thing is lying: it'll return true for *some* PMD which is accepted
> but not the whole range of [start, end].

That's true. Note also that the check is inherently racy. Other CPU can
get the range or subrange accepted just after spin_unlock().

The check indicates that accept_memory() has to be called on the range
before first access.

Do you have problem with a name? Maybe has_unaccepted_memory()?

--
Kirill A. Shutemov

2022-05-10 20:18:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Fri, May 06, 2022 at 06:30:13PM +0300, Kirill A. Shutemov wrote:
> I find it harder to follow.

If in doubt, always consider using a helper function:

---

diff --git a/arch/x86/boot/compressed/efi.h b/arch/x86/boot/compressed/efi.h
index 7db2f41b54cd..cf475243b6d5 100644
--- a/arch/x86/boot/compressed/efi.h
+++ b/arch/x86/boot/compressed/efi.h
@@ -32,6 +32,7 @@ typedef struct {
} efi_table_hdr_t;

#define EFI_CONVENTIONAL_MEMORY 7
+#define EFI_UNACCEPTED_MEMORY 15

#define EFI_MEMORY_MORE_RELIABLE \
((u64)0x0000000000010000ULL) /* higher reliability */
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 28b91df9d31e..39bb4c319dfc 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -671,6 +671,23 @@ static bool process_mem_region(struct mem_vector *region,
}

#ifdef CONFIG_EFI
+
+/*
+ * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if supported) are guaranteed
+ * to be free.
+ */
+static inline bool memory_type_is_free(efi_memory_desc_t *md)
+{
+ if (md->type == EFI_CONVENTIONAL_MEMORY)
+ return true;
+
+ if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
+ if (md->type == EFI_UNACCEPTED_MEMORY)
+ return true;
+
+ return false;
+}
+
/*
* Returns true if we processed the EFI memmap, which we prefer over the E820
* table if it is available.
@@ -723,21 +740,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
* free memory and thus available to place the kernel image into,
* but in practice there's firmware where using that memory leads
* to crashes.
- *
- * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
- * supported) are guaranteed to be free.
*/
-
- switch (md->type) {
- case EFI_CONVENTIONAL_MEMORY:
- break;
- case EFI_UNACCEPTED_MEMORY:
- if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
- break;
+ if (!memory_type_is_free(md))
continue;
- default:
- continue;
- }

if (efi_soft_reserve_enabled() &&
(md->attribute & EFI_MEMORY_SP))
--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-05-10 21:47:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 08/12] x86/mm: Provide helpers for unaccepted memory

On Fri, May 06, 2022 at 07:13:59PM +0300, Kirill A. Shutemov wrote:
> Failure to accept the memory is fatal. Why pretend it is not?
>
> For TDX it will result in a crash on the first access. Prolonging the
> suffering just make it harder to understand what happened.

Ok then. Does that panic message contain enough info so that the
acceptance failure can be debugged?

Just "Cannot accept memory" doesn't seem very helpful to me...

> That's true. Note also that the check is inherently racy. Other CPU can
> get the range or subrange accepted just after spin_unlock().
>
> The check indicates that accept_memory() has to be called on the range
> before first access.
>
> Do you have problem with a name? Maybe has_unaccepted_memory()?

I have a problem with the definition of this function, what it is
supposed to do and how it is supposed to be used.

Right now, it looks weird and strange: is it supposed to check for *all*
in-between (start, end)? It doesn't, atm, so what's the meaning of
@start and @end then at all?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-05-11 11:56:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 08/12] x86/mm: Provide helpers for unaccepted memory

On Wed, May 11, 2022 at 04:15:35AM +0300, Kirill A. Shutemov wrote:
> Okay. Fair enough. I will change it to
>
> panic("Cannot accept memory: unknown platform.");

So I haven't went all the way in the patchset but what I see is:

/* Platform-specific memory-acceptance call goes here */
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
tdx_accept_memory(range_start * PMD_SIZE,
range_end * PMD_SIZE);
} else {
panic("Cannot accept memory");
}

so how would you decide for some other platform that it should panic?

TDX should panic, that I get. But you can just as well WARN_ONCE() here
so that it gets fixed. Panicking is counterproductive.

> It checks if the range of memory requires accept_memory() call before it
> can be accessed.
>
> If any part of the range is not accepted, the call is required.
> accept_memory() knows what exactly has to be done. Note that
> accept_memory() call is harmless for any valid memory range.
> It can be called on already accepted memory.

Aaah, so that's what I was missing. So this function definitely needs a
comment ontop of it. And a name change to something like

range_contains_unaccepted_memory()

or so to actually state what it does.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-05-11 13:50:43

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv5 08/12] x86/mm: Provide helpers for unaccepted memory

On Tue, May 10, 2022 at 08:32:23PM +0200, Borislav Petkov wrote:
> On Fri, May 06, 2022 at 07:13:59PM +0300, Kirill A. Shutemov wrote:
> > Failure to accept the memory is fatal. Why pretend it is not?
> >
> > For TDX it will result in a crash on the first access. Prolonging the
> > suffering just make it harder to understand what happened.
>
> Ok then. Does that panic message contain enough info so that the
> acceptance failure can be debugged?
>
> Just "Cannot accept memory" doesn't seem very helpful to me...

Okay. Fair enough. I will change it to

panic("Cannot accept memory: unknown platform.");

>
> > That's true. Note also that the check is inherently racy. Other CPU can
> > get the range or subrange accepted just after spin_unlock().
> >
> > The check indicates that accept_memory() has to be called on the range
> > before first access.
> >
> > Do you have problem with a name? Maybe has_unaccepted_memory()?
>
> I have a problem with the definition of this function, what it is
> supposed to do and how it is supposed to be used.
>
> Right now, it looks weird and strange: is it supposed to check for *all*
> in-between (start, end)? It doesn't, atm, so what's the meaning of
> @start and @end then at all?

It checks if the range of memory requires accept_memory() call before it
can be accessed.

If any part of the range is not accepted, the call is required.
accept_memory() knows what exactly has to be done. Note that
accept_memory() call is harmless for any valid memory range.
It can be called on already accepted memory.

--
Kirill A. Shutemov

2022-05-13 06:18:37

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

Kirill, I've been tracking these changes to see if we can handle the
unaccepted memory type for SEV-SNP, but testing has been an issue. The
proposed patch in Ovmf to introduce unaccepted memory seems to have
stalled out last September
(https://www.mail-archive.com/[email protected]/msg35842.html) and
is particularly difficult to adapt to SEV-SNP since it doesn't follow
the TDVF way of initializing all memory. Is there a different
development I might have missed so that we might test these cases?
Without the UEFI introducing EFI_UNACCEPTED_MEMORY type, any kernel
uses are essentially dead code.

Thanks,
-Dionna

(apologies for repost in text mode)

On Tue, May 10, 2022 at 4:04 AM Borislav Petkov <[email protected]> wrote:
>
> On Fri, May 06, 2022 at 06:30:13PM +0300, Kirill A. Shutemov wrote:
> > I find it harder to follow.
>
> If in doubt, always consider using a helper function:
>
> ---
>
> diff --git a/arch/x86/boot/compressed/efi.h b/arch/x86/boot/compressed/efi.h
> index 7db2f41b54cd..cf475243b6d5 100644
> --- a/arch/x86/boot/compressed/efi.h
> +++ b/arch/x86/boot/compressed/efi.h
> @@ -32,6 +32,7 @@ typedef struct {
> } efi_table_hdr_t;
>
> #define EFI_CONVENTIONAL_MEMORY 7
> +#define EFI_UNACCEPTED_MEMORY 15
>
> #define EFI_MEMORY_MORE_RELIABLE \
> ((u64)0x0000000000010000ULL) /* higher reliability */
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 28b91df9d31e..39bb4c319dfc 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -671,6 +671,23 @@ static bool process_mem_region(struct mem_vector *region,
> }
>
> #ifdef CONFIG_EFI
> +
> +/*
> + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if supported) are guaranteed
> + * to be free.
> + */
> +static inline bool memory_type_is_free(efi_memory_desc_t *md)
> +{
> + if (md->type == EFI_CONVENTIONAL_MEMORY)
> + return true;
> +
> + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
> + if (md->type == EFI_UNACCEPTED_MEMORY)
> + return true;
> +
> + return false;
> +}
> +
> /*
> * Returns true if we processed the EFI memmap, which we prefer over the E820
> * table if it is available.
> @@ -723,21 +740,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> * free memory and thus available to place the kernel image into,
> * but in practice there's firmware where using that memory leads
> * to crashes.
> - *
> - * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
> - * supported) are guaranteed to be free.
> */
> -
> - switch (md->type) {
> - case EFI_CONVENTIONAL_MEMORY:
> - break;
> - case EFI_UNACCEPTED_MEMORY:
> - if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
> - break;
> + if (!memory_type_is_free(md))
> continue;
> - default:
> - continue;
> - }
>
> if (efi_soft_reserve_enabled() &&
> (md->attribute & EFI_MEMORY_SP))
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette



--
-Dionna Glaze, PhD (she/her)

2022-05-14 01:25:00

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Fri, May 13, 2022 at 11:01:43AM +0200, Borislav Petkov wrote:
> + mroth
> - brijesh
>
> On Thu, May 12, 2022 at 10:34:02PM -0700, Dionna Amalie Glaze wrote:
> > Kirill, I've been tracking these changes to see if we can handle the
> > unaccepted memory type for SEV-SNP, but testing has been an issue. The
> > proposed patch in Ovmf to introduce unaccepted memory seems to have
> > stalled out last September
> > (https://www.mail-archive.com/[email protected]/msg35842.html) and
> > is particularly difficult to adapt to SEV-SNP since it doesn't follow
> > the TDVF way of initializing all memory. Is there a different
> > development I might have missed so that we might test these cases?
> > Without the UEFI introducing EFI_UNACCEPTED_MEMORY type, any kernel
> > uses are essentially dead code.

+ Min, Jiaqi.

I don't follow firmware development. Min, Jiaqi, could you comment?

--
Kirill A. Shutemov

2022-05-14 04:03:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

+ mroth
- brijesh

On Thu, May 12, 2022 at 10:34:02PM -0700, Dionna Amalie Glaze wrote:
> Kirill, I've been tracking these changes to see if we can handle the
> unaccepted memory type for SEV-SNP, but testing has been an issue. The
> proposed patch in Ovmf to introduce unaccepted memory seems to have
> stalled out last September
> (https://www.mail-archive.com/[email protected]/msg35842.html) and
> is particularly difficult to adapt to SEV-SNP since it doesn't follow
> the TDVF way of initializing all memory. Is there a different
> development I might have missed so that we might test these cases?
> Without the UEFI introducing EFI_UNACCEPTED_MEMORY type, any kernel
> uses are essentially dead code.
>
> Thanks,
> -Dionna
>
> (apologies for repost in text mode)
>
> On Tue, May 10, 2022 at 4:04 AM Borislav Petkov <[email protected]> wrote:
> >
> > On Fri, May 06, 2022 at 06:30:13PM +0300, Kirill A. Shutemov wrote:
> > > I find it harder to follow.
> >
> > If in doubt, always consider using a helper function:
> >
> > ---
> >
> > diff --git a/arch/x86/boot/compressed/efi.h b/arch/x86/boot/compressed/efi.h
> > index 7db2f41b54cd..cf475243b6d5 100644
> > --- a/arch/x86/boot/compressed/efi.h
> > +++ b/arch/x86/boot/compressed/efi.h
> > @@ -32,6 +32,7 @@ typedef struct {
> > } efi_table_hdr_t;
> >
> > #define EFI_CONVENTIONAL_MEMORY 7
> > +#define EFI_UNACCEPTED_MEMORY 15
> >
> > #define EFI_MEMORY_MORE_RELIABLE \
> > ((u64)0x0000000000010000ULL) /* higher reliability */
> > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> > index 28b91df9d31e..39bb4c319dfc 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -671,6 +671,23 @@ static bool process_mem_region(struct mem_vector *region,
> > }
> >
> > #ifdef CONFIG_EFI
> > +
> > +/*
> > + * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if supported) are guaranteed
> > + * to be free.
> > + */
> > +static inline bool memory_type_is_free(efi_memory_desc_t *md)
> > +{
> > + if (md->type == EFI_CONVENTIONAL_MEMORY)
> > + return true;
> > +
> > + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
> > + if (md->type == EFI_UNACCEPTED_MEMORY)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > /*
> > * Returns true if we processed the EFI memmap, which we prefer over the E820
> > * table if it is available.
> > @@ -723,21 +740,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> > * free memory and thus available to place the kernel image into,
> > * but in practice there's firmware where using that memory leads
> > * to crashes.
> > - *
> > - * Only EFI_CONVENTIONAL_MEMORY and EFI_UNACCEPTED_MEMORY (if
> > - * supported) are guaranteed to be free.
> > */
> > -
> > - switch (md->type) {
> > - case EFI_CONVENTIONAL_MEMORY:
> > - break;
> > - case EFI_UNACCEPTED_MEMORY:
> > - if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY))
> > - break;
> > + if (!memory_type_is_free(md))
> > continue;
> > - default:
> > - continue;
> > - }
> >
> > if (efi_soft_reserve_enabled() &&
> > (md->attribute & EFI_MEMORY_SP))
> > --
> > Regards/Gruss,
> > Boris.
> >
> > https://people.kernel.org/tglx/notes-about-netiquette
>
>
>
> --
> -Dionna Glaze, PhD (she/her)
>

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Ivo Totev, HRB 36809, AG Nürnberg

2022-05-16 23:22:19

by Xu, Min M

[permalink] [raw]
Subject: RE: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On May 13, 2022 10:45 PM, Kirill A. Shutemov wrote:
> On Fri, May 13, 2022 at 11:01:43AM +0200, Borislav Petkov wrote:
> > + mroth
> > - brijesh
> >
> > On Thu, May 12, 2022 at 10:34:02PM -0700, Dionna Amalie Glaze wrote:
> > > Kirill, I've been tracking these changes to see if we can handle the
> > > unaccepted memory type for SEV-SNP, but testing has been an issue.
> > > The proposed patch in Ovmf to introduce unaccepted memory seems to
> > > have stalled out last September
> > > (https://www.mail-archive.com/[email protected]/msg35842.html)
> > > and is particularly difficult to adapt to SEV-SNP since it doesn't
> > > follow the TDVF way of initializing all memory. Is there a different
> > > development I might have missed so that we might test these cases?
> > > Without the UEFI introducing EFI_UNACCEPTED_MEMORY type, any
> kernel
> > > uses are essentially dead code.
>
> + Min, Jiaqi.
>
> I don't follow firmware development. Min, Jiaqi, could you comment?
>
We have prepared the patch for unaccepted memory and it is now working in our internal release.
But there is an obstacle to upstream it to edk2 master branch.
The patch-set depends on the definition of UEFI_RESOURCE_MEMORY_UNACCEPTED in PI spec. This is proposed in https://github.com/microsoft/mu_basecore/pull/66/files#diff-b20a11152d1ce9249c691be5690b4baf52069efadf2e2546cdd2eb663d80c9e4R237, according to UEFI-Code-First. The proposal was approved in 2021 in UEFI Mantis, and will be added to the new PI.next specification. (Till now it has not been added in the latest PI spec.)
So UEFI_RESOURCE_MEMORY_UNACCEPTED cannot be added in MdePkg which make it difficult to submit the patch to edk2 community for review. See this link: https://edk2.groups.io/g/devel/message/87558

Please be noted: UEFI_RESOURCE_MEMORY_UNACCEPTED (defined in PI spec) is different from EFI_UNACCEPTED_MEMORY (defined in UEFI spec)

I will submit the patch-set once the new definition is added in the new PI.next spec.

Thanks
Min

2022-06-01 17:57:01

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory


> Hi y'all, I've made minimal changes to OVMF to prevalidate only up to
> 4GB and leave the rest unaccepted, as Thomas Lendacky recommended
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAMDESE%2Fovmf%2Fpull%2F4%23issuecomment-1138606275&amp;data=05%7C01%7Cpankaj.gupta%40amd.com%7Cde8fd09ad93f4420bd7408da43568f68%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637896336342540814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=K93%2F1FrPOo4bIWcssHoisM8vDkOBjWh69bUWosT%2Bt0E%3D&amp;reserved=0 and ran
> a memtouch test to see if this change behaves as expected. One thing
> that struck me is that an 8GB machine reports 2044MB free with this
> change (free -k) whereas without it, I see 7089MB free. I think that
> unaccepted memory should be classified as free in meminfo, no? I'm not
> familiar enough with that code to say what specific change needs to be
> made.
>

Is it memory accounting issue when accepting all the memory at boot time
compared to 4GB:4GB preboot_acceptance:use_time_acceptance split?

You said you ran memtouch (don't know how it works, assuming it uses
memory)? Doesn't that mean most of the memory used and hence accepted?
So, free memory reduced?

Just trying to understand the issue.

Thanks,
Pankaj
>
>
> On Sun, May 15, 2022 at 11:47 PM Xu, Min M <[email protected]> wrote:
>>
>> On May 13, 2022 10:45 PM, Kirill A. Shutemov wrote:
>>> On Fri, May 13, 2022 at 11:01:43AM +0200, Borislav Petkov wrote:
>>>> + mroth
>>>> - brijesh
>>>>
>>>> On Thu, May 12, 2022 at 10:34:02PM -0700, Dionna Amalie Glaze wrote:
>>>>> Kirill, I've been tracking these changes to see if we can handle the
>>>>> unaccepted memory type for SEV-SNP, but testing has been an issue.
>>>>> The proposed patch in Ovmf to introduce unaccepted memory seems to
>>>>> have stalled out last September
>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fdevel%40edk2.groups.io%2Fmsg35842.html&amp;data=05%7C01%7Cpankaj.gupta%40amd.com%7Cde8fd09ad93f4420bd7408da43568f68%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637896336342540814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Hku8nQJGOg%2FdQqypHxw2eLFG0e%2FE6HoF5VXSIhMpmx0%3D&amp;reserved=0)
>>>>> and is particularly difficult to adapt to SEV-SNP since it doesn't
>>>>> follow the TDVF way of initializing all memory. Is there a different
>>>>> development I might have missed so that we might test these cases?
>>>>> Without the UEFI introducing EFI_UNACCEPTED_MEMORY type, any
>>> kernel
>>>>> uses are essentially dead code.
>>>
>>> + Min, Jiaqi.
>>>
>>> I don't follow firmware development. Min, Jiaqi, could you comment?
>>>
>> We have prepared the patch for unaccepted memory and it is now working in our internal release.
>> But there is an obstacle to upstream it to edk2 master branch.
>> The patch-set depends on the definition of UEFI_RESOURCE_MEMORY_UNACCEPTED in PI spec. This is proposed in https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fmu_basecore%2Fpull%2F66%2Ffiles%23diff-b20a11152d1ce9249c691be5690b4baf52069efadf2e2546cdd2eb663d80c9e4R237&amp;data=05%7C01%7Cpankaj.gupta%40amd.com%7Cde8fd09ad93f4420bd7408da43568f68%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637896336342540814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=v7s68GZWXJfaXB7vfvXjAlTD2KLOSghk%2Bj3GXF3FTVg%3D&amp;reserved=0, according to UEFI-Code-First. The proposal was approved in 2021 in UEFI Mantis, and will be added to the new PI.next specification. (Till now it has not been added in the latest PI spec.)
>> So UEFI_RESOURCE_MEMORY_UNACCEPTED cannot be added in MdePkg which make it difficult to submit the patch to edk2 community for review. See this link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fedk2.groups.io%2Fg%2Fdevel%2Fmessage%2F87558&amp;data=05%7C01%7Cpankaj.gupta%40amd.com%7Cde8fd09ad93f4420bd7408da43568f68%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637896336342540814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=WVIJ2yRRd2URwIF85Dp0WD4ovibZlsobijIGbN6MWZQ%3D&amp;reserved=0
>>
>> Please be noted: UEFI_RESOURCE_MEMORY_UNACCEPTED (defined in PI spec) is different from EFI_UNACCEPTED_MEMORY (defined in UEFI spec)
>>
>> I will submit the patch-set once the new definition is added in the new PI.next spec.
>>
>> Thanks
>> Min
>
>
>


2022-06-01 19:42:38

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

Hi y'all, I've made minimal changes to OVMF to prevalidate only up to
4GB and leave the rest unaccepted, as Thomas Lendacky recommended
https://github.com/AMDESE/ovmf/pull/4#issuecomment-1138606275 and ran
a memtouch test to see if this change behaves as expected. One thing
that struck me is that an 8GB machine reports 2044MB free with this
change (free -k) whereas without it, I see 7089MB free. I think that
unaccepted memory should be classified as free in meminfo, no? I'm not
familiar enough with that code to say what specific change needs to be
made.

(resent in text mode)


On Sun, May 15, 2022 at 11:47 PM Xu, Min M <[email protected]> wrote:
>
> On May 13, 2022 10:45 PM, Kirill A. Shutemov wrote:
> > On Fri, May 13, 2022 at 11:01:43AM +0200, Borislav Petkov wrote:
> > > + mroth
> > > - brijesh
> > >
> > > On Thu, May 12, 2022 at 10:34:02PM -0700, Dionna Amalie Glaze wrote:
> > > > Kirill, I've been tracking these changes to see if we can handle the
> > > > unaccepted memory type for SEV-SNP, but testing has been an issue.
> > > > The proposed patch in Ovmf to introduce unaccepted memory seems to
> > > > have stalled out last September
> > > > (https://www.mail-archive.com/[email protected]/msg35842.html)
> > > > and is particularly difficult to adapt to SEV-SNP since it doesn't
> > > > follow the TDVF way of initializing all memory. Is there a different
> > > > development I might have missed so that we might test these cases?
> > > > Without the UEFI introducing EFI_UNACCEPTED_MEMORY type, any
> > kernel
> > > > uses are essentially dead code.
> >
> > + Min, Jiaqi.
> >
> > I don't follow firmware development. Min, Jiaqi, could you comment?
> >
> We have prepared the patch for unaccepted memory and it is now working in our internal release.
> But there is an obstacle to upstream it to edk2 master branch.
> The patch-set depends on the definition of UEFI_RESOURCE_MEMORY_UNACCEPTED in PI spec. This is proposed in https://github.com/microsoft/mu_basecore/pull/66/files#diff-b20a11152d1ce9249c691be5690b4baf52069efadf2e2546cdd2eb663d80c9e4R237, according to UEFI-Code-First. The proposal was approved in 2021 in UEFI Mantis, and will be added to the new PI.next specification. (Till now it has not been added in the latest PI spec.)
> So UEFI_RESOURCE_MEMORY_UNACCEPTED cannot be added in MdePkg which make it difficult to submit the patch to edk2 community for review. See this link: https://edk2.groups.io/g/devel/message/87558
>
> Please be noted: UEFI_RESOURCE_MEMORY_UNACCEPTED (defined in PI spec) is different from EFI_UNACCEPTED_MEMORY (defined in UEFI spec)
>
> I will submit the patch-set once the new definition is added in the new PI.next spec.
>
> Thanks
> Min



--
-Dionna Glaze, PhD (she/her)

2022-06-01 20:58:12

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

The memory accounting in Linux is probably the issue. Both times I ran
the test were from a freshly booted VM. The test parses the output of
$(free -k) to determine the amount of free memory it should allocate
and write/read from, with a given stride of pages to skip before
touching the next page.

We grab the third column of numbers from the Mem output that looks like this

total used free shared buff/cache available
Mem: 65856604 4128688 48558952 11208 13168964 60942928
Swap: 1953788 118124 1835664

So my workstation has 48558952 free bytes. We take that, give it to
memtouch to allocate that much anonymous memory rounded down to the
nearest MB with mmap and randomly read/write the buffer.

For an 8GB machine, the UEFI will have the initial 0-0xA000 memory and
0x10_0000 to 0xC00_0000 (beginning of mmio hole) prevalidated. The
next 5GB is classified as the UEFI v2.9 memory type
EFI_RESOURCE_MEMORY_UNACCEPTED, 0x1_4000_000 to 0x2_0000_0000.
The Linux e820 map should see that range as unaccepted rather than
EFI_CONVENTIONAL_MEMORY (i.e., EDK2's EFI_RESOURCE_SYSTEM_MEMORY), but
I think it needs to be accounted as free conventional memory.

So when I see 2044MB free vs 7089MB free in my VMs, the two are
roughly 5GB different.

On Wed, Jun 1, 2022 at 8:49 AM Gupta, Pankaj <[email protected]> wrote:
>
>
> > Hi y'all, I've made minimal changes to OVMF to prevalidate only up to
> > 4GB and leave the rest unaccepted, as Thomas Lendacky recommended
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAMDESE%2Fovmf%2Fpull%2F4%23issuecomment-1138606275&amp;data=05%7C01%7Cpankaj.gupta%40amd.com%7Cde8fd09ad93f4420bd7408da43568f68%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637896336342540814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=K93%2F1FrPOo4bIWcssHoisM8vDkOBjWh69bUWosT%2Bt0E%3D&amp;reserved=0 and ran
> > a memtouch test to see if this change behaves as expected. One thing
> > that struck me is that an 8GB machine reports 2044MB free with this
> > change (free -k) whereas without it, I see 7089MB free. I think that
> > unaccepted memory should be classified as free in meminfo, no? I'm not
> > familiar enough with that code to say what specific change needs to be
> > made.
> >
>
> Is it memory accounting issue when accepting all the memory at boot time
> compared to 4GB:4GB preboot_acceptance:use_time_acceptance split?
>
> You said you ran memtouch (don't know how it works, assuming it uses
> memory)? Doesn't that mean most of the memory used and hence accepted?
> So, free memory reduced?
>
> Just trying to understand the issue.
>
> Thanks,
> Pankaj
> >
> >
> > On Sun, May 15, 2022 at 11:47 PM Xu, Min M <[email protected]> wrote:
> >>
> >> On May 13, 2022 10:45 PM, Kirill A. Shutemov wrote:
> >>> On Fri, May 13, 2022 at 11:01:43AM +0200, Borislav Petkov wrote:
> >>>> + mroth
> >>>> - brijesh
> >>>>
> >>>> On Thu, May 12, 2022 at 10:34:02PM -0700, Dionna Amalie Glaze wrote:
> >>>>> Kirill, I've been tracking these changes to see if we can handle the
> >>>>> unaccepted memory type for SEV-SNP, but testing has been an issue.
> >>>>> The proposed patch in Ovmf to introduce unaccepted memory seems to
> >>>>> have stalled out last September
> >>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fdevel%40edk2.groups.io%2Fmsg35842.html&amp;data=05%7C01%7Cpankaj.gupta%40amd.com%7Cde8fd09ad93f4420bd7408da43568f68%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637896336342540814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Hku8nQJGOg%2FdQqypHxw2eLFG0e%2FE6HoF5VXSIhMpmx0%3D&amp;reserved=0)
> >>>>> and is particularly difficult to adapt to SEV-SNP since it doesn't
> >>>>> follow the TDVF way of initializing all memory. Is there a different
> >>>>> development I might have missed so that we might test these cases?
> >>>>> Without the UEFI introducing EFI_UNACCEPTED_MEMORY type, any
> >>> kernel
> >>>>> uses are essentially dead code.
> >>>
> >>> + Min, Jiaqi.
> >>>
> >>> I don't follow firmware development. Min, Jiaqi, could you comment?
> >>>
> >> We have prepared the patch for unaccepted memory and it is now working in our internal release.
> >> But there is an obstacle to upstream it to edk2 master branch.
> >> The patch-set depends on the definition of UEFI_RESOURCE_MEMORY_UNACCEPTED in PI spec. This is proposed in https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fmu_basecore%2Fpull%2F66%2Ffiles%23diff-b20a11152d1ce9249c691be5690b4baf52069efadf2e2546cdd2eb663d80c9e4R237&amp;data=05%7C01%7Cpankaj.gupta%40amd.com%7Cde8fd09ad93f4420bd7408da43568f68%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637896336342540814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=v7s68GZWXJfaXB7vfvXjAlTD2KLOSghk%2Bj3GXF3FTVg%3D&amp;reserved=0, according to UEFI-Code-First. The proposal was approved in 2021 in UEFI Mantis, and will be added to the new PI.next specification. (Till now it has not been added in the latest PI spec.)
> >> So UEFI_RESOURCE_MEMORY_UNACCEPTED cannot be added in MdePkg which make it difficult to submit the patch to edk2 community for review. See this link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fedk2.groups.io%2Fg%2Fdevel%2Fmessage%2F87558&amp;data=05%7C01%7Cpankaj.gupta%40amd.com%7Cde8fd09ad93f4420bd7408da43568f68%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637896336342540814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=WVIJ2yRRd2URwIF85Dp0WD4ovibZlsobijIGbN6MWZQ%3D&amp;reserved=0
> >>
> >> Please be noted: UEFI_RESOURCE_MEMORY_UNACCEPTED (defined in PI spec) is different from EFI_UNACCEPTED_MEMORY (defined in UEFI spec)
> >>
> >> I will submit the patch-set once the new definition is added in the new PI.next spec.
> >>
> >> Thanks
> >> Min
> >
> >
> >
>


--
-Dionna Glaze, PhD (she/her)

2022-06-01 21:34:28

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

Hi--

On 6/1/22 09:20, Dionna Amalie Glaze wrote:
> The memory accounting in Linux is probably the issue. Both times I ran
> the test were from a freshly booted VM. The test parses the output of
> $(free -k) to determine the amount of free memory it should allocate
> and write/read from, with a given stride of pages to skip before
> touching the next page.
>
> We grab the third column of numbers from the Mem output that looks like this
>
> total used free shared buff/cache available
> Mem: 65856604 4128688 48558952 11208 13168964 60942928
> Swap: 1953788 118124 1835664
>
> So my workstation has 48558952 free bytes. We take that, give it to
> memtouch to allocate that much anonymous memory rounded down to the
> nearest MB with mmap and randomly read/write the buffer.
>
> For an 8GB machine, the UEFI will have the initial 0-0xA000 memory and
> 0x10_0000 to 0xC00_0000 (beginning of mmio hole) prevalidated. The
> next 5GB is classified as the UEFI v2.9 memory type
> EFI_RESOURCE_MEMORY_UNACCEPTED, 0x1_4000_000 to 0x2_0000_0000.
> The Linux e820 map should see that range as unaccepted rather than
> EFI_CONVENTIONAL_MEMORY (i.e., EDK2's EFI_RESOURCE_SYSTEM_MEMORY), but
> I think it needs to be accounted as free conventional memory.
>
> So when I see 2044MB free vs 7089MB free in my VMs, the two are
> roughly 5GB different.

Please see/read/use
https://people.kernel.org/tglx/notes-about-netiquette

--
~Randy

2022-06-01 21:54:38

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory


>> The memory accounting in Linux is probably the issue. Both times I ran
>> the test were from a freshly booted VM. The test parses the output of
>> $(free -k) to determine the amount of free memory it should allocate
>> and write/read from, with a given stride of pages to skip before
>> touching the next page.
>>
>> We grab the third column of numbers from the Mem output that looks like this
>>
>> total used free shared buff/cache available
>> Mem: 65856604 4128688 48558952 11208 13168964 60942928
>> Swap: 1953788 118124 1835664
>>
>> So my workstation has 48558952 free bytes. We take that, give it to
>> memtouch to allocate that much anonymous memory rounded down to the
>> nearest MB with mmap and randomly read/write the buffer.
>>
>> For an 8GB machine, the UEFI will have the initial 0-0xA000 memory and
>> 0x10_0000 to 0xC00_0000 (beginning of mmio hole) prevalidated. The
>> next 5GB is classified as the UEFI v2.9 memory type
>> EFI_RESOURCE_MEMORY_UNACCEPTED, 0x1_4000_000 to 0x2_0000_0000.
>> The Linux e820 map should see that range as unaccepted rather than
>> EFI_CONVENTIONAL_MEMORY (i.e., EDK2's EFI_RESOURCE_SYSTEM_MEMORY), but
>> I think it needs to be accounted as free conventional memory.
>>
>> So when I see 2044MB free vs 7089MB free in my VMs, the two are
>> roughly 5GB different.
>
> Please see/read/use
...

Apologies, some problem in my email client (appending long characters to
links), will find and correct.

Thank you for pointing.

Best regards,
Pankaj



2022-06-06 03:44:16

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory


> The memory accounting in Linux is probably the issue. Both times I ran
> the test were from a freshly booted VM. The test parses the output of
> $(free -k) to determine the amount of free memory it should allocate
> and write/read from, with a given stride of pages to skip before
> touching the next page.
>
> We grab the third column of numbers from the Mem output that looks like this
>
> total used free shared buff/cache available
> Mem: 65856604 4128688 48558952 11208 13168964 60942928
> Swap: 1953788 118124 1835664
>
> So my workstation has 48558952 free bytes. We take that, give it to
> memtouch to allocate that much anonymous memory rounded down to the
> nearest MB with mmap and randomly read/write the buffer.
>
> For an 8GB machine, the UEFI will have the initial 0-0xA000 memory and
> 0x10_0000 to 0xC00_0000 (beginning of mmio hole) prevalidated. The
> next 5GB is classified as the UEFI v2.9 memory type
> EFI_RESOURCE_MEMORY_UNACCEPTED, 0x1_4000_000 to 0x2_0000_0000.
> The Linux e820 map should see that range as unaccepted rather than
> EFI_CONVENTIONAL_MEMORY (i.e., EDK2's EFI_RESOURCE_SYSTEM_MEMORY), but
> I think it needs to be accounted as free conventional memory.

AFAIU the unaccepted memory also stays in buddy (first via slow path)
and should be accounted automatically in free?

>
> So when I see 2044MB free vs 7089MB free in my VMs, the two are
> roughly 5GB different.

Is it possible all memory got allocated with memblock? Maybe some
variable tests to validate with '/proc/meminfo | grep UnacceptedMem'
would give you more clue.

Thanks,
Pankaj

2022-06-06 03:59:28

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

On Thu, Jun 2, 2022 at 5:51 AM Gupta, Pankaj <[email protected]> wrote:
> AFAIU the unaccepted memory also stays in buddy (first via slow path)
> and should be accounted automatically in free?
>

No, the last patch adds unaccepted mem as a differently accounted memory type.

> >
> > So when I see 2044MB free vs 7089MB free in my VMs, the two are
> > roughly 5GB different.
>
> Is it possible all memory got allocated with memblock? Maybe some
> variable tests to validate with '/proc/meminfo | grep UnacceptedMem'
> would give you more clue.
>

free -k parses /proc/meminfo for MemFree and SwapFree in
/proc/meminfo, so it sounds like it should also add in UnacceptedMem.
We'll try that. Thanks.




--
-Dionna Glaze, PhD (she/her)

2022-06-07 20:44:31

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory


> Testing error on my part. Was using an adaptation of an old patch set.
> With Brijesh's SEV-SNP support adapted to this version on top of
> SEV-SNP guest patch series v12, I have no trouble with passing our
> memtouch test.

Ah Guest unaccepted memory validate patch was missing. Good to know it
worked for you. Thank you for sharing the result.

Best regards,
Pankaj


2022-06-08 05:20:13

by Dionna Amalie Glaze

[permalink] [raw]
Subject: Re: [PATCHv5 06/12] x86/boot/compressed: Handle unaccepted memory

> free -k parses /proc/meminfo for MemFree and SwapFree in
> /proc/meminfo, so it sounds like it should also add in UnacceptedMem.
> We'll try that. Thanks.
>

Testing error on my part. Was using an adaptation of an old patch set.
With Brijesh's SEV-SNP support adapted to this version on top of
SEV-SNP guest patch series v12, I have no trouble with passing our
memtouch test.

Reference: https://github.com/deeglaze/amdese-linux/tree/v12unaccepted-v5

--
-Dionna Glaze, PhD (she/her)