2023-06-04 14:59:35

by Huang, Kai

[permalink] [raw]
Subject: [PATCH v11 00/20] TDX host kernel support

Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks. TDX specs are available in [1].

This series is the initial support to enable TDX with minimal code to
allow KVM to create and run TDX guests. KVM support for TDX is being
developed separately[2]. A new "userspace inaccessible memfd" approach
to support TDX private memory is also being developed[3]. The KVM will
only support the new "userspace inaccessible memfd" as TDX guest memory.

This series doesn't aim to support all functionalities, and doesn't aim
to resolve all things perfectly. All other optimizations will be posted
as follow-up once this initial TDX support is upstreamed.

Also, the patch to add the new kernel comline tdx="force" isn't included
in this initial version, as Dave suggested it isn't mandatory. But I
will add one once this initial version gets merged.

(For memory hotplug, sorry for broadcasting widely but I cc'ed the
[email protected] following Kirill's suggestion so MM experts can also
help to provide comments.)

Hi Dave, Kirill, Tony, Peter, Thomas, Dan (and Intel reviewers),

The new relaxed TDX per-cpu initialization flow has been verified. The
TDX module can be initialized when there are offline cpus, and the
TDH.SYS.LP.INIT SEAMCALL can be made successfully later after module
initialization when the offline cpu is up.

This series mainly added code to handle the new TDX "partial write
machine check" erratum (SPR113) in [4].

And I would appreciate reviewed-by or acked-by tags if the patches look
good to you. Thanks in advance!

----- Changelog history: ------

- v10 -> v11:

- Addressed comments in v10
- Added patches to handle TDX "partial write machine check" erratum.
- Added a new patch to handle running out of entropy in common code.
- Fixed a bug in kexec() support.

v10: https://lore.kernel.org/kvm/[email protected]/

- v9 -> v10:

- Changed the per-cpu initalization handling
- Gave up "ensuring all online cpus are TDX-runnable when TDX module
is initialized", but just provide two basic functions, tdx_enable()
and tdx_cpu_enable(), to let the user of TDX to make sure the
tdx_cpu_enable() has been done successfully when the user wants to
use particular cpu for TDX.
- Thus, moved per-cpu initialization out of tdx_enable(). Now
tdx_enable() just assumes VMXON and tdx_cpu_enable() has been done
on all online cpus before calling it.
- Merged the tdx_enable() skeleton patch and per-cpu initialization
patch together to tell better story.
- Moved "SEAMCALL infrastructure" patch before the tdx_enable() patch.

v9: https://lore.kernel.org/lkml/[email protected]/

- v8 -> v9:

- Added patches to handle TDH.SYS.INIT and TDH.SYS.LP.INIT back.
- Other changes please refer to changelog histroy in individual patches.

v8: https://lore.kernel.org/lkml/[email protected]/

- v7 -> v8:

- 200+ LOC removed (from 1800+ -> 1600+).
- Removed patches to do TDH.SYS.INIT and TDH.SYS.LP.INIT
(Dave/Peter/Thomas).
- Removed patch to shut down TDX module (Sean).
- For memory hotplug, changed to reject non-TDX memory from
arch_add_memory() to memory_notifier (Dan/David).
- Simplified the "skeletion patch" as a result of removing
TDH.SYS.LP.INIT patch.
- Refined changelog/comments for most of the patches (to tell better
story, remove silly comments, etc) (Dave).
- Added new 'struct tdmr_info_list' struct, and changed all TDMR related
patches to use it (Dave).
- Effectively merged patch "Reserve TDX module global KeyID" and
"Configure TDX module with TDMRs and global KeyID", and removed the
static variable 'tdx_global_keyid', following Dave's suggestion on
making tdx_sysinfo local variable.
- For detailed changes please see individual patch changelog history.

v7: https://lore.kernel.org/lkml/[email protected]/T/

- v6 -> v7:
- Added memory hotplug support.
- Changed how to choose the list of "TDX-usable" memory regions from at
kernel boot time to TDX module initialization time.
- Addressed comments received in previous versions. (Andi/Dave).
- Improved the commit message and the comments of kexec() support patch,
and the patch handles returnning PAMTs back to the kernel when TDX
module initialization fails. Please also see "kexec()" section below.
- Changed the documentation patch accordingly.
- For all others please see individual patch changelog history.

v6: https://lore.kernel.org/lkml/[email protected]/T/

- v5 -> v6:

- Removed ACPI CPU/memory hotplug patches. (Intel internal discussion)
- Removed patch to disable driver-managed memory hotplug (Intel
internal discussion).
- Added one patch to introduce enum type for TDX supported page size
level to replace the hard-coded values in TDX guest code (Dave).
- Added one patch to make TDX depends on X2APIC being enabled (Dave).
- Added one patch to build all boot-time present memory regions as TDX
memory during kernel boot.
- Added Reviewed-by from others to some patches.
- For all others please see individual patch changelog history.

v5: https://lore.kernel.org/lkml/[email protected]/T/

- v4 -> v5:

This is essentially a resent of v4. Sorry I forgot to consult
get_maintainer.pl when sending out v4, so I forgot to add linux-acpi
and linux-mm mailing list and the relevant people for 4 new patches.

There are also very minor code and commit message update from v4:

- Rebased to latest tip/x86/tdx.
- Fixed a checkpatch issue that I missed in v4.
- Removed an obsoleted comment that I missed in patch 6.
- Very minor update to the commit message of patch 12.

For other changes to individual patches since v3, please refer to the
changelog histroy of individual patches (I just used v3 -> v5 since
there's basically no code change to v4).

v4: https://lore.kernel.org/lkml/98c84c31d8f062a0b50a69ef4d3188bc259f2af2.1654025431.git.kai.huang@intel.com/T/

- v3 -> v4 (addressed Dave's comments, and other comments from others):

- Simplified SEAMRR and TDX keyID detection.
- Added patches to handle ACPI CPU hotplug.
- Added patches to handle ACPI memory hotplug and driver managed memory
hotplug.
- Removed tdx_detect() but only use single tdx_init().
- Removed detecting TDX module via P-SEAMLDR.
- Changed from using e820 to using memblock to convert system RAM to TDX
memory.
- Excluded legacy PMEM from TDX memory.
- Removed the boot-time command line to disable TDX patch.
- Addressed comments for other individual patches (please see individual
patches).
- Improved the documentation patch based on the new implementation.

v3: https://lore.kernel.org/lkml/[email protected]/T/

- V2 -> v3:

- Addressed comments from Isaku.
- Fixed memory leak and unnecessary function argument in the patch to
configure the key for the global keyid (patch 17).
- Enhanced a little bit to the patch to get TDX module and CMR
information (patch 09).
- Fixed an unintended change in the patch to allocate PAMT (patch 13).
- Addressed comments from Kevin:
- Slightly improvement on commit message to patch 03.
- Removed WARN_ON_ONCE() in the check of cpus_booted_once_mask in
seamrr_enabled() (patch 04).
- Changed documentation patch to add TDX host kernel support materials
to Documentation/x86/tdx.rst together with TDX guest staff, instead
of a standalone file (patch 21)
- Very minor improvement in commit messages.

v2: https://lore.kernel.org/lkml/[email protected]/T/

- RFC (v1) -> v2:
- Rebased to Kirill's latest TDX guest code.
- Fixed two issues that are related to finding all RAM memory regions
based on e820.
- Minor improvement on comments and commit messages.

v1: https://lore.kernel.org/lkml/[email protected]/T/

== Background ==

TDX introduces a new CPU mode called Secure Arbitration Mode (SEAM)
and a new isolated range pointed by the SEAM Ranger Register (SEAMRR).
A CPU-attested software module called 'the TDX module' runs in the new
isolated region as a trusted hypervisor to create/run protected VMs.

TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to
provide crypto-protection to the VMs. TDX reserves part of MKTME KeyIDs
as TDX private KeyIDs, which are only accessible within the SEAM mode.

TDX is different from AMD SEV/SEV-ES/SEV-SNP, which uses a dedicated
secure processor to provide crypto-protection. The firmware runs on the
secure processor acts a similar role as the TDX module.

The host kernel communicates with SEAM software via a new SEAMCALL
instruction. This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead.

Before being able to manage TD guests, the TDX module must be loaded
and properly initialized. This series assumes the TDX module is loaded
by BIOS before the kernel boots.

How to initialize the TDX module is described at TDX module 1.0
specification, chapter "13.Intel TDX Module Lifecycle: Enumeration,
Initialization and Shutdown".

== Design Considerations ==

1. Initialize the TDX module at runtime

There are basically two ways the TDX module could be initialized: either
in early boot, or at runtime before the first TDX guest is run. This
series implements the runtime initialization.

Also, TDX requires a per-cpu initialization SEAMCALL to be done before
making any SEAMCALL on that cpu.

This series adds two functions: tdx_cpu_enable() and tdx_enable() to do
per-cpu initialization and module initialization respectively.

2. CPU hotplug

DX doesn't support physical (ACPI) CPU hotplug. A non-buggy BIOS should
never support hotpluggable CPU devicee and/or deliver ACPI CPU hotplug
event to the kernel. This series doesn't handle physical (ACPI) CPU
hotplug at all but depends on the BIOS to behave correctly.

Also, tdx_cpu_enable() will simply return error for any hot-added cpu if
something insane happened.

Note TDX works with CPU logical online/offline, thus this series still
allows to do logical CPU online/offline.

3. Kernel policy on TDX memory

The TDX module reports a list of "Convertible Memory Region" (CMR) to
indicate which memory regions are TDX-capable. The TDX architecture
allows the VMM to designate specific convertible memory regions as usable
for TDX private memory.

The initial support of TDX guests will only allocate TDX private memory
from the global page allocator. This series chooses to designate _all_
system RAM in the core-mm at the time of initializing TDX module as TDX
memory to guarantee all pages in the page allocator are TDX pages.

4. Memory Hotplug

After the kernel passes all "TDX-usable" memory regions to the TDX
module, the set of "TDX-usable" memory regions are fixed during module's
runtime. No more "TDX-usable" memory can be added to the TDX module
after that.

To achieve above "to guarantee all pages in the page allocator are TDX
pages", this series simply choose to reject any non-TDX-usable memory in
memory hotplug.

5. Physical Memory Hotplug

Note TDX assumes convertible memory is always physically present during
machine's runtime. A non-buggy BIOS should never support hot-removal of
any convertible memory. This implementation doesn't handle ACPI memory
removal but depends on the BIOS to behave correctly.

Also, if something insane really happened, 4) makes sure either TDX
cannot be enabled or hot-added memory will be rejected after TDX gets
enabled.

6. Kexec()

Similar to AMD's SME, in kexec() kernel needs to flush dirty cachelines
of TDX private memory otherwise they may silently corrupt the new kernel.

7. TDX erratum

The first few generations of TDX hardware have an erratum. A partial
write to a TDX private memory cacheline will silently "poison" the
line. Subsequent reads will consume the poison and generate a machine
check.

The fast warm reset reboot doesn't reset TDX private memory. With this
erratum, all TDX private pages needs to be converted back to normal
before a fast warm reset reboot or booting to the new kernel in kexec().
Otherwise, the new kernel may get unexpected machine check.

In normal condition, triggering the erratum in Linux requires some kind
of kernel bug involving relatively exotic memory writes to TDX private
memory and will manifest via spurious-looking machine checks when
reading the affected memory. Machine check handler is improved to deal
with such machine check.


[1]: TDX specs
https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html

[2]: KVM TDX basic feature support
https://lore.kernel.org/kvm/[email protected]/T/#t

[3]: KVM: mm: fd-based approach for supporting KVM
https://lore.kernel.org/kvm/[email protected]/

[4]: TDX erratum
https://cdrdv2.intel.com/v1/dl/getContent/772415?explicitVersion=true



Kai Huang (20):
x86/tdx: Define TDX supported page sizes as macros
x86/virt/tdx: Detect TDX during kernel boot
x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC
x86/cpu: Detect TDX partial write machine check erratum
x86/virt/tdx: Add SEAMCALL infrastructure
x86/virt/tdx: Handle SEAMCALL running out of entropy error
x86/virt/tdx: Add skeleton to enable TDX on demand
x86/virt/tdx: Get information about TDX module and TDX-capable memory
x86/virt/tdx: Use all system memory when initializing TDX module as
TDX memory
x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX
memory regions
x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions
x86/virt/tdx: Allocate and set up PAMTs for TDMRs
x86/virt/tdx: Designate reserved areas for all TDMRs
x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID
x86/virt/tdx: Configure global KeyID on all packages
x86/virt/tdx: Initialize all TDMRs
x86/kexec: Flush cache of TDX private memory
x86: Handle TDX erratum to reset TDX private memory during kexec() and
reboot
x86/mce: Improve error log of kernel space TDX #MC due to erratum
Documentation/x86: Add documentation for TDX host support

Documentation/arch/x86/tdx.rst | 186 +++-
arch/x86/Kconfig | 15 +
arch/x86/Makefile | 2 +
arch/x86/coco/tdx/tdx.c | 6 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 3 +
arch/x86/include/asm/tdx.h | 23 +
arch/x86/include/asm/x86_init.h | 1 +
arch/x86/kernel/cpu/intel.c | 21 +
arch/x86/kernel/cpu/mce/core.c | 33 +
arch/x86/kernel/process.c | 7 +-
arch/x86/kernel/reboot.c | 16 +
arch/x86/kernel/setup.c | 2 +
arch/x86/kernel/x86_init.c | 2 +
arch/x86/virt/Makefile | 2 +
arch/x86/virt/vmx/Makefile | 2 +
arch/x86/virt/vmx/tdx/Makefile | 2 +
arch/x86/virt/vmx/tdx/seamcall.S | 52 +
arch/x86/virt/vmx/tdx/tdx.c | 1526 ++++++++++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 161 +++
arch/x86/virt/vmx/tdx/tdxcall.S | 19 +-
21 files changed, 2065 insertions(+), 17 deletions(-)
create mode 100644 arch/x86/virt/Makefile
create mode 100644 arch/x86/virt/vmx/Makefile
create mode 100644 arch/x86/virt/vmx/tdx/Makefile
create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S
create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
create mode 100644 arch/x86/virt/vmx/tdx/tdx.h


base-commit: 122333d6bd229af279cdb35d1b874b71b3b9ccfb
--
2.40.1



2023-06-04 14:59:36

by Huang, Kai

[permalink] [raw]
Subject: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

Start to transit out the "multi-steps" to initialize the TDX module.

TDX provides increased levels of memory confidentiality and integrity.
This requires special hardware support for features like memory
encryption and storage of memory integrity checksums. Not all memory
satisfies these requirements.

As a result, TDX introduced the concept of a "Convertible Memory Region"
(CMR). During boot, the firmware builds a list of all of the memory
ranges which can provide the TDX security guarantees.

CMRs tell the kernel which memory is TDX compatible. The kernel takes
CMRs (plus a little more metadata) and constructs "TD Memory Regions"
(TDMRs). TDMRs let the kernel grant TDX protections to some or all of
the CMR areas.

The TDX module also reports necessary information to let the kernel
build TDMRs and run TDX guests in structure 'tdsysinfo_struct'. The
list of CMRs, along with the TDX module information, is available to
the kernel by querying the TDX module.

As a preparation to construct TDMRs, get the TDX module information and
the list of CMRs. Print out CMRs to help user to decode which memory
regions are TDX convertible.

The 'tdsysinfo_struct' is fairly large (1024 bytes) and contains a lot
of info about the TDX module. Fully define the entire structure, but
only use the fields necessary to build the TDMRs and pr_info() some
basics about the module. The rest of the fields will get used by KVM.

For now both 'tdsysinfo_struct' and CMRs are only used during the module
initialization. But because they are both relatively big, declare them
inside the module initialization function but as static variables.

Signed-off-by: Kai Huang <[email protected]>
Reviewed-by: Isaku Yamahata <[email protected]>
---

v10 -> v11:
- No change.

v9 -> v10:
- Added back "start to transit out..." as now per-cpu init has been
moved out from tdx_enable().

v8 -> v9:
- Removed "start to trransit out ..." part in changelog since this patch
is no longer the first step anymore.
- Changed to declare 'tdsysinfo' and 'cmr_array' as local static, and
changed changelog accordingly (Dave).
- Improved changelog to explain why to declare 'tdsysinfo_struct' in
full but only use a few members of them (Dave).

v7 -> v8: (Dave)
- Improved changelog to tell this is the first patch to transit out the
"multi-steps" init_tdx_module().
- Removed all CMR check/trim code but to depend on later SEAMCALL.
- Variable 'vertical alignment' in print TDX module information.
- Added DECLARE_PADDED_STRUCT() for padded structure.
- Made tdx_sysinfo and tdx_cmr_array[] to be function local variable
(and rename them accordingly), and added -Wframe-larger-than=4096 flag
to silence the build warning.

v6 -> v7:
- Simplified the check of CMRs due to the fact that TDX actually
verifies CMRs (that are passed by the BIOS) before enabling TDX.
- Changed the function name from check_cmrs() -> trim_empty_cmrs().
- Added CMR page aligned check so that later patch can just get the PFN
using ">> PAGE_SHIFT".

v5 -> v6:
- Added to also print TDX module's attribute (Isaku).
- Removed all arguments in tdx_gete_sysinfo() to use static variables
of 'tdx_sysinfo' and 'tdx_cmr_array' directly as they are all used
directly in other functions in later patches.
- Added Isaku's Reviewed-by.

- v3 -> v5 (no feedback on v4):
- Renamed sanitize_cmrs() to check_cmrs().
- Removed unnecessary sanity check against tdx_sysinfo and tdx_cmr_array
actual size returned by TDH.SYS.INFO.
- Changed -EFAULT to -EINVAL in couple places.
- Added comments around tdx_sysinfo and tdx_cmr_array saying they are
used by TDH.SYS.INFO ABI.
- Changed to pass 'tdx_sysinfo' and 'tdx_cmr_array' as function
arguments in tdx_get_sysinfo().
- Changed to only print BIOS-CMR when check_cmrs() fails.


---
arch/x86/virt/vmx/tdx/tdx.c | 67 +++++++++++++++++++++++++++++++++-
arch/x86/virt/vmx/tdx/tdx.h | 72 +++++++++++++++++++++++++++++++++++++
2 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index bcf2b2d15a2e..9fde0f71dd8b 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -20,6 +20,7 @@
#include <asm/msr-index.h>
#include <asm/msr.h>
#include <asm/archrandom.h>
+#include <asm/page.h>
#include <asm/tdx.h>
#include "tdx.h"

@@ -191,12 +192,76 @@ int tdx_cpu_enable(void)
}
EXPORT_SYMBOL_GPL(tdx_cpu_enable);

+static inline bool is_cmr_empty(struct cmr_info *cmr)
+{
+ return !cmr->size;
+}
+
+static void print_cmrs(struct cmr_info *cmr_array, int nr_cmrs)
+{
+ int i;
+
+ for (i = 0; i < nr_cmrs; i++) {
+ struct cmr_info *cmr = &cmr_array[i];
+
+ /*
+ * The array of CMRs reported via TDH.SYS.INFO can
+ * contain tail empty CMRs. Don't print them.
+ */
+ if (is_cmr_empty(cmr))
+ break;
+
+ pr_info("CMR: [0x%llx, 0x%llx)\n", cmr->base,
+ cmr->base + cmr->size);
+ }
+}
+
+/*
+ * Get the TDX module information (TDSYSINFO_STRUCT) and the array of
+ * CMRs, and save them to @sysinfo and @cmr_array. @sysinfo must have
+ * been padded to have enough room to save the TDSYSINFO_STRUCT.
+ */
+static int tdx_get_sysinfo(struct tdsysinfo_struct *sysinfo,
+ struct cmr_info *cmr_array)
+{
+ struct tdx_module_output out;
+ u64 sysinfo_pa, cmr_array_pa;
+ int ret;
+
+ sysinfo_pa = __pa(sysinfo);
+ cmr_array_pa = __pa(cmr_array);
+ ret = seamcall(TDH_SYS_INFO, sysinfo_pa, TDSYSINFO_STRUCT_SIZE,
+ cmr_array_pa, MAX_CMRS, NULL, &out);
+ if (ret)
+ return ret;
+
+ pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
+ sysinfo->attributes, sysinfo->vendor_id,
+ sysinfo->major_version, sysinfo->minor_version,
+ sysinfo->build_date, sysinfo->build_num);
+
+ /* R9 contains the actual entries written to the CMR array. */
+ print_cmrs(cmr_array, out.r9);
+
+ return 0;
+}
+
static int init_tdx_module(void)
{
+ static DECLARE_PADDED_STRUCT(tdsysinfo_struct, tdsysinfo,
+ TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT);
+ static struct cmr_info cmr_array[MAX_CMRS]
+ __aligned(CMR_INFO_ARRAY_ALIGNMENT);
+ struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo);
+ int ret;
+
+ ret = tdx_get_sysinfo(sysinfo, cmr_array);
+ if (ret)
+ return ret;
+
/*
* TODO:
*
- * - Get TDX module information and TDX-capable memory regions.
* - Build the list of TDX-usable memory regions.
* - Construct a list of "TD Memory Regions" (TDMRs) to cover
* all TDX-usable memory regions.
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 9fb46033c852..97f4d7e7f1a4 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -3,6 +3,8 @@
#define _X86_VIRT_TDX_H

#include <linux/types.h>
+#include <linux/stddef.h>
+#include <linux/compiler_attributes.h>

/*
* This file contains both macros and data structures defined by the TDX
@@ -21,6 +23,76 @@
*/
#define TDH_SYS_INIT 33
#define TDH_SYS_LP_INIT 35
+#define TDH_SYS_INFO 32
+
+struct cmr_info {
+ u64 base;
+ u64 size;
+} __packed;
+
+#define MAX_CMRS 32
+#define CMR_INFO_ARRAY_ALIGNMENT 512
+
+struct cpuid_config {
+ u32 leaf;
+ u32 sub_leaf;
+ u32 eax;
+ u32 ebx;
+ u32 ecx;
+ u32 edx;
+} __packed;
+
+#define DECLARE_PADDED_STRUCT(type, name, size, alignment) \
+ struct type##_padded { \
+ union { \
+ struct type name; \
+ u8 padding[size]; \
+ }; \
+ } name##_padded __aligned(alignment)
+
+#define PADDED_STRUCT(name) (name##_padded.name)
+
+#define TDSYSINFO_STRUCT_SIZE 1024
+#define TDSYSINFO_STRUCT_ALIGNMENT 1024
+
+/*
+ * The size of this structure itself is flexible. The actual structure
+ * passed to TDH.SYS.INFO must be padded to TDSYSINFO_STRUCT_SIZE and be
+ * aligned to TDSYSINFO_STRUCT_ALIGNMENT using DECLARE_PADDED_STRUCT().
+ */
+struct tdsysinfo_struct {
+ /* TDX-SEAM Module Info */
+ u32 attributes;
+ u32 vendor_id;
+ u32 build_date;
+ u16 build_num;
+ u16 minor_version;
+ u16 major_version;
+ u8 reserved0[14];
+ /* Memory Info */
+ u16 max_tdmrs;
+ u16 max_reserved_per_tdmr;
+ u16 pamt_entry_size;
+ u8 reserved1[10];
+ /* Control Struct Info */
+ u16 tdcs_base_size;
+ u8 reserved2[2];
+ u16 tdvps_base_size;
+ u8 tdvps_xfam_dependent_size;
+ u8 reserved3[9];
+ /* TD Capabilities */
+ u64 attributes_fixed0;
+ u64 attributes_fixed1;
+ u64 xfam_fixed0;
+ u64 xfam_fixed1;
+ u8 reserved4[32];
+ u32 num_cpuid_config;
+ /*
+ * The actual number of CPUID_CONFIG depends on above
+ * 'num_cpuid_config'.
+ */
+ DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs);
+} __packed;

/*
* Do not put any hardware-defined TDX structure representations below
--
2.40.1


2023-06-04 14:59:43

by Huang, Kai

[permalink] [raw]
Subject: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

Start to transit out the "multi-steps" to construct a list of "TD Memory
Regions" (TDMRs) to cover all TDX-usable memory regions.

The kernel configures TDX-usable memory regions by passing a list of
TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains
the information of the base/size of a memory region, the base/size of the
associated Physical Address Metadata Table (PAMT) and a list of reserved
areas in the region.

Do the first step to fill out a number of TDMRs to cover all TDX memory
regions. To keep it simple, always try to use one TDMR for each memory
region. As the first step only set up the base/size for each TDMR.

Each TDMR must be 1G aligned and the size must be in 1G granularity.
This implies that one TDMR could cover multiple memory regions. If a
memory region spans the 1GB boundary and the former part is already
covered by the previous TDMR, just use a new TDMR for the remaining
part.

TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs
are consumed but there is more memory region to cover.

There are fancier things that could be done like trying to merge
adjacent TDMRs. This would allow more pathological memory layouts to be
supported. But, current systems are not even close to exhausting the
existing TDMR resources in practice. For now, keep it simple.

Signed-off-by: Kai Huang <[email protected]>
---

v10 -> v11:
- No update

v9 -> v10:
- No change.

v8 -> v9:

- Added the last paragraph in the changelog (Dave).
- Removed unnecessary type cast in tdmr_entry() (Dave).


---
arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++-
1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 7a20c72361e7..fa9fa8bc581a 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list)
tdmr_list->max_tdmrs * tdmr_list->tdmr_sz);
}

+/* Get the TDMR from the list at the given index. */
+static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list,
+ int idx)
+{
+ int tdmr_info_offset = tdmr_list->tdmr_sz * idx;
+
+ return (void *)tdmr_list->tdmrs + tdmr_info_offset;
+}
+
+#define TDMR_ALIGNMENT BIT_ULL(30)
+#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT)
+#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT)
+#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT)
+
+static inline u64 tdmr_end(struct tdmr_info *tdmr)
+{
+ return tdmr->base + tdmr->size;
+}
+
+/*
+ * Take the memory referenced in @tmb_list and populate the
+ * preallocated @tdmr_list, following all the special alignment
+ * and size rules for TDMR.
+ */
+static int fill_out_tdmrs(struct list_head *tmb_list,
+ struct tdmr_info_list *tdmr_list)
+{
+ struct tdx_memblock *tmb;
+ int tdmr_idx = 0;
+
+ /*
+ * Loop over TDX memory regions and fill out TDMRs to cover them.
+ * To keep it simple, always try to use one TDMR to cover one
+ * memory region.
+ *
+ * In practice TDX1.0 supports 64 TDMRs, which is big enough to
+ * cover all memory regions in reality if the admin doesn't use
+ * 'memmap' to create a bunch of discrete memory regions. When
+ * there's a real problem, enhancement can be done to merge TDMRs
+ * to reduce the final number of TDMRs.
+ */
+ list_for_each_entry(tmb, tmb_list, list) {
+ struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx);
+ u64 start, end;
+
+ start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn));
+ end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn));
+
+ /*
+ * A valid size indicates the current TDMR has already
+ * been filled out to cover the previous memory region(s).
+ */
+ if (tdmr->size) {
+ /*
+ * Loop to the next if the current memory region
+ * has already been fully covered.
+ */
+ if (end <= tdmr_end(tdmr))
+ continue;
+
+ /* Otherwise, skip the already covered part. */
+ if (start < tdmr_end(tdmr))
+ start = tdmr_end(tdmr);
+
+ /*
+ * Create a new TDMR to cover the current memory
+ * region, or the remaining part of it.
+ */
+ tdmr_idx++;
+ if (tdmr_idx >= tdmr_list->max_tdmrs) {
+ pr_warn("initialization failed: TDMRs exhausted.\n");
+ return -ENOSPC;
+ }
+
+ tdmr = tdmr_entry(tdmr_list, tdmr_idx);
+ }
+
+ tdmr->base = start;
+ tdmr->size = end - start;
+ }
+
+ /* @tdmr_idx is always the index of last valid TDMR. */
+ tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1;
+
+ return 0;
+}
+
/*
* Construct a list of TDMRs on the preallocated space in @tdmr_list
* to cover all TDX memory regions in @tmb_list based on the TDX module
@@ -394,10 +481,15 @@ static int construct_tdmrs(struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list,
struct tdsysinfo_struct *sysinfo)
{
+ int ret;
+
+ ret = fill_out_tdmrs(tmb_list, tdmr_list);
+ if (ret)
+ return ret;
+
/*
* TODO:
*
- * - Fill out TDMRs to cover all TDX memory regions.
* - Allocate and set up PAMTs for each TDMR.
* - Designate reserved areas for each TDMR.
*
--
2.40.1


2023-06-06 01:02:17

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v11 00/20] TDX host kernel support

On Mon, Jun 05, 2023 at 02:27:13AM +1200,
Kai Huang <[email protected]> wrote:

> Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks. TDX specs are available in [1].
>
> This series is the initial support to enable TDX with minimal code to
> allow KVM to create and run TDX guests. KVM support for TDX is being
> developed separately[2]. A new "userspace inaccessible memfd" approach
> to support TDX private memory is also being developed[3]. The KVM will
> only support the new "userspace inaccessible memfd" as TDX guest memory.
>
> This series doesn't aim to support all functionalities, and doesn't aim
> to resolve all things perfectly. All other optimizations will be posted
> as follow-up once this initial TDX support is upstreamed.
>
> Also, the patch to add the new kernel comline tdx="force" isn't included
> in this initial version, as Dave suggested it isn't mandatory. But I
> will add one once this initial version gets merged.
>
> (For memory hotplug, sorry for broadcasting widely but I cc'ed the
> [email protected] following Kirill's suggestion so MM experts can also
> help to provide comments.)
>
> Hi Dave, Kirill, Tony, Peter, Thomas, Dan (and Intel reviewers),
>
> The new relaxed TDX per-cpu initialization flow has been verified. The
> TDX module can be initialized when there are offline cpus, and the
> TDH.SYS.LP.INIT SEAMCALL can be made successfully later after module
> initialization when the offline cpu is up.
>
> This series mainly added code to handle the new TDX "partial write
> machine check" erratum (SPR113) in [4].
>
> And I would appreciate reviewed-by or acked-by tags if the patches look
> good to you. Thanks in advance!

I've rebased the TDX KVM patch series v14 [1] with this patch series and
uploaded it at [2]. As the rebased TDX KVM patches doesn't have any changes
except trivial rebase fixes, I don't post something like v14.1.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://github.com/intel/tdx/tree/kvm-upstream-workaround
--
Isaku Yamahata <[email protected]>

2023-06-07 16:01:04

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On 6/4/23 07:27, Kai Huang wrote:
> Start to transit out the "multi-steps" to initialize the TDX module.
>
> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums. Not all memory
> satisfies these requirements.
>
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR). During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees.
>
> CMRs tell the kernel which memory is TDX compatible. The kernel takes
> CMRs (plus a little more metadata) and constructs "TD Memory Regions"
> (TDMRs). TDMRs let the kernel grant TDX protections to some or all of
> the CMR areas.
>
> The TDX module also reports necessary information to let the kernel
> build TDMRs and run TDX guests in structure 'tdsysinfo_struct'. The
> list of CMRs, along with the TDX module information, is available to
> the kernel by querying the TDX module.
>
> As a preparation to construct TDMRs, get the TDX module information and
> the list of CMRs. Print out CMRs to help user to decode which memory
> regions are TDX convertible.
>
> The 'tdsysinfo_struct' is fairly large (1024 bytes) and contains a lot
> of info about the TDX module. Fully define the entire structure, but
> only use the fields necessary to build the TDMRs and pr_info() some
> basics about the module. The rest of the fields will get used by KVM.
>
> For now both 'tdsysinfo_struct' and CMRs are only used during the module
> initialization. But because they are both relatively big, declare them
> inside the module initialization function but as static variables.
>
> Signed-off-by: Kai Huang <[email protected]>
> Reviewed-by: Isaku Yamahata <[email protected]>

Reviewed-by: Dave Hansen <[email protected]>

2023-06-07 16:28:10

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On 6/4/23 07:27, Kai Huang wrote:
> + /*
> + * Loop over TDX memory regions and fill out TDMRs to cover them.
> + * To keep it simple, always try to use one TDMR to cover one
> + * memory region.
> + *
> + * In practice TDX1.0 supports 64 TDMRs, which is big enough to
> + * cover all memory regions in reality if the admin doesn't use
> + * 'memmap' to create a bunch of discrete memory regions. When
> + * there's a real problem, enhancement can be done to merge TDMRs
> + * to reduce the final number of TDMRs.
> + */

Rather than focus in on one specific command-line parameter, let's just say:

In practice TDX supports at least 64 TDMRs. A 2-socket system
typically only consumes <NUMBER> of those. This code is dumb
and simple and may use more TMDRs than is strictly required.

Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of
the 64. We'll hopefully start to get reports somewhat in advance if
systems get close to the limit.

2023-06-08 00:36:16

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> For now both 'tdsysinfo_struct' and CMRs are only used during the module
> initialization. But because they are both relatively big, declare them
> inside the module initialization function but as static variables.

This justification does not make sense to me. static variables will not be
freed after function returned. They will still consume memory.

I think you need to allocate/free memory dynamically, if they are too big
for stack.

...

> static int init_tdx_module(void)
> {
> + static DECLARE_PADDED_STRUCT(tdsysinfo_struct, tdsysinfo,
> + TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT);
> + static struct cmr_info cmr_array[MAX_CMRS]
> + __aligned(CMR_INFO_ARRAY_ALIGNMENT);

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-08 03:05:27

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > initialization. But because they are both relatively big, declare them
> > inside the module initialization function but as static variables.
>
> This justification does not make sense to me. static variables will not be
> freed after function returned. They will still consume memory.
>
> I think you need to allocate/free memory dynamically, if they are too big
> for stack.


I do need to keep tdsysinfo_struct as it will be used by KVM too. CMRs are not
used by KVM now but they might get used in the future, e.g., we may want to
expose them to /sys in the future.

Also it takes more lines of code to do dynamic allocation. I'd prefer the code
simplicity. Dave is fine with static too, but prefers to putting them inside
the function:

https://lore.kernel.org/lkml/[email protected]/T/#mbfdaa353278588da09e43f3ce37b7bf8ddedc1b2

I can update the changelog to reflect above:

For now both 'tdsysinfo_struct' and CMRs are only used during the 
module initialization. KVM will need to at least use
'tdsysinfo_struct'
when supporting TDX guests. For now just declare them inside the
module
initialization function but as static variables.

?

2023-06-08 11:04:00

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Wed, 2023-06-07 at 09:05 -0700, Dave Hansen wrote:
> On 6/4/23 07:27, Kai Huang wrote:
> > + /*
> > + * Loop over TDX memory regions and fill out TDMRs to cover them.
> > + * To keep it simple, always try to use one TDMR to cover one
> > + * memory region.
> > + *
> > + * In practice TDX1.0 supports 64 TDMRs, which is big enough to
> > + * cover all memory regions in reality if the admin doesn't use
> > + * 'memmap' to create a bunch of discrete memory regions. When
> > + * there's a real problem, enhancement can be done to merge TDMRs
> > + * to reduce the final number of TDMRs.
> > + */
>
> Rather than focus in on one specific command-line parameter, let's just say:
>
> In practice TDX supports at least 64 TDMRs. A 2-socket system
> typically only consumes <NUMBER> of those. This code is dumb
> and simple and may use more TMDRs than is strictly required.

Thanks will do. Will take a look at machine to get the <NUMBER>.

>
> Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of
> the 64. We'll hopefully start to get reports somewhat in advance if
> systems get close to the limit.

May I ask why this is useful? TDX module can only be initialized once, so if
not considering module runtime update case, the kernel can only get two results
for once: 

1) Succeed to initialize: consumed TDMRs doesn't exceed maximum TDMRs
2) Fail to initialize: consumed TDMRs exceeds maximum TDMRs

What's the value of pr_warn() user when consumed TDMRs exceeds some threshold?

Anyway, if you want it, how does below code look?

static int fill_out_tdmrs(struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list)
{
+ int consumed_tdmrs_threshold, tdmr_idx = 0;
struct tdx_memblock *tmb;
- int tdmr_idx = 0;

/*
* Loop over TDX memory regions and fill out TDMRs to cover them.
* To keep it simple, always try to use one TDMR to cover one
* memory region.
*
- * In practice TDX1.0 supports 64 TDMRs, which is big enough to
- * cover all memory regions in reality if the admin doesn't use
- * 'memmap' to create a bunch of discrete memory regions. When
- * there's a real problem, enhancement can be done to merge TDMRs
- * to reduce the final number of TDMRs.
+ * In practice TDX supports at least 64 TDMRs. A 2-socket system
+ * typically only consumes <NUMBER> of those. This code is dumb
+ * and simple and may use more TMDRs than is strictly required.
+ *
+ * Also set a threshold of consumed TDMRs, and pr_warn() to warn
+ * the user the system is getting close to the limit of supported
+ * number of TDMRs if the number of consumed TDMRs exceeds the
+ * threshold.
*/
+ consumed_tdmrs_threshold = tdmr_list->max_tdmrs * 3 / 4;
list_for_each_entry(tmb, tmb_list, list) {
struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx);
u64 start, end;
@@ -463,6 +467,10 @@ static int fill_out_tdmrs(struct list_head *tmb_list,
return -ENOSPC;
}

+ if (tdmr_idx == consumed_tdmrs_threshold)
+ pr_warn("consumed TDMRs reaching limit: %d used
(out of %d)\n",
+ tdmr_idx, tdmr_list->max_tdmrs);
+
tdmr = tdmr_entry(tdmr_list, tdmr_idx);
}


2023-06-08 12:23:21

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, Jun 08, 2023 at 02:40:27AM +0000, Huang, Kai wrote:
> On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> > On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > > initialization. But because they are both relatively big, declare them
> > > inside the module initialization function but as static variables.
> >
> > This justification does not make sense to me. static variables will not be
> > freed after function returned. They will still consume memory.
> >
> > I think you need to allocate/free memory dynamically, if they are too big
> > for stack.
>
>
> I do need to keep tdsysinfo_struct as it will be used by KVM too.

Will you pass it down to KVM from this function? Will KVM use the struct
after the function returns?

> CMRs are not
> used by KVM now but they might get used in the future, e.g., we may want to
> expose them to /sys in the future.
>
> Also it takes more lines of code to do dynamic allocation. I'd prefer the code
> simplicity.

These structures take 1.5K of memory and the memory will be allocated for
all machines that boots the kernel with TDX enabled, regardless if the
machine has TDX or not. It seems very wasteful to me.


--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-08 13:41:00

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On 6/8/23 03:48, Huang, Kai wrote:
>> Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of
>> the 64. We'll hopefully start to get reports somewhat in advance if
>> systems get close to the limit.
> May I ask why this is useful? TDX module can only be initialized once, so if
> not considering module runtime update case, the kernel can only get two results
> for once:
>
> 1) Succeed to initialize: consumed TDMRs doesn't exceed maximum TDMRs
> 2) Fail to initialize: consumed TDMRs exceeds maximum TDMRs
>
> What's the value of pr_warn() user when consumed TDMRs exceeds some threshold?

Today, we're saying, "64 TMDRs out to be enough for anybody!"

I'd actually kinda like to know if anybody starts building platforms
that get anywhere near using 64. That way, we won't get a bug report
that TDX is broken and we'll have a fire drill. We'll get a bug report
that TDX is complaining and we'll have some time to go fix it without
anyone actually being broken.

Maybe not even a pr_warn(), but something that's a bit ominous and has a
chance of getting users to act.

2023-06-08 13:49:54

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On 6/8/23 04:41, [email protected] wrote:
> These structures take 1.5K of memory and the memory will be allocated for
> all machines that boots the kernel with TDX enabled, regardless if the
> machine has TDX or not. It seems very wasteful to me.

Actually, those variables are in .bss. They're allocated forever for
anyone that runs a kernel that has TDX support.



2023-06-08 21:53:14

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v11 00/20] TDX host kernel support

Kai Huang wrote:
> Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks. TDX specs are available in [1].
>
> This series is the initial support to enable TDX with minimal code to
> allow KVM to create and run TDX guests. KVM support for TDX is being
> developed separately[2]. A new "userspace inaccessible memfd" approach
> to support TDX private memory is also being developed[3]. The KVM will
> only support the new "userspace inaccessible memfd" as TDX guest memory.

This memfd approach is incompatible with one of the primary ways that
new memory topologies like high-bandwidth-memory and CXL are accessed,
via a device-special-file mapping. There is already precedent for mmap()
to only be used for communicating address value and not CPU accessible
memory. See "Userspace P2PDMA with O_DIRECT NVMe devices" [1].

So before this memfd requirement becomes too baked in to the design I
want to understand if "userspace inaccessible" is the only requirement
so I can look to add that to the device-special-file interface for
"device" / "Soft Reserved" memory like HBM and CXL.

[1]: https://lore.kernel.org/all/[email protected]/

2023-06-08 23:17:57

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Mon, Jun 05, 2023 at 02:27:24AM +1200, Kai Huang wrote:
> +#define TDMR_ALIGNMENT BIT_ULL(30)

Nit: SZ_1G can be a little bit more readable here.

Anyway:

Reviewed-by: Kirill A. Shutemov <[email protected]>


--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-09 00:15:57

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, Jun 08, 2023 at 02:41:28PM +0300,
"[email protected]" <[email protected]> wrote:

> On Thu, Jun 08, 2023 at 02:40:27AM +0000, Huang, Kai wrote:
> > On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> > > On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > > > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > > > initialization. But because they are both relatively big, declare them
> > > > inside the module initialization function but as static variables.
> > >
> > > This justification does not make sense to me. static variables will not be
> > > freed after function returned. They will still consume memory.
> > >
> > > I think you need to allocate/free memory dynamically, if they are too big
> > > for stack.
> >
> >
> > I do need to keep tdsysinfo_struct as it will be used by KVM too.
>
> Will you pass it down to KVM from this function? Will KVM use the struct
> after the function returns?

KVM needs tdsysinfo_struct to create guest TD. It doesn't require
1024-alignment.
--
Isaku Yamahata <[email protected]>

2023-06-09 00:24:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, Jun 08, 2023 at 04:29:19PM -0700, Isaku Yamahata wrote:
> On Thu, Jun 08, 2023 at 02:41:28PM +0300,
> "[email protected]" <[email protected]> wrote:
>
> > On Thu, Jun 08, 2023 at 02:40:27AM +0000, Huang, Kai wrote:
> > > On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> > > > On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > > > > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > > > > initialization. But because they are both relatively big, declare them
> > > > > inside the module initialization function but as static variables.
> > > >
> > > > This justification does not make sense to me. static variables will not be
> > > > freed after function returned. They will still consume memory.
> > > >
> > > > I think you need to allocate/free memory dynamically, if they are too big
> > > > for stack.
> > >
> > >
> > > I do need to keep tdsysinfo_struct as it will be used by KVM too.
> >
> > Will you pass it down to KVM from this function? Will KVM use the struct
> > after the function returns?
>
> KVM needs tdsysinfo_struct to create guest TD. It doesn't require
> 1024-alignment.

How KVM gets it from here?

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-09 02:02:01

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Fri, Jun 09, 2023 at 02:54:41AM +0300,
"[email protected]" <[email protected]> wrote:

> On Thu, Jun 08, 2023 at 04:29:19PM -0700, Isaku Yamahata wrote:
> > On Thu, Jun 08, 2023 at 02:41:28PM +0300,
> > "[email protected]" <[email protected]> wrote:
> >
> > > On Thu, Jun 08, 2023 at 02:40:27AM +0000, Huang, Kai wrote:
> > > > On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> > > > > On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > > > > > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > > > > > initialization. But because they are both relatively big, declare them
> > > > > > inside the module initialization function but as static variables.
> > > > >
> > > > > This justification does not make sense to me. static variables will not be
> > > > > freed after function returned. They will still consume memory.
> > > > >
> > > > > I think you need to allocate/free memory dynamically, if they are too big
> > > > > for stack.
> > > >
> > > >
> > > > I do need to keep tdsysinfo_struct as it will be used by KVM too.
> > >
> > > Will you pass it down to KVM from this function? Will KVM use the struct
> > > after the function returns?
> >
> > KVM needs tdsysinfo_struct to create guest TD. It doesn't require
> > 1024-alignment.
>
> How KVM gets it from here?

For now, TDX KVM patch series moves the tdsysinfo out of the function, and add
a getter function of it.
As long as KVM can access the info, it doesn't care how its memory is allocated.
static or dynamic.
--
Isaku Yamahata <[email protected]>

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions



On 6/4/23 7:27 AM, Kai Huang wrote:
> Start to transit out the "multi-steps" to construct a list of "TD Memory
> Regions" (TDMRs) to cover all TDX-usable memory regions.
>
> The kernel configures TDX-usable memory regions by passing a list of
> TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains
> the information of the base/size of a memory region, the base/size of the
> associated Physical Address Metadata Table (PAMT) and a list of reserved
> areas in the region.
>
> Do the first step to fill out a number of TDMRs to cover all TDX memory
> regions. To keep it simple, always try to use one TDMR for each memory
> region. As the first step only set up the base/size for each TDMR.

As a first step?

>
> Each TDMR must be 1G aligned and the size must be in 1G granularity.
> This implies that one TDMR could cover multiple memory regions. If a
> memory region spans the 1GB boundary and the former part is already
> covered by the previous TDMR, just use a new TDMR for the remaining
> part.
>
> TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs
> are consumed but there is more memory region to cover.
>
> There are fancier things that could be done like trying to merge
> adjacent TDMRs. This would allow more pathological memory layouts to be
> supported. But, current systems are not even close to exhausting the
> existing TDMR resources in practice. For now, keep it simple.
>
> Signed-off-by: Kai Huang <[email protected]>
> ---
>
> v10 -> v11:
> - No update
>
> v9 -> v10:
> - No change.
>
> v8 -> v9:
>
> - Added the last paragraph in the changelog (Dave).
> - Removed unnecessary type cast in tdmr_entry() (Dave).
>
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++-
> 1 file changed, 93 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 7a20c72361e7..fa9fa8bc581a 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list)
> tdmr_list->max_tdmrs * tdmr_list->tdmr_sz);
> }
>
> +/* Get the TDMR from the list at the given index. */
> +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list,
> + int idx)
> +{
> + int tdmr_info_offset = tdmr_list->tdmr_sz * idx;
> +
> + return (void *)tdmr_list->tdmrs + tdmr_info_offset;
> +}
> +
> +#define TDMR_ALIGNMENT BIT_ULL(30)
> +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT)

This macro is never used. Maybe you can drop it from this patch.

> +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT)
> +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT)
> +
> +static inline u64 tdmr_end(struct tdmr_info *tdmr)
> +{
> + return tdmr->base + tdmr->size;
> +}
> +
> +/*
> + * Take the memory referenced in @tmb_list and populate the
> + * preallocated @tdmr_list, following all the special alignment
> + * and size rules for TDMR.
> + */
> +static int fill_out_tdmrs(struct list_head *tmb_list,
> + struct tdmr_info_list *tdmr_list)
> +{
> + struct tdx_memblock *tmb;
> + int tdmr_idx = 0;
> +
> + /*
> + * Loop over TDX memory regions and fill out TDMRs to cover them.
> + * To keep it simple, always try to use one TDMR to cover one
> + * memory region.
> + *
> + * In practice TDX1.0 supports 64 TDMRs, which is big enough to
> + * cover all memory regions in reality if the admin doesn't use
> + * 'memmap' to create a bunch of discrete memory regions. When
> + * there's a real problem, enhancement can be done to merge TDMRs
> + * to reduce the final number of TDMRs.
> + */
> + list_for_each_entry(tmb, tmb_list, list) {
> + struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx);
> + u64 start, end;
> +
> + start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn));
> + end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn));
> +
> + /*
> + * A valid size indicates the current TDMR has already
> + * been filled out to cover the previous memory region(s).
> + */
> + if (tdmr->size) {
> + /*
> + * Loop to the next if the current memory region
> + * has already been fully covered.
> + */
> + if (end <= tdmr_end(tdmr))
> + continue;
> +
> + /* Otherwise, skip the already covered part. */
> + if (start < tdmr_end(tdmr))
> + start = tdmr_end(tdmr);
> +
> + /*
> + * Create a new TDMR to cover the current memory
> + * region, or the remaining part of it.
> + */
> + tdmr_idx++;
> + if (tdmr_idx >= tdmr_list->max_tdmrs) {
> + pr_warn("initialization failed: TDMRs exhausted.\n");
> + return -ENOSPC;
> + }
> +
> + tdmr = tdmr_entry(tdmr_list, tdmr_idx);
> + }
> +
> + tdmr->base = start;
> + tdmr->size = end - start;
> + }
> +
> + /* @tdmr_idx is always the index of last valid TDMR. */
> + tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1;
> +
> + return 0;
> +}
> +
> /*
> * Construct a list of TDMRs on the preallocated space in @tdmr_list
> * to cover all TDX memory regions in @tmb_list based on the TDX module
> @@ -394,10 +481,15 @@ static int construct_tdmrs(struct list_head *tmb_list,
> struct tdmr_info_list *tdmr_list,
> struct tdsysinfo_struct *sysinfo)
> {
> + int ret;
> +
> + ret = fill_out_tdmrs(tmb_list, tdmr_list);
> + if (ret)
> + return ret;
> +
> /*
> * TODO:
> *
> - * - Fill out TDMRs to cover all TDX memory regions.
> * - Allocate and set up PAMTs for each TDMR.
> * - Designate reserved areas for each TDMR.
> *

Rest looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2023-06-09 10:39:06

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

> @@ -21,6 +23,76 @@
> */
> #define TDH_SYS_INIT 33
> #define TDH_SYS_LP_INIT 35
> +#define TDH_SYS_INFO 32

Could you keep these defines ordered? Here and all following patches.

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-12 02:22:28

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, 2023-06-08 at 06:13 -0700, Dave Hansen wrote:
> On 6/8/23 04:41, [email protected] wrote:
> > These structures take 1.5K of memory and the memory will be allocated for
> > all machines that boots the kernel with TDX enabled, regardless if the
> > machine has TDX or not. It seems very wasteful to me.
>
> Actually, those variables are in .bss. They're allocated forever for
> anyone that runs a kernel that has TDX support.
>

Hi Dave/Kirill,

Thanks for feedback.

My understanding is you both prefer dynamic allocation. I'll change to use
that. Also I will free them after module initialization as for now they are
only used by module initialization.

Please let me know if you have any comments.

2023-06-12 02:26:04

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Fri, 2023-06-09 at 13:02 +0300, [email protected] wrote:
> > @@ -21,6 +23,76 @@
> >   */
> >  #define TDH_SYS_INIT 33
> >  #define TDH_SYS_LP_INIT 35
> > +#define TDH_SYS_INFO 32
>
> Could you keep these defines ordered? Here and all following patches.
>

Sure will do. Thanks.

2023-06-12 02:28:02

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Fri, 2023-06-09 at 02:02 +0300, [email protected] wrote:
> On Mon, Jun 05, 2023 at 02:27:24AM +1200, Kai Huang wrote:
> > +#define TDMR_ALIGNMENT BIT_ULL(30)
>
> Nit: SZ_1G can be a little bit more readable here.

Will do.

>
> Anyway:
>
> Reviewed-by: Kirill A. Shutemov <[email protected]>
>
>

Thanks.

2023-06-12 02:55:16

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Thu, 2023-06-08 at 21:01 -0700, Sathyanarayanan Kuppuswamy wrote:
>
> On 6/4/23 7:27 AM, Kai Huang wrote:
> > Start to transit out the "multi-steps" to construct a list of "TD Memory
> > Regions" (TDMRs) to cover all TDX-usable memory regions.
> >
> > The kernel configures TDX-usable memory regions by passing a list of
> > TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains
> > the information of the base/size of a memory region, the base/size of the
> > associated Physical Address Metadata Table (PAMT) and a list of reserved
> > areas in the region.
> >
> > Do the first step to fill out a number of TDMRs to cover all TDX memory
> > regions. To keep it simple, always try to use one TDMR for each memory
> > region. As the first step only set up the base/size for each TDMR.
>
> As a first step?

Not sure there are two or more first steps? I think I'll keep it as is.

[...]

> > +#define TDMR_ALIGNMENT BIT_ULL(30)
> > +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT)
>
> This macro is never used. Maybe you can drop it from this patch.

OK will do.

[...]

>
> Rest looks good to me.
>
> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
>

Thanks.

2023-06-12 03:02:55

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions


>
> Maybe not even a pr_warn(), but something that's a bit ominous and has a
> chance of getting users to act.

Sorry I am not sure how to do. Could you give some suggestion?

2023-06-12 11:39:32

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 00/20] TDX host kernel support

On Thu, 2023-06-08 at 14:03 -0700, Dan Williams wrote:
> Kai Huang wrote:
> > Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
> > host and certain physical attacks. TDX specs are available in [1].
> >
> > This series is the initial support to enable TDX with minimal code to
> > allow KVM to create and run TDX guests. KVM support for TDX is being
> > developed separately[2]. A new "userspace inaccessible memfd" approach
> > to support TDX private memory is also being developed[3]. The KVM will
> > only support the new "userspace inaccessible memfd" as TDX guest memory.
>
> This memfd approach is incompatible with one of the primary ways that
> new memory topologies like high-bandwidth-memory and CXL are accessed,
> via a device-special-file mapping. There is already precedent for mmap()
> to only be used for communicating address value and not CPU accessible
> memory. See "Userspace P2PDMA with O_DIRECT NVMe devices" [1].
>
> So before this memfd requirement becomes too baked in to the design I
> want to understand if "userspace inaccessible" is the only requirement
> so I can look to add that to the device-special-file interface for
> "device" / "Soft Reserved" memory like HBM and CXL.
>
> [1]: https://lore.kernel.org/all/[email protected]/

+ Peng, Chao who is working on this with Sean.

There are some recent developments around the design of the "userspace
inaccessible memfd", e.g., IIUC Sean is proposing to replace the new syscall
with a new KVM ioctl().

Hi Sean, Chao,

Could you give comments to Dan's concern?

2023-06-12 15:21:36

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote:
>
> >
> > Maybe not even a pr_warn(), but something that's a bit ominous and has a
> > chance of getting users to act.
>
> Sorry I am not sure how to do. Could you give some suggestion?

Maybe something like this would do?

I'm struggle with the warning message. Any suggestion is welcome.

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 9cd4f6b58d4a..cc141025b249 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -627,6 +627,15 @@ static int fill_out_tdmrs(struct list_head *tmb_list,
/* @tdmr_idx is always the index of last valid TDMR. */
tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1;

+ /*
+ * Warn early that kernel is about to run out of TDMRs.
+ *
+ * This is indication that TDMR allocation has to be reworked to be
+ * smarter to not run into an issue.
+ */
+ if (tdmr_list->max_tdmrs - tdmr_list->nr_consumed_tdmrs < TDMR_NR_WARN)
+ pr_warn("Low number of spare TDMRs\n");
+
return 0;
}

diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 323ce744b853..17efe33847ae 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -98,6 +98,9 @@ struct tdx_memblock {
int nid;
};

+/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
+#define TDMR_NR_WARN 4
+
struct tdmr_info_list {
void *tdmrs; /* Flexible array to hold 'tdmr_info's */
int nr_consumed_tdmrs; /* How many 'tdmr_info's are in use */
--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-12 22:29:50

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Mon, 2023-06-12 at 17:33 +0300, [email protected] wrote:
> On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote:
> >
> > >
> > > Maybe not even a pr_warn(), but something that's a bit ominous and has a
> > > chance of getting users to act.
> >
> > Sorry I am not sure how to do. Could you give some suggestion?
>
> Maybe something like this would do?
>
> I'm struggle with the warning message. Any suggestion is welcome.

I guess it would be helpful to print out the actual consumed TDMRs?

pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n",
tdmr_idx, tdmr_list->max_tdmrs);

>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 9cd4f6b58d4a..cc141025b249 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -627,6 +627,15 @@ static int fill_out_tdmrs(struct list_head *tmb_list,
> /* @tdmr_idx is always the index of last valid TDMR. */
> tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1;
>
> + /*
> + * Warn early that kernel is about to run out of TDMRs.
> + *
> + * This is indication that TDMR allocation has to be reworked to be
> + * smarter to not run into an issue.
> + */
> + if (tdmr_list->max_tdmrs - tdmr_list->nr_consumed_tdmrs < TDMR_NR_WARN)
> + pr_warn("Low number of spare TDMRs\n");
> +
> return 0;
> }
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 323ce744b853..17efe33847ae 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -98,6 +98,9 @@ struct tdx_memblock {
> int nid;
> };
>
> +/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
> +#define TDMR_NR_WARN 4
> +
> struct tdmr_info_list {
> void *tdmrs; /* Flexible array to hold 'tdmr_info's */
> int nr_consumed_tdmrs; /* How many 'tdmr_info's are in use */

2023-06-13 10:32:32

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Mon, Jun 12, 2023 at 10:10:39PM +0000, Huang, Kai wrote:
> On Mon, 2023-06-12 at 17:33 +0300, [email protected] wrote:
> > On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote:
> > >
> > > >
> > > > Maybe not even a pr_warn(), but something that's a bit ominous and has a
> > > > chance of getting users to act.
> > >
> > > Sorry I am not sure how to do. Could you give some suggestion?
> >
> > Maybe something like this would do?
> >
> > I'm struggle with the warning message. Any suggestion is welcome.
>
> I guess it would be helpful to print out the actual consumed TDMRs?
>
> pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n",
> tdmr_idx, tdmr_list->max_tdmrs);

It is off-by-one. It supposed to be tdmr_idx + 1.

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-13 23:24:24

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Tue, 2023-06-13 at 13:18 +0300, [email protected] wrote:
> On Mon, Jun 12, 2023 at 10:10:39PM +0000, Huang, Kai wrote:
> > On Mon, 2023-06-12 at 17:33 +0300, [email protected] wrote:
> > > On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote:
> > > >
> > > > >
> > > > > Maybe not even a pr_warn(), but something that's a bit ominous and has a
> > > > > chance of getting users to act.
> > > >
> > > > Sorry I am not sure how to do. Could you give some suggestion?
> > >
> > > Maybe something like this would do?
> > >
> > > I'm struggle with the warning message. Any suggestion is welcome.
> >
> > I guess it would be helpful to print out the actual consumed TDMRs?
> >
> > pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n",
> > tdmr_idx, tdmr_list->max_tdmrs);
>
> It is off-by-one. It supposed to be tdmr_idx + 1.
>

In your code, yes. Thanks for pointing out. I copied it from my code.

2023-06-14 12:53:48

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions



On 4.06.23 г. 17:27 ч., Kai Huang wrote:
<snip>

> ---
> arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++-
> 1 file changed, 93 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 7a20c72361e7..fa9fa8bc581a 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list)
> tdmr_list->max_tdmrs * tdmr_list->tdmr_sz);
> }
>
> +/* Get the TDMR from the list at the given index. */
> +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list,
> + int idx)
> +{
> + int tdmr_info_offset = tdmr_list->tdmr_sz * idx;
> +
> + return (void *)tdmr_list->tdmrs + tdmr_info_offset;

nit: I would just like to point that sizeof(void *) being treated as 1
is a gcc-specific compiler extension:
https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Pointer-Arith.html#Pointer-Arith

I don't know if clang treats it the same way, just for the sake of
simplicity you might wanna change this (void *) to (char *).

<snip>

2023-06-14 23:11:01

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Wed, 2023-06-14 at 15:31 +0300, Nikolay Borisov wrote:
>
> On 4.06.23 г. 17:27 ч., Kai Huang wrote:
> <snip>
>
> > ---
> > arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 93 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 7a20c72361e7..fa9fa8bc581a 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list)
> > tdmr_list->max_tdmrs * tdmr_list->tdmr_sz);
> > }
> >
> > +/* Get the TDMR from the list at the given index. */
> > +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list,
> > + int idx)
> > +{
> > + int tdmr_info_offset = tdmr_list->tdmr_sz * idx;
> > +
> > + return (void *)tdmr_list->tdmrs + tdmr_info_offset;
>
> nit: I would just like to point that sizeof(void *) being treated as 1
> is a gcc-specific compiler extension:
> https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Pointer-Arith.html#Pointer-Arith
>
> I don't know if clang treats it the same way, just for the sake of
> simplicity you might wanna change this (void *) to (char *).

Then we will need additional cast from 'char *' to 'struct tdmr_info *' I
suppose? Not sure whether it is worth the additional cast.

And I found such 'void *' arithmetic operation is already used in other kernel
code too, e.g., below code in networking code:

./net/rds/tcp_send.c:105: (void *)&rm->m_inc.i_hdr + hdr_off,

and I believe there are other examples too (that I didn't spend a lot of time to
grep).

And it seems Linus also thinks "using arithmetic on 'void *' is generally
superior":

https://lore.kernel.org/lkml/CAHk-=whFKYMrF6euVvziW+drw7-yi1pYdf=uccnzJ8k09DoTXA@mail.gmail.com/t/#m983827708903c8c5bddf193343d392c9ed5af1a0

So I wouldn't worry about the Clang thing.

2023-06-19 13:43:52

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On 04.06.23 16:27, Kai Huang wrote:
> Start to transit out the "multi-steps" to initialize the TDX module.
>
> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums. Not all memory
> satisfies these requirements.
>
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR). During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees.
>
> CMRs tell the kernel which memory is TDX compatible. The kernel takes
> CMRs (plus a little more metadata) and constructs "TD Memory Regions"
> (TDMRs). TDMRs let the kernel grant TDX protections to some or all of
> the CMR areas.
>
> The TDX module also reports necessary information to let the kernel
> build TDMRs and run TDX guests in structure 'tdsysinfo_struct'. The
> list of CMRs, along with the TDX module information, is available to
> the kernel by querying the TDX module.
>
> As a preparation to construct TDMRs, get the TDX module information and
> the list of CMRs. Print out CMRs to help user to decode which memory
> regions are TDX convertible.
>
> The 'tdsysinfo_struct' is fairly large (1024 bytes) and contains a lot
> of info about the TDX module. Fully define the entire structure, but
> only use the fields necessary to build the TDMRs and pr_info() some
> basics about the module. The rest of the fields will get used by KVM.
>
> For now both 'tdsysinfo_struct' and CMRs are only used during the module
> initialization. But because they are both relatively big, declare them
> inside the module initialization function but as static variables.
>
> Signed-off-by: Kai Huang <[email protected]>
> Reviewed-by: Isaku Yamahata <[email protected]>
> ---


[...]


> ---
> arch/x86/virt/vmx/tdx/tdx.c | 67 +++++++++++++++++++++++++++++++++-
> arch/x86/virt/vmx/tdx/tdx.h | 72 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 138 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index bcf2b2d15a2e..9fde0f71dd8b 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -20,6 +20,7 @@
> #include <asm/msr-index.h>
> #include <asm/msr.h>
> #include <asm/archrandom.h>
> +#include <asm/page.h>
> #include <asm/tdx.h>
> #include "tdx.h"
>
> @@ -191,12 +192,76 @@ int tdx_cpu_enable(void)
> }
> EXPORT_SYMBOL_GPL(tdx_cpu_enable);
>
> +static inline bool is_cmr_empty(struct cmr_info *cmr)
> +{
> + return !cmr->size;
> +}
> +

Nit: maybe it's just me, but this function seems unnecessary.

If "!cmr->size" is not expressive, then I don't know why "is_cmr_empty"
should be. Just inline that into the single user.

.. after all the single caller also uses/prints cmr->size ...

> +static void print_cmrs(struct cmr_info *cmr_array, int nr_cmrs)
> +{
> + int i;
> +
> + for (i = 0; i < nr_cmrs; i++) {
> + struct cmr_info *cmr = &cmr_array[i];
> +
> + /*
> + * The array of CMRs reported via TDH.SYS.INFO can
> + * contain tail empty CMRs. Don't print them.
> + */
> + if (is_cmr_empty(cmr))
> + break;
> +
> + pr_info("CMR: [0x%llx, 0x%llx)\n", cmr->base,
> + cmr->base + cmr->size);
> + }
> +}
> +
> +/*
> + * Get the TDX module information (TDSYSINFO_STRUCT) and the array of
> + * CMRs, and save them to @sysinfo and @cmr_array. @sysinfo must have
> + * been padded to have enough room to save the TDSYSINFO_STRUCT.
> + */
> +static int tdx_get_sysinfo(struct tdsysinfo_struct *sysinfo,
> + struct cmr_info *cmr_array)
> +{
> + struct tdx_module_output out;
> + u64 sysinfo_pa, cmr_array_pa;
> + int ret;
> +
> + sysinfo_pa = __pa(sysinfo);
> + cmr_array_pa = __pa(cmr_array);
> + ret = seamcall(TDH_SYS_INFO, sysinfo_pa, TDSYSINFO_STRUCT_SIZE,
> + cmr_array_pa, MAX_CMRS, NULL, &out);
> + if (ret)
> + return ret;
> +
> + pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",


"attributes" ?

> + sysinfo->attributes, sysinfo->vendor_id,
> + sysinfo->major_version, sysinfo->minor_version,
> + sysinfo->build_date, sysinfo->build_num);
> +
> + /* R9 contains the actual entries written to the CMR array. */
> + print_cmrs(cmr_array, out.r9);
> +
> + return 0;
> +}
> +
> static int init_tdx_module(void)
> {
> + static DECLARE_PADDED_STRUCT(tdsysinfo_struct, tdsysinfo,
> + TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT);
> + static struct cmr_info cmr_array[MAX_CMRS]
> + __aligned(CMR_INFO_ARRAY_ALIGNMENT);
> + struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo);
> + int ret;
> +
> + ret = tdx_get_sysinfo(sysinfo, cmr_array);
> + if (ret)
> + return ret;
> +
> /*
> * TODO:
> *
> - * - Get TDX module information and TDX-capable memory regions.
> * - Build the list of TDX-usable memory regions.
> * - Construct a list of "TD Memory Regions" (TDMRs) to cover
> * all TDX-usable memory regions.
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 9fb46033c852..97f4d7e7f1a4 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -3,6 +3,8 @@
> #define _X86_VIRT_TDX_H
>
> #include <linux/types.h>
> +#include <linux/stddef.h>
> +#include <linux/compiler_attributes.h>
>
> /*
> * This file contains both macros and data structures defined by the TDX
> @@ -21,6 +23,76 @@
> */
> #define TDH_SYS_INIT 33
> #define TDH_SYS_LP_INIT 35
> +#define TDH_SYS_INFO 32
> +
> +struct cmr_info {
> + u64 base;
> + u64 size;
> +} __packed;
> +
> +#define MAX_CMRS 32
> +#define CMR_INFO_ARRAY_ALIGNMENT 512
> +
> +struct cpuid_config {
> + u32 leaf;
> + u32 sub_leaf;
> + u32 eax;
> + u32 ebx;
> + u32 ecx;
> + u32 edx;
> +} __packed;
> +
> +#define DECLARE_PADDED_STRUCT(type, name, size, alignment) \
> + struct type##_padded { \
> + union { \
> + struct type name; \
> + u8 padding[size]; \
> + }; \
> + } name##_padded __aligned(alignment)
> +
> +#define PADDED_STRUCT(name) (name##_padded.name)
> +
> +#define TDSYSINFO_STRUCT_SIZE 1024

So, it can never be larger than 1024 bytes? Not even with many cpuid
configs?

> +#define TDSYSINFO_STRUCT_ALIGNMENT 1024
> +

--
Cheers,

David / dhildenb


2023-06-20 00:04:34

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory


> > +static inline bool is_cmr_empty(struct cmr_info *cmr)
> > +{
> > + return !cmr->size;
> > +}
> > +
>
> Nit: maybe it's just me, but this function seems unnecessary.
>
> If "!cmr->size" is not expressive, then I don't know why "is_cmr_empty"
> should be. Just inline that into the single user.
>
> .. after all the single caller also uses/prints cmr->size ...

Agreed. I'll remove this function. Thanks!

>
> > +static void print_cmrs(struct cmr_info *cmr_array, int nr_cmrs)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < nr_cmrs; i++) {
> > + struct cmr_info *cmr = &cmr_array[i];
> > +
> > + /*
> > + * The array of CMRs reported via TDH.SYS.INFO can
> > + * contain tail empty CMRs. Don't print them.
> > + */
> > + if (is_cmr_empty(cmr))
> > + break;
> > +
> > + pr_info("CMR: [0x%llx, 0x%llx)\n", cmr->base,
> > + cmr->base + cmr->size);
> > + }
> > +}
> > +
> > +/*
> > + * Get the TDX module information (TDSYSINFO_STRUCT) and the array of
> > + * CMRs, and save them to @sysinfo and @cmr_array. @sysinfo must have
> > + * been padded to have enough room to save the TDSYSINFO_STRUCT.
> > + */
> > +static int tdx_get_sysinfo(struct tdsysinfo_struct *sysinfo,
> > + struct cmr_info *cmr_array)
> > +{
> > + struct tdx_module_output out;
> > + u64 sysinfo_pa, cmr_array_pa;
> > + int ret;
> > +
> > + sysinfo_pa = __pa(sysinfo);
> > + cmr_array_pa = __pa(cmr_array);
> > + ret = seamcall(TDH_SYS_INFO, sysinfo_pa, TDSYSINFO_STRUCT_SIZE,
> > + cmr_array_pa, MAX_CMRS, NULL, &out);
> > + if (ret)
> > + return ret;
> > +
> > + pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
>
>
> "attributes" ?

Appreciate! :)

[...]


> > +#define TDSYSINFO_STRUCT_SIZE 1024
>
> So, it can never be larger than 1024 bytes? Not even with many cpuid
> configs?

Correct. The TDX module spec(s) says:

TDSYSINFO_STRUCT’s size is 1024B.

Which is an architectural sentence to me.

We (Intel) already published TDX IO, and TDSYSINFO_STRUCT is 1024B for all TDX
module versions.
>