2022-06-22 11:40:39

by Kai Huang

[permalink] [raw]
Subject: [PATCH v5 00/22] TDX host kernel support

Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks. This series provides support for
initializing TDX in the host kernel.

KVM support for TDX is being developed separately[1]. A new fd-based
approach to supporting TDX private memory is also being developed[2].
The KVM will only support the new fd-based approach as TD guest backend.

You can find TDX related specs here:
https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html

This series is rebased to latest tip/x86/tdx. You can also find this
series in below repo in github:
https://github.com/intel/tdx/tree/host-upstream

I highly appreciate if anyone can help to review this series.

Hi Dave (and Intel reviewers),

Please kindly help to review, and I would appreciate reviewed-by or
acked-by tags if the patches look good to you.

Changelog history:

- v4 -> v5:

This is essentially a resent of v4. Sorry I forgot to consult
get_maintainer.pl when sending out v4, so I forgot to add linux-acpi
and linux-mm mailing list and the relevant people for 4 new patches.

Since there's no feedback on v4, please ignore reviewing on v4 and
compare v5 to v3 directly. For changes to comparing to v3, please see
change from v3 -> v4. Also, in changelog histroy of individual
patches, I just used v3 -> v5.

- v3 -> v4 (addressed Dave's comments, and other comments from others):

- Simplified SEAMRR and TDX keyID detection.
- Added patches to handle ACPI CPU hotplug.
- Added patches to handle ACPI memory hotplug and driver managed memory
hotplug.
- Removed tdx_detect() but only use single tdx_init().
- Removed detecting TDX module via P-SEAMLDR.
- Changed from using e820 to using memblock to convert system RAM to TDX
memory.
- Excluded legacy PMEM from TDX memory.
- Removed the boot-time command line to disable TDX patch.
- Addressed comments for other individual patches (please see individual
patches).
- Improved the documentation patch based on the new implementation.

- V2 -> v3:

- Addressed comments from Isaku.
- Fixed memory leak and unnecessary function argument in the patch to
configure the key for the global keyid (patch 17).
- Enhanced a little bit to the patch to get TDX module and CMR
information (patch 09).
- Fixed an unintended change in the patch to allocate PAMT (patch 13).
- Addressed comments from Kevin:
- Slightly improvement on commit message to patch 03.
- Removed WARN_ON_ONCE() in the check of cpus_booted_once_mask in
seamrr_enabled() (patch 04).
- Changed documentation patch to add TDX host kernel support materials
to Documentation/x86/tdx.rst together with TDX guest staff, instead
of a standalone file (patch 21)
- Very minor improvement in commit messages.

- RFC (v1) -> v2:
- Rebased to Kirill's latest TDX guest code.
- Fixed two issues that are related to finding all RAM memory regions
based on e820.
- Minor improvement on comments and commit messages.

v3:
https://lore.kernel.org/lkml/[email protected]/T/

V2:
https://lore.kernel.org/lkml/[email protected]/T/

RFC (v1):
https://lore.kernel.org/all/e0ff030a49b252d91c789a89c303bb4206f85e3d.1646007267.git.kai.huang@intel.com/T/

== Background ==

Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks. TDX introduces a new CPU mode called
Secure Arbitration Mode (SEAM) and a new isolated range pointed by the
SEAM Ranger Register (SEAMRR). A CPU-attested software module called
'the TDX module' implements the functionalities to manage and run
protected VMs. The TDX module (and it's loader called the 'P-SEAMLDR')
runs inside the new isolated range and is protected from the untrusted
host VMM.

TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to
provide crypto-protection to the VMs. TDX reserves part of MKTME KeyIDs
as TDX private KeyIDs, which are only accessible within the SEAM mode.
BIOS is responsible for partitioning legacy MKTME KeyIDs and TDX KeyIDs.

TDX is different from AMD SEV/SEV-ES/SEV-SNP, which uses a dedicated
secure processor to provide crypto-protection. The firmware runs on the
secure processor acts a similar role as the TDX module.

The host kernel communicates with SEAM software via a new SEAMCALL
instruction. This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead.

Before being able to manage TD guests, the TDX module must be loaded
and properly initialized using SEAMCALLs defined by TDX architecture.
This series assumes the TDX module are loaded by BIOS before the kernel
boots.

There's no CPUID or MSR to detect whether the TDX module has been loaded.
The SEAMCALL instruction fails with VMfailInvalid if the target SEAM
software (either the P-SEAMLDR or the TDX module) is not loaded. It can
be used to directly detect the TDX module.

The TDX module is initialized in multiple steps:

1) Global initialization;
2) Logical-CPU scope initialization;
3) Enumerate information of the TDX module and TDX capable memory.
4) Configure the TDX module about TDX-usable memory ranges and a global
TDX KeyID which protects the TDX module metadata.
5) Package-scope configuration for the global TDX KeyID;
6) Initialize TDX metadata for usable memory ranges based on 4).

Step 2) requires calling some SEAMCALL on all "BIOS-enabled" (in MADT
table) logical cpus, otherwise step 4) will fail. Step 5) requires
calling SEAMCALL on at least one cpu on all packages.

Also, the TDX module could already have been initialized or in shutdown
mode for a kexec()-ed kernel (see below kexec() section). In this case,
the first step of above process will fail immediately.

TDX module can also be shut down at any time during module's lifetime, by
calling SEAMCALL on all "BIOS-enabled" logical cpus.

== Design Considerations ==

1. Initialize the TDX module at runtime

There are basically two ways the TDX module could be initialized: either
in early boot, or at runtime before the first TDX guest is run. This
series implements the runtime initialization.

This series adds a function tdx_init() to allow the caller to initialize
TDX at runtime:

if (tdx_init())
goto no_tdx;
// TDX is ready to create TD guests.

This approach has below pros:

1) Initializing the TDX module requires to reserve ~1/256th system RAM as
metadata. Enabling TDX on demand allows only to consume this memory when
TDX is truly needed (i.e. when KVM wants to create TD guests).

2) SEAMCALL requires CPU being already in VMX operation (VMXON has been
done) otherwise it causes #UD. So far, KVM is the only user of TDX, and
it guarantees all online CPUs are in VMX operation when there's any VM.
Letting KVM to initialize TDX at runtime avoids handling VMXON/VMXOFF in
the core kernel. Also, in the long term, more kernel components will
need to use TDX thus likely a reference-based approach to do VMXON/VMXOFF
is needed in the core kernel.

3) It is more flexible to support "TDX module runtime update" (not in
this series). After updating to the new module at runtime, kernel needs
to go through the initialization process again. For the new module,
it's possible the metadata allocated for the old module cannot be reused
for the new module, and needs to be re-allocated again.

2. Kernel policy on TDX memory

The TDX architecture allows the VMM to designate specific memory as
usable for TDX private memory. This series chooses to designate _all_
system RAM as TDX to avoid having to modify the page allocator to
distinguish TDX and non-TDX-capable memory.

3. CPU hotplug

TDX doesn't work with ACPI CPU hotplug. To guarantee the security MCHECK
verifies all logical CPUs for all packages during platform boot. Any
hot-added CPU is not verified thus cannot support TDX. A non-buggy BIOS
should never deliver ACPI CPU hot-add event to the kernel. Such event is
reported as BIOS bug and the hot-added CPU is rejected.

TDX requires all boot-time verified logical CPUs being present until
machine reset. If kernel receives ACPI CPU hot-removal event, assume
kernel cannot continue to work normally and just BUG().

Note TDX works with CPU logical online/offline, thus the kernel still
allows to offline logical CPU and online it again.

4. Memory Hotplug

The TDX module reports a list of "Convertible Memory Region" (CMR) to
indicate which memory regions are TDX-capable. Those regions are
generated by BIOS and verified by the MCHECK so that they are truly
present during platform boot and can meet security guarantee.

This means TDX doesn't work with ACPI memory hot-add. A non-buggy BIOS
should never deliver ACPI memory hot-add event to the kernel. Such event
is reported as BIOS bug and the hot-added memory is rejected.

TDX also doesn't work with ACPI memory hot-removal. If kernel receives
ACPI memory hot-removal event, assume the kernel cannot continue to work
normally so just BUG().

Also, the kernel needs to choose which TDX-capable regions to use as TDX
memory and pass those regions to the TDX module when it gets initialized.
Once they are passed to the TDX module, the TDX-usable memory regions are
fixed during module's lifetime.

This series guarantees all pages managed by the page allocator are TDX
memory. This means any hot-added memory to the page allocator will break
such guarantee thus should be prevented.

There are basically two memory hot-add cases that need to be prevented:
ACPI memory hot-add and driver managed memory hot-add. This series
rejectes the driver managed memory hot-add too when TDX is enabled by
BIOS.

However, adding new memory to ZONE_DEVICE should not be prevented as
those pages are not managed by the page allocator. Therefore,
memremap_pages() variants are still allowed although they internally
also uses memory hotplug functions.

5. Kexec()

TDX (and MKTME) doesn't guarantee cache coherency among different KeyIDs.
If the TDX module is ever initialized, the kernel needs to flush dirty
cachelines associated with any TDX private KeyID, otherwise they may
slightly corrupt the new kernel.

Similar to SME support, the kernel uses wbinvd() to flush cache in
stop_this_cpu().

The current TDX module architecture doesn't play nicely with kexec().
The TDX module can only be initialized once during its lifetime, and
there is no SEAMCALL to reset the module to give a new clean slate to
the new kernel. Therefore, ideally, if the module is ever initialized,
it's better to shut down the module. The new kernel won't be able to
use TDX anyway (as it needs to go through the TDX module initialization
process which will fail immediately at the first step).

However, there's no guarantee CPU is in VMX operation during kexec(), so
it's impractical to shut down the module. This series just leaves the
module in open state.

Reference:
[1]: https://lore.kernel.org/lkml/[email protected]/T/
[2]: https://lore.kernel.org/linux-mm/[email protected]/t/


Kai Huang (22):
x86/virt/tdx: Detect TDX during kernel boot
cc_platform: Add new attribute to prevent ACPI CPU hotplug
cc_platform: Add new attribute to prevent ACPI memory hotplug
x86/virt/tdx: Prevent ACPI CPU hotplug and ACPI memory hotplug
x86/virt/tdx: Prevent hot-add driver managed memory
x86/virt/tdx: Add skeleton to initialize TDX on demand
x86/virt/tdx: Implement SEAMCALL function
x86/virt/tdx: Shut down TDX module in case of error
x86/virt/tdx: Detect TDX module by doing module global initialization
x86/virt/tdx: Do logical-cpu scope TDX module initialization
x86/virt/tdx: Get information about TDX module and TDX-capable memory
x86/virt/tdx: Convert all memory regions in memblock to TDX memory
x86/virt/tdx: Add placeholder to construct TDMRs based on memblock
x86/virt/tdx: Create TDMRs to cover all memblock memory regions
x86/virt/tdx: Allocate and set up PAMTs for TDMRs
x86/virt/tdx: Set up reserved areas for all TDMRs
x86/virt/tdx: Reserve TDX module global KeyID
x86/virt/tdx: Configure TDX module with TDMRs and global KeyID
x86/virt/tdx: Configure global KeyID on all packages
x86/virt/tdx: Initialize all TDMRs
x86/virt/tdx: Support kexec()
Documentation/x86: Add documentation for TDX host support

Documentation/x86/tdx.rst | 190 ++++-
arch/x86/Kconfig | 16 +
arch/x86/Makefile | 2 +
arch/x86/coco/core.c | 34 +-
arch/x86/include/asm/tdx.h | 9 +
arch/x86/kernel/process.c | 9 +-
arch/x86/mm/init_64.c | 21 +
arch/x86/virt/Makefile | 2 +
arch/x86/virt/vmx/Makefile | 2 +
arch/x86/virt/vmx/tdx/Makefile | 2 +
arch/x86/virt/vmx/tdx/seamcall.S | 52 ++
arch/x86/virt/vmx/tdx/tdx.c | 1333 ++++++++++++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 153 ++++
drivers/acpi/acpi_memhotplug.c | 23 +
drivers/acpi/acpi_processor.c | 23 +
include/linux/cc_platform.h | 25 +-
include/linux/memory_hotplug.h | 2 +
kernel/cpu.c | 2 +-
mm/memory_hotplug.c | 15 +
19 files changed, 1898 insertions(+), 17 deletions(-)
create mode 100644 arch/x86/virt/Makefile
create mode 100644 arch/x86/virt/vmx/Makefile
create mode 100644 arch/x86/virt/vmx/tdx/Makefile
create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S
create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
create mode 100644 arch/x86/virt/vmx/tdx/tdx.h

--
2.36.1


2022-06-22 11:40:41

by Kai Huang

[permalink] [raw]
Subject: [PATCH v5 09/22] x86/virt/tdx: Detect TDX module by doing module global initialization

So far the TDX module hasn't been detected yet. __seamcall() returns
TDX_SEAMCALL_VMFAILINVALID when the target SEAM software module is not
loaded. Just use __seamcall() to the TDX module to detect the TDX
module.

The first step of initializing the module is to call TDH.SYS.INIT once
on any logical cpu to do module global initialization. Just use it to
detect the module since it needs to be done anyway.

Signed-off-by: Kai Huang <[email protected]>
---

- v3 -> v5 (no feedback on v4):
- Add detecting TDX module.

---
arch/x86/virt/vmx/tdx/tdx.c | 39 +++++++++++++++++++++++++++++++++++--
arch/x86/virt/vmx/tdx/tdx.h | 1 +
2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 31ce4522100a..de4efc16ed45 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -180,6 +180,21 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
on_each_cpu(seamcall_smp_call_function, sc, true);
}

+/*
+ * Do TDX module global initialization. It also detects whether the
+ * module has been loaded or not.
+ */
+static int tdx_module_init_global(void)
+{
+ u64 ret;
+
+ ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL);
+ if (ret == TDX_SEAMCALL_VMFAILINVALID)
+ return -ENODEV;
+
+ return ret ? -EFAULT : 0;
+}
+
/*
* Detect and initialize the TDX module.
*
@@ -189,8 +204,28 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
*/
static int init_tdx_module(void)
{
- /* The TDX module hasn't been detected */
- return -ENODEV;
+ int ret;
+
+ /*
+ * Whether the TDX module is loaded is still unknown. SEAMCALL
+ * instruction fails with VMfailInvalid if the target SEAM
+ * software module is not loaded, so it can be used to detect the
+ * module.
+ *
+ * The first step of initializing the TDX module is module global
+ * initialization. Just use it to detect the module.
+ */
+ ret = tdx_module_init_global();
+ if (ret)
+ goto out;
+
+ /*
+ * Return -EINVAL until all steps of TDX module initialization
+ * process are done.
+ */
+ ret = -EINVAL;
+out:
+ return ret;
}

static void shutdown_tdx_module(void)
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 95d4eb884134..9e694789eb91 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -49,6 +49,7 @@
/*
* TDX module SEAMCALL leaf functions
*/
+#define TDH_SYS_INIT 33
#define TDH_SYS_LP_SHUTDOWN 44

/*
--
2.36.1

2022-06-22 11:41:45

by Kai Huang

[permalink] [raw]
Subject: [PATCH v5 06/22] x86/virt/tdx: Add skeleton to initialize TDX on demand

Before the TDX module can be used to create and run TD guests, it must
be loaded into the isolated region pointed by the SEAMRR and properly
initialized. The TDX module is expected to be loaded by BIOS before
booting to the kernel, and the kernel is expected to detect and
initialize it.

The TDX module can be initialized only once in its lifetime. Instead
of always initializing it at boot time, this implementation chooses an
on-demand approach to initialize TDX until there is a real need (e.g
when requested by KVM). This avoids consuming the memory that must be
allocated by kernel and given to the TDX module as metadata (~1/256th of
the TDX-usable memory), and also saves the time of initializing the TDX
module (and the metadata) when TDX is not used at all. Initializing the
TDX module at runtime on-demand also is more flexible to support TDX
module runtime updating in the future (after updating the TDX module, it
needs to be initialized again).

Add a placeholder tdx_init() to detect and initialize the TDX module on
demand, with a state machine protected by mutex to support concurrent
calls from multiple callers.

The TDX module will be initialized in multi-steps defined by the TDX
architecture:

1) Global initialization;
2) Logical-CPU scope initialization;
3) Enumerate the TDX module capabilities and platform configuration;
4) Configure the TDX module about usable memory ranges and global
KeyID information;
5) Package-scope configuration for the global KeyID;
6) Initialize usable memory ranges based on 4).

The TDX module can also be shut down at any time during its lifetime.
In case of any error during the initialization process, shut down the
module. It's pointless to leave the module in any intermediate state
during the initialization.

Signed-off-by: Kai Huang <[email protected]>
---

- v3->v5 (no feedback on v4):

- Removed the check that SEAMRR and TDX KeyID have been detected on
all present cpus.
- Removed tdx_detect().
- Added num_online_cpus() to MADT-enabled CPUs check within the CPU
hotplug lock and return early with error message.
- Improved dmesg printing for TDX module detection and initialization.

---
arch/x86/include/asm/tdx.h | 2 +
arch/x86/virt/vmx/tdx/tdx.c | 153 ++++++++++++++++++++++++++++++++++++
2 files changed, 155 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 97511b76c1ac..801f6e10b2db 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -90,8 +90,10 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,

#ifdef CONFIG_INTEL_TDX_HOST
bool platform_tdx_enabled(void);
+int tdx_init(void);
#else /* !CONFIG_INTEL_TDX_HOST */
static inline bool platform_tdx_enabled(void) { return false; }
+static inline int tdx_init(void) { return -ENODEV; }
#endif /* CONFIG_INTEL_TDX_HOST */

#endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index eb3294bf1b0a..1f9d8108eeea 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -10,17 +10,39 @@
#include <linux/types.h>
#include <linux/init.h>
#include <linux/printk.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
#include <asm/cpufeatures.h>
#include <asm/cpufeature.h>
#include <asm/msr-index.h>
#include <asm/msr.h>
+#include <asm/smp.h>
#include <asm/tdx.h>
#include <asm/coco.h>
#include "tdx.h"

+/*
+ * TDX module status during initialization
+ */
+enum tdx_module_status_t {
+ /* TDX module hasn't been detected and initialized */
+ TDX_MODULE_UNKNOWN,
+ /* TDX module is not loaded */
+ TDX_MODULE_NONE,
+ /* TDX module is initialized */
+ TDX_MODULE_INITIALIZED,
+ /* TDX module is shut down due to initialization error */
+ TDX_MODULE_SHUTDOWN,
+};
+
static u32 tdx_keyid_start __ro_after_init;
static u32 tdx_keyid_num __ro_after_init;

+static enum tdx_module_status_t tdx_module_status;
+/* Prevent concurrent attempts on TDX detection and initialization */
+static DEFINE_MUTEX(tdx_module_lock);
+
/* Detect whether CPU supports SEAM */
static int detect_seam(void)
{
@@ -101,6 +123,84 @@ static int __init tdx_early_detect(void)
}
early_initcall(tdx_early_detect);

+/*
+ * Detect and initialize the TDX module.
+ *
+ * Return -ENODEV when the TDX module is not loaded, 0 when it
+ * is successfully initialized, or other error when it fails to
+ * initialize.
+ */
+static int init_tdx_module(void)
+{
+ /* The TDX module hasn't been detected */
+ return -ENODEV;
+}
+
+static void shutdown_tdx_module(void)
+{
+ /* TODO: Shut down the TDX module */
+ tdx_module_status = TDX_MODULE_SHUTDOWN;
+}
+
+static int __tdx_init(void)
+{
+ int ret;
+
+ /*
+ * Initializing the TDX module requires running some code on
+ * all MADT-enabled CPUs. If not all MADT-enabled CPUs are
+ * online, it's not possible to initialize the TDX module.
+ *
+ * For simplicity temporarily disable CPU hotplug to prevent
+ * any CPU from going offline during the initialization.
+ */
+ cpus_read_lock();
+
+ /*
+ * Check whether all MADT-enabled CPUs are online and return
+ * early with an explicit message so the user can be aware.
+ *
+ * Note ACPI CPU hotplug is prevented when TDX is enabled, so
+ * num_processors always reflects all present MADT-enabled
+ * CPUs during boot when disabled_cpus is 0.
+ */
+ if (disabled_cpus || num_online_cpus() != num_processors) {
+ pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = init_tdx_module();
+ if (ret == -ENODEV) {
+ pr_info("TDX module is not loaded.\n");
+ goto out;
+ }
+
+ /*
+ * Shut down the TDX module in case of any error during the
+ * initialization process. It's meaningless to leave the TDX
+ * module in any middle state of the initialization process.
+ *
+ * Shutting down the module also requires running some code on
+ * all MADT-enabled CPUs. Do it while CPU hotplug is disabled.
+ *
+ * Return all errors during initialization as -EFAULT as
+ * the TDX module is always shut down in such cases.
+ */
+ if (ret) {
+ pr_info("Failed to initialize TDX module. Shut it down.\n");
+ shutdown_tdx_module();
+ ret = -EFAULT;
+ goto out;
+ }
+
+ pr_info("TDX module initialized.\n");
+out:
+ cpus_read_unlock();
+
+ return ret;
+}
+
/**
* platform_tdx_enabled() - Return whether BIOS has enabled TDX
*
@@ -111,3 +211,56 @@ bool platform_tdx_enabled(void)
{
return tdx_keyid_num >= 2;
}
+
+/**
+ * tdx_init - Initialize the TDX module
+ *
+ * Initialize the TDX module to make it ready to run TD guests.
+ *
+ * Caller to make sure all CPUs are online before calling this function.
+ * CPU hotplug is temporarily disabled internally to prevent any cpu
+ * from going offline.
+ *
+ * This function can be called in parallel by multiple callers.
+ *
+ * Return:
+ *
+ * * 0: The TDX module has been successfully initialized.
+ * * -ENODEV: The TDX module is not loaded, or TDX is not supported.
+ * * -EINVAL: The TDX module cannot be initialized due to certain
+ * conditions are not met (i.e. when not all MADT-enabled
+ * CPUs are not online).
+ * * -EFAULT: Other internal fatal errors, or the TDX module is in
+ * shutdown mode due to it failed to initialize in previous
+ * attempts.
+ */
+int tdx_init(void)
+{
+ int ret;
+
+ if (!platform_tdx_enabled())
+ return -ENODEV;
+
+ mutex_lock(&tdx_module_lock);
+
+ switch (tdx_module_status) {
+ case TDX_MODULE_UNKNOWN:
+ ret = __tdx_init();
+ break;
+ case TDX_MODULE_NONE:
+ ret = -ENODEV;
+ break;
+ case TDX_MODULE_INITIALIZED:
+ ret = 0;
+ break;
+ default:
+ WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN);
+ ret = -EFAULT;
+ break;
+ }
+
+ mutex_unlock(&tdx_module_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_init);
--
2.36.1

2022-06-22 11:53:34

by Kai Huang

[permalink] [raw]
Subject: [PATCH v5 14/22] x86/virt/tdx: Create TDMRs to cover all memblock memory regions

The kernel configures TDX-usable memory regions by passing an array of
"TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains the
information of the base/size of a memory region, the base/size of the
associated Physical Address Metadata Table (PAMT) and a list of reserved
areas in the region.

Create a number of TDMRs according to the memblock memory regions. To
keep it simple, always try to create one TDMR for each memory region.
As the first step only set up the base/size for each TDMR.

Each TDMR must be 1G aligned and the size must be in 1G granularity.
This implies that one TDMR could cover multiple memory regions. If a
memory region spans the 1GB boundary and the former part is already
covered by the previous TDMR, just create a new TDMR for the remaining
part.

TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs
are consumed but there is more memory region to cover.

Signed-off-by: Kai Huang <[email protected]>
---

- v3 -> v5 (no feedback on v4):
- Removed allocating TDMR individually.
- Improved changelog by using Dave's words.
- Made TDMR_START() and TDMR_END() as static inline function.

---
arch/x86/virt/vmx/tdx/tdx.c | 104 +++++++++++++++++++++++++++++++++++-
1 file changed, 103 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 645addb1bea2..fd9f449b5395 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -427,6 +427,24 @@ static int check_memblock_tdx_convertible(void)
return 0;
}

+/* TDMR must be 1gb aligned */
+#define TDMR_ALIGNMENT BIT_ULL(30)
+#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT)
+
+/* Align up and down the address to TDMR boundary */
+#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT)
+#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT)
+
+static inline u64 tdmr_start(struct tdmr_info *tdmr)
+{
+ return tdmr->base;
+}
+
+static inline u64 tdmr_end(struct tdmr_info *tdmr)
+{
+ return tdmr->base + tdmr->size;
+}
+
/* Calculate the actual TDMR_INFO size */
static inline int cal_tdmr_size(void)
{
@@ -464,6 +482,82 @@ static struct tdmr_info *alloc_tdmr_array(int *array_sz)
return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO);
}

+static struct tdmr_info *tdmr_array_entry(struct tdmr_info *tdmr_array,
+ int idx)
+{
+ return (struct tdmr_info *)((unsigned long)tdmr_array +
+ cal_tdmr_size() * idx);
+}
+
+/*
+ * Create TDMRs to cover all memory regions in memblock. The actual
+ * number of TDMRs is set to @tdmr_num.
+ */
+static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
+{
+ unsigned long start_pfn, end_pfn;
+ int i, nid, tdmr_idx = 0;
+
+ /*
+ * Loop over all memory regions in memblock and create TDMRs to
+ * cover them. To keep it simple, always try to use one TDMR to
+ * cover memory region.
+ */
+ memblock_for_each_tdx_mem_pfn_range(i, &start_pfn, &end_pfn, &nid) {
+ struct tdmr_info *tdmr;
+ u64 start, end;
+
+ tdmr = tdmr_array_entry(tdmr_array, tdmr_idx);
+ start = TDMR_ALIGN_DOWN(start_pfn << PAGE_SHIFT);
+ end = TDMR_ALIGN_UP(end_pfn << PAGE_SHIFT);
+
+ /*
+ * If the current TDMR's size hasn't been initialized,
+ * it is a new TDMR to cover the new memory region.
+ * Otherwise, the current TDMR has already covered the
+ * previous memory region. In the latter case, check
+ * whether the current memory region has been fully or
+ * partially covered by the current TDMR, since TDMR is
+ * 1G aligned.
+ */
+ if (tdmr->size) {
+ /*
+ * Loop to the next memory region if the current
+ * region has already fully covered by the
+ * current TDMR.
+ */
+ if (end <= tdmr_end(tdmr))
+ continue;
+
+ /*
+ * If part of the current memory region has
+ * already been covered by the current TDMR,
+ * skip the already covered part.
+ */
+ if (start < tdmr_end(tdmr))
+ start = tdmr_end(tdmr);
+
+ /*
+ * Create a new TDMR to cover the current memory
+ * region, or the remaining part of it.
+ */
+ tdmr_idx++;
+ if (tdmr_idx >= tdx_sysinfo.max_tdmrs)
+ return -E2BIG;
+
+ tdmr = tdmr_array_entry(tdmr_array, tdmr_idx);
+ }
+
+ tdmr->base = start;
+ tdmr->size = end - start;
+ }
+
+ /* @tdmr_idx is always the index of last valid TDMR. */
+ *tdmr_num = tdmr_idx + 1;
+
+ return 0;
+}
+
/*
* Construct an array of TDMRs to cover all memory regions in memblock.
* This makes sure all pages managed by the page allocator are TDX
@@ -472,8 +566,16 @@ static struct tdmr_info *alloc_tdmr_array(int *array_sz)
static int construct_tdmrs_memeblock(struct tdmr_info *tdmr_array,
int *tdmr_num)
{
+ int ret;
+
+ ret = create_tdmrs(tdmr_array, tdmr_num);
+ if (ret)
+ goto err;
+
/* Return -EINVAL until constructing TDMRs is done */
- return -EINVAL;
+ ret = -EINVAL;
+err:
+ return ret;
}

/*
--
2.36.1

2022-06-22 11:54:10

by Kai Huang

[permalink] [raw]
Subject: [PATCH v5 22/22] Documentation/x86: Add documentation for TDX host support

Add documentation for TDX host kernel support. There is already one
file Documentation/x86/tdx.rst containing documentation for TDX guest
internals. Also reuse it for TDX host kernel support.

Introduce a new level menu "TDX Guest Support" and move existing
materials under it, and add a new menu for TDX host kernel support.

Signed-off-by: Kai Huang <[email protected]>
---
Documentation/x86/tdx.rst | 190 +++++++++++++++++++++++++++++++++++---
1 file changed, 179 insertions(+), 11 deletions(-)

diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst
index b8fa4329e1a5..6c6b09ca6ba4 100644
--- a/Documentation/x86/tdx.rst
+++ b/Documentation/x86/tdx.rst
@@ -10,6 +10,174 @@ encrypting the guest memory. In TDX, a special module running in a special
mode sits between the host and the guest and manages the guest/host
separation.

+TDX Host Kernel Support
+=======================
+
+TDX introduces a new CPU mode called Secure Arbitration Mode (SEAM) and
+a new isolated range pointed by the SEAM Ranger Register (SEAMRR). A
+CPU-attested software module called 'the TDX module' runs inside the new
+isolated range to provide the functionalities to manage and run protected
+VMs.
+
+TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to
+provide crypto-protection to the VMs. TDX reserves part of MKTME KeyIDs
+as TDX private KeyIDs, which are only accessible within the SEAM mode.
+BIOS is responsible for partitioning legacy MKTME KeyIDs and TDX KeyIDs.
+
+To enable TDX, BIOS configures SEAMRR and TDX private KeyIDs consistently
+across all CPU packages. TDX doesn't trust BIOS. The MCHECK verifies
+all configurations from BIOS are correct and enables SEAMRR.
+
+After TDX is enabled in BIOS, the TDX module needs to be loaded into the
+SEAMRR range and properly initialized, before it can be used to create
+and run protected VMs.
+
+The TDX architecture doesn't require BIOS to load the TDX module, but
+current kernel assumes it is loaded by BIOS (i.e. either directly or by
+some UEFI shell tool) before booting to the kernel. Current kernel
+detects TDX and initializes the TDX module.
+
+TDX boot-time detection
+-----------------------
+
+Kernel detects TDX and the TDX private KeyIDs during kernel boot. User
+can see below dmesg if TDX is enabled by BIOS:
+
+| [..] tdx: SEAMRR enabled.
+| [..] tdx: TDX private KeyID range: [16, 64).
+| [..] tdx: TDX enabled by BIOS.
+
+TDX module detection and initialization
+---------------------------------------
+
+There is no CPUID or MSR to detect whether the TDX module. The kernel
+detects the TDX module by initializing it.
+
+The kernel talks to the TDX module via the new SEAMCALL instruction. The
+TDX module implements SEAMCALL leaf functions to allow the kernel to
+initialize it.
+
+Initializing the TDX module consumes roughly ~1/256th system RAM size to
+use it as 'metadata' for the TDX memory. It also takes additional CPU
+time to initialize those metadata along with the TDX module itself. Both
+are not trivial. Current kernel doesn't choose to always initialize the
+TDX module during kernel boot, but provides a function tdx_init() to
+allow the caller to initialize TDX when it truly wants to use TDX:
+
+ ret = tdx_init();
+ if (ret)
+ goto no_tdx;
+ // TDX is ready to use
+
+Initializing the TDX module requires all logical CPUs being online and
+are in VMX operation (requirement of making SEAMCALL) during tdx_init().
+Currently, KVM is the only user of TDX. KVM always guarantees all online
+CPUs are in VMX operation when there's any VM. Current kernel doesn't
+handle entering VMX operation in tdx_init() but leaves this to the
+caller.
+
+User can consult dmesg to see the presence of the TDX module, and whether
+it has been initialized.
+
+If the TDX module is not loaded, dmesg shows below:
+
+| [..] tdx: TDX module is not loaded.
+
+If the TDX module is initialized successfully, dmesg shows something
+like below:
+
+| [..] tdx: TDX module: vendor_id 0x8086, major_version 1, minor_version 0, build_date 20211209, build_num 160
+| [..] tdx: 65667 pages allocated for PAMT.
+| [..] tdx: TDX module initialized.
+
+If the TDX module failed to initialize, dmesg shows below:
+
+| [..] tdx: Failed to initialize TDX module. Shut it down.
+
+TDX Interaction to Other Kernel Components
+------------------------------------------
+
+CPU Hotplug
+~~~~~~~~~~~
+
+TDX doesn't work with ACPI CPU hotplug. To guarantee the security MCHECK
+verifies all logical CPUs for all packages during platform boot. Any
+hot-added CPU is not verified thus cannot support TDX. A non-buggy BIOS
+should never deliver ACPI CPU hot-add event to the kernel. Such event is
+reported as BIOS bug and the hot-added CPU is rejected.
+
+TDX requires all boot-time verified logical CPUs being present until
+machine reset. If kernel receives ACPI CPU hot-removal event, assume the
+kernel cannot continue to work normally so just BUG().
+
+Note TDX works with CPU logical online/offline, thus the kernel still
+allows to offline logical CPU and online it again.
+
+Memory Hotplug
+~~~~~~~~~~~~~~
+
+The TDX module reports a list of "Convertible Memory Region" (CMR) to
+indicate which memory regions are TDX-capable. Those regions are
+generated by BIOS and verified by the MCHECK so that they are truly
+present during platform boot and can meet security guarantee.
+
+This means TDX doesn't work with ACPI memory hot-add. A non-buggy BIOS
+should never deliver ACPI memory hot-add event to the kernel. Such event
+is reported as BIOS bug and the hot-added memory is rejected.
+
+TDX also doesn't work with ACPI memory hot-removal. If kernel receives
+ACPI memory hot-removal event, assume the kernel cannot continue to work
+normally so just BUG().
+
+Also, the kernel needs to choose which TDX-capable regions to use as TDX
+memory and pass those regions to the TDX module when it gets initialized.
+Once they are passed to the TDX module, the TDX-usable memory regions are
+fixed during module's lifetime.
+
+To avoid having to modify the page allocator to distinguish TDX and
+non-TDX memory allocation, current kernel guarantees all pages managed by
+the page allocator are TDX memory. This means any hot-added memory to
+the page allocator will break such guarantee thus should be prevented.
+
+There are basically two memory hot-add cases that need to be prevented:
+ACPI memory hot-add and driver managed memory hot-add. The kernel
+rejectes the driver managed memory hot-add too when TDX is enabled by
+BIOS. For instance, dmesg shows below error when using kmem driver to
+add a legacy PMEM as system RAM:
+
+| [..] tdx: Unable to add memory [0x580000000, 0x600000000) on TDX enabled platform.
+| [..] kmem dax0.0: mapping0: 0x580000000-0x5ffffffff memory add failed
+
+However, adding new memory to ZONE_DEVICE should not be prevented as
+those pages are not managed by the page allocator. Therefore,
+memremap_pages() variants are still allowed although they internally
+also uses memory hotplug functions.
+
+Kexec()
+~~~~~~~
+
+TDX (and MKTME) doesn't guarantee cache coherency among different KeyIDs.
+If the TDX module is ever initialized, the kernel needs to flush dirty
+cachelines associated with any TDX private KeyID, otherwise they may
+slightly corrupt the new kernel.
+
+Similar to SME support, the kernel uses wbinvd() to flush cache in
+stop_this_cpu().
+
+The current TDX module architecture doesn't play nicely with kexec().
+The TDX module can only be initialized once during its lifetime, and
+there is no SEAMCALL to reset the module to give a new clean slate to
+the new kernel. Therefore, ideally, if the module is ever initialized,
+it's better to shut down the module. The new kernel won't be able to
+use TDX anyway (as it needs to go through the TDX module initialization
+process which will fail immediately at the first step).
+
+However, there's no guarantee CPU is in VMX operation during kexec(), so
+it's impractical to shut down the module. Current kernel just leaves the
+module in open state.
+
+TDX Guest Support
+=================
Since the host cannot directly access guest registers or memory, much
normal functionality of a hypervisor must be moved into the guest. This is
implemented using a Virtualization Exception (#VE) that is handled by the
@@ -20,7 +188,7 @@ TDX includes new hypercall-like mechanisms for communicating from the
guest to the hypervisor or the TDX module.

New TDX Exceptions
-==================
+------------------

TDX guests behave differently from bare-metal and traditional VMX guests.
In TDX guests, otherwise normal instructions or memory accesses can cause
@@ -30,7 +198,7 @@ Instructions marked with an '*' conditionally cause exceptions. The
details for these instructions are discussed below.

Instruction-based #VE
----------------------
+~~~~~~~~~~~~~~~~~~~~~

- Port I/O (INS, OUTS, IN, OUT)
- HLT
@@ -41,7 +209,7 @@ Instruction-based #VE
- CPUID*

Instruction-based #GP
----------------------
+~~~~~~~~~~~~~~~~~~~~~

- All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH,
VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON
@@ -52,7 +220,7 @@ Instruction-based #GP
- RDMSR*,WRMSR*

RDMSR/WRMSR Behavior
---------------------
+~~~~~~~~~~~~~~~~~~~~

MSR access behavior falls into three categories:

@@ -73,7 +241,7 @@ trapping and handling in the TDX module. Other than possibly being slow,
these MSRs appear to function just as they would on bare metal.

CPUID Behavior
---------------
+~~~~~~~~~~~~~~

For some CPUID leaves and sub-leaves, the virtualized bit fields of CPUID
return values (in guest EAX/EBX/ECX/EDX) are configurable by the
@@ -93,7 +261,7 @@ not know how to handle. The guest kernel may ask the hypervisor for the
value with a hypercall.

#VE on Memory Accesses
-======================
+----------------------

There are essentially two classes of TDX memory: private and shared.
Private memory receives full TDX protections. Its content is protected
@@ -107,7 +275,7 @@ entries. This helps ensure that a guest does not place sensitive
information in shared memory, exposing it to the untrusted hypervisor.

#VE on Shared Memory
---------------------
+~~~~~~~~~~~~~~~~~~~~

Access to shared mappings can cause a #VE. The hypervisor ultimately
controls whether a shared memory access causes a #VE, so the guest must be
@@ -127,7 +295,7 @@ be careful not to access device MMIO regions unless it is also prepared to
handle a #VE.

#VE on Private Pages
---------------------
+~~~~~~~~~~~~~~~~~~~~

An access to private mappings can also cause a #VE. Since all kernel
memory is also private memory, the kernel might theoretically need to
@@ -145,7 +313,7 @@ The hypervisor is permitted to unilaterally move accepted pages to a
to handle the exception.

Linux #VE handler
-=================
+-----------------

Just like page faults or #GP's, #VE exceptions can be either handled or be
fatal. Typically, an unhandled userspace #VE results in a SIGSEGV.
@@ -167,7 +335,7 @@ While the block is in place, any #VE is elevated to a double fault (#DF)
which is not recoverable.

MMIO handling
-=============
+-------------

In non-TDX VMs, MMIO is usually implemented by giving a guest access to a
mapping which will cause a VMEXIT on access, and then the hypervisor
@@ -189,7 +357,7 @@ MMIO access via other means (like structure overlays) may result in an
oops.

Shared Memory Conversions
-=========================
+-------------------------

All TDX guest memory starts out as private at boot. This memory can not
be accessed by the hypervisor. However, some kernel users like device
--
2.36.1

2022-06-22 11:56:22

by Kai Huang

[permalink] [raw]
Subject: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

TDX supports shutting down the TDX module at any time during its
lifetime. After the module is shut down, no further TDX module SEAMCALL
leaf functions can be made to the module on any logical cpu.

Shut down the TDX module in case of any error during the initialization
process. It's pointless to leave the TDX module in some middle state.

Shutting down the TDX module requires calling TDH.SYS.LP.SHUTDOWN on all
BIOS-enabled CPUs, and the SEMACALL can run concurrently on different
CPUs. Implement a mechanism to run SEAMCALL concurrently on all online
CPUs and use it to shut down the module. Later logical-cpu scope module
initialization will use it too.

Also add a wrapper of __seamcall() which additionally prints out the
error information if SEAMCALL fails. It will be useful during the TDX
module initialization as it provides more error information to the user.

SEAMCALL instruction causes #UD if CPU is not in VMX operation (VMXON
has been done). So far only KVM supports VMXON. It guarantees all
online CPUs are in VMX operation when there's any VM still exists. As
so far KVM is also the only user of TDX, choose to just let the caller
to guarantee all CPUs are in VMX operation during tdx_init().

Adding the support of VMXON/VMXOFF to the core kernel isn't trivial.
In the long term, more kernel components will likely need to use TDX so
a reference-based approach to do VMXON/VMXOFF will likely be needed.

Signed-off-by: Kai Huang <[email protected]>
---

- v3 -> v5 (no feedback on v4):

- Added a wrapper of __seamcall() to print error code if SEAMCALL fails.
- Made the seamcall_on_each_cpu() void.
- Removed 'seamcall_ret' and 'tdx_module_out' from
'struct seamcall_ctx', as they must be local variable.
- Added the comments to tdx_init() and one paragraph to changelog to
explain the caller should handle VMXON.
- Called out after shut down, no "TDX module" SEAMCALL can be made.

---
arch/x86/virt/vmx/tdx/tdx.c | 65 ++++++++++++++++++++++++++++++++++++-
arch/x86/virt/vmx/tdx/tdx.h | 5 +++
2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 1f9d8108eeea..31ce4522100a 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -13,6 +13,8 @@
#include <linux/mutex.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
+#include <linux/smp.h>
+#include <linux/atomic.h>
#include <asm/cpufeatures.h>
#include <asm/cpufeature.h>
#include <asm/msr-index.h>
@@ -123,6 +125,61 @@ static int __init tdx_early_detect(void)
}
early_initcall(tdx_early_detect);

+/*
+ * Data structure to make SEAMCALL on multiple CPUs concurrently.
+ * @err is set to -EFAULT when SEAMCALL fails on any cpu.
+ */
+struct seamcall_ctx {
+ u64 fn;
+ u64 rcx;
+ u64 rdx;
+ u64 r8;
+ u64 r9;
+ atomic_t err;
+};
+
+/*
+ * Wrapper of __seamcall(). It additionally prints out the error
+ * informationi if __seamcall() fails normally. It is useful during
+ * the module initialization by providing more information to the user.
+ */
+static u64 seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+ struct tdx_module_output *out)
+{
+ u64 ret;
+
+ ret = __seamcall(fn, rcx, rdx, r8, r9, out);
+ if (ret == TDX_SEAMCALL_VMFAILINVALID || !ret)
+ return ret;
+
+ pr_err("SEAMCALL failed: leaf: 0x%llx, error: 0x%llx\n", fn, ret);
+ if (out)
+ pr_err("SEAMCALL additional output: rcx 0x%llx, rdx 0x%llx, r8 0x%llx, r9 0x%llx, r10 0x%llx, r11 0x%llx.\n",
+ out->rcx, out->rdx, out->r8, out->r9, out->r10, out->r11);
+
+ return ret;
+}
+
+static void seamcall_smp_call_function(void *data)
+{
+ struct seamcall_ctx *sc = data;
+ struct tdx_module_output out;
+ u64 ret;
+
+ ret = seamcall(sc->fn, sc->rcx, sc->rdx, sc->r8, sc->r9, &out);
+ if (ret)
+ atomic_set(&sc->err, -EFAULT);
+}
+
+/*
+ * Call the SEAMCALL on all online CPUs concurrently. Caller to check
+ * @sc->err to determine whether any SEAMCALL failed on any cpu.
+ */
+static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
+{
+ on_each_cpu(seamcall_smp_call_function, sc, true);
+}
+
/*
* Detect and initialize the TDX module.
*
@@ -138,7 +195,10 @@ static int init_tdx_module(void)

static void shutdown_tdx_module(void)
{
- /* TODO: Shut down the TDX module */
+ struct seamcall_ctx sc = { .fn = TDH_SYS_LP_SHUTDOWN };
+
+ seamcall_on_each_cpu(&sc);
+
tdx_module_status = TDX_MODULE_SHUTDOWN;
}

@@ -221,6 +281,9 @@ bool platform_tdx_enabled(void)
* CPU hotplug is temporarily disabled internally to prevent any cpu
* from going offline.
*
+ * Caller also needs to guarantee all CPUs are in VMX operation during
+ * this function, otherwise Oops may be triggered.
+ *
* This function can be called in parallel by multiple callers.
*
* Return:
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index f1a2dfb978b1..95d4eb884134 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -46,6 +46,11 @@
#define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32))


+/*
+ * TDX module SEAMCALL leaf functions
+ */
+#define TDH_SYS_LP_SHUTDOWN 44
+
/*
* Do not put any hardware-defined TDX structure representations below this
* comment!
--
2.36.1

2022-06-22 11:59:39

by Kai Huang

[permalink] [raw]
Subject: [PATCH v5 20/22] x86/virt/tdx: Initialize all TDMRs

Initialize TDMRs via TDH.SYS.TDMR.INIT as the last step to complete the
TDX initialization.

All TDMRs need to be initialized using TDH.SYS.TDMR.INIT SEAMCALL before
the memory pages can be used by the TDX module. The time to initialize
TDMR is proportional to the size of the TDMR because TDH.SYS.TDMR.INIT
internally initializes the PAMT entries using the global KeyID.

To avoid long latency caused in one SEAMCALL, TDH.SYS.TDMR.INIT only
initializes an (implementation-specific) subset of PAMT entries of one
TDMR in one invocation. The caller needs to call TDH.SYS.TDMR.INIT
iteratively until all PAMT entries of the given TDMR are initialized.

TDH.SYS.TDMR.INITs can run concurrently on multiple CPUs as long as they
are initializing different TDMRs. To keep it simple, just initialize
all TDMRs one by one. On a 2-socket machine with 2.2G CPUs and 64GB
memory, each TDH.SYS.TDMR.INIT roughly takes ~7us on average, and it
takes roughly ~100ms to complete initializing all TDMRs while system is
idle.

Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 70 ++++++++++++++++++++++++++++++++++---
arch/x86/virt/vmx/tdx/tdx.h | 1 +
2 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index b9777a353835..da1af1b60c35 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1019,6 +1019,65 @@ static int config_global_keyid(void)
return seamcall_on_each_package_serialized(&sc);
}

+/* Initialize one TDMR */
+static int init_tdmr(struct tdmr_info *tdmr)
+{
+ u64 next;
+
+ /*
+ * Initializing PAMT entries might be time-consuming (in
+ * proportion to the size of the requested TDMR). To avoid long
+ * latency in one SEAMCALL, TDH.SYS.TDMR.INIT only initializes
+ * an (implementation-defined) subset of PAMT entries in one
+ * invocation.
+ *
+ * Call TDH.SYS.TDMR.INIT iteratively until all PAMT entries
+ * of the requested TDMR are initialized (if next-to-initialize
+ * address matches the end address of the TDMR).
+ */
+ do {
+ struct tdx_module_output out;
+ u64 ret;
+
+ ret = seamcall(TDH_SYS_TDMR_INIT, tdmr->base, 0, 0, 0, &out);
+ if (ret)
+ return -EFAULT;
+ /*
+ * RDX contains 'next-to-initialize' address if
+ * TDH.SYS.TDMR.INT succeeded.
+ */
+ next = out.rdx;
+ /* Allow scheduling when needed */
+ if (need_resched())
+ cond_resched();
+ } while (next < tdmr->base + tdmr->size);
+
+ return 0;
+}
+
+/* Initialize all TDMRs */
+static int init_tdmrs(struct tdmr_info *tdmr_array, int tdmr_num)
+{
+ int i;
+
+ /*
+ * Initialize TDMRs one-by-one for simplicity, though the TDX
+ * architecture does allow different TDMRs to be initialized in
+ * parallel on multiple CPUs. Parallel initialization could
+ * be added later when the time spent in the serialized scheme
+ * becomes a real concern.
+ */
+ for (i = 0; i < tdmr_num; i++) {
+ int ret;
+
+ ret = init_tdmr(tdmr_array_entry(tdmr_array, i));
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
/*
* Detect and initialize the TDX module.
*
@@ -1109,11 +1168,12 @@ static int init_tdx_module(void)
if (ret)
goto out_free_pamts;

- /*
- * Return -EINVAL until all steps of TDX module initialization
- * process are done.
- */
- ret = -EINVAL;
+ /* Initialize TDMRs to complete the TDX module initialization */
+ ret = init_tdmrs(tdmr_array, tdmr_num);
+ if (ret)
+ goto out_free_pamts;
+
+ tdx_module_status = TDX_MODULE_INITIALIZED;
out_free_pamts:
if (ret) {
/*
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 2d25a93b89ef..e0309558be13 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -53,6 +53,7 @@
#define TDH_SYS_INFO 32
#define TDH_SYS_INIT 33
#define TDH_SYS_LP_INIT 35
+#define TDH_SYS_TDMR_INIT 36
#define TDH_SYS_LP_SHUTDOWN 44
#define TDH_SYS_CONFIG 45

--
2.36.1

2022-06-24 02:54:00

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v5 06/22] x86/virt/tdx: Add skeleton to initialize TDX on demand

On Wed, Jun 22, 2022 at 11:16:29PM +1200, Kai Huang wrote:
>Before the TDX module can be used to create and run TD guests, it must
>be loaded into the isolated region pointed by the SEAMRR and properly
>initialized. The TDX module is expected to be loaded by BIOS before
>booting to the kernel, and the kernel is expected to detect and
>initialize it.
>
>The TDX module can be initialized only once in its lifetime. Instead
>of always initializing it at boot time, this implementation chooses an
>on-demand approach to initialize TDX until there is a real need (e.g
>when requested by KVM). This avoids consuming the memory that must be
>allocated by kernel and given to the TDX module as metadata (~1/256th of
>the TDX-usable memory), and also saves the time of initializing the TDX
>module (and the metadata) when TDX is not used at all. Initializing the
>TDX module at runtime on-demand also is more flexible to support TDX
>module runtime updating in the future (after updating the TDX module, it
>needs to be initialized again).
>
>Add a placeholder tdx_init() to detect and initialize the TDX module on
>demand, with a state machine protected by mutex to support concurrent
>calls from multiple callers.
>
>The TDX module will be initialized in multi-steps defined by the TDX
>architecture:
>
> 1) Global initialization;
> 2) Logical-CPU scope initialization;
> 3) Enumerate the TDX module capabilities and platform configuration;
> 4) Configure the TDX module about usable memory ranges and global
> KeyID information;
> 5) Package-scope configuration for the global KeyID;
> 6) Initialize usable memory ranges based on 4).
>
>The TDX module can also be shut down at any time during its lifetime.
>In case of any error during the initialization process, shut down the
>module. It's pointless to leave the module in any intermediate state
>during the initialization.
>
>Signed-off-by: Kai Huang <[email protected]>

Reviewed-by: Chao Gao <[email protected]>

One nit below:

>+static int __tdx_init(void)
>+{
>+ int ret;
>+
>+ /*
>+ * Initializing the TDX module requires running some code on
>+ * all MADT-enabled CPUs. If not all MADT-enabled CPUs are
>+ * online, it's not possible to initialize the TDX module.
>+ *
>+ * For simplicity temporarily disable CPU hotplug to prevent
>+ * any CPU from going offline during the initialization.
>+ */
>+ cpus_read_lock();
>+
>+ /*
>+ * Check whether all MADT-enabled CPUs are online and return
>+ * early with an explicit message so the user can be aware.
>+ *
>+ * Note ACPI CPU hotplug is prevented when TDX is enabled, so
>+ * num_processors always reflects all present MADT-enabled
>+ * CPUs during boot when disabled_cpus is 0.
>+ */
>+ if (disabled_cpus || num_online_cpus() != num_processors) {
>+ pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n");
>+ ret = -EINVAL;
>+ goto out;
>+ }
>+
>+ ret = init_tdx_module();
>+ if (ret == -ENODEV) {
>+ pr_info("TDX module is not loaded.\n");

tdx_module_status should be set to TDX_MODULE_NONE here.

>+ goto out;
>+ }

2022-06-24 11:43:36

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v5 06/22] x86/virt/tdx: Add skeleton to initialize TDX on demand


> > + ret = init_tdx_module();
> > + if (ret == -ENODEV) {
> > + pr_info("TDX module is not loaded.\n");
>
> tdx_module_status should be set to TDX_MODULE_NONE here.

Thanks. Will fix.

>
> > + goto out;
> > + }

--
Thanks,
-Kai


2022-06-24 18:59:32

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

So, the last patch was called:

Implement SEAMCALL function

and yet, in this patch, we have a "seamcall()" function. That's a bit
confusing and not covered at *all* in this subject.

Further, seamcall() is the *ONLY* caller of __seamcall() that I see in
this series. That makes its presence here even more odd.

The seamcall() bits should either be in their own patch, or mashed in
with __seamcall().

> +/*
> + * Wrapper of __seamcall(). It additionally prints out the error
> + * informationi if __seamcall() fails normally. It is useful during
> + * the module initialization by providing more information to the user.
> + */
> +static u64 seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> + struct tdx_module_output *out)
> +{
> + u64 ret;
> +
> + ret = __seamcall(fn, rcx, rdx, r8, r9, out);
> + if (ret == TDX_SEAMCALL_VMFAILINVALID || !ret)
> + return ret;
> +
> + pr_err("SEAMCALL failed: leaf: 0x%llx, error: 0x%llx\n", fn, ret);
> + if (out)
> + pr_err("SEAMCALL additional output: rcx 0x%llx, rdx 0x%llx, r8 0x%llx, r9 0x%llx, r10 0x%llx, r11 0x%llx.\n",
> + out->rcx, out->rdx, out->r8, out->r9, out->r10, out->r11);
> +
> + return ret;
> +}
> +
> +static void seamcall_smp_call_function(void *data)
> +{
> + struct seamcall_ctx *sc = data;
> + struct tdx_module_output out;
> + u64 ret;
> +
> + ret = seamcall(sc->fn, sc->rcx, sc->rdx, sc->r8, sc->r9, &out);
> + if (ret)
> + atomic_set(&sc->err, -EFAULT);
> +}
> +
> +/*
> + * Call the SEAMCALL on all online CPUs concurrently. Caller to check
> + * @sc->err to determine whether any SEAMCALL failed on any cpu.
> + */
> +static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
> +{
> + on_each_cpu(seamcall_smp_call_function, sc, true);
> +}

You can get away with this three-liner seamcall_on_each_cpu() being in
this patch, but seamcall() itself doesn't belong here.

> /*
> * Detect and initialize the TDX module.
> *
> @@ -138,7 +195,10 @@ static int init_tdx_module(void)
>
> static void shutdown_tdx_module(void)
> {
> - /* TODO: Shut down the TDX module */
> + struct seamcall_ctx sc = { .fn = TDH_SYS_LP_SHUTDOWN };
> +
> + seamcall_on_each_cpu(&sc);
> +
> tdx_module_status = TDX_MODULE_SHUTDOWN;
> }
>
> @@ -221,6 +281,9 @@ bool platform_tdx_enabled(void)
> * CPU hotplug is temporarily disabled internally to prevent any cpu
> * from going offline.
> *
> + * Caller also needs to guarantee all CPUs are in VMX operation during
> + * this function, otherwise Oops may be triggered.

I would *MUCH* rather have this be a:

if (!cpu_feature_enabled(X86_FEATURE_VMX))
WARN_ONCE("VMX should be on blah blah\n");

than just plain oops. Even a pr_err() that preceded the oops would be
nicer than an oops that someone has to go decode and then grumble when
their binutils is too old that it can't disassemble the TDCALL.



> * This function can be called in parallel by multiple callers.
> *
> * Return:
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index f1a2dfb978b1..95d4eb884134 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -46,6 +46,11 @@
> #define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32))
>
>
> +/*
> + * TDX module SEAMCALL leaf functions
> + */
> +#define TDH_SYS_LP_SHUTDOWN 44
> +
> /*
> * Do not put any hardware-defined TDX structure representations below this
> * comment!

2022-06-24 19:52:18

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 00/22] TDX host kernel support

On 6/22/22 04:15, Kai Huang wrote:
> Please kindly help to review, and I would appreciate reviewed-by or
> acked-by tags if the patches look good to you.

Serious question: Is *ANYONE* looking at these patches other than you
and the maintainers? I first saw this code (inside Intel) in early
2020. In that time, not a single review tag has been acquired?

$ egrep -ic 'acked-by:|reviewed-by:' kais-patches.mbox
0

2022-06-27 04:45:13

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v5 00/22] TDX host kernel support

On Fri, 2022-06-24 at 12:47 -0700, Dave Hansen wrote:
> On 6/22/22 04:15, Kai Huang wrote:
> > Please kindly help to review, and I would appreciate reviewed-by or
> > acked-by tags if the patches look good to you.
>
> Serious question: Is *ANYONE* looking at these patches other than you
> and the maintainers? I first saw this code (inside Intel) in early
> 2020. In that time, not a single review tag has been acquired?
>
> $ egrep -ic 'acked-by:|reviewed-by:' kais-patches.mbox
> 0

Hi Dave,

There were big design changes in the history of this series (i.e. we originally
supported loading both the NP-SEAMLDR ACM and the TDX module during boot, and we
changed from initializing the module from during kernel boot to at runtime), but
yes some other Linux/KVM TDX developers in our team have been reviewing this
series during the all time, at least at some extent. They just didn't give
Reviewed-by or Acked-by.

Especially, after we had agreed that this series in general should enable TDX
with minimal code change, Kevin helped to review this series intensively and
helped to simplify the code to the current shape (i.e. TDMR part). He didn't
give any of tags either (only said this series is ready for you to review),
perhaps because he was _helping_ to get this series to the shape that is ready
for you and other Intel reviewers to review.

--
Thanks,
-Kai


2022-06-27 05:27:30

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

On Fri, 2022-06-24 at 11:50 -0700, Dave Hansen wrote:
> So, the last patch was called:
>
> Implement SEAMCALL function
>
> and yet, in this patch, we have a "seamcall()" function. That's a bit
> confusing and not covered at *all* in this subject.
>
> Further, seamcall() is the *ONLY* caller of __seamcall() that I see in
> this series. That makes its presence here even more odd.
>
> The seamcall() bits should either be in their own patch, or mashed in
> with __seamcall().

Right. The reason I didn't put the seamcall() into previous patch was it is
only used in this tdx.c, so it should be static. But adding a static function
w/o using it in previous patch will trigger a compile warning. So I introduced
here where it is first used.

One option is I can introduce seamcall() as a static inline function in tdx.h in
previous patch so there won't be a warning. I'll change to use this way.
Please let me know if you have any comments.

>
> > +/*
> > + * Wrapper of __seamcall(). It additionally prints out the error
> > + * informationi if __seamcall() fails normally. It is useful during
> > + * the module initialization by providing more information to the user.
> > + */
> > +static u64 seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> > + struct tdx_module_output *out)
> > +{
> > + u64 ret;
> > +
> > + ret = __seamcall(fn, rcx, rdx, r8, r9, out);
> > + if (ret == TDX_SEAMCALL_VMFAILINVALID || !ret)
> > + return ret;
> > +
> > + pr_err("SEAMCALL failed: leaf: 0x%llx, error: 0x%llx\n", fn, ret);
> > + if (out)
> > + pr_err("SEAMCALL additional output: rcx 0x%llx, rdx 0x%llx, r8 0x%llx, r9 0x%llx, r10 0x%llx, r11 0x%llx.\n",
> > + out->rcx, out->rdx, out->r8, out->r9, out->r10, out->r11);
> > +
> > + return ret;
> > +}
> > +
> > +static void seamcall_smp_call_function(void *data)
> > +{
> > + struct seamcall_ctx *sc = data;
> > + struct tdx_module_output out;
> > + u64 ret;
> > +
> > + ret = seamcall(sc->fn, sc->rcx, sc->rdx, sc->r8, sc->r9, &out);
> > + if (ret)
> > + atomic_set(&sc->err, -EFAULT);
> > +}
> > +
> > +/*
> > + * Call the SEAMCALL on all online CPUs concurrently. Caller to check
> > + * @sc->err to determine whether any SEAMCALL failed on any cpu.
> > + */
> > +static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
> > +{
> > + on_each_cpu(seamcall_smp_call_function, sc, true);
> > +}
>
> You can get away with this three-liner seamcall_on_each_cpu() being in
> this patch, but seamcall() itself doesn't belong here.

Right. Please see above reply.

>
> > /*
> > * Detect and initialize the TDX module.
> > *
> > @@ -138,7 +195,10 @@ static int init_tdx_module(void)
> >
> > static void shutdown_tdx_module(void)
> > {
> > - /* TODO: Shut down the TDX module */
> > + struct seamcall_ctx sc = { .fn = TDH_SYS_LP_SHUTDOWN };
> > +
> > + seamcall_on_each_cpu(&sc);
> > +
> > tdx_module_status = TDX_MODULE_SHUTDOWN;
> > }
> >
> > @@ -221,6 +281,9 @@ bool platform_tdx_enabled(void)
> > * CPU hotplug is temporarily disabled internally to prevent any cpu
> > * from going offline.
> > *
> > + * Caller also needs to guarantee all CPUs are in VMX operation during
> > + * this function, otherwise Oops may be triggered.
>
> I would *MUCH* rather have this be a:
>
> if (!cpu_feature_enabled(X86_FEATURE_VMX))
> WARN_ONCE("VMX should be on blah blah\n");
>
> than just plain oops. Even a pr_err() that preceded the oops would be
> nicer than an oops that someone has to go decode and then grumble when
> their binutils is too old that it can't disassemble the TDCALL.

I can add this to seamcall():

/*
* SEAMCALL requires CPU being in VMX operation otherwise it causes
#UD.
* Sanity check and return early to avoid Oops. Note cpu_vmx_enabled()
* actually only checks whether VMX is enabled but doesn't check
whether
* CPU is in VMX operation (VMXON is done). There's no way to check
* whether VMXON has been done, but currently enabling VMX and doing
* VMXON are always done together.
*/
if (!cpu_vmx_enabled()) {
WARN_ONCE("CPU is not in VMX operation before making
SEAMCALL");
return -EINVAL;
}

The reason I didn't do is I'd like to make seamcall() simple, that it only
returns TDX_SEAMCALL_VMFAILINVALID or the actual SEAMCALL leaf error. With
above, this function also returns kernel error code, which isn't good.

Alternatively, we can always add EXTABLE to TDX_MODULE_CALL macro to handle #UD
and #GP by returning dedicated error codes (please also see my reply to previous
patch for the code needed to handle), in which case we don't need such check
here.

Always handling #UD in TDX_MODULE_CALL macro also has another advantage: there
will be no Oops for #UD regardless the issue that "there's no way to check
whether VMXON has been done" in the above comment.

What's your opinion?


--
Thanks,
-Kai


2022-06-27 21:11:51

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

On 6/26/22 22:26, Kai Huang wrote:
> On Fri, 2022-06-24 at 11:50 -0700, Dave Hansen wrote:
>> So, the last patch was called:
>>
>> Implement SEAMCALL function
>>
>> and yet, in this patch, we have a "seamcall()" function. That's a bit
>> confusing and not covered at *all* in this subject.
>>
>> Further, seamcall() is the *ONLY* caller of __seamcall() that I see in
>> this series. That makes its presence here even more odd.
>>
>> The seamcall() bits should either be in their own patch, or mashed in
>> with __seamcall().
>
> Right. The reason I didn't put the seamcall() into previous patch was it is
> only used in this tdx.c, so it should be static. But adding a static function
> w/o using it in previous patch will trigger a compile warning. So I introduced
> here where it is first used.
>
> One option is I can introduce seamcall() as a static inline function in tdx.h in
> previous patch so there won't be a warning. I'll change to use this way.
> Please let me know if you have any comments.

Does a temporary __unused get rid of the warning?

>>> /*
>>> * Detect and initialize the TDX module.
>>> *
>>> @@ -138,7 +195,10 @@ static int init_tdx_module(void)
>>>
>>> static void shutdown_tdx_module(void)
>>> {
>>> - /* TODO: Shut down the TDX module */
>>> + struct seamcall_ctx sc = { .fn = TDH_SYS_LP_SHUTDOWN };
>>> +
>>> + seamcall_on_each_cpu(&sc);
>>> +
>>> tdx_module_status = TDX_MODULE_SHUTDOWN;
>>> }
>>>
>>> @@ -221,6 +281,9 @@ bool platform_tdx_enabled(void)
>>> * CPU hotplug is temporarily disabled internally to prevent any cpu
>>> * from going offline.
>>> *
>>> + * Caller also needs to guarantee all CPUs are in VMX operation during
>>> + * this function, otherwise Oops may be triggered.
>>
>> I would *MUCH* rather have this be a:
>>
>> if (!cpu_feature_enabled(X86_FEATURE_VMX))
>> WARN_ONCE("VMX should be on blah blah\n");
>>
>> than just plain oops. Even a pr_err() that preceded the oops would be
>> nicer than an oops that someone has to go decode and then grumble when
>> their binutils is too old that it can't disassemble the TDCALL.
>
> I can add this to seamcall():
>
> /*
> * SEAMCALL requires CPU being in VMX operation otherwise it causes
> #UD.
> * Sanity check and return early to avoid Oops. Note cpu_vmx_enabled()
> * actually only checks whether VMX is enabled but doesn't check
> whether
> * CPU is in VMX operation (VMXON is done). There's no way to check
> * whether VMXON has been done, but currently enabling VMX and doing
> * VMXON are always done together.
> */
> if (!cpu_vmx_enabled()) {
> WARN_ONCE("CPU is not in VMX operation before making
> SEAMCALL");
> return -EINVAL;
> }
>
> The reason I didn't do is I'd like to make seamcall() simple, that it only
> returns TDX_SEAMCALL_VMFAILINVALID or the actual SEAMCALL leaf error. With
> above, this function also returns kernel error code, which isn't good.

I think you're missing the point. You wasted two lines of code on a
*COMMENT* that doesn't actually help anyone decode an oops. You could
have, instead, spent two lines on actual code that would have been just
as good or better than a comment *AND* help folks looking at an oops.

It's almost always better to do something actionable in code than to
comment it, unless it's in some crazy fast path.

> Alternatively, we can always add EXTABLE to TDX_MODULE_CALL macro to handle #UD
> and #GP by returning dedicated error codes (please also see my reply to previous
> patch for the code needed to handle), in which case we don't need such check
> here.
>
> Always handling #UD in TDX_MODULE_CALL macro also has another advantage: there
> will be no Oops for #UD regardless the issue that "there's no way to check
> whether VMXON has been done" in the above comment.
>
> What's your opinion?

I think you should explore using the EXTABLE. Let's see how it looks.

2022-06-27 22:46:15

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

On Mon, 2022-06-27 at 13:46 -0700, Dave Hansen wrote:
> On 6/26/22 22:26, Kai Huang wrote:
> > On Fri, 2022-06-24 at 11:50 -0700, Dave Hansen wrote:
> > > So, the last patch was called:
> > >
> > > Implement SEAMCALL function
> > >
> > > and yet, in this patch, we have a "seamcall()" function. That's a bit
> > > confusing and not covered at *all* in this subject.
> > >
> > > Further, seamcall() is the *ONLY* caller of __seamcall() that I see in
> > > this series. That makes its presence here even more odd.
> > >
> > > The seamcall() bits should either be in their own patch, or mashed in
> > > with __seamcall().
> >
> > Right. The reason I didn't put the seamcall() into previous patch was it is
> > only used in this tdx.c, so it should be static. But adding a static function
> > w/o using it in previous patch will trigger a compile warning. So I introduced
> > here where it is first used.
> >
> > One option is I can introduce seamcall() as a static inline function in tdx.h in
> > previous patch so there won't be a warning. I'll change to use this way.
> > Please let me know if you have any comments.
>
> Does a temporary __unused get rid of the warning?

Yes, and both __maybe_unused and __always_unused also git rid of the warning
too.

__unused is not defined in compiler_attributes.h, so we need to use
__attribute__((__unused__)) explicitly, or have __unused defined to it as a
macro.

I think I can just use __always_unused for this purpose?

So I think we put seamcall() implementation to the patch which implements
__seamcall(). And we can inline for seamcall() and put it in either tdx.h or
tdx.c, or we can use __always_unused (or the one you prefer) to get rid of the
warning.

What's your opinion?
>
> > > > /*
> > > > * Detect and initialize the TDX module.
> > > > *
> > > > @@ -138,7 +195,10 @@ static int init_tdx_module(void)
> > > >
> > > > static void shutdown_tdx_module(void)
> > > > {
> > > > - /* TODO: Shut down the TDX module */
> > > > + struct seamcall_ctx sc = { .fn = TDH_SYS_LP_SHUTDOWN };
> > > > +
> > > > + seamcall_on_each_cpu(&sc);
> > > > +
> > > > tdx_module_status = TDX_MODULE_SHUTDOWN;
> > > > }
> > > >
> > > > @@ -221,6 +281,9 @@ bool platform_tdx_enabled(void)
> > > > * CPU hotplug is temporarily disabled internally to prevent any cpu
> > > > * from going offline.
> > > > *
> > > > + * Caller also needs to guarantee all CPUs are in VMX operation during
> > > > + * this function, otherwise Oops may be triggered.
> > >
> > > I would *MUCH* rather have this be a:
> > >
> > > if (!cpu_feature_enabled(X86_FEATURE_VMX))
> > > WARN_ONCE("VMX should be on blah blah\n");
> > >
> > > than just plain oops. Even a pr_err() that preceded the oops would be
> > > nicer than an oops that someone has to go decode and then grumble when
> > > their binutils is too old that it can't disassemble the TDCALL.
> >
> > I can add this to seamcall():
> >
> > /*
> > * SEAMCALL requires CPU being in VMX operation otherwise it causes
> > #UD.
> > * Sanity check and return early to avoid Oops. Note cpu_vmx_enabled()
> > * actually only checks whether VMX is enabled but doesn't check
> > whether
> > * CPU is in VMX operation (VMXON is done). There's no way to check
> > * whether VMXON has been done, but currently enabling VMX and doing
> > * VMXON are always done together.
> > */
> > if (!cpu_vmx_enabled()) {
> > WARN_ONCE("CPU is not in VMX operation before making
> > SEAMCALL");
> > return -EINVAL;
> > }
> >
> > The reason I didn't do is I'd like to make seamcall() simple, that it only
> > returns TDX_SEAMCALL_VMFAILINVALID or the actual SEAMCALL leaf error. With
> > above, this function also returns kernel error code, which isn't good.
>
> I think you're missing the point. You wasted two lines of code on a
> *COMMENT* that doesn't actually help anyone decode an oops. You could
> have, instead, spent two lines on actual code that would have been just
> as good or better than a comment *AND* help folks looking at an oops.
>
> It's almost always better to do something actionable in code than to
> comment it, unless it's in some crazy fast path.

Agreed. Thanks.

>
> > Alternatively, we can always add EXTABLE to TDX_MODULE_CALL macro to handle #UD
> > and #GP by returning dedicated error codes (please also see my reply to previous
> > patch for the code needed to handle), in which case we don't need such check
> > here.
> >
> > Always handling #UD in TDX_MODULE_CALL macro also has another advantage: there
> > will be no Oops for #UD regardless the issue that "there's no way to check
> > whether VMXON has been done" in the above comment.
> >
> > What's your opinion?
>
> I think you should explore using the EXTABLE. Let's see how it looks.

I tried to wrote the code before. I didn't test but it should look like to
something below. Any comments?

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 4b75c930fa1b..4a97ca8eb14c 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -8,6 +8,7 @@
#include <asm/ptrace.h>
#include <asm/shared/tdx.h>

+#ifdef CONFIG_INTEL_TDX_HOST
/*
* SW-defined error codes.
*
@@ -18,6 +19,21 @@
#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))

+/*
+ * Special error codes to indicate SEAMCALL #GP and #UD.
+ *
+ * SEAMCALL causes #GP when SEAMRR is not properly enabled by BIOS, and
+ * causes #UD when CPU is not in VMX operation. Define two separate
+ * error codes to distinguish the two cases so caller can be aware of
+ * what caused the SEAMCALL to fail.
+ *
+ * Bits 61:48 are reserved bits which will never be set by the TDX
+ * module. Borrow 2 reserved bits to represent #GP and #UD.
+ */
+#define TDX_SEAMCALL_GP (TDX_ERROR | GENMASK_ULL(48, 48))
+#define TDX_SEAMCALL_UD (TDX_ERROR | GENMASK_ULL(49, 49))
+#endif
+
#ifndef __ASSEMBLY__

/*
diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index 49a54356ae99..7431c47258d9 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <asm/asm-offsets.h>
#include <asm/tdx.h>
+#include <asm/asm.h>

/*
* TDCALL and SEAMCALL are supported in Binutils >= 2.36.
@@ -45,6 +46,7 @@
/* Leave input param 2 in RDX */

.if \host
+1:
seamcall
/*
* SEAMCALL instruction is essentially a VMExit from VMX root
@@ -57,9 +59,25 @@
* This value will never be used as actual SEAMCALL error code as
* it is from the Reserved status code class.
*/
- jnc .Lno_vmfailinvalid
+ jnc .Lseamcall_out
mov $TDX_SEAMCALL_VMFAILINVALID, %rax
-.Lno_vmfailinvalid:
+ jmp .Lseamcall_out
+2:
+ /*
+ * SEAMCALL caused #GP or #UD. By reaching here %eax contains
+ * the trap number. Check the trap number and set up the return
+ * value to %rax.
+ */
+ cmp $X86_TRAP_GP, %eax
+ je .Lseamcall_gp
+ mov $TDX_SEAMCALL_UD, %rax
+ jmp .Lseamcall_out
+.Lseamcall_gp:
+ mov $TDX_SEAMCALL_GP, %rax
+ jmp .Lseamcall_out
+
+ _ASM_EXTABLE_FAULT(1b, 2b)
+.Lseamcall_out





2022-06-27 23:21:11

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

On 6/27/22 15:34, Kai Huang wrote:
> On Mon, 2022-06-27 at 13:46 -0700, Dave Hansen wrote:
> I think I can just use __always_unused for this purpose?
>
> So I think we put seamcall() implementation to the patch which implements
> __seamcall(). And we can inline for seamcall() and put it in either tdx.h or
> tdx.c, or we can use __always_unused (or the one you prefer) to get rid of the
> warning.
>
> What's your opinion?

A temporary __always_unused seems fine to me.

>>> Alternatively, we can always add EXTABLE to TDX_MODULE_CALL macro to handle #UD
>>> and #GP by returning dedicated error codes (please also see my reply to previous
>>> patch for the code needed to handle), in which case we don't need such check
>>> here.
>>>
>>> Always handling #UD in TDX_MODULE_CALL macro also has another advantage: there
>>> will be no Oops for #UD regardless the issue that "there's no way to check
>>> whether VMXON has been done" in the above comment.
>>>
>>> What's your opinion?
>>
>> I think you should explore using the EXTABLE. Let's see how it looks.
>
> I tried to wrote the code before. I didn't test but it should look like to
> something below. Any comments?
>
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 4b75c930fa1b..4a97ca8eb14c 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -8,6 +8,7 @@
> #include <asm/ptrace.h>
> #include <asm/shared/tdx.h>
>
> +#ifdef CONFIG_INTEL_TDX_HOST
> /*
> * SW-defined error codes.
> *
> @@ -18,6 +19,21 @@
> #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
> #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))
>
> +/*
> + * Special error codes to indicate SEAMCALL #GP and #UD.
> + *
> + * SEAMCALL causes #GP when SEAMRR is not properly enabled by BIOS, and
> + * causes #UD when CPU is not in VMX operation. Define two separate
> + * error codes to distinguish the two cases so caller can be aware of
> + * what caused the SEAMCALL to fail.
> + *
> + * Bits 61:48 are reserved bits which will never be set by the TDX
> + * module. Borrow 2 reserved bits to represent #GP and #UD.
> + */
> +#define TDX_SEAMCALL_GP (TDX_ERROR | GENMASK_ULL(48, 48))
> +#define TDX_SEAMCALL_UD (TDX_ERROR | GENMASK_ULL(49, 49))
> +#endif
> +
> #ifndef __ASSEMBLY__
>
> /*
> diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
> index 49a54356ae99..7431c47258d9 100644
> --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> @@ -1,6 +1,7 @@
> /* SPDX-License-Identifier: GPL-2.0 */
> #include <asm/asm-offsets.h>
> #include <asm/tdx.h>
> +#include <asm/asm.h>
>
> /*
> * TDCALL and SEAMCALL are supported in Binutils >= 2.36.
> @@ -45,6 +46,7 @@
> /* Leave input param 2 in RDX */
>
> .if \host
> +1:
> seamcall
> /*
> * SEAMCALL instruction is essentially a VMExit from VMX root
> @@ -57,9 +59,25 @@
> * This value will never be used as actual SEAMCALL error code as
> * it is from the Reserved status code class.
> */
> - jnc .Lno_vmfailinvalid
> + jnc .Lseamcall_out
> mov $TDX_SEAMCALL_VMFAILINVALID, %rax
> -.Lno_vmfailinvalid:
> + jmp .Lseamcall_out
> +2:
> + /*
> + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> + * the trap number. Check the trap number and set up the return
> + * value to %rax.
> + */
> + cmp $X86_TRAP_GP, %eax
> + je .Lseamcall_gp
> + mov $TDX_SEAMCALL_UD, %rax
> + jmp .Lseamcall_out
> +.Lseamcall_gp:
> + mov $TDX_SEAMCALL_GP, %rax
> + jmp .Lseamcall_out
> +
> + _ASM_EXTABLE_FAULT(1b, 2b)
> +.Lseamcall_out

Not too bad, although the end of that is a bit ugly. It would be nicer
if you could just return the %rax value in the exception section instead
of having to do the transform there. Maybe have a TDX_ERROR code with
enough bits to hold any X86_TRAP_FOO.

It'd be nice if Peter Z or Andy L has a sec to look at this. Seems like
the kind of thing they'd have good ideas about.

2022-06-28 00:09:21

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

On Mon, 2022-06-27 at 15:56 -0700, Dave Hansen wrote:
> On 6/27/22 15:34, Kai Huang wrote:
> > On Mon, 2022-06-27 at 13:46 -0700, Dave Hansen wrote:
> > I think I can just use __always_unused for this purpose?
> >
> > So I think we put seamcall() implementation to the patch which implements
> > __seamcall(). And we can inline for seamcall() and put it in either tdx.h or
> > tdx.c, or we can use __always_unused (or the one you prefer) to get rid of the
> > warning.
> >
> > What's your opinion?
>
> A temporary __always_unused seems fine to me.

Thanks will do.

>
> > > > Alternatively, we can always add EXTABLE to TDX_MODULE_CALL macro to handle #UD
> > > > and #GP by returning dedicated error codes (please also see my reply to previous
> > > > patch for the code needed to handle), in which case we don't need such check
> > > > here.
> > > >
> > > > Always handling #UD in TDX_MODULE_CALL macro also has another advantage: there
> > > > will be no Oops for #UD regardless the issue that "there's no way to check
> > > > whether VMXON has been done" in the above comment.
> > > >
> > > > What's your opinion?
> > >
> > > I think you should explore using the EXTABLE. Let's see how it looks.
> >
> > I tried to wrote the code before. I didn't test but it should look like to
> > something below. Any comments?
> >
> > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> > index 4b75c930fa1b..4a97ca8eb14c 100644
> > --- a/arch/x86/include/asm/tdx.h
> > +++ b/arch/x86/include/asm/tdx.h
> > @@ -8,6 +8,7 @@
> > #include <asm/ptrace.h>
> > #include <asm/shared/tdx.h>
> >
> > +#ifdef CONFIG_INTEL_TDX_HOST
> > /*
> > * SW-defined error codes.
> > *
> > @@ -18,6 +19,21 @@
> > #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
> > #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))
> >
> > +/*
> > + * Special error codes to indicate SEAMCALL #GP and #UD.
> > + *
> > + * SEAMCALL causes #GP when SEAMRR is not properly enabled by BIOS, and
> > + * causes #UD when CPU is not in VMX operation. Define two separate
> > + * error codes to distinguish the two cases so caller can be aware of
> > + * what caused the SEAMCALL to fail.
> > + *
> > + * Bits 61:48 are reserved bits which will never be set by the TDX
> > + * module. Borrow 2 reserved bits to represent #GP and #UD.
> > + */
> > +#define TDX_SEAMCALL_GP (TDX_ERROR | GENMASK_ULL(48, 48))
> > +#define TDX_SEAMCALL_UD (TDX_ERROR | GENMASK_ULL(49, 49))
> > +#endif
> > +
> > #ifndef __ASSEMBLY__
> >
> > /*
> > diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
> > index 49a54356ae99..7431c47258d9 100644
> > --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> > +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> > @@ -1,6 +1,7 @@
> > /* SPDX-License-Identifier: GPL-2.0 */
> > #include <asm/asm-offsets.h>
> > #include <asm/tdx.h>
> > +#include <asm/asm.h>
> >
> > /*
> > * TDCALL and SEAMCALL are supported in Binutils >= 2.36.
> > @@ -45,6 +46,7 @@
> > /* Leave input param 2 in RDX */
> >
> > .if \host
> > +1:
> > seamcall
> > /*
> > * SEAMCALL instruction is essentially a VMExit from VMX root
> > @@ -57,9 +59,25 @@
> > * This value will never be used as actual SEAMCALL error code as
> > * it is from the Reserved status code class.
> > */
> > - jnc .Lno_vmfailinvalid
> > + jnc .Lseamcall_out
> > mov $TDX_SEAMCALL_VMFAILINVALID, %rax
> > -.Lno_vmfailinvalid:
> > + jmp .Lseamcall_out
> > +2:
> > + /*
> > + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> > + * the trap number. Check the trap number and set up the return
> > + * value to %rax.
> > + */
> > + cmp $X86_TRAP_GP, %eax
> > + je .Lseamcall_gp
> > + mov $TDX_SEAMCALL_UD, %rax
> > + jmp .Lseamcall_out
> > +.Lseamcall_gp:
> > + mov $TDX_SEAMCALL_GP, %rax
> > + jmp .Lseamcall_out
> > +
> > + _ASM_EXTABLE_FAULT(1b, 2b)
> > +.Lseamcall_out
>
> Not too bad, although the end of that is a bit ugly. It would be nicer
> if you could just return the %rax value in the exception section instead
> of having to do the transform there. Maybe have a TDX_ERROR code with
> enough bits to hold any X86_TRAP_FOO.

We already have declared bits 47:40 == 0xFF is never used by TDX module:

/*
* SW-defined error codes.
*
* Bits 47:40 == 0xFF indicate Reserved status code class that never used by
* TDX module.
*/
#define TDX_ERROR _BITUL(63)
#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))

So how about just putting the X86_TRAP_FOO to the last 32-bits? We only have 32
traps, so 32-bits is more than enough.

#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP)
#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD)

If so, in the assembly, I think we can just XOR TDX_SW_ERROR to the %rax and
return %rax:

2:
/*
* SEAMCALL caused #GP or #UD. By reaching here %eax contains
* the trap number. Convert trap number to TDX error code by setting
* TDX_SW_ERROR to the high 32-bits of %rax.
*/
xorq $TDX_SW_ERROR, %rax

How does this look?



--
Thanks,
-Kai


2022-06-28 00:17:01

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

On 6/27/22 16:59, Kai Huang wrote:
> If so, in the assembly, I think we can just XOR TDX_SW_ERROR to the %rax and
> return %rax:
>
> 2:
> /*
> * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> * the trap number. Convert trap number to TDX error code by setting
> * TDX_SW_ERROR to the high 32-bits of %rax.
> */
> xorq $TDX_SW_ERROR, %rax
>
> How does this look?

I guess it doesn't matter if you know the things being masked together
are padded correctly, but I probably would have done a straight OR, not XOR.

Otherwise, I think that looks OK. Simplifies the assembly for sure.

2022-06-28 00:17:11

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error

On Mon, 2022-06-27 at 17:03 -0700, Dave Hansen wrote:
> On 6/27/22 16:59, Kai Huang wrote:
> > If so, in the assembly, I think we can just XOR TDX_SW_ERROR to the %rax and
> > return %rax:
> >
> > 2:
> > /*
> > * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> > * the trap number. Convert trap number to TDX error code by setting
> > * TDX_SW_ERROR to the high 32-bits of %rax.
> > */
> > xorq $TDX_SW_ERROR, %rax
> >
> > How does this look?
>
> I guess it doesn't matter if you know the things being masked together
> are padded correctly, but I probably would have done a straight OR, not XOR.
>
> Otherwise, I think that looks OK. Simplifies the assembly for sure.

Right straight OR is better. Thanks.

--
Thanks,
-Kai


2022-08-18 04:29:17

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: [PATCH v5 22/22] Documentation/x86: Add documentation for TDX host support

On Wed, Jun 22, 2022 at 11:17:50PM +1200, Kai Huang wrote:
> +Kernel detects TDX and the TDX private KeyIDs during kernel boot. User
> +can see below dmesg if TDX is enabled by BIOS:
> +
> +| [..] tdx: SEAMRR enabled.
> +| [..] tdx: TDX private KeyID range: [16, 64).
> +| [..] tdx: TDX enabled by BIOS.
> +
<snipped>
> +Initializing the TDX module consumes roughly ~1/256th system RAM size to
> +use it as 'metadata' for the TDX memory. It also takes additional CPU
> +time to initialize those metadata along with the TDX module itself. Both
> +are not trivial. Current kernel doesn't choose to always initialize the
> +TDX module during kernel boot, but provides a function tdx_init() to
> +allow the caller to initialize TDX when it truly wants to use TDX:
> +
> + ret = tdx_init();
> + if (ret)
> + goto no_tdx;
> + // TDX is ready to use
> +

Hi,

The code block above produces Sphinx warnings:

Documentation/x86/tdx.rst:69: WARNING: Unexpected indentation.
Documentation/x86/tdx.rst:70: WARNING: Block quote ends without a blank line; unexpected unindent.

I have applied the fixup:

---- >8 ----

diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst
index 6c6b09ca6ba407..4430912a2e4f05 100644
--- a/Documentation/x86/tdx.rst
+++ b/Documentation/x86/tdx.rst
@@ -62,7 +62,7 @@ use it as 'metadata' for the TDX memory. It also takes additional CPU
time to initialize those metadata along with the TDX module itself. Both
are not trivial. Current kernel doesn't choose to always initialize the
TDX module during kernel boot, but provides a function tdx_init() to
-allow the caller to initialize TDX when it truly wants to use TDX:
+allow the caller to initialize TDX when it truly wants to use TDX::

ret = tdx_init();
if (ret)

> +If the TDX module is not loaded, dmesg shows below:
> +
> +| [..] tdx: TDX module is not loaded.
> +
> +If the TDX module is initialized successfully, dmesg shows something
> +like below:
> +
> +| [..] tdx: TDX module: vendor_id 0x8086, major_version 1, minor_version 0, build_date 20211209, build_num 160
> +| [..] tdx: 65667 pages allocated for PAMT.
> +| [..] tdx: TDX module initialized.
> +
> +If the TDX module failed to initialize, dmesg shows below:
> +
> +| [..] tdx: Failed to initialize TDX module. Shut it down.
<snipped>
> +There are basically two memory hot-add cases that need to be prevented:
> +ACPI memory hot-add and driver managed memory hot-add. The kernel
> +rejectes the driver managed memory hot-add too when TDX is enabled by
> +BIOS. For instance, dmesg shows below error when using kmem driver to
> +add a legacy PMEM as system RAM:
> +
> +| [..] tdx: Unable to add memory [0x580000000, 0x600000000) on TDX enabled platform.
> +| [..] kmem dax0.0: mapping0: 0x580000000-0x5ffffffff memory add failed
> +

For dmesg ouput, use literal code block instead of line blocks, like:

---- >8 ----

diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst
index 4430912a2e4f05..1eaeb7cd14d76f 100644
--- a/Documentation/x86/tdx.rst
+++ b/Documentation/x86/tdx.rst
@@ -41,11 +41,11 @@ TDX boot-time detection
-----------------------

Kernel detects TDX and the TDX private KeyIDs during kernel boot. User
-can see below dmesg if TDX is enabled by BIOS:
+can see below dmesg if TDX is enabled by BIOS::

-| [..] tdx: SEAMRR enabled.
-| [..] tdx: TDX private KeyID range: [16, 64).
-| [..] tdx: TDX enabled by BIOS.
+ [..] tdx: SEAMRR enabled.
+ [..] tdx: TDX private KeyID range: [16, 64).
+ [..] tdx: TDX enabled by BIOS.

TDX module detection and initialization
---------------------------------------
@@ -79,20 +79,20 @@ caller.
User can consult dmesg to see the presence of the TDX module, and whether
it has been initialized.

-If the TDX module is not loaded, dmesg shows below:
+If the TDX module is not loaded, dmesg shows below::

-| [..] tdx: TDX module is not loaded.
+ [..] tdx: TDX module is not loaded.

If the TDX module is initialized successfully, dmesg shows something
-like below:
+like below::

-| [..] tdx: TDX module: vendor_id 0x8086, major_version 1, minor_version 0, build_date 20211209, build_num 160
-| [..] tdx: 65667 pages allocated for PAMT.
-| [..] tdx: TDX module initialized.
+ [..] tdx: TDX module: vendor_id 0x8086, major_version 1, minor_version 0, build_date 20211209, build_num 160
+ [..] tdx: 65667 pages allocated for PAMT.
+ [..] tdx: TDX module initialized.

-If the TDX module failed to initialize, dmesg shows below:
+If the TDX module failed to initialize, dmesg shows below::

-| [..] tdx: Failed to initialize TDX module. Shut it down.
+ [..] tdx: Failed to initialize TDX module. Shut it down.

TDX Interaction to Other Kernel Components
------------------------------------------
@@ -143,10 +143,10 @@ There are basically two memory hot-add cases that need to be prevented:
ACPI memory hot-add and driver managed memory hot-add. The kernel
rejectes the driver managed memory hot-add too when TDX is enabled by
BIOS. For instance, dmesg shows below error when using kmem driver to
-add a legacy PMEM as system RAM:
+add a legacy PMEM as system RAM::

-| [..] tdx: Unable to add memory [0x580000000, 0x600000000) on TDX enabled platform.
-| [..] kmem dax0.0: mapping0: 0x580000000-0x5ffffffff memory add failed
+ [..] tdx: Unable to add memory [0x580000000, 0x600000000) on TDX enabled platform.
+ [..] kmem dax0.0: mapping0: 0x580000000-0x5ffffffff memory add failed

However, adding new memory to ZONE_DEVICE should not be prevented as
those pages are not managed by the page allocator. Therefore,

Thanks.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (5.73 kB)
signature.asc (235.00 B)
Download all attachments

2022-08-18 10:12:04

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v5 22/22] Documentation/x86: Add documentation for TDX host support

On Thu, 2022-08-18 at 11:07 +0700, Bagas Sanjaya wrote:
> On Wed, Jun 22, 2022 at 11:17:50PM +1200, Kai Huang wrote:
> > +Kernel detects TDX and the TDX private KeyIDs during kernel boot. User
> > +can see below dmesg if TDX is enabled by BIOS:
> > +
> > +| [..] tdx: SEAMRR enabled.
> > +| [..] tdx: TDX private KeyID range: [16, 64).
> > +| [..] tdx: TDX enabled by BIOS.
> > +
> <snipped>
> > +Initializing the TDX module consumes roughly ~1/256th system RAM size to
> > +use it as 'metadata' for the TDX memory. It also takes additional CPU
> > +time to initialize those metadata along with the TDX module itself. Both
> > +are not trivial. Current kernel doesn't choose to always initialize the
> > +TDX module during kernel boot, but provides a function tdx_init() to
> > +allow the caller to initialize TDX when it truly wants to use TDX:
> > +
> > + ret = tdx_init();
> > + if (ret)
> > + goto no_tdx;
> > + // TDX is ready to use
> > +
>
> Hi,
>
> The code block above produces Sphinx warnings:
>
> Documentation/x86/tdx.rst:69: WARNING: Unexpected indentation.
> Documentation/x86/tdx.rst:70: WARNING: Block quote ends without a blank line; unexpected unindent.
>
> I have applied the fixup:
>

Thank you! will fix in next version.


--
Thanks,
-Kai