LinuxLists.cc - [PATCH v11 00/20] TDX host kernel support

[permalink] [raw]

Subject: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

Start to transit out the "multi-steps" to initialize the TDX module.

TDX provides increased levels of memory confidentiality and integrity.
This requires special hardware support for features like memory
encryption and storage of memory integrity checksums. Not all memory
satisfies these requirements.

As a result, TDX introduced the concept of a "Convertible Memory Region"
(CMR). During boot, the firmware builds a list of all of the memory
ranges which can provide the TDX security guarantees.

CMRs tell the kernel which memory is TDX compatible. The kernel takes
CMRs (plus a little more metadata) and constructs "TD Memory Regions"
(TDMRs). TDMRs let the kernel grant TDX protections to some or all of
the CMR areas.

The TDX module also reports necessary information to let the kernel
build TDMRs and run TDX guests in structure 'tdsysinfo_struct'. The
list of CMRs, along with the TDX module information, is available to
the kernel by querying the TDX module.

As a preparation to construct TDMRs, get the TDX module information and
the list of CMRs. Print out CMRs to help user to decode which memory
regions are TDX convertible.

The 'tdsysinfo_struct' is fairly large (1024 bytes) and contains a lot
of info about the TDX module. Fully define the entire structure, but
only use the fields necessary to build the TDMRs and pr_info() some
basics about the module. The rest of the fields will get used by KVM.

For now both 'tdsysinfo_struct' and CMRs are only used during the module
initialization. But because they are both relatively big, declare them
inside the module initialization function but as static variables.

Signed-off-by: Kai Huang <[email protected]>
Reviewed-by: Isaku Yamahata <[email protected]>
---

v10 -> v11:
- No change.

v9 -> v10:
- Added back "start to transit out..." as now per-cpu init has been
moved out from tdx_enable().

v8 -> v9:
- Removed "start to trransit out ..." part in changelog since this patch
is no longer the first step anymore.
- Changed to declare 'tdsysinfo' and 'cmr_array' as local static, and
changed changelog accordingly (Dave).
- Improved changelog to explain why to declare 'tdsysinfo_struct' in
full but only use a few members of them (Dave).

v7 -> v8: (Dave)
- Improved changelog to tell this is the first patch to transit out the
"multi-steps" init_tdx_module().
- Removed all CMR check/trim code but to depend on later SEAMCALL.
- Variable 'vertical alignment' in print TDX module information.
- Added DECLARE_PADDED_STRUCT() for padded structure.
- Made tdx_sysinfo and tdx_cmr_array[] to be function local variable
(and rename them accordingly), and added -Wframe-larger-than=4096 flag
to silence the build warning.

v6 -> v7:
- Simplified the check of CMRs due to the fact that TDX actually
verifies CMRs (that are passed by the BIOS) before enabling TDX.
- Changed the function name from check_cmrs() -> trim_empty_cmrs().
- Added CMR page aligned check so that later patch can just get the PFN
using ">> PAGE_SHIFT".

v5 -> v6:
- Added to also print TDX module's attribute (Isaku).
- Removed all arguments in tdx_gete_sysinfo() to use static variables
of 'tdx_sysinfo' and 'tdx_cmr_array' directly as they are all used
directly in other functions in later patches.
- Added Isaku's Reviewed-by.

- v3 -> v5 (no feedback on v4):
- Renamed sanitize_cmrs() to check_cmrs().
- Removed unnecessary sanity check against tdx_sysinfo and tdx_cmr_array
actual size returned by TDH.SYS.INFO.
- Changed -EFAULT to -EINVAL in couple places.
- Added comments around tdx_sysinfo and tdx_cmr_array saying they are
used by TDH.SYS.INFO ABI.
- Changed to pass 'tdx_sysinfo' and 'tdx_cmr_array' as function
arguments in tdx_get_sysinfo().
- Changed to only print BIOS-CMR when check_cmrs() fails.

---
arch/x86/virt/vmx/tdx/tdx.c | 67 +++++++++++++++++++++++++++++++++-
arch/x86/virt/vmx/tdx/tdx.h | 72 +++++++++++++++++++++++++++++++++++++
2 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index bcf2b2d15a2e..9fde0f71dd8b 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -20,6 +20,7 @@
#include <asm/msr-index.h>
#include <asm/msr.h>
#include <asm/archrandom.h>
+#include <asm/page.h>
#include <asm/tdx.h>
#include "tdx.h"

@@ -191,12 +192,76 @@ int tdx_cpu_enable(void)
}
EXPORT_SYMBOL_GPL(tdx_cpu_enable);

+static inline bool is_cmr_empty(struct cmr_info *cmr)
+{
+ return !cmr->size;
+}
+
+static void print_cmrs(struct cmr_info *cmr_array, int nr_cmrs)
+{
+ int i;
+
+ for (i = 0; i < nr_cmrs; i++) {
+ struct cmr_info *cmr = &cmr_array[i];
+
+ /*
+ * The array of CMRs reported via TDH.SYS.INFO can
+ * contain tail empty CMRs. Don't print them.
+ */
+ if (is_cmr_empty(cmr))
+ break;
+
+ pr_info("CMR: [0x%llx, 0x%llx)\n", cmr->base,
+ cmr->base + cmr->size);
+ }
+}
+
+/*
+ * Get the TDX module information (TDSYSINFO_STRUCT) and the array of
+ * CMRs, and save them to @sysinfo and @cmr_array. @sysinfo must have
+ * been padded to have enough room to save the TDSYSINFO_STRUCT.
+ */
+static int tdx_get_sysinfo(struct tdsysinfo_struct *sysinfo,
+ struct cmr_info *cmr_array)
+{
+ struct tdx_module_output out;
+ u64 sysinfo_pa, cmr_array_pa;
+ int ret;
+
+ sysinfo_pa = __pa(sysinfo);
+ cmr_array_pa = __pa(cmr_array);
+ ret = seamcall(TDH_SYS_INFO, sysinfo_pa, TDSYSINFO_STRUCT_SIZE,
+ cmr_array_pa, MAX_CMRS, NULL, &out);
+ if (ret)
+ return ret;
+
+ pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
+ sysinfo->attributes, sysinfo->vendor_id,
+ sysinfo->major_version, sysinfo->minor_version,
+ sysinfo->build_date, sysinfo->build_num);
+
+ /* R9 contains the actual entries written to the CMR array. */
+ print_cmrs(cmr_array, out.r9);
+
+ return 0;
+}
+
static int init_tdx_module(void)
{
+ static DECLARE_PADDED_STRUCT(tdsysinfo_struct, tdsysinfo,
+ TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT);
+ static struct cmr_info cmr_array[MAX_CMRS]
+ __aligned(CMR_INFO_ARRAY_ALIGNMENT);
+ struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo);
+ int ret;
+
+ ret = tdx_get_sysinfo(sysinfo, cmr_array);
+ if (ret)
+ return ret;
+
/*
* TODO:
*
- * - Get TDX module information and TDX-capable memory regions.
* - Build the list of TDX-usable memory regions.
* - Construct a list of "TD Memory Regions" (TDMRs) to cover
* all TDX-usable memory regions.
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 9fb46033c852..97f4d7e7f1a4 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -3,6 +3,8 @@
#define _X86_VIRT_TDX_H

#include <linux/types.h>
+#include <linux/stddef.h>
+#include <linux/compiler_attributes.h>

/*
* This file contains both macros and data structures defined by the TDX
@@ -21,6 +23,76 @@
*/
#define TDH_SYS_INIT 33
#define TDH_SYS_LP_INIT 35
+#define TDH_SYS_INFO 32
+
+struct cmr_info {
+ u64 base;
+ u64 size;
+} __packed;
+
+#define MAX_CMRS 32
+#define CMR_INFO_ARRAY_ALIGNMENT 512
+
+struct cpuid_config {
+ u32 leaf;
+ u32 sub_leaf;
+ u32 eax;
+ u32 ebx;
+ u32 ecx;
+ u32 edx;
+} __packed;
+
+#define DECLARE_PADDED_STRUCT(type, name, size, alignment) \
+ struct type##_padded { \
+ union { \
+ struct type name; \
+ u8 padding[size]; \
+ }; \
+ } name##_padded __aligned(alignment)
+
+#define PADDED_STRUCT(name) (name##_padded.name)
+
+#define TDSYSINFO_STRUCT_SIZE 1024
+#define TDSYSINFO_STRUCT_ALIGNMENT 1024
+
+/*
+ * The size of this structure itself is flexible. The actual structure
+ * passed to TDH.SYS.INFO must be padded to TDSYSINFO_STRUCT_SIZE and be
+ * aligned to TDSYSINFO_STRUCT_ALIGNMENT using DECLARE_PADDED_STRUCT().
+ */
+struct tdsysinfo_struct {
+ /* TDX-SEAM Module Info */
+ u32 attributes;
+ u32 vendor_id;
+ u32 build_date;
+ u16 build_num;
+ u16 minor_version;
+ u16 major_version;
+ u8 reserved0[14];
+ /* Memory Info */
+ u16 max_tdmrs;
+ u16 max_reserved_per_tdmr;
+ u16 pamt_entry_size;
+ u8 reserved1[10];
+ /* Control Struct Info */
+ u16 tdcs_base_size;
+ u8 reserved2[2];
+ u16 tdvps_base_size;
+ u8 tdvps_xfam_dependent_size;
+ u8 reserved3[9];
+ /* TD Capabilities */
+ u64 attributes_fixed0;
+ u64 attributes_fixed1;
+ u64 xfam_fixed0;
+ u64 xfam_fixed1;
+ u8 reserved4[32];
+ u32 num_cpuid_config;
+ /*
+ * The actual number of CPUID_CONFIG depends on above
+ * 'num_cpuid_config'.
+ */
+ DECLARE_FLEX_ARRAY(struct cpuid_config, cpuid_configs);
+} __packed;

/*
* Do not put any hardware-defined TDX structure representations below
--
2.40.1

2023-06-04 14:59:43

[permalink] [raw]

Subject: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

Start to transit out the "multi-steps" to construct a list of "TD Memory
Regions" (TDMRs) to cover all TDX-usable memory regions.

The kernel configures TDX-usable memory regions by passing a list of
TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains
the information of the base/size of a memory region, the base/size of the
associated Physical Address Metadata Table (PAMT) and a list of reserved
areas in the region.

Do the first step to fill out a number of TDMRs to cover all TDX memory
regions. To keep it simple, always try to use one TDMR for each memory
region. As the first step only set up the base/size for each TDMR.

Each TDMR must be 1G aligned and the size must be in 1G granularity.
This implies that one TDMR could cover multiple memory regions. If a
memory region spans the 1GB boundary and the former part is already
covered by the previous TDMR, just use a new TDMR for the remaining
part.

TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs
are consumed but there is more memory region to cover.

There are fancier things that could be done like trying to merge
adjacent TDMRs. This would allow more pathological memory layouts to be
supported. But, current systems are not even close to exhausting the
existing TDMR resources in practice. For now, keep it simple.

Signed-off-by: Kai Huang <[email protected]>
---

v10 -> v11:
- No update

v9 -> v10:
- No change.

v8 -> v9:

- Added the last paragraph in the changelog (Dave).
- Removed unnecessary type cast in tdmr_entry() (Dave).

---
arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++-
1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 7a20c72361e7..fa9fa8bc581a 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list)
tdmr_list->max_tdmrs * tdmr_list->tdmr_sz);
}

+/* Get the TDMR from the list at the given index. */
+static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list,
+ int idx)
+{
+ int tdmr_info_offset = tdmr_list->tdmr_sz * idx;
+
+ return (void *)tdmr_list->tdmrs + tdmr_info_offset;
+}
+
+#define TDMR_ALIGNMENT BIT_ULL(30)
+#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT)
+#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT)
+#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT)
+
+static inline u64 tdmr_end(struct tdmr_info *tdmr)
+{
+ return tdmr->base + tdmr->size;
+}
+
+/*
+ * Take the memory referenced in @tmb_list and populate the
+ * preallocated @tdmr_list, following all the special alignment
+ * and size rules for TDMR.
+ */
+static int fill_out_tdmrs(struct list_head *tmb_list,
+ struct tdmr_info_list *tdmr_list)
+{
+ struct tdx_memblock *tmb;
+ int tdmr_idx = 0;
+
+ /*
+ * Loop over TDX memory regions and fill out TDMRs to cover them.
+ * To keep it simple, always try to use one TDMR to cover one
+ * memory region.
+ *
+ * In practice TDX1.0 supports 64 TDMRs, which is big enough to
+ * cover all memory regions in reality if the admin doesn't use
+ * 'memmap' to create a bunch of discrete memory regions. When
+ * there's a real problem, enhancement can be done to merge TDMRs
+ * to reduce the final number of TDMRs.
+ */
+ list_for_each_entry(tmb, tmb_list, list) {
+ struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx);
+ u64 start, end;
+
+ start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn));
+ end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn));
+
+ /*
+ * A valid size indicates the current TDMR has already
+ * been filled out to cover the previous memory region(s).
+ */
+ if (tdmr->size) {
+ /*
+ * Loop to the next if the current memory region
+ * has already been fully covered.
+ */
+ if (end <= tdmr_end(tdmr))
+ continue;
+
+ /* Otherwise, skip the already covered part. */
+ if (start < tdmr_end(tdmr))
+ start = tdmr_end(tdmr);
+
+ /*
+ * Create a new TDMR to cover the current memory
+ * region, or the remaining part of it.
+ */
+ tdmr_idx++;
+ if (tdmr_idx >= tdmr_list->max_tdmrs) {
+ pr_warn("initialization failed: TDMRs exhausted.\n");
+ return -ENOSPC;
+ }
+
+ tdmr = tdmr_entry(tdmr_list, tdmr_idx);
+ }
+
+ tdmr->base = start;
+ tdmr->size = end - start;
+ }
+
+ /* @tdmr_idx is always the index of last valid TDMR. */
+ tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1;
+
+ return 0;
+}
+
/*
* Construct a list of TDMRs on the preallocated space in @tdmr_list
* to cover all TDX memory regions in @tmb_list based on the TDX module
@@ -394,10 +481,15 @@ static int construct_tdmrs(struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list,
struct tdsysinfo_struct *sysinfo)
{
+ int ret;
+
+ ret = fill_out_tdmrs(tmb_list, tdmr_list);
+ if (ret)
+ return ret;
+
/*
* TODO:
*
- * - Fill out TDMRs to cover all TDX memory regions.
* - Allocate and set up PAMTs for each TDMR.
* - Designate reserved areas for each TDMR.
*
--
2.40.1

2023-06-06 01:02:17

by Isaku Yamahata

[permalink] [raw]

Subject: Re: [PATCH v11 00/20] TDX host kernel support

On Mon, Jun 05, 2023 at 02:27:13AM +1200,
Kai Huang <[email protected]> wrote:

> Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks. TDX specs are available in [1].
>
> This series is the initial support to enable TDX with minimal code to
> allow KVM to create and run TDX guests. KVM support for TDX is being
> developed separately[2]. A new "userspace inaccessible memfd" approach
> to support TDX private memory is also being developed[3]. The KVM will
> only support the new "userspace inaccessible memfd" as TDX guest memory.
>
> This series doesn't aim to support all functionalities, and doesn't aim
> to resolve all things perfectly. All other optimizations will be posted
> as follow-up once this initial TDX support is upstreamed.
>
> Also, the patch to add the new kernel comline tdx="force" isn't included
> in this initial version, as Dave suggested it isn't mandatory. But I
> will add one once this initial version gets merged.
>
> (For memory hotplug, sorry for broadcasting widely but I cc'ed the
> [email protected] following Kirill's suggestion so MM experts can also
> help to provide comments.)
>
> Hi Dave, Kirill, Tony, Peter, Thomas, Dan (and Intel reviewers),
>
> The new relaxed TDX per-cpu initialization flow has been verified. The
> TDX module can be initialized when there are offline cpus, and the
> TDH.SYS.LP.INIT SEAMCALL can be made successfully later after module
> initialization when the offline cpu is up.
>
> This series mainly added code to handle the new TDX "partial write
> machine check" erratum (SPR113) in [4].
>
> And I would appreciate reviewed-by or acked-by tags if the patches look
> good to you. Thanks in advance!

I've rebased the TDX KVM patch series v14 [1] with this patch series and
uploaded it at [2]. As the rebased TDX KVM patches doesn't have any changes
except trivial rebase fixes, I don't post something like v14.1.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://github.com/intel/tdx/tree/kvm-upstream-workaround
--
Isaku Yamahata <[email protected]>

2023-06-07 16:01:04

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On 6/4/23 07:27, Kai Huang wrote:
> Start to transit out the "multi-steps" to initialize the TDX module.
>
> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums. Not all memory
> satisfies these requirements.
>
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR). During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees.
>
> CMRs tell the kernel which memory is TDX compatible. The kernel takes
> CMRs (plus a little more metadata) and constructs "TD Memory Regions"
> (TDMRs). TDMRs let the kernel grant TDX protections to some or all of
> the CMR areas.
>
> The TDX module also reports necessary information to let the kernel
> build TDMRs and run TDX guests in structure 'tdsysinfo_struct'. The
> list of CMRs, along with the TDX module information, is available to
> the kernel by querying the TDX module.
>
> As a preparation to construct TDMRs, get the TDX module information and
> the list of CMRs. Print out CMRs to help user to decode which memory
> regions are TDX convertible.
>
> The 'tdsysinfo_struct' is fairly large (1024 bytes) and contains a lot
> of info about the TDX module. Fully define the entire structure, but
> only use the fields necessary to build the TDMRs and pr_info() some
> basics about the module. The rest of the fields will get used by KVM.
>
> For now both 'tdsysinfo_struct' and CMRs are only used during the module
> initialization. But because they are both relatively big, declare them
> inside the module initialization function but as static variables.
>
> Signed-off-by: Kai Huang <[email protected]>
> Reviewed-by: Isaku Yamahata <[email protected]>

Reviewed-by: Dave Hansen <[email protected]>

2023-06-07 16:28:10

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On 6/4/23 07:27, Kai Huang wrote:
> + /*
> + * Loop over TDX memory regions and fill out TDMRs to cover them.
> + * To keep it simple, always try to use one TDMR to cover one
> + * memory region.
> + *
> + * In practice TDX1.0 supports 64 TDMRs, which is big enough to
> + * cover all memory regions in reality if the admin doesn't use
> + * 'memmap' to create a bunch of discrete memory regions. When
> + * there's a real problem, enhancement can be done to merge TDMRs
> + * to reduce the final number of TDMRs.
> + */

Rather than focus in on one specific command-line parameter, let's just say:

In practice TDX supports at least 64 TDMRs. A 2-socket system
typically only consumes <NUMBER> of those. This code is dumb
and simple and may use more TMDRs than is strictly required.

Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of
the 64. We'll hopefully start to get reports somewhat in advance if
systems get close to the limit.

2023-06-08 00:36:16

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> For now both 'tdsysinfo_struct' and CMRs are only used during the module
> initialization. But because they are both relatively big, declare them
> inside the module initialization function but as static variables.

This justification does not make sense to me. static variables will not be
freed after function returned. They will still consume memory.

I think you need to allocate/free memory dynamically, if they are too big
for stack.

...

> static int init_tdx_module(void)
> {
> + static DECLARE_PADDED_STRUCT(tdsysinfo_struct, tdsysinfo,
> + TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT);
> + static struct cmr_info cmr_array[MAX_CMRS]
> + __aligned(CMR_INFO_ARRAY_ALIGNMENT);

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-08 03:05:27

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > initialization. But because they are both relatively big, declare them
> > inside the module initialization function but as static variables.
>
> This justification does not make sense to me. static variables will not be
> freed after function returned. They will still consume memory.
>
> I think you need to allocate/free memory dynamically, if they are too big
> for stack.

I do need to keep tdsysinfo_struct as it will be used by KVM too. CMRs are not
used by KVM now but they might get used in the future, e.g., we may want to
expose them to /sys in the future.

Also it takes more lines of code to do dynamic allocation. I'd prefer the code
simplicity. Dave is fine with static too, but prefers to putting them inside
the function:

https://lore.kernel.org/lkml/[email protected]/T/#mbfdaa353278588da09e43f3ce37b7bf8ddedc1b2

I can update the changelog to reflect above:

For now both 'tdsysinfo_struct' and CMRs are only used during the
module initialization. KVM will need to at least use
'tdsysinfo_struct'
when supporting TDX guests. For now just declare them inside the
module
initialization function but as static variables.

?

2023-06-08 11:04:00

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Wed, 2023-06-07 at 09:05 -0700, Dave Hansen wrote:
> On 6/4/23 07:27, Kai Huang wrote:
> > + /*
> > + * Loop over TDX memory regions and fill out TDMRs to cover them.
> > + * To keep it simple, always try to use one TDMR to cover one
> > + * memory region.
> > + *
> > + * In practice TDX1.0 supports 64 TDMRs, which is big enough to
> > + * cover all memory regions in reality if the admin doesn't use
> > + * 'memmap' to create a bunch of discrete memory regions. When
> > + * there's a real problem, enhancement can be done to merge TDMRs
> > + * to reduce the final number of TDMRs.
> > + */
>
> Rather than focus in on one specific command-line parameter, let's just say:
>
> In practice TDX supports at least 64 TDMRs. A 2-socket system
> typically only consumes <NUMBER> of those. This code is dumb
> and simple and may use more TMDRs than is strictly required.

Thanks will do. Will take a look at machine to get the <NUMBER>.

>
> Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of
> the 64. We'll hopefully start to get reports somewhat in advance if
> systems get close to the limit.

May I ask why this is useful? TDX module can only be initialized once, so if
not considering module runtime update case, the kernel can only get two results
for once:

1) Succeed to initialize: consumed TDMRs doesn't exceed maximum TDMRs
2) Fail to initialize: consumed TDMRs exceeds maximum TDMRs

What's the value of pr_warn() user when consumed TDMRs exceeds some threshold?

Anyway, if you want it, how does below code look?

static int fill_out_tdmrs(struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list)
{
+ int consumed_tdmrs_threshold, tdmr_idx = 0;
struct tdx_memblock *tmb;
- int tdmr_idx = 0;

/*
* Loop over TDX memory regions and fill out TDMRs to cover them.
* To keep it simple, always try to use one TDMR to cover one
* memory region.
*
- * In practice TDX1.0 supports 64 TDMRs, which is big enough to
- * cover all memory regions in reality if the admin doesn't use
- * 'memmap' to create a bunch of discrete memory regions. When
- * there's a real problem, enhancement can be done to merge TDMRs
- * to reduce the final number of TDMRs.
+ * In practice TDX supports at least 64 TDMRs. A 2-socket system
+ * typically only consumes <NUMBER> of those. This code is dumb
+ * and simple and may use more TMDRs than is strictly required.
+ *
+ * Also set a threshold of consumed TDMRs, and pr_warn() to warn
+ * the user the system is getting close to the limit of supported
+ * number of TDMRs if the number of consumed TDMRs exceeds the
+ * threshold.
*/
+ consumed_tdmrs_threshold = tdmr_list->max_tdmrs * 3 / 4;
list_for_each_entry(tmb, tmb_list, list) {
struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx);
u64 start, end;
@@ -463,6 +467,10 @@ static int fill_out_tdmrs(struct list_head *tmb_list,
return -ENOSPC;
}

+ if (tdmr_idx == consumed_tdmrs_threshold)
+ pr_warn("consumed TDMRs reaching limit: %d used
(out of %d)\n",
+ tdmr_idx, tdmr_list->max_tdmrs);
+
tdmr = tdmr_entry(tdmr_list, tdmr_idx);
}

2023-06-08 12:23:21

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, Jun 08, 2023 at 02:40:27AM +0000, Huang, Kai wrote:
> On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> > On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > > initialization. But because they are both relatively big, declare them
> > > inside the module initialization function but as static variables.
> >
> > This justification does not make sense to me. static variables will not be
> > freed after function returned. They will still consume memory.
> >
> > I think you need to allocate/free memory dynamically, if they are too big
> > for stack.
>
>
> I do need to keep tdsysinfo_struct as it will be used by KVM too.

Will you pass it down to KVM from this function? Will KVM use the struct
after the function returns?

> CMRs are not
> used by KVM now but they might get used in the future, e.g., we may want to
> expose them to /sys in the future.
>
> Also it takes more lines of code to do dynamic allocation. I'd prefer the code
> simplicity.

These structures take 1.5K of memory and the memory will be allocated for
all machines that boots the kernel with TDX enabled, regardless if the
machine has TDX or not. It seems very wasteful to me.

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-08 13:41:00

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On 6/8/23 03:48, Huang, Kai wrote:
>> Let's also put a pr_warn() in here if we exceed, say 1/2 or maybe 3/4 of
>> the 64. We'll hopefully start to get reports somewhat in advance if
>> systems get close to the limit.
> May I ask why this is useful? TDX module can only be initialized once, so if
> not considering module runtime update case, the kernel can only get two results
> for once:
>
> 1) Succeed to initialize: consumed TDMRs doesn't exceed maximum TDMRs
> 2) Fail to initialize: consumed TDMRs exceeds maximum TDMRs
>
> What's the value of pr_warn() user when consumed TDMRs exceeds some threshold?

Today, we're saying, "64 TMDRs out to be enough for anybody!"

I'd actually kinda like to know if anybody starts building platforms
that get anywhere near using 64. That way, we won't get a bug report
that TDX is broken and we'll have a fire drill. We'll get a bug report
that TDX is complaining and we'll have some time to go fix it without
anyone actually being broken.

Maybe not even a pr_warn(), but something that's a bit ominous and has a
chance of getting users to act.

2023-06-08 13:49:54

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On 6/8/23 04:41, [email protected] wrote:
> These structures take 1.5K of memory and the memory will be allocated for
> all machines that boots the kernel with TDX enabled, regardless if the
> machine has TDX or not. It seems very wasteful to me.

Actually, those variables are in .bss. They're allocated forever for
anyone that runs a kernel that has TDX support.

2023-06-08 21:53:14

by Dan Williams

[permalink] [raw]

Subject: RE: [PATCH v11 00/20] TDX host kernel support

Kai Huang wrote:
> Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks. TDX specs are available in [1].
>
> This series is the initial support to enable TDX with minimal code to
> allow KVM to create and run TDX guests. KVM support for TDX is being
> developed separately[2]. A new "userspace inaccessible memfd" approach
> to support TDX private memory is also being developed[3]. The KVM will
> only support the new "userspace inaccessible memfd" as TDX guest memory.

This memfd approach is incompatible with one of the primary ways that
new memory topologies like high-bandwidth-memory and CXL are accessed,
via a device-special-file mapping. There is already precedent for mmap()
to only be used for communicating address value and not CPU accessible
memory. See "Userspace P2PDMA with O_DIRECT NVMe devices" [1].

So before this memfd requirement becomes too baked in to the design I
want to understand if "userspace inaccessible" is the only requirement
so I can look to add that to the device-special-file interface for
"device" / "Soft Reserved" memory like HBM and CXL.

[1]: https://lore.kernel.org/all/[email protected]/

2023-06-08 23:17:57

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Mon, Jun 05, 2023 at 02:27:24AM +1200, Kai Huang wrote:
> +#define TDMR_ALIGNMENT BIT_ULL(30)

Nit: SZ_1G can be a little bit more readable here.

Anyway:

Reviewed-by: Kirill A. Shutemov <[email protected]>

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-09 00:15:57

by Isaku Yamahata

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, Jun 08, 2023 at 02:41:28PM +0300,
"[email protected]" <[email protected]> wrote:

> On Thu, Jun 08, 2023 at 02:40:27AM +0000, Huang, Kai wrote:
> > On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> > > On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > > > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > > > initialization. But because they are both relatively big, declare them
> > > > inside the module initialization function but as static variables.
> > >
> > > This justification does not make sense to me. static variables will not be
> > > freed after function returned. They will still consume memory.
> > >
> > > I think you need to allocate/free memory dynamically, if they are too big
> > > for stack.
> >
> >
> > I do need to keep tdsysinfo_struct as it will be used by KVM too.
>
> Will you pass it down to KVM from this function? Will KVM use the struct
> after the function returns?

KVM needs tdsysinfo_struct to create guest TD. It doesn't require
1024-alignment.
--
Isaku Yamahata <[email protected]>

2023-06-09 00:24:32

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, Jun 08, 2023 at 04:29:19PM -0700, Isaku Yamahata wrote:
> On Thu, Jun 08, 2023 at 02:41:28PM +0300,
> "[email protected]" <[email protected]> wrote:
>
> > On Thu, Jun 08, 2023 at 02:40:27AM +0000, Huang, Kai wrote:
> > > On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> > > > On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > > > > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > > > > initialization. But because they are both relatively big, declare them
> > > > > inside the module initialization function but as static variables.
> > > >
> > > > This justification does not make sense to me. static variables will not be
> > > > freed after function returned. They will still consume memory.
> > > >
> > > > I think you need to allocate/free memory dynamically, if they are too big
> > > > for stack.
> > >
> > >
> > > I do need to keep tdsysinfo_struct as it will be used by KVM too.
> >
> > Will you pass it down to KVM from this function? Will KVM use the struct
> > after the function returns?
>
> KVM needs tdsysinfo_struct to create guest TD. It doesn't require
> 1024-alignment.

How KVM gets it from here?

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-09 02:02:01

by Isaku Yamahata

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Fri, Jun 09, 2023 at 02:54:41AM +0300,
"[email protected]" <[email protected]> wrote:

> On Thu, Jun 08, 2023 at 04:29:19PM -0700, Isaku Yamahata wrote:
> > On Thu, Jun 08, 2023 at 02:41:28PM +0300,
> > "[email protected]" <[email protected]> wrote:
> >
> > > On Thu, Jun 08, 2023 at 02:40:27AM +0000, Huang, Kai wrote:
> > > > On Thu, 2023-06-08 at 03:27 +0300, [email protected] wrote:
> > > > > On Mon, Jun 05, 2023 at 02:27:21AM +1200, Kai Huang wrote:
> > > > > > For now both 'tdsysinfo_struct' and CMRs are only used during the module
> > > > > > initialization. But because they are both relatively big, declare them
> > > > > > inside the module initialization function but as static variables.
> > > > >
> > > > > This justification does not make sense to me. static variables will not be
> > > > > freed after function returned. They will still consume memory.
> > > > >
> > > > > I think you need to allocate/free memory dynamically, if they are too big
> > > > > for stack.
> > > >
> > > >
> > > > I do need to keep tdsysinfo_struct as it will be used by KVM too.
> > >
> > > Will you pass it down to KVM from this function? Will KVM use the struct
> > > after the function returns?
> >
> > KVM needs tdsysinfo_struct to create guest TD. It doesn't require
> > 1024-alignment.
>
> How KVM gets it from here?

For now, TDX KVM patch series moves the tdsysinfo out of the function, and add
a getter function of it.
As long as KVM can access the info, it doesn't care how its memory is allocated.
static or dynamic.
--
Isaku Yamahata <[email protected]>

2023-06-09 04:17:23

by Kuppuswamy Sathyanarayanan

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On 6/4/23 7:27 AM, Kai Huang wrote:
> Start to transit out the "multi-steps" to construct a list of "TD Memory
> Regions" (TDMRs) to cover all TDX-usable memory regions.
>
> The kernel configures TDX-usable memory regions by passing a list of
> TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains
> the information of the base/size of a memory region, the base/size of the
> associated Physical Address Metadata Table (PAMT) and a list of reserved
> areas in the region.
>
> Do the first step to fill out a number of TDMRs to cover all TDX memory
> regions. To keep it simple, always try to use one TDMR for each memory
> region. As the first step only set up the base/size for each TDMR.

As a first step?

>
> Each TDMR must be 1G aligned and the size must be in 1G granularity.
> This implies that one TDMR could cover multiple memory regions. If a
> memory region spans the 1GB boundary and the former part is already
> covered by the previous TDMR, just use a new TDMR for the remaining
> part.
>
> TDX only supports a limited number of TDMRs. Disable TDX if all TDMRs
> are consumed but there is more memory region to cover.
>
> There are fancier things that could be done like trying to merge
> adjacent TDMRs. This would allow more pathological memory layouts to be
> supported. But, current systems are not even close to exhausting the
> existing TDMR resources in practice. For now, keep it simple.
>
> Signed-off-by: Kai Huang <[email protected]>
> ---
>
> v10 -> v11:
> - No update
>
> v9 -> v10:
> - No change.
>
> v8 -> v9:
>
> - Added the last paragraph in the changelog (Dave).
> - Removed unnecessary type cast in tdmr_entry() (Dave).
>
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++-
> 1 file changed, 93 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 7a20c72361e7..fa9fa8bc581a 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list)
> tdmr_list->max_tdmrs * tdmr_list->tdmr_sz);
> }
>
> +/* Get the TDMR from the list at the given index. */
> +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list,
> + int idx)
> +{
> + int tdmr_info_offset = tdmr_list->tdmr_sz * idx;
> +
> + return (void *)tdmr_list->tdmrs + tdmr_info_offset;
> +}
> +
> +#define TDMR_ALIGNMENT BIT_ULL(30)
> +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT)

This macro is never used. Maybe you can drop it from this patch.

> +#define TDMR_ALIGN_DOWN(_addr) ALIGN_DOWN((_addr), TDMR_ALIGNMENT)
> +#define TDMR_ALIGN_UP(_addr) ALIGN((_addr), TDMR_ALIGNMENT)
> +
> +static inline u64 tdmr_end(struct tdmr_info *tdmr)
> +{
> + return tdmr->base + tdmr->size;
> +}
> +
> +/*
> + * Take the memory referenced in @tmb_list and populate the
> + * preallocated @tdmr_list, following all the special alignment
> + * and size rules for TDMR.
> + */
> +static int fill_out_tdmrs(struct list_head *tmb_list,
> + struct tdmr_info_list *tdmr_list)
> +{
> + struct tdx_memblock *tmb;
> + int tdmr_idx = 0;
> +
> + /*
> + * Loop over TDX memory regions and fill out TDMRs to cover them.
> + * To keep it simple, always try to use one TDMR to cover one
> + * memory region.
> + *
> + * In practice TDX1.0 supports 64 TDMRs, which is big enough to
> + * cover all memory regions in reality if the admin doesn't use
> + * 'memmap' to create a bunch of discrete memory regions. When
> + * there's a real problem, enhancement can be done to merge TDMRs
> + * to reduce the final number of TDMRs.
> + */
> + list_for_each_entry(tmb, tmb_list, list) {
> + struct tdmr_info *tdmr = tdmr_entry(tdmr_list, tdmr_idx);
> + u64 start, end;
> +
> + start = TDMR_ALIGN_DOWN(PFN_PHYS(tmb->start_pfn));
> + end = TDMR_ALIGN_UP(PFN_PHYS(tmb->end_pfn));
> +
> + /*
> + * A valid size indicates the current TDMR has already
> + * been filled out to cover the previous memory region(s).
> + */
> + if (tdmr->size) {
> + /*
> + * Loop to the next if the current memory region
> + * has already been fully covered.
> + */
> + if (end <= tdmr_end(tdmr))
> + continue;
> +
> + /* Otherwise, skip the already covered part. */
> + if (start < tdmr_end(tdmr))
> + start = tdmr_end(tdmr);
> +
> + /*
> + * Create a new TDMR to cover the current memory
> + * region, or the remaining part of it.
> + */
> + tdmr_idx++;
> + if (tdmr_idx >= tdmr_list->max_tdmrs) {
> + pr_warn("initialization failed: TDMRs exhausted.\n");
> + return -ENOSPC;
> + }
> +
> + tdmr = tdmr_entry(tdmr_list, tdmr_idx);
> + }
> +
> + tdmr->base = start;
> + tdmr->size = end - start;
> + }
> +
> + /* @tdmr_idx is always the index of last valid TDMR. */
> + tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1;
> +
> + return 0;
> +}
> +
> /*
> * Construct a list of TDMRs on the preallocated space in @tdmr_list
> * to cover all TDX memory regions in @tmb_list based on the TDX module
> @@ -394,10 +481,15 @@ static int construct_tdmrs(struct list_head *tmb_list,
> struct tdmr_info_list *tdmr_list,
> struct tdsysinfo_struct *sysinfo)
> {
> + int ret;
> +
> + ret = fill_out_tdmrs(tmb_list, tdmr_list);
> + if (ret)
> + return ret;
> +
> /*
> * TODO:
> *
> - * - Fill out TDMRs to cover all TDX memory regions.
> * - Allocate and set up PAMTs for each TDMR.
> * - Designate reserved areas for each TDMR.
> *

Rest looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2023-06-09 10:39:06

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

> @@ -21,6 +23,76 @@
> */
> #define TDH_SYS_INIT 33
> #define TDH_SYS_LP_INIT 35
> +#define TDH_SYS_INFO 32

Could you keep these defines ordered? Here and all following patches.

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-12 02:22:28

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Thu, 2023-06-08 at 06:13 -0700, Dave Hansen wrote:
> On 6/8/23 04:41, [email protected] wrote:
> > These structures take 1.5K of memory and the memory will be allocated for
> > all machines that boots the kernel with TDX enabled, regardless if the
> > machine has TDX or not. It seems very wasteful to me.
>
> Actually, those variables are in .bss. They're allocated forever for
> anyone that runs a kernel that has TDX support.
>

Hi Dave/Kirill,

Thanks for feedback.

My understanding is you both prefer dynamic allocation. I'll change to use
that. Also I will free them after module initialization as for now they are
only used by module initialization.

Please let me know if you have any comments.

2023-06-12 02:26:04

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On Fri, 2023-06-09 at 13:02 +0300, [email protected] wrote:
> > @@ -21,6 +23,76 @@
> > */
> > #define TDH_SYS_INIT 33
> > #define TDH_SYS_LP_INIT 35
> > +#define TDH_SYS_INFO 32
>
> Could you keep these defines ordered? Here and all following patches.
>

Sure will do. Thanks.

2023-06-12 02:28:02

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Fri, 2023-06-09 at 02:02 +0300, [email protected] wrote:
> On Mon, Jun 05, 2023 at 02:27:24AM +1200, Kai Huang wrote:
> > +#define TDMR_ALIGNMENT BIT_ULL(30)
>
> Nit: SZ_1G can be a little bit more readable here.

Will do.

>
> Anyway:
>
> Reviewed-by: Kirill A. Shutemov <[email protected]>
>
>

Thanks.

2023-06-12 02:55:16

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Thu, 2023-06-08 at 21:01 -0700, Sathyanarayanan Kuppuswamy wrote:
>
> On 6/4/23 7:27 AM, Kai Huang wrote:
> > Start to transit out the "multi-steps" to construct a list of "TD Memory
> > Regions" (TDMRs) to cover all TDX-usable memory regions.
> >
> > The kernel configures TDX-usable memory regions by passing a list of
> > TDMRs "TD Memory Regions" (TDMRs) to the TDX module. Each TDMR contains
> > the information of the base/size of a memory region, the base/size of the
> > associated Physical Address Metadata Table (PAMT) and a list of reserved
> > areas in the region.
> >
> > Do the first step to fill out a number of TDMRs to cover all TDX memory
> > regions. To keep it simple, always try to use one TDMR for each memory
> > region. As the first step only set up the base/size for each TDMR.
>
> As a first step?

Not sure there are two or more first steps? I think I'll keep it as is.

[...]

> > +#define TDMR_ALIGNMENT BIT_ULL(30)
> > +#define TDMR_PFN_ALIGNMENT (TDMR_ALIGNMENT >> PAGE_SHIFT)
>
> This macro is never used. Maybe you can drop it from this patch.

OK will do.

[...]

>
> Rest looks good to me.
>
> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
>

Thanks.

2023-06-12 03:02:55

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

>
> Maybe not even a pr_warn(), but something that's a bit ominous and has a
> chance of getting users to act.

Sorry I am not sure how to do. Could you give some suggestion?

2023-06-12 11:39:32

[permalink] [raw]

Subject: Re: [PATCH v11 00/20] TDX host kernel support

On Thu, 2023-06-08 at 14:03 -0700, Dan Williams wrote:
> Kai Huang wrote:
> > Intel Trusted Domain Extensions (TDX) protects guest VMs from malicious
> > host and certain physical attacks. TDX specs are available in [1].
> >
> > This series is the initial support to enable TDX with minimal code to
> > allow KVM to create and run TDX guests. KVM support for TDX is being
> > developed separately[2]. A new "userspace inaccessible memfd" approach
> > to support TDX private memory is also being developed[3]. The KVM will
> > only support the new "userspace inaccessible memfd" as TDX guest memory.
>
> This memfd approach is incompatible with one of the primary ways that
> new memory topologies like high-bandwidth-memory and CXL are accessed,
> via a device-special-file mapping. There is already precedent for mmap()
> to only be used for communicating address value and not CPU accessible
> memory. See "Userspace P2PDMA with O_DIRECT NVMe devices" [1].
>
> So before this memfd requirement becomes too baked in to the design I
> want to understand if "userspace inaccessible" is the only requirement
> so I can look to add that to the device-special-file interface for
> "device" / "Soft Reserved" memory like HBM and CXL.
>
> [1]: https://lore.kernel.org/all/[email protected]/

+ Peng, Chao who is working on this with Sean.

There are some recent developments around the design of the "userspace
inaccessible memfd", e.g., IIUC Sean is proposing to replace the new syscall
with a new KVM ioctl().

Hi Sean, Chao,

Could you give comments to Dan's concern?

2023-06-12 15:21:36

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote:
>
> >
> > Maybe not even a pr_warn(), but something that's a bit ominous and has a
> > chance of getting users to act.
>
> Sorry I am not sure how to do. Could you give some suggestion?

Maybe something like this would do?

I'm struggle with the warning message. Any suggestion is welcome.

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 9cd4f6b58d4a..cc141025b249 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -627,6 +627,15 @@ static int fill_out_tdmrs(struct list_head *tmb_list,
/* @tdmr_idx is always the index of last valid TDMR. */
tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1;

+ /*
+ * Warn early that kernel is about to run out of TDMRs.
+ *
+ * This is indication that TDMR allocation has to be reworked to be
+ * smarter to not run into an issue.
+ */
+ if (tdmr_list->max_tdmrs - tdmr_list->nr_consumed_tdmrs < TDMR_NR_WARN)
+ pr_warn("Low number of spare TDMRs\n");
+
return 0;
}

diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 323ce744b853..17efe33847ae 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -98,6 +98,9 @@ struct tdx_memblock {
int nid;
};

+/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
+#define TDMR_NR_WARN 4
+
struct tdmr_info_list {
void *tdmrs; /* Flexible array to hold 'tdmr_info's */
int nr_consumed_tdmrs; /* How many 'tdmr_info's are in use */
--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-12 22:29:50

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Mon, 2023-06-12 at 17:33 +0300, [email protected] wrote:
> On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote:
> >
> > >
> > > Maybe not even a pr_warn(), but something that's a bit ominous and has a
> > > chance of getting users to act.
> >
> > Sorry I am not sure how to do. Could you give some suggestion?
>
> Maybe something like this would do?
>
> I'm struggle with the warning message. Any suggestion is welcome.

I guess it would be helpful to print out the actual consumed TDMRs?

pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n",
tdmr_idx, tdmr_list->max_tdmrs);

>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 9cd4f6b58d4a..cc141025b249 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -627,6 +627,15 @@ static int fill_out_tdmrs(struct list_head *tmb_list,
> /* @tdmr_idx is always the index of last valid TDMR. */
> tdmr_list->nr_consumed_tdmrs = tdmr_idx + 1;
>
> + /*
> + * Warn early that kernel is about to run out of TDMRs.
> + *
> + * This is indication that TDMR allocation has to be reworked to be
> + * smarter to not run into an issue.
> + */
> + if (tdmr_list->max_tdmrs - tdmr_list->nr_consumed_tdmrs < TDMR_NR_WARN)
> + pr_warn("Low number of spare TDMRs\n");
> +
> return 0;
> }
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 323ce744b853..17efe33847ae 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -98,6 +98,9 @@ struct tdx_memblock {
> int nid;
> };
>
> +/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
> +#define TDMR_NR_WARN 4
> +
> struct tdmr_info_list {
> void *tdmrs; /* Flexible array to hold 'tdmr_info's */
> int nr_consumed_tdmrs; /* How many 'tdmr_info's are in use */

2023-06-13 10:32:32

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Mon, Jun 12, 2023 at 10:10:39PM +0000, Huang, Kai wrote:
> On Mon, 2023-06-12 at 17:33 +0300, [email protected] wrote:
> > On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote:
> > >
> > > >
> > > > Maybe not even a pr_warn(), but something that's a bit ominous and has a
> > > > chance of getting users to act.
> > >
> > > Sorry I am not sure how to do. Could you give some suggestion?
> >
> > Maybe something like this would do?
> >
> > I'm struggle with the warning message. Any suggestion is welcome.
>
> I guess it would be helpful to print out the actual consumed TDMRs?
>
> pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n",
> tdmr_idx, tdmr_list->max_tdmrs);

It is off-by-one. It supposed to be tdmr_idx + 1.

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-13 23:24:24

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Tue, 2023-06-13 at 13:18 +0300, [email protected] wrote:
> On Mon, Jun 12, 2023 at 10:10:39PM +0000, Huang, Kai wrote:
> > On Mon, 2023-06-12 at 17:33 +0300, [email protected] wrote:
> > > On Mon, Jun 12, 2023 at 02:33:58AM +0000, Huang, Kai wrote:
> > > >
> > > > >
> > > > > Maybe not even a pr_warn(), but something that's a bit ominous and has a
> > > > > chance of getting users to act.
> > > >
> > > > Sorry I am not sure how to do. Could you give some suggestion?
> > >
> > > Maybe something like this would do?
> > >
> > > I'm struggle with the warning message. Any suggestion is welcome.
> >
> > I guess it would be helpful to print out the actual consumed TDMRs?
> >
> > pr_warn("consumed TDMRs reaching limit: %d used (out of %d)\n",
> > tdmr_idx, tdmr_list->max_tdmrs);
>
> It is off-by-one. It supposed to be tdmr_idx + 1.
>

In your code, yes. Thanks for pointing out. I copied it from my code.

2023-06-14 12:53:48

by Nikolay Borisov

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On 4.06.23 г. 17:27 ч., Kai Huang wrote:
<snip>

> ---
> arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++-
> 1 file changed, 93 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 7a20c72361e7..fa9fa8bc581a 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list)
> tdmr_list->max_tdmrs * tdmr_list->tdmr_sz);
> }
>
> +/* Get the TDMR from the list at the given index. */
> +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list,
> + int idx)
> +{
> + int tdmr_info_offset = tdmr_list->tdmr_sz * idx;
> +
> + return (void *)tdmr_list->tdmrs + tdmr_info_offset;

nit: I would just like to point that sizeof(void *) being treated as 1
is a gcc-specific compiler extension:
https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Pointer-Arith.html#Pointer-Arith

I don't know if clang treats it the same way, just for the sake of
simplicity you might wanna change this (void *) to (char *).

<snip>

2023-06-14 23:11:01

[permalink] [raw]

Subject: Re: [PATCH v11 11/20] x86/virt/tdx: Fill out TDMRs to cover all TDX memory regions

On Wed, 2023-06-14 at 15:31 +0300, Nikolay Borisov wrote:
>
> On 4.06.23 г. 17:27 ч., Kai Huang wrote:
> <snip>
>
> > ---
> > arch/x86/virt/vmx/tdx/tdx.c | 94 ++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 93 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 7a20c72361e7..fa9fa8bc581a 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -385,6 +385,93 @@ static void free_tdmr_list(struct tdmr_info_list *tdmr_list)
> > tdmr_list->max_tdmrs * tdmr_list->tdmr_sz);
> > }
> >
> > +/* Get the TDMR from the list at the given index. */
> > +static struct tdmr_info *tdmr_entry(struct tdmr_info_list *tdmr_list,
> > + int idx)
> > +{
> > + int tdmr_info_offset = tdmr_list->tdmr_sz * idx;
> > +
> > + return (void *)tdmr_list->tdmrs + tdmr_info_offset;
>
> nit: I would just like to point that sizeof(void *) being treated as 1
> is a gcc-specific compiler extension:
> https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Pointer-Arith.html#Pointer-Arith
>
> I don't know if clang treats it the same way, just for the sake of
> simplicity you might wanna change this (void *) to (char *).

Then we will need additional cast from 'char *' to 'struct tdmr_info *' I
suppose? Not sure whether it is worth the additional cast.

And I found such 'void *' arithmetic operation is already used in other kernel
code too, e.g., below code in networking code:

./net/rds/tcp_send.c:105: (void *)&rm->m_inc.i_hdr + hdr_off,

and I believe there are other examples too (that I didn't spend a lot of time to
grep).

And it seems Linus also thinks "using arithmetic on 'void *' is generally
superior":

https://lore.kernel.org/lkml/CAHk-=whFKYMrF6euVvziW+drw7-yi1pYdf=uccnzJ8k09DoTXA@mail.gmail.com/t/#m983827708903c8c5bddf193343d392c9ed5af1a0

So I wouldn't worry about the Clang thing.

2023-06-19 13:43:52

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory

On 04.06.23 16:27, Kai Huang wrote:
> Start to transit out the "multi-steps" to initialize the TDX module.
>
> TDX provides increased levels of memory confidentiality and integrity.
> This requires special hardware support for features like memory
> encryption and storage of memory integrity checksums. Not all memory
> satisfies these requirements.
>
> As a result, TDX introduced the concept of a "Convertible Memory Region"
> (CMR). During boot, the firmware builds a list of all of the memory
> ranges which can provide the TDX security guarantees.
>
> CMRs tell the kernel which memory is TDX compatible. The kernel takes
> CMRs (plus a little more metadata) and constructs "TD Memory Regions"
> (TDMRs). TDMRs let the kernel grant TDX protections to some or all of
> the CMR areas.
>
> The TDX module also reports necessary information to let the kernel
> build TDMRs and run TDX guests in structure 'tdsysinfo_struct'. The
> list of CMRs, along with the TDX module information, is available to
> the kernel by querying the TDX module.
>
> As a preparation to construct TDMRs, get the TDX module information and
> the list of CMRs. Print out CMRs to help user to decode which memory
> regions are TDX convertible.
>
> The 'tdsysinfo_struct' is fairly large (1024 bytes) and contains a lot
> of info about the TDX module. Fully define the entire structure, but
> only use the fields necessary to build the TDMRs and pr_info() some
> basics about the module. The rest of the fields will get used by KVM.
>
> For now both 'tdsysinfo_struct' and CMRs are only used during the module
> initialization. But because they are both relatively big, declare them
> inside the module initialization function but as static variables.
>
> Signed-off-by: Kai Huang <[email protected]>
> Reviewed-by: Isaku Yamahata <[email protected]>
> ---

[...]

> ---
> arch/x86/virt/vmx/tdx/tdx.c | 67 +++++++++++++++++++++++++++++++++-
> arch/x86/virt/vmx/tdx/tdx.h | 72 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 138 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index bcf2b2d15a2e..9fde0f71dd8b 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -20,6 +20,7 @@
> #include <asm/msr-index.h>
> #include <asm/msr.h>
> #include <asm/archrandom.h>
> +#include <asm/page.h>
> #include <asm/tdx.h>
> #include "tdx.h"
>
> @@ -191,12 +192,76 @@ int tdx_cpu_enable(void)
> }
> EXPORT_SYMBOL_GPL(tdx_cpu_enable);
>
> +static inline bool is_cmr_empty(struct cmr_info *cmr)
> +{
> + return !cmr->size;
> +}
> +

Nit: maybe it's just me, but this function seems unnecessary.

If "!cmr->size" is not expressive, then I don't know why "is_cmr_empty"
should be. Just inline that into the single user.

.. after all the single caller also uses/prints cmr->size ...

> +static void print_cmrs(struct cmr_info *cmr_array, int nr_cmrs)
> +{
> + int i;
> +
> + for (i = 0; i < nr_cmrs; i++) {
> + struct cmr_info *cmr = &cmr_array[i];
> +
> + /*
> + * The array of CMRs reported via TDH.SYS.INFO can
> + * contain tail empty CMRs. Don't print them.
> + */
> + if (is_cmr_empty(cmr))
> + break;
> +
> + pr_info("CMR: [0x%llx, 0x%llx)\n", cmr->base,
> + cmr->base + cmr->size);
> + }
> +}
> +
> +/*
> + * Get the TDX module information (TDSYSINFO_STRUCT) and the array of
> + * CMRs, and save them to @sysinfo and @cmr_array. @sysinfo must have
> + * been padded to have enough room to save the TDSYSINFO_STRUCT.
> + */
> +static int tdx_get_sysinfo(struct tdsysinfo_struct *sysinfo,
> + struct cmr_info *cmr_array)
> +{
> + struct tdx_module_output out;
> + u64 sysinfo_pa, cmr_array_pa;
> + int ret;
> +
> + sysinfo_pa = __pa(sysinfo);
> + cmr_array_pa = __pa(cmr_array);
> + ret = seamcall(TDH_SYS_INFO, sysinfo_pa, TDSYSINFO_STRUCT_SIZE,
> + cmr_array_pa, MAX_CMRS, NULL, &out);
> + if (ret)
> + return ret;
> +
> + pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",

"attributes" ?

> + sysinfo->attributes, sysinfo->vendor_id,
> + sysinfo->major_version, sysinfo->minor_version,
> + sysinfo->build_date, sysinfo->build_num);
> +
> + /* R9 contains the actual entries written to the CMR array. */
> + print_cmrs(cmr_array, out.r9);
> +
> + return 0;
> +}
> +
> static int init_tdx_module(void)
> {
> + static DECLARE_PADDED_STRUCT(tdsysinfo_struct, tdsysinfo,
> + TDSYSINFO_STRUCT_SIZE, TDSYSINFO_STRUCT_ALIGNMENT);
> + static struct cmr_info cmr_array[MAX_CMRS]
> + __aligned(CMR_INFO_ARRAY_ALIGNMENT);
> + struct tdsysinfo_struct *sysinfo = &PADDED_STRUCT(tdsysinfo);
> + int ret;
> +
> + ret = tdx_get_sysinfo(sysinfo, cmr_array);
> + if (ret)
> + return ret;
> +
> /*
> * TODO:
> *
> - * - Get TDX module information and TDX-capable memory regions.
> * - Build the list of TDX-usable memory regions.
> * - Construct a list of "TD Memory Regions" (TDMRs) to cover
> * all TDX-usable memory regions.
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 9fb46033c852..97f4d7e7f1a4 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -3,6 +3,8 @@
> #define _X86_VIRT_TDX_H
>
> #include <linux/types.h>
> +#include <linux/stddef.h>
> +#include <linux/compiler_attributes.h>
>
> /*
> * This file contains both macros and data structures defined by the TDX
> @@ -21,6 +23,76 @@
> */
> #define TDH_SYS_INIT 33
> #define TDH_SYS_LP_INIT 35
> +#define TDH_SYS_INFO 32
> +
> +struct cmr_info {
> + u64 base;
> + u64 size;
> +} __packed;
> +
> +#define MAX_CMRS 32
> +#define CMR_INFO_ARRAY_ALIGNMENT 512
> +
> +struct cpuid_config {
> + u32 leaf;
> + u32 sub_leaf;
> + u32 eax;
> + u32 ebx;
> + u32 ecx;
> + u32 edx;
> +} __packed;
> +
> +#define DECLARE_PADDED_STRUCT(type, name, size, alignment) \
> + struct type##_padded { \
> + union { \
> + struct type name; \
> + u8 padding[size]; \
> + }; \
> + } name##_padded __aligned(alignment)
> +
> +#define PADDED_STRUCT(name) (name##_padded.name)
> +
> +#define TDSYSINFO_STRUCT_SIZE 1024

So, it can never be larger than 1024 bytes? Not even with many cpuid
configs?

> +#define TDSYSINFO_STRUCT_ALIGNMENT 1024
> +

--
Cheers,

David / dhildenb

2023-06-20 00:04:34