2024-06-16 12:01:59

by Kai Huang

[permalink] [raw]
Subject: [PATCH 0/9] TDX host: metadata reading tweaks, bug fix and info dump

TL;DR:

This series does necessary tweaks to TDX host "global metadata" reading
code to fix some immediate issues in the TDX module initialization code,
with intention to also provide a flexible code base to support sharing
global metadata to KVM (and other kernel components) for future needs.

Hi Dave, Dan, Rick,

The "reading CMRs" part is a new (related) issue that I got received
last week so I also included in this patchset.

=== More info ===

TDX module provides a set of "global metadata fields" for software to
query. They report things like TDX module version, supported features
fields required for creating TDX guests and so on.

Today the TDX host code already reads "TD Memory Region" (TDMR) related
metadata fields for module initialization. There are immediate needs
that require TDX host code to read more metadata fields:

- Dump basic TDX module info [1];
- Reject module with no NO_RBP_MOD feature support [2];
- Read CMR info to fix a module initialization failure bug [3].

Also, the upstreaming-on-going KVM TDX support [4] requires to read more
global metadata fields. In the longer term, the TDX Connect [5] (which
supports assigning trusted IO devices to TDX guest) may also require
other kernel components (e.g., pci/vt-d) to access more metadata.

To meet all of those, the idea is the TDX host core-kernel to provide a
centralized, canonical, and read-only structure to contain all global
metadata that comes out of TDX module for all kernel components to use.

There is an "alternative option to manage global metadata" (see below)
but it is not as straightforward as this.

This series starts to track all global metadata fields into a single
'struct tdx_sysinfo', and reads more metadata fields to that structure
to address the immediate needs as mentioned above.

More fields will be added in the near future to support KVM TDX, and the
actual sharing/export the "read-only" global metadata for KVM will also
be sent out in the near future when that becomes immediate (also see
"Share global metadata to KVM" below).

Note, the first couple of patches in this series were from the old
patchset "TDX host: Provide TDX module metadata reading APIs" [6].

=== Further read ===

1) Altertive option to manage global metadata

The TDX host core-kernel could also expose/export APIs for reading
metadata out of TDX module directly, and all in-kernel TDX users use
these APIs and manage their own metadata fields.

However this isn't as straightforward as exposing/exporting structure,
because the API to read multi fields to a structure requires the caller
to build a "mapping table" between field ID to structure member:

struct kvm_used_metadata {
u64 member1;
...
};

#define TD_SYSINFO_KVM_MAP(_field_id, _member) \
TD_SYSINFO_MAP(_field_id, struct kvm_used_metadata, \
_member)

struct tdx_metadata_field_mapping fields[] = {
TD_SYSINFO_KVM_MAP(FIELD_ID1, member1),
...
};

ret = tdx_sysmd_read_multi(fields, ARRAY_SIZE(fields), buf);

Another problem is some metadata field may be accessed by multiple
kernel components, e.g., the one reports TDX module features, in which
case there will be duplicated code comparing to exposing structure
directly.

2) Share global metadata to KVM

To achieve "read-only" centralized global metadata structure, the idea
way is to use __ro_after_init. However currently all global metadata
are read by tdx_enable(), which is supposed to be called at any time at
runtime thus isn't annotated with __init.

The __ro_after_init can be done eventually, but it can only be done
after moving VMXON out of KVM to the core-kernel: after that we can
read all metadata during kernel boot (thus __ro_after_init), but
doesn't necessarily have to do it in tdx_enable().

However moving VMXON out of KVM is NOT considered as dependency for the
initial KVM TDX support [7]. Thus for the initial support, the idea is
TDX host to export a function which returns a "const struct pointer" so
KVM won't be able to modify any global metadata.

Whether to export the 'const struct tdx_sysinfo *' (which points to all
metadata), or the pointer to a smaller set of metadata can be discussed
in the future.

Adding new fields to 'struct tdx_sysinfo' for KVM is similiar to the
patch 7 ("x86/virt/tdx: Print TDX module basic information").

3) TDH.SYS.RD vs TDH.SYS.RDALL

The kernel can use two SEAMCALLs to read global metadata: TDH.SYS.RD and
TDH.SYS.RDALL. The former simply reads one metadata field to a 'u64'.
The latter tries to read all fields to a 4KB buffer.

Currently the kernel only uses the former to read metadata, and this
series doesn't choose to use TDH.SYS.RDALL.

The main reason is the "layout of all fields in the 4KB buffer" that
returned by TDH.SYS.RDALL isn't architectural consistent among different
TDX module versions.

E.g., some metadata fields may not be supported by the old module, thus
they may or may not be in the 4KB buffer depending on module version.
And it is impractical to know whether those fields are in the buffer or
not.

TDH.SYS.RDALL may be useful to read one small set of metadata fields,
e.g., fields in one "Class" (TDX categories all global metadata fields
in different "Class"es). But this is only an optimization even if
TDH.SYS.RDALL can be used, so leave this to future consideration.


[1] https://lore.kernel.org/lkml/[email protected]/T/#m352829aedf6680d4628c7e40dc40b332eda93355
[2] https://lore.kernel.org/lkml/[email protected]/T/#mef98469c51e2382ead2c537ea189752360bd2bef
[3] https://github.com/canonical/tdx/issues/135#issuecomment-2151570238
[4] https://lore.kernel.org/kvm/[email protected]/T/
[5] https://cdrdv2.intel.com/v1/dl/getContent/773614
[6] https://lore.kernel.org/lkml/[email protected]/T/
[7] https://lore.kernel.org/kvm/[email protected]/T/#me0c081438074341ad70d3525f6cf3472d7d81b0e




Kai Huang (9):
x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro
x86/virt/tdx: Unbind global metadata read with 'struct
tdx_tdmr_sysinfo'
x86/virt/tdx: Support global metadata read for all element sizes
x86/virt/tdx: Abstract reading multiple global metadata fields as a
helper
x86/virt/tdx: Move field mapping table of getting TDMR info to
function local
x86/virt/tdx: Start to track all global metadata in one structure
x86/virt/tdx: Print TDX module basic information
x86/virt/tdx: Exclude memory region hole within CMR as TDMR's reserved
area
x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD
feature

arch/x86/virt/vmx/tdx/tdx.c | 315 ++++++++++++++++++++++++++++++------
arch/x86/virt/vmx/tdx/tdx.h | 80 ++++++++-
2 files changed, 338 insertions(+), 57 deletions(-)


base-commit: 281f8331020e339b607154455faec6ebf5861a2c
--
2.43.2



2024-06-16 12:02:16

by Kai Huang

[permalink] [raw]
Subject: [PATCH 1/9] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro

TD_SYSINFO_MAP() macro actually takes the member of the 'struct
tdx_tdmr_sysinfo' as the second argument and uses the offsetof() to
calculate the offset for that member.

Rename the macro argument _offset to _member to reflect this.

Signed-off-by: Kai Huang <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Binbin Wu <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4e2b2e2ac9f9..fbde24ea3b3e 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -296,9 +296,9 @@ struct field_mapping {
int offset;
};

-#define TD_SYSINFO_MAP(_field_id, _offset) \
+#define TD_SYSINFO_MAP(_field_id, _member) \
{ .field_id = MD_FIELD_ID_##_field_id, \
- .offset = offsetof(struct tdx_tdmr_sysinfo, _offset) }
+ .offset = offsetof(struct tdx_tdmr_sysinfo, _member) }

/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
static const struct field_mapping fields[] = {
--
2.43.2


2024-06-16 12:02:38

by Kai Huang

[permalink] [raw]
Subject: [PATCH 2/9] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'

The TDX module provides a set of "global metadata fields". They report
things like TDX module version, supported features, and fields related
to create/run TDX guests and so on.

For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
module, and the metadata reading code can only work with that structure.

Future changes will need to read other metadata fields that don't make
sense to populate to the "struct tdx_tdmr_sysinfo". It's essential to
provide a generic metadata read infrastructure which is not bound to any
specific structure.

To start providing such infrastructure, unbind the metadata reading code
with the 'struct tdx_tdmr_sysinfo'.

Note the kernel has a helper macro, TD_SYSINFO_MAP(), for marshaling the
metadata into the 'struct tdx_tdmr_sysinfo', and currently the macro
hardcodes the structure name. As part of unbinding the metadata reading
code with 'struct tdx_tdmr_sysinfo', it is extended to accept different
structures.

Unfortunately, this will result in the usage of TD_SYSINFO_MAP() for
populating 'struct tdx_tdmr_sysinfo' to be changed to use the structure
name explicitly for each structure member and make the code longer. Add
a wrapper macro which hides the 'struct tdx_tdmr_sysinfo' internally to
make the code shorter thus better readability.

Signed-off-by: Kai Huang <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Binbin Wu <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index fbde24ea3b3e..854312e97eff 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -272,9 +272,9 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)

static int read_sys_metadata_field16(u64 field_id,
int offset,
- struct tdx_tdmr_sysinfo *ts)
+ void *stbuf)
{
- u16 *ts_member = ((void *)ts) + offset;
+ u16 *st_member = stbuf + offset;
u64 tmp;
int ret;

@@ -286,7 +286,7 @@ static int read_sys_metadata_field16(u64 field_id,
if (ret)
return ret;

- *ts_member = tmp;
+ *st_member = tmp;

return 0;
}
@@ -296,17 +296,20 @@ struct field_mapping {
int offset;
};

-#define TD_SYSINFO_MAP(_field_id, _member) \
- { .field_id = MD_FIELD_ID_##_field_id, \
- .offset = offsetof(struct tdx_tdmr_sysinfo, _member) }
+#define TD_SYSINFO_MAP(_field_id, _struct, _member) \
+ { .field_id = MD_FIELD_ID_##_field_id, \
+ .offset = offsetof(_struct, _member) }
+
+#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)

/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
static const struct field_mapping fields[] = {
- TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs),
- TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
- TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]),
- TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]),
- TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
+ TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS, max_tdmrs),
+ TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]),
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]),
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
};

static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
--
2.43.2


2024-06-16 12:02:58

by Kai Huang

[permalink] [raw]
Subject: [PATCH 3/9] x86/virt/tdx: Support global metadata read for all element sizes

The TDX module provides "global metadata fields" for software to query.
Each metadata field is accessible by a unique "metadata field ID". TDX
supports 8/16/32/64 bits metadata element sizes. The size of each
metadata field is encoded in its associated metadata field ID.

For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields for module initialization. All these metadata fields
are 16-bit, and the kernel only supports reading 16-bit fields.

Future changes will need to read more metadata fields with other element
sizes. To resolve this once for all, extend the existing metadata
reading code to support reading all element sizes.

Signed-off-by: Kai Huang <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 29 ++++++++++++++++-------------
arch/x86/virt/vmx/tdx/tdx.h | 3 ++-
2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 854312e97eff..4392e82a9bcb 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -270,23 +270,25 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
return 0;
}

-static int read_sys_metadata_field16(u64 field_id,
- int offset,
- void *stbuf)
+/*
+ * Read one global metadata field and store the data to a location of a
+ * given buffer specified by the offset and size (in bytes).
+ */
+static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
+ int bytes)
{
- u16 *st_member = stbuf + offset;
+ void *st_member = stbuf + offset;
u64 tmp;
int ret;

- if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
- MD_FIELD_ID_ELE_SIZE_16BIT))
+ if (WARN_ON_ONCE(MD_FIELD_BYTES(field_id) != bytes))
return -EINVAL;

ret = read_sys_metadata_field(field_id, &tmp);
if (ret)
return ret;

- *st_member = tmp;
+ memcpy(st_member, &tmp, bytes);

return 0;
}
@@ -294,11 +296,13 @@ static int read_sys_metadata_field16(u64 field_id,
struct field_mapping {
u64 field_id;
int offset;
+ int size;
};

-#define TD_SYSINFO_MAP(_field_id, _struct, _member) \
- { .field_id = MD_FIELD_ID_##_field_id, \
- .offset = offsetof(_struct, _member) }
+#define TD_SYSINFO_MAP(_field_id, _struct, _member) \
+ { .field_id = MD_FIELD_ID_##_field_id, \
+ .offset = offsetof(_struct, _member), \
+ .size = sizeof(typeof(((_struct *)0)->_member)) }

#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
@@ -319,9 +323,8 @@ static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)

/* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
for (i = 0; i < ARRAY_SIZE(fields); i++) {
- ret = read_sys_metadata_field16(fields[i].field_id,
- fields[i].offset,
- tdmr_sysinfo);
+ ret = stbuf_read_sysmd_field(fields[i].field_id, tdmr_sysinfo,
+ fields[i].offset, fields[i].size);
if (ret)
return ret;
}
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index b701f69485d3..812943516946 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -53,7 +53,8 @@
#define MD_FIELD_ID_ELE_SIZE_CODE(_field_id) \
(((_field_id) & GENMASK_ULL(33, 32)) >> 32)

-#define MD_FIELD_ID_ELE_SIZE_16BIT 1
+#define MD_FIELD_BYTES(_field_id) \
+ (1 << MD_FIELD_ID_ELE_SIZE_CODE(_field_id))

struct tdmr_reserved_area {
u64 offset;
--
2.43.2


2024-06-16 12:03:38

by Kai Huang

[permalink] [raw]
Subject: [PATCH 5/9] x86/virt/tdx: Move field mapping table of getting TDMR info to function local

For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
module. The kernel populates the relevant metadata fields into the
structure using a "field mapping table" of metadata field IDs and the
structure members.

Currently the scope of this "field mapping table" is the entire C file.
Future changes will need to read more global metadata fields that will
be organized in other structures and use this kind of field mapping
tables for other structures too.

Move the field mapping table to the function local to limit its scope so
that the same name can also be used by other functions.

Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index c68fbaf4aa15..fad42014ca37 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -322,17 +322,17 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)

-/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
-static const struct field_mapping fields[] = {
- TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS, max_tdmrs),
- TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
- TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]),
- TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]),
- TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
-};
-
static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
{
+ /* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
+ static const struct field_mapping fields[] = {
+ TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS, max_tdmrs),
+ TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]),
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]),
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
+ };
+
/* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), tdmr_sysinfo);
}
--
2.43.2


2024-06-16 12:03:59

by Kai Huang

[permalink] [raw]
Subject: [PATCH 6/9] x86/virt/tdx: Start to track all global metadata in one structure

The TDX module provides a set of "global metadata fields". They report
things like TDX module version, supported features, and fields related
to create/run TDX guests and so on.

Currently the kernel only reads "TD Memory Region" (TDMR) related fields
for module initialization. There are immediate needs which require the
TDX module initialization to read more global metadata including module
version, supported features and "Convertible Memory Regions" (CMRs).

Also, KVM will need to read more metadata fields to support baseline TDX
guests. In the longer term, other TDX features like TDX Connect (which
supports assigning trusted IO devices to TDX guest) may also require
other kernel components such as pci/vt-d to access global metadata.

To meet all those requirements, the idea is the TDX host core-kernel to
to provide a centralized, canonical, and read-only structure for the
global metadata that comes out from the TDX module for all kernel
components to use.

As the first step, introduce a new 'struct tdx_sysinfo' to track all
global metadata fields.

TDX categories global metadata fields into different "Class"es. E.g.,
the current TDMR related fields are under class "TDMR Info". Instead of
making 'struct tdx_sysinfo' a plain structure to contain all metadata
fields, organize them in smaller structures based on the "Class".

This allows those metadata fields to be used in finer granularity thus
makes the code more clear. E.g., the current construct_tdmr() can just
take the structure which contains "TDMR Info" metadata fields.

Start with moving 'struct tdx_tdmr_sysinfo' to 'struct tdx_sysinfo', and
rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sysinfo_tdmr_info' to
make it consistent with the "class name".

Add a new function get_tdx_sysinfo() as the place to read all metadata
fields, and call it at the beginning of init_tdx_module(). Move the
existing get_tdx_tdmr_sysinfo() to get_tdx_sysinfo().

Note there is a functional change: get_tdx_tdmr_sysinfo() is moved from
after build_tdx_memlist() to before it, but it is fine to do so.

Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 29 +++++++++++++++++------------
arch/x86/virt/vmx/tdx/tdx.h | 32 +++++++++++++++++++++++++-------
2 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index fad42014ca37..4683884efcc6 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -320,11 +320,11 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
}

#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
- TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
+ TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_tdmr_info, _member)

-static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+static int get_tdx_tdmr_sysinfo(struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
{
- /* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
+ /* Map TD_SYSINFO fields into 'struct tdx_sysinfo_tdmr_info': */
static const struct field_mapping fields[] = {
TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS, max_tdmrs),
TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
@@ -337,6 +337,11 @@ static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), tdmr_sysinfo);
}

+static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
+{
+ return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
+}
+
/* Calculate the actual TDMR size */
static int tdmr_size_single(u16 max_reserved_per_tdmr)
{
@@ -353,7 +358,7 @@ static int tdmr_size_single(u16 max_reserved_per_tdmr)
}

static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list,
- struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+ struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
{
size_t tdmr_sz, tdmr_array_sz;
void *tdmr_array;
@@ -936,7 +941,7 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
*/
static int construct_tdmrs(struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list,
- struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+ struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
{
int ret;

@@ -1109,9 +1114,13 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list)

static int init_tdx_module(void)
{
- struct tdx_tdmr_sysinfo tdmr_sysinfo;
+ struct tdx_sysinfo sysinfo;
int ret;

+ ret = get_tdx_sysinfo(&sysinfo);
+ if (ret)
+ return ret;
+
/*
* To keep things simple, assume that all TDX-protected memory
* will come from the page allocator. Make sure all pages in the
@@ -1128,17 +1137,13 @@ static int init_tdx_module(void)
if (ret)
goto out_put_tdxmem;

- ret = get_tdx_tdmr_sysinfo(&tdmr_sysinfo);
- if (ret)
- goto err_free_tdxmem;
-
/* Allocate enough space for constructing TDMRs */
- ret = alloc_tdmr_list(&tdx_tdmr_list, &tdmr_sysinfo);
+ ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo.tdmr_info);
if (ret)
goto err_free_tdxmem;

/* Cover all TDX-usable memory regions in TDMRs */
- ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &tdmr_sysinfo);
+ ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr_info);
if (ret)
goto err_free_tdmrs;

diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 812943516946..6b61dc67b0af 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -100,13 +100,6 @@ struct tdx_memblock {
int nid;
};

-/* "TDMR info" part of "Global Scope Metadata" for constructing TDMRs */
-struct tdx_tdmr_sysinfo {
- u16 max_tdmrs;
- u16 max_reserved_per_tdmr;
- u16 pamt_entry_size[TDX_PS_NR];
-};
-
/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
#define TDMR_NR_WARN 4

@@ -119,4 +112,29 @@ struct tdmr_info_list {
int max_tdmrs; /* How many 'tdmr_info's are allocated */
};

+/*
+ * Kernel-defined structures to contain "Global Scope Metadata".
+ *
+ * TDX global metadata fields are categorized by "Class". See the
+ * 'global_metadata.json' in the "TDX 1.5 ABI Definitions".
+ *
+ * 'struct tdx_sysinfo' is the main structure to contain all metadata
+ * used by the kernel. It contains sub-structures with each reflecting
+ * the "Class" in the 'global_metadata.json'.
+ *
+ * Note not all metadata fields in each class are defined, only those
+ * used by the kernel are.
+ */
+
+/* Class "TDMR Info" */
+struct tdx_sysinfo_tdmr_info {
+ u16 max_tdmrs;
+ u16 max_reserved_per_tdmr;
+ u16 pamt_entry_size[TDX_PS_NR];
+};
+
+struct tdx_sysinfo {
+ struct tdx_sysinfo_tdmr_info tdmr_info;
+};
+
#endif
--
2.43.2


2024-06-16 12:04:20

by Kai Huang

[permalink] [raw]
Subject: [PATCH 7/9] x86/virt/tdx: Print TDX module basic information

Currently the kernel doesn't print any information regarding the TDX
module itself, e.g. module version. Printing such information is not
mandatory for initializing the TDX module, but in practice such
information is useful, especially to the developers.

For instance, there are couple of use cases for dumping module basic
information:

1) When something goes wrong around using TDX, the information like TDX
module version, supported features etc could be helpful [1][2].

2) For Linux, when the user wants to update the TDX module, one needs to
replace the old module in a specific location in the EFI partition
with the new one so that after reboot the BIOS can load it. However,
after kernel boots, currently the user has no way to verify it is
indeed the new module that gets loaded and initialized (e.g., error
could happen when replacing the old module). With the module version
dumped the user can verify this easily.

So dump the basic TDX module information:

- TDX module type: Debug or Production.
- TDX_FEATURES0: Supported TDX features.
- TDX module version, and the build date.

And dump the information right after reading global metadata, so that
this information is printed no matter whether module initialization
fails or not.

The actual dmesg will look like:

virt/tdx: Production module.
virt/tdx: TDX_FEATURES0: 0xfbf
virt/tdx: Module version: 1.5.00.00.0481, build_date: 20230323

Link: https://lore.kernel.org/lkml/[email protected]/T/#m352829aedf6680d4628c7e40dc40b332eda93355 [1]
Link: https://lore.kernel.org/lkml/[email protected]/T/#m351ebcbc006d2e5bc3e7650206a087cb2708d451 [2]
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 67 +++++++++++++++++++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 33 +++++++++++++++++-
2 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4683884efcc6..ced40e3b516e 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -319,6 +319,61 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
return 0;
}

+#define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_info, _member)
+
+static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
+{
+ static const struct field_mapping fields[] = {
+ TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes),
+ TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0, tdx_features0),
+ };
+
+ return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modinfo);
+}
+
+#define TD_SYSINFO_MAP_MOD_VERSION(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_version, _member)
+
+static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
+{
+ static const struct field_mapping fields[] = {
+ TD_SYSINFO_MAP_MOD_VERSION(MAJOR_VERSION, major),
+ TD_SYSINFO_MAP_MOD_VERSION(MINOR_VERSION, minor),
+ TD_SYSINFO_MAP_MOD_VERSION(UPDATE_VERSION, update),
+ TD_SYSINFO_MAP_MOD_VERSION(INTERNAL_VERSION, internal),
+ TD_SYSINFO_MAP_MOD_VERSION(BUILD_NUM, build_num),
+ TD_SYSINFO_MAP_MOD_VERSION(BUILD_DATE, build_date),
+ };
+
+ return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
+}
+
+static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
+{
+ struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
+ struct tdx_sysinfo_module_info *modinfo = &sysinfo->module_info;
+ bool debug = modinfo->sys_attributes & TDX_SYS_ATTR_DEBUG_MODULE;
+
+ pr_info("%s module.\n", debug ? "Debug" : "Production");
+
+ pr_info("TDX_FEATURES0: 0x%llx\n", modinfo->tdx_features0);
+
+ /*
+ * TDX module version encoding:
+ *
+ * <major>.<minor>.<update>.<internal>.<build_num>
+ *
+ * When printed as text, <major> and <minor> are 1-digit,
+ * <update> and <internal> are 2-digits and <build_num>
+ * is 4-digits.
+ */
+ pr_info("Module version: %u.%u.%02u.%02u.%04u, build_date: %u\n",
+ modver->major, modver->minor,
+ modver->update, modver->internal,
+ modver->build_num, modver->build_date);
+}
+
#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_tdmr_info, _member)

@@ -339,6 +394,16 @@ static int get_tdx_tdmr_sysinfo(struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)

static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
{
+ int ret;
+
+ ret = get_tdx_module_info(&sysinfo->module_info);
+ if (ret)
+ return ret;
+
+ ret = get_tdx_module_version(&sysinfo->module_version);
+ if (ret)
+ return ret;
+
return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
}

@@ -1121,6 +1186,8 @@ static int init_tdx_module(void)
if (ret)
return ret;

+ print_basic_sysinfo(&sysinfo);
+
/*
* To keep things simple, assume that all TDX-protected memory
* will come from the page allocator. Make sure all pages in the
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 6b61dc67b0af..d80ec797fbf1 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -31,6 +31,15 @@
*
* See Table "Global Scope Metadata", TDX module 1.5 ABI spec.
*/
+#define MD_FIELD_ID_SYS_ATTRIBUTES 0x0A00000200000000ULL
+#define MD_FIELD_ID_TDX_FEATURES0 0x0A00000300000008ULL
+#define MD_FIELD_ID_BUILD_DATE 0x8800000200000001ULL
+#define MD_FIELD_ID_BUILD_NUM 0x8800000100000002ULL
+#define MD_FIELD_ID_MINOR_VERSION 0x0800000100000003ULL
+#define MD_FIELD_ID_MAJOR_VERSION 0x0800000100000004ULL
+#define MD_FIELD_ID_UPDATE_VERSION 0x0800000100000005ULL
+#define MD_FIELD_ID_INTERNAL_VERSION 0x0800000100000006ULL
+
#define MD_FIELD_ID_MAX_TDMRS 0x9100000100000008ULL
#define MD_FIELD_ID_MAX_RESERVED_PER_TDMR 0x9100000100000009ULL
#define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE 0x9100000100000010ULL
@@ -124,8 +133,28 @@ struct tdmr_info_list {
*
* Note not all metadata fields in each class are defined, only those
* used by the kernel are.
+ *
+ * Also note the "bit definitions" are architectural.
*/

+/* Class "TDX Module Info" */
+struct tdx_sysinfo_module_info {
+ u32 sys_attributes;
+ u64 tdx_features0;
+};
+
+#define TDX_SYS_ATTR_DEBUG_MODULE 0x1
+
+/* Class "TDX Module Version" */
+struct tdx_sysinfo_module_version {
+ u16 major;
+ u16 minor;
+ u16 update;
+ u16 internal;
+ u16 build_num;
+ u32 build_date;
+};
+
/* Class "TDMR Info" */
struct tdx_sysinfo_tdmr_info {
u16 max_tdmrs;
@@ -134,7 +163,9 @@ struct tdx_sysinfo_tdmr_info {
};

struct tdx_sysinfo {
- struct tdx_sysinfo_tdmr_info tdmr_info;
+ struct tdx_sysinfo_module_info module_info;
+ struct tdx_sysinfo_module_version module_version;
+ struct tdx_sysinfo_tdmr_info tdmr_info;
};

#endif
--
2.43.2


2024-06-16 12:04:21

by Kai Huang

[permalink] [raw]
Subject: [PATCH 4/9] x86/virt/tdx: Abstract reading multiple global metadata fields as a helper

For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
module. Future changes will need to read other metadata fields that
don't make sense to populate to the "struct tdx_tdmr_sysinfo".

Now the code in get_tdx_tdmr_sysinfo() to read multiple global metadata
fields is not bound to the 'struct tdx_tdmr_sysinfo', and can support
reading all metadata element sizes. Abstract this code as a helper for
future use.

Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4392e82a9bcb..c68fbaf4aa15 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -304,6 +304,21 @@ struct field_mapping {
.offset = offsetof(_struct, _member), \
.size = sizeof(typeof(((_struct *)0)->_member)) }

+static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
+ int nr_fields, void *stbuf)
+{
+ int i, ret;
+
+ for (i = 0; i < nr_fields; i++) {
+ ret = stbuf_read_sysmd_field(fields[i].field_id, stbuf,
+ fields[i].offset, fields[i].size);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)

@@ -318,18 +333,8 @@ static const struct field_mapping fields[] = {

static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
{
- int ret;
- int i;
-
/* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
- for (i = 0; i < ARRAY_SIZE(fields); i++) {
- ret = stbuf_read_sysmd_field(fields[i].field_id, tdmr_sysinfo,
- fields[i].offset, fields[i].size);
- if (ret)
- return ret;
- }
-
- return 0;
+ return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), tdmr_sysinfo);
}

/* Calculate the actual TDMR size */
--
2.43.2


2024-06-16 12:04:43

by Kai Huang

[permalink] [raw]
Subject: [PATCH 8/9] x86/virt/tdx: Exclude memory region hole within CMR as TDMR's reserved area

A TDX module initialization failure was reported on a Emerald Rapids
platform:

virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
virt/tdx: module initialization failed (-28)

As a step of initializing the TDX module, the kernel tells the TDX
module all the "TDX-usable memory regions" via a set of TDX architecture
defined structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB
aligned and in 1GB granularity, and all "non-TDX-usable memory holes" in
a given TDMR must be marked as a "reserved area". Each TDMR only
supports a maximum number of reserved areas reported by the TDX module.

As shown above, the root cause of this failure is when the kernel tries
to construct a TDMR to cover address range [0x0, 0x80000000), there
are too many memory holes within that range and the number of memory
holes exceeds the maximum number of reserved areas.

The E820 table of that platform (see [1] below) reflects this: the
number of memory holes among e820 "usable" entries exceeds 16, which is
the maximum number of reserved areas TDX module supports in practice.

=== Fix ===

There are two options to fix this: 1) put less memory holes as "reserved
area" when constructing a TDMR; 2) reduce the TDMR's size to cover less
memory regions, thus less memory holes.

Option 1) is possible, and in fact is easier and preferable:

TDX actually has a concept of "Convertible Memory Regions" (CMRs). TDX
reports a list of CMRs that meet TDX's security requirements on memory.
TDX requires all the "TDX-usable memory regions" that the kernel passes
to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
must be convertible memory.

In other words, if a memory hole is indeed CMR, then it's not mandatory
for the kernel to add it to the reserved areas. The number of consumed
reserved areas can be reduced if the kernel doesn't add those memory
holes as reserved area. Note this doesn't have security impact because
the kernel is out of TDX's TCB anyway.

This is feasible because in practice the CMRs just reflect the nature of
whether the RAM can indeed be used by TDX, thus each CMR tends to be a
large range w/o being split into small areas, e.g., in the way the e820
table does to contain a lot "ACPI *" entries. [2] below shows the CMRs
reported on the problematic platform (using the off-tree TDX code).

So for this particular module initialization failure, the memory holes
that are within [0x0, 0x80000000) are mostly indeed CMR. By not adding
them to the reserved areas, the number of consumed reserved areas for
the TDMR [0x0, 0x80000000) can be dramatically reduced.

On the other hand, although option 2) is also theoretically feasible, it
requires more complicated logic to handle around splitting TDMR into
smaller ones. E.g., today one memory region must be fully in one TDMR,
while splitting TDMR will result in each TDMR only covering part of some
memory region. And this also increases the total number of TDMRs, which
also cannot exceed a maximum value that TDX module supports.

So, fix this issue by:

1) reading out the CMRs from the TDX module global metadata, and
2) passing the CMRs to the function construct_tdmrs(), and changing to
not add a memory hole to the reserved areas when it is indeed CMR.

Also dump the CMRs in dmesg. They are helpful when something goes wrong
around "constructing and passing the TDMRs to the TDX module to
configure it".

[1] BIOS-E820 table of the problematic platform:

BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000005d168fff] usable
BIOS-e820: [mem 0x000000005d169000-0x000000005d22afff] ACPI data
BIOS-e820: [mem 0x000000005d22b000-0x000000005d3cefff] usable
BIOS-e820: [mem 0x000000005d3cf000-0x000000005d469fff] reserved
BIOS-e820: [mem 0x000000005d46a000-0x000000005e5b2fff] usable
BIOS-e820: [mem 0x000000005e5b3000-0x000000005e5c2fff] reserved
BIOS-e820: [mem 0x000000005e5c3000-0x000000005e5d2fff] usable
BIOS-e820: [mem 0x000000005e5d3000-0x000000005e5e4fff] reserved
BIOS-e820: [mem 0x000000005e5e5000-0x000000005eb57fff] usable
BIOS-e820: [mem 0x000000005eb58000-0x0000000061357fff] ACPI NVS
BIOS-e820: [mem 0x0000000061358000-0x000000006172afff] usable
BIOS-e820: [mem 0x000000006172b000-0x0000000061794fff] ACPI data
BIOS-e820: [mem 0x0000000061795000-0x00000000617fefff] usable
BIOS-e820: [mem 0x00000000617ff000-0x0000000061912fff] ACPI data
BIOS-e820: [mem 0x0000000061913000-0x0000000061998fff] usable
BIOS-e820: [mem 0x0000000061999000-0x00000000619dffff] ACPI data
BIOS-e820: [mem 0x00000000619e0000-0x00000000619e1fff] usable
BIOS-e820: [mem 0x00000000619e2000-0x00000000619e9fff] reserved
BIOS-e820: [mem 0x00000000619ea000-0x0000000061a26fff] usable
BIOS-e820: [mem 0x0000000061a27000-0x0000000061baefff] ACPI data
BIOS-e820: [mem 0x0000000061baf000-0x00000000623c2fff] usable
BIOS-e820: [mem 0x00000000623c3000-0x0000000062471fff] reserved
BIOS-e820: [mem 0x0000000062472000-0x0000000062823fff] usable
BIOS-e820: [mem 0x0000000062824000-0x0000000063a24fff] reserved
BIOS-e820: [mem 0x0000000063a25000-0x0000000063d57fff] usable
BIOS-e820: [mem 0x0000000063d58000-0x0000000064157fff] reserved
BIOS-e820: [mem 0x0000000064158000-0x0000000064158fff] usable
BIOS-e820: [mem 0x0000000064159000-0x0000000064194fff] reserved
BIOS-e820: [mem 0x0000000064195000-0x000000006e9cefff] usable
BIOS-e820: [mem 0x000000006e9cf000-0x000000006eccefff] reserved
BIOS-e820: [mem 0x000000006eccf000-0x000000006f6fefff] ACPI NVS
BIOS-e820: [mem 0x000000006f6ff000-0x000000006f7fefff] ACPI data
BIOS-e820: [mem 0x000000006f7ff000-0x000000006f7fffff] usable
BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
......

[2] Convertible Memory Regions of the problematic platform:

virt/tdx: CMR: [0x100000, 0x6f800000)
virt/tdx: CMR: [0x100000000, 0x107a000000)
virt/tdx: CMR: [0x1080000000, 0x207c000000)
virt/tdx: CMR: [0x2080000000, 0x307c000000)
virt/tdx: CMR: [0x3080000000, 0x407c000000)

Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 149 ++++++++++++++++++++++++++++++++----
arch/x86/virt/vmx/tdx/tdx.h | 13 ++++
2 files changed, 146 insertions(+), 16 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index ced40e3b516e..88a0c8b788b7 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -293,6 +293,10 @@ static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
return 0;
}

+/* Wrapper to read one metadata field to u8/u16/u32/u64 */
+#define stbuf_read_sysmd_single(_field_id, _pdata) \
+ stbuf_read_sysmd_field(_field_id, _pdata, 0, sizeof(typeof(*(_pdata))))
+
struct field_mapping {
u64 field_id;
int offset;
@@ -349,6 +353,76 @@ static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
}

+/* Update the @cmr_info->num_cmrs to trim tail empty CMRs */
+static void trim_empty_tail_cmrs(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+ int i;
+
+ for (i = 0; i < cmr_info->num_cmrs; i++) {
+ u64 cmr_base = cmr_info->cmr_base[i];
+ u64 cmr_size = cmr_info->cmr_size[i];
+
+ if (!cmr_size) {
+ WARN_ON_ONCE(cmr_base);
+ break;
+ }
+
+ /* TDX architecture: CMR must be 4KB aligned */
+ WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
+ !PAGE_ALIGNED(cmr_size));
+ }
+
+ cmr_info->num_cmrs = i;
+}
+
+#define TD_SYSINFO_MAP_CMR_INFO(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_cmr_info, _member)
+
+static int get_tdx_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+ int i, ret;
+
+ ret = stbuf_read_sysmd_single(MD_FIELD_ID_NUM_CMRS,
+ &cmr_info->num_cmrs);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < cmr_info->num_cmrs; i++) {
+ const struct field_mapping fields[] = {
+ TD_SYSINFO_MAP_CMR_INFO(CMR_BASE0 + i, cmr_base[i]),
+ TD_SYSINFO_MAP_CMR_INFO(CMR_SIZE0 + i, cmr_size[i]),
+ };
+
+ ret = stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields),
+ cmr_info);
+ if (ret)
+ return ret;
+ }
+
+ /*
+ * The TDX module may just report the maximum number of CMRs that
+ * TDX architecturally supports as the actual number of CMRs,
+ * despite the latter is smaller. In this case all the tail
+ * CMRs will be empty. Trim them away.
+ */
+ trim_empty_tail_cmrs(cmr_info);
+
+ return 0;
+}
+
+static void print_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+ int i;
+
+ for (i = 0; i < cmr_info->num_cmrs; i++) {
+ u64 cmr_base = cmr_info->cmr_base[i];
+ u64 cmr_size = cmr_info->cmr_size[i];
+
+ pr_info("CMR[%d]: [0x%llx, 0x%llx)\n", i, cmr_base,
+ cmr_base + cmr_size);
+ }
+}
+
static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
{
struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
@@ -372,6 +446,8 @@ static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
modver->major, modver->minor,
modver->update, modver->internal,
modver->build_num, modver->build_date);
+
+ print_cmr_info(&sysinfo->cmr_info);
}

#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
@@ -404,6 +480,10 @@ static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
if (ret)
return ret;

+ ret = get_tdx_cmr_info(&sysinfo->cmr_info);
+ if (ret)
+ return ret;
+
return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
}

@@ -827,6 +907,23 @@ static int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx, u64 addr,
return 0;
}

+/* Return whether a given region [start, end) is a sub-region of any CMR */
+static bool is_cmr_subregion(struct tdx_sysinfo_cmr_info *cmr_info, u64 start,
+ u64 end)
+{
+ int i;
+
+ for (i = 0; i < cmr_info->num_cmrs; i++) {
+ u64 cmr_base = cmr_info->cmr_base[i];
+ u64 cmr_size = cmr_info->cmr_size[i];
+
+ if (start >= cmr_base && end <= (cmr_base + cmr_size))
+ return true;
+ }
+
+ return false;
+}
+
/*
* Go through @tmb_list to find holes between memory areas. If any of
* those holes fall within @tdmr, set up a TDMR reserved area to cover
@@ -835,7 +932,8 @@ static int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx, u64 addr,
static int tdmr_populate_rsvd_holes(struct list_head *tmb_list,
struct tdmr_info *tdmr,
int *rsvd_idx,
- u16 max_reserved_per_tdmr)
+ u16 max_reserved_per_tdmr,
+ struct tdx_sysinfo_cmr_info *cmr_info)
{
struct tdx_memblock *tmb;
u64 prev_end;
@@ -864,10 +962,16 @@ static int tdmr_populate_rsvd_holes(struct list_head *tmb_list,
* Skip over memory areas that
* have already been dealt with.
*/
- if (start <= prev_end) {
- prev_end = end;
- continue;
- }
+ if (start <= prev_end)
+ goto next_tmb;
+
+ /*
+ * Found the hole [prev_end, start) before this region.
+ * Skip the hole if it is within any CMR to reduce the
+ * consumption of reserved areas.
+ */
+ if (is_cmr_subregion(cmr_info, prev_end, start))
+ goto next_tmb;

/* Add the hole before this region */
ret = tdmr_add_rsvd_area(tdmr, rsvd_idx, prev_end,
@@ -876,11 +980,16 @@ static int tdmr_populate_rsvd_holes(struct list_head *tmb_list,
if (ret)
return ret;

+next_tmb:
prev_end = end;
}

- /* Add the hole after the last region if it exists. */
- if (prev_end < tdmr_end(tdmr)) {
+ /*
+ * Add the hole after the last region if it exists, but skip
+ * if it is within any CMR.
+ */
+ if (prev_end < tdmr_end(tdmr) &&
+ !is_cmr_subregion(cmr_info, prev_end, tdmr_end(tdmr))) {
ret = tdmr_add_rsvd_area(tdmr, rsvd_idx, prev_end,
tdmr_end(tdmr) - prev_end,
max_reserved_per_tdmr);
@@ -956,12 +1065,13 @@ static int rsvd_area_cmp_func(const void *a, const void *b)
static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr,
struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list,
- u16 max_reserved_per_tdmr)
+ u16 max_reserved_per_tdmr,
+ struct tdx_sysinfo_cmr_info *cmr_info)
{
int ret, rsvd_idx = 0;

ret = tdmr_populate_rsvd_holes(tmb_list, tdmr, &rsvd_idx,
- max_reserved_per_tdmr);
+ max_reserved_per_tdmr, cmr_info);
if (ret)
return ret;

@@ -979,11 +1089,13 @@ static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr,

/*
* Populate reserved areas for all TDMRs in @tdmr_list, including memory
- * holes (via @tmb_list) and PAMTs.
+ * holes (via @tmb_list) and PAMTs. Exclude the memory holes within any
+ * CMR to reduce number of consumed reserved areas.
*/
static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
struct list_head *tmb_list,
- u16 max_reserved_per_tdmr)
+ u16 max_reserved_per_tdmr,
+ struct tdx_sysinfo_cmr_info *cmr_info)
{
int i;

@@ -991,7 +1103,8 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
int ret;

ret = tdmr_populate_rsvd_areas(tdmr_entry(tdmr_list, i),
- tmb_list, tdmr_list, max_reserved_per_tdmr);
+ tmb_list, tdmr_list, max_reserved_per_tdmr,
+ cmr_info);
if (ret)
return ret;
}
@@ -1002,11 +1115,13 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
/*
* Construct a list of TDMRs on the preallocated space in @tdmr_list
* to cover all TDX memory regions in @tmb_list based on the TDX module
- * TDMR global information in @tdmr_sysinfo.
+ * TDMR global information in @tdmr_sysinfo and CMR information in
+ * @cmr_info.
*/
static int construct_tdmrs(struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list,
- struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
+ struct tdx_sysinfo_tdmr_info *tdmr_sysinfo,
+ struct tdx_sysinfo_cmr_info *cmr_info)
{
int ret;

@@ -1020,7 +1135,8 @@ static int construct_tdmrs(struct list_head *tmb_list,
return ret;

ret = tdmrs_populate_rsvd_areas_all(tdmr_list, tmb_list,
- tdmr_sysinfo->max_reserved_per_tdmr);
+ tdmr_sysinfo->max_reserved_per_tdmr,
+ cmr_info);
if (ret)
tdmrs_free_pamt_all(tdmr_list);

@@ -1210,7 +1326,8 @@ static int init_tdx_module(void)
goto err_free_tdxmem;

/* Cover all TDX-usable memory regions in TDMRs */
- ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr_info);
+ ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr_info,
+ &sysinfo.cmr_info);
if (ret)
goto err_free_tdmrs;

diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index d80ec797fbf1..be93b6f31e5b 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -40,6 +40,10 @@
#define MD_FIELD_ID_UPDATE_VERSION 0x0800000100000005ULL
#define MD_FIELD_ID_INTERNAL_VERSION 0x0800000100000006ULL

+#define MD_FIELD_ID_NUM_CMRS 0x9000000100000000ULL
+#define MD_FIELD_ID_CMR_BASE0 0x9000000300000080ULL
+#define MD_FIELD_ID_CMR_SIZE0 0x9000000300000100ULL
+
#define MD_FIELD_ID_MAX_TDMRS 0x9100000100000008ULL
#define MD_FIELD_ID_MAX_RESERVED_PER_TDMR 0x9100000100000009ULL
#define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE 0x9100000100000010ULL
@@ -155,6 +159,14 @@ struct tdx_sysinfo_module_version {
u32 build_date;
};

+/* Class "CMR Info" */
+#define TDX_MAX_CMRS 32
+struct tdx_sysinfo_cmr_info {
+ u16 num_cmrs;
+ u64 cmr_base[TDX_MAX_CMRS];
+ u64 cmr_size[TDX_MAX_CMRS];
+};
+
/* Class "TDMR Info" */
struct tdx_sysinfo_tdmr_info {
u16 max_tdmrs;
@@ -165,6 +177,7 @@ struct tdx_sysinfo_tdmr_info {
struct tdx_sysinfo {
struct tdx_sysinfo_module_info module_info;
struct tdx_sysinfo_module_version module_version;
+ struct tdx_sysinfo_cmr_info cmr_info;
struct tdx_sysinfo_tdmr_info tdmr_info;
};

--
2.43.2


2024-06-16 12:04:56

by Kai Huang

[permalink] [raw]
Subject: [PATCH 9/9] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature

Old TDX modules can clobber RBP in the TDH.VP.ENTER SEAMCALL. However
RBP is used as frame pointer in the x86_64 calling convention, and
clobbering RBP could result in bad things like being unable to unwind
the stack if any non-maskable exceptions (NMI, #MC etc) happens in that
gap.

A new "NO_RBP_MOD" feature was introduced to more recent TDX modules to
not clobber RBP. This feature is reported in the TDX_FEATURES0 global
metadata field via bit 18.

Don't initialize the TDX module if this feature is not supported [1].

Link: https://lore.kernel.org/all/[email protected]/T/#mef98469c51e2382ead2c537ea189752360bd2bef [1]
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/virt/vmx/tdx/tdx.c | 17 +++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 1 +
2 files changed, 18 insertions(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 88a0c8b788b7..c4ff68b565e8 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -487,6 +487,18 @@ static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
}

+static int check_module_compatibility(struct tdx_sysinfo *sysinfo)
+{
+ u64 tdx_features0 = sysinfo->module_info.tdx_features0;
+
+ if (!(tdx_features0 & TDX_FEATURES0_NO_RBP_MOD)) {
+ pr_err("NO_RBP_MOD feature is not supported\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
/* Calculate the actual TDMR size */
static int tdmr_size_single(u16 max_reserved_per_tdmr)
{
@@ -1304,6 +1316,11 @@ static int init_tdx_module(void)

print_basic_sysinfo(&sysinfo);

+ /* Check whether the kernel can support this module */
+ ret = check_module_compatibility(&sysinfo);
+ if (ret)
+ return ret;
+
/*
* To keep things simple, assume that all TDX-protected memory
* will come from the page allocator. Make sure all pages in the
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index be93b6f31e5b..295c3b6d9505 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -148,6 +148,7 @@ struct tdx_sysinfo_module_info {
};

#define TDX_SYS_ATTR_DEBUG_MODULE 0x1
+#define TDX_FEATURES0_NO_RBP_MOD _BITULL(18)

/* Class "TDX Module Version" */
struct tdx_sysinfo_module_version {
--
2.43.2