2023-07-12 09:02:01

by Kai Huang

[permalink] [raw]
Subject: [PATCH 00/10] Unify TDCALL/SEAMCALL and TDVMCALL assembly

Hi Peter, Kirill, all,

This series unifies the assembly code for TDCALL/SEAMCALL and TDVMCALL.
Now all of them use one singe TDX_MODULE_CALL asm macro. I basically
followed Peter's code here:

https://lore.kernel.org/linux-mm/[email protected]/

With some differences that I found during my code writing and testing.

With this series, I have verified the TDX guest can boot successfully
and the TDX module can also be initialized successfully.

The last two patches are SEAMCALL patches that are needed for TDX host
patchset. They are not mandatory to be here though, i.e., can be in the
TDX host support series. I put them here so we can have a complete view
how TDCALL/SEAMCALL and TDVMCALL are implemented.

Could you help to review? Thanks in advance.

Also cc Sean/Paolo/Isaku and KVM list for TDH.VP.ENTER part.

Kai Huang (10):
x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro
x86/tdx: Use cmovc to save a label in TDX_MODULE_CALL asm
x86/tdx: Move FRAME_BEGIN/END to TDX_MODULE_CALL asm macro
x86/tdx: Make macros of TDCALLs consistent with the spec
x86/tdx: Rename __tdx_module_call() to __tdcall()
x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs
x86/tdx: Unify TDX_HYPERCALL and TDX_MODULE_CALL assembly
x86/virt/tdx: Wire up basic SEAMCALL functions
x86/virt/tdx: Allow SEAMCALL to handle #UD and #GP

arch/x86/Kconfig | 12 ++
arch/x86/Makefile | 2 +
arch/x86/boot/compressed/tdx.c | 26 +++-
arch/x86/coco/tdx/tdcall.S | 238 +++++-------------------------
arch/x86/coco/tdx/tdx.c | 124 +++++++++-------
arch/x86/include/asm/shared/tdx.h | 48 ++++--
arch/x86/include/asm/tdx.h | 31 ++--
arch/x86/kernel/asm-offsets.c | 33 ++---
arch/x86/virt/Makefile | 2 +
arch/x86/virt/vmx/Makefile | 2 +
arch/x86/virt/vmx/tdx/Makefile | 2 +
arch/x86/virt/vmx/tdx/seamcall.S | 54 +++++++
arch/x86/virt/vmx/tdx/tdxcall.S | 206 ++++++++++++++++++++------
13 files changed, 431 insertions(+), 349 deletions(-)
create mode 100644 arch/x86/virt/Makefile
create mode 100644 arch/x86/virt/vmx/Makefile
create mode 100644 arch/x86/virt/vmx/tdx/Makefile
create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S


base-commit: 94142c9d1bdf1c18027a42758ceb6bdd59a92012
--
2.41.0



2023-07-12 09:02:39

by Kai Huang

[permalink] [raw]
Subject: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks. A CPU-attested software module
called 'the TDX module' runs inside a new isolated memory range as a
trusted hypervisor to manage and run protected VMs.

TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
mode runs only the TDX module itself or other code to load the TDX
module.

The host kernel communicates with SEAM software via a new SEAMCALL
instruction. This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead. The TDX
module establishes a new SEAMCALL ABI which allows the host to
initialize the module and to manage VMs.

The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
TDCALL infrastructure. Wire up basic functions to make SEAMCALLs for
the basic TDX support: __seamcall(), __seamcall_ret() and
__seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.

To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
TDX host kernel support. Add a new Kconfig option CONFIG_INTEL_TDX_HOST
to opt-in TDX host kernel support (to distinguish with TDX guest kernel
support). So far only KVM uses TDX. Make the new config option depend
on KVM_INTEL.

Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/Kconfig | 12 +++++++
arch/x86/Makefile | 2 ++
arch/x86/include/asm/tdx.h | 7 +++++
arch/x86/virt/Makefile | 2 ++
arch/x86/virt/vmx/Makefile | 2 ++
arch/x86/virt/vmx/tdx/Makefile | 2 ++
arch/x86/virt/vmx/tdx/seamcall.S | 54 ++++++++++++++++++++++++++++++++
7 files changed, 81 insertions(+)
create mode 100644 arch/x86/virt/Makefile
create mode 100644 arch/x86/virt/vmx/Makefile
create mode 100644 arch/x86/virt/vmx/tdx/Makefile
create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 53bab123a8ee..191587f75810 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1952,6 +1952,18 @@ config X86_SGX

If unsure, say N.

+config INTEL_TDX_HOST
+ bool "Intel Trust Domain Extensions (TDX) host support"
+ depends on CPU_SUP_INTEL
+ depends on X86_64
+ depends on KVM_INTEL
+ help
+ Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
+ host and certain physical attacks. This option enables necessary TDX
+ support in the host kernel to run confidential VMs.
+
+ If unsure, say N.
+
config EFI
bool "EFI runtime service support"
depends on ACPI
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index b39975977c03..ec0e71d8fa30 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -252,6 +252,8 @@ archheaders:

libs-y += arch/x86/lib/

+core-y += arch/x86/virt/
+
# drivers-y are linked after core-y
drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
drivers-$(CONFIG_PCI) += arch/x86/pci/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 9b0ad0176e58..a82e5249d079 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -74,5 +74,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
return -ENODEV;
}
#endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
+
+#ifdef CONFIG_INTEL_TDX_HOST
+u64 __seamcall(u64 fn, struct tdx_module_args *args);
+u64 __seamcall_ret(u64 fn, struct tdx_module_args *args);
+u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
+#endif /* CONFIG_INTEL_TDX_HOST */
+
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
new file mode 100644
index 000000000000..1e36502cd738
--- /dev/null
+++ b/arch/x86/virt/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y += vmx/
diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
new file mode 100644
index 000000000000..feebda21d793
--- /dev/null
+++ b/arch/x86/virt/vmx/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_INTEL_TDX_HOST) += tdx/
diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
new file mode 100644
index 000000000000..46ef8f73aebb
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y += seamcall.o
diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
new file mode 100644
index 000000000000..650a40843afe
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/seamcall.S
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/linkage.h>
+#include <asm/frame.h>
+
+#include "tdxcall.S"
+
+/*
+ * __seamcall() - Host-side interface functions to SEAM software
+ * (the P-SEAMLDR or the TDX module).
+ *
+ * __seamcall() function ABI:
+ *
+ * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
+ * @args (RSI) - struct tdx_module_args for input
+ *
+ * Return (via RAX) TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself
+ * fails, or the completion status of the SEAMCALL leaf function.
+ */
+SYM_FUNC_START(__seamcall)
+ TDX_MODULE_CALL host=1 ret=0 saved=0
+SYM_FUNC_END(__seamcall)
+
+/*
+ * __seamcall_ret() - Host-side interface functions to SEAM software
+ * (the P-SEAMLDR or the TDX module).
+ *
+ * __seamcall_ret() function ABI:
+ *
+ * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
+ * @args (RSI) - struct tdx_module_args for input and output
+ *
+ * Return (via RAX) TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself
+ * fails, or the completion status of the SEAMCALL leaf function.
+ */
+SYM_FUNC_START(__seamcall_ret)
+ TDX_MODULE_CALL host=1 ret=1 saved=0
+SYM_FUNC_END(__seamcall_ret)
+
+/*
+ * __seamcall_saved_ret() - Host-side interface functions to SEAM software
+ * (the P-SEAMLDR or the TDX module) with extra
+ * "callee-saved" registers as input/output.
+ *
+ * __seamcall_saved_ret() function ABI:
+ *
+ * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
+ * @args (RSI) - struct tdx_module_args for input and output
+ *
+ * Return (via RAX) TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself
+ * fails, or the completion status of the SEAMCALL leaf function.
+ */
+SYM_FUNC_START(__seamcall_saved_ret)
+ TDX_MODULE_CALL host=1 ret=1 saved=0
+SYM_FUNC_END(__seamcall_saved_ret)
--
2.41.0


2023-07-12 09:02:47

by Kai Huang

[permalink] [raw]
Subject: [PATCH 05/10] x86/tdx: Rename __tdx_module_call() to __tdcall()

__tdx_module_call() is only used by the TDX guest to issue TDCALL to the
TDX module. Rename it to __tdcall() to match its behaviour, e.g., it
cannot be used to make host-side SEAMCALL.

Also rename tdx_module_call() which is a wrapper of __tdx_module_call()
to tdcall().

No functional change intended.

Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/coco/tdx/tdcall.S | 10 +++++-----
arch/x86/coco/tdx/tdx.c | 20 ++++++++++----------
arch/x86/include/asm/tdx.h | 4 ++--
3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
index e5d4b7d8ecd4..6aebac08f2bf 100644
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -40,8 +40,8 @@
.section .noinstr.text, "ax"

/*
- * __tdx_module_call() - Used by TDX guests to request services from
- * the TDX module (does not include VMM services) using TDCALL instruction.
+ * __tdcall() - Used by TDX guests to request services from the TDX
+ * module (does not include VMM services) using TDCALL instruction.
*
* Transforms function call register arguments into the TDCALL register ABI.
* After TDCALL operation, TDX module output is saved in @out (if it is
@@ -62,7 +62,7 @@
*
*-------------------------------------------------------------------------
*
- * __tdx_module_call() function ABI:
+ * __tdcall() function ABI:
*
* @fn (RDI) - TDCALL Leaf ID, moved to RAX
* @rcx (RSI) - Input parameter 1, moved to RCX
@@ -77,9 +77,9 @@
*
* Return status of TDCALL via RAX.
*/
-SYM_FUNC_START(__tdx_module_call)
+SYM_FUNC_START(__tdcall)
TDX_MODULE_CALL host=0
-SYM_FUNC_END(__tdx_module_call)
+SYM_FUNC_END(__tdcall)

/*
* TDX_HYPERCALL - Make hypercalls to a TDX VMM using TDVMCALL leaf of TDCALL
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index de021df92009..268f812ff595 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -98,10 +98,10 @@ EXPORT_SYMBOL_GPL(tdx_kvm_hypercall);
* should only be used for calls that have no legitimate reason to fail
* or where the kernel can not survive the call failing.
*/
-static inline void tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
- struct tdx_module_output *out)
+static inline void tdcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+ struct tdx_module_output *out)
{
- if (__tdx_module_call(fn, rcx, rdx, r8, r9, out))
+ if (__tdcall(fn, rcx, rdx, r8, r9, out))
panic("TDCALL %lld failed (Buggy TDX module!)\n", fn);
}

@@ -123,9 +123,9 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport)
{
u64 ret;

- ret = __tdx_module_call(TDG_MR_REPORT, virt_to_phys(tdreport),
- virt_to_phys(reportdata), TDREPORT_SUBTYPE_0,
- 0, NULL);
+ ret = __tdcall(TDG_MR_REPORT, virt_to_phys(tdreport),
+ virt_to_phys(reportdata), TDREPORT_SUBTYPE_0,
+ 0, NULL);
if (ret) {
if (TDCALL_RETURN_CODE(ret) == TDCALL_INVALID_OPERAND)
return -EINVAL;
@@ -184,7 +184,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask)
* Guest-Host-Communication Interface (GHCI), section 2.4.2 TDCALL
* [TDG.VP.INFO].
*/
- tdx_module_call(TDG_VP_INFO, 0, 0, 0, 0, &out);
+ tdcall(TDG_VP_INFO, 0, 0, 0, 0, &out);

/*
* The highest bit of a guest physical address is the "sharing" bit.
@@ -626,7 +626,7 @@ void tdx_get_ve_info(struct ve_info *ve)
* Note, the TDX module treats virtual NMIs as inhibited if the #VE
* valid flag is set. It means that NMI=>#VE will not result in a #DF.
*/
- tdx_module_call(TDG_VP_VEINFO_GET, 0, 0, 0, 0, &out);
+ tdcall(TDG_VP_VEINFO_GET, 0, 0, 0, 0, &out);

/* Transfer the output parameters */
ve->exit_reason = out.rcx;
@@ -768,7 +768,7 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
}

tdcall_rcx = *start | page_size;
- if (__tdx_module_call(TDG_MEM_PAGE_ACCEPT, tdcall_rcx, 0, 0, 0, NULL))
+ if (__tdcall(TDG_MEM_PAGE_ACCEPT, tdcall_rcx, 0, 0, 0, NULL))
return false;

*start += accept_size;
@@ -870,7 +870,7 @@ void __init tdx_early_init(void)
cc_set_mask(cc_mask);

/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdx_module_call(TDG_VM_WR, 0, TDCS_NOTIFY_ENABLES, 0, -1ULL, NULL);
+ tdcall(TDG_VM_WR, 0, TDCS_NOTIFY_ENABLES, 0, -1ULL, NULL);

/*
* All bits above GPA width are reserved and kernel treats shared bit
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 28d889c9aa16..f19e329c4044 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -56,8 +56,8 @@ struct ve_info {
void __init tdx_early_init(void);

/* Used to communicate with the TDX module */
-u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
- struct tdx_module_output *out);
+u64 __tdcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+ struct tdx_module_output *out);

void tdx_get_ve_info(struct ve_info *ve);

--
2.41.0


2023-07-12 09:04:38

by Kai Huang

[permalink] [raw]
Subject: [PATCH 10/10] x86/virt/tdx: Allow SEAMCALL to handle #UD and #GP

On the platform with the "partial write machine check" erratum, a kernel
partial write to TDX private memory may cause unexpected machine check.
It would be nice if the #MC handler could print additional information
to show the #MC was TDX private memory error due to possible kernel bug.

To do that, the machine check handler needs to use SEAMCALL to query
page type of the error memory from the TDX module, because there's no
existing infrastructure to track TDX private pages.

SEAMCALL instruction causes #UD if CPU isn't in VMX operation. In #MC
handler, it is legal that CPU isn't in VMX operation when making this
SEAMCALL. Extend the TDX_MODULE_CALL macro to handle #UD so the
SEAMCALL can return error code instead of Oops in the #MC handler.
Opportunistically handles #GP too since they share the same code.

A bonus is when kernel mistakenly calls SEAMCALL when CPU isn't in VMX
operation, or when TDX isn't enabled by the BIOS, or when the BIOS is
buggy, the kernel can get a nicer error message rather than a less
understandable Oops.

This is basically based on Peter's code.

Cc: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/include/asm/tdx.h | 5 +++++
arch/x86/virt/vmx/tdx/tdxcall.S | 20 ++++++++++++++++++++
2 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index a82e5249d079..feb85316346e 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -8,6 +8,8 @@
#include <asm/ptrace.h>
#include <asm/shared/tdx.h>

+#include <asm/trapnr.h>
+
/*
* SW-defined error codes.
*
@@ -18,6 +20,9 @@
#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))

+#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP)
+#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD)
+
#ifndef __ASSEMBLY__

/*
diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index e4e90ebf5dad..04b0c466f38c 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -2,6 +2,7 @@
#include <asm/asm-offsets.h>
#include <asm/frame.h>
#include <asm/tdx.h>
+#include <asm/asm.h>

/*
* TDCALL and SEAMCALL are supported in Binutils >= 2.36.
@@ -85,6 +86,7 @@
.endif /* \saved */

.if \host
+1:
seamcall
/*
* SEAMCALL instruction is essentially a VMExit from VMX root
@@ -99,6 +101,7 @@
*/
mov $TDX_SEAMCALL_VMFAILINVALID, %rdi
cmovc %rdi, %rax
+2:
.else
tdcall
.endif
@@ -185,4 +188,21 @@

FRAME_END
RET
+
+ .if \host
+3:
+ /*
+ * SEAMCALL caused #GP or #UD. By reaching here %eax contains
+ * the trap number. Convert the trap number to the TDX error
+ * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
+ *
+ * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
+ * only accepts 32-bit immediate at most.
+ */
+ movq $TDX_SW_ERROR, %r12
+ orq %r12, %rax
+ jmp 2b
+
+ _ASM_EXTABLE_FAULT(1b, 3b)
+ .endif /* \host */
.endm
--
2.41.0


2023-07-12 09:12:36

by Kai Huang

[permalink] [raw]
Subject: [PATCH 01/10] x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro

In the TDX_HYPERCALL asm, after the TDCALL instruction returns from the
untrusted VMM, the registers that the TDX guest shares to the VMM need
to be cleared to avoid speculative execution of VMM-provided values.

RSI is specified in the bitmap of those registers, but it is missing
when zeroing out those registers in the current TDX_HYPERCALL.

It was there when it was originally added in commit 752d13305c78
("x86/tdx: Expand __tdx_hypercall() to handle more arguments"), but was
later removed in commit 1e70c680375a ("x86/tdx: Do not corrupt
frame-pointer in __tdx_hypercall()"), which was correct because %rsi is
later restored in the "pop %rsi". However a later commit 7a3a401874be
("x86/tdx: Drop flags from __tdx_hypercall()") removed that "pop %rsi"
but forgot to add the "xor %rsi, %rsi" back.

Fix by adding it back.

Fixes: 7a3a401874be ("x86/tdx: Drop flags from __tdx_hypercall()")
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/coco/tdx/tdcall.S | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
index b193c0a1d8db..2eca5f43734f 100644
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -195,6 +195,7 @@ SYM_FUNC_END(__tdx_module_call)
xor %r10d, %r10d
xor %r11d, %r11d
xor %rdi, %rdi
+ xor %rsi, %rsi
xor %rdx, %rdx

/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
--
2.41.0


2023-07-12 09:16:35

by Kai Huang

[permalink] [raw]
Subject: [PATCH 06/10] x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure

Currently, the TDX_MODULE_CALL asm macro, which handles both TDCALL and
SEAMCALL, takes one parameter for each input register and an optional
'struct tdx_module_output' (a collection of output registers) as output.
This is different from the TDX_HYPERCALL macro which uses a single
'struct tdx_hypercall_args' to carry all input/output registers.

The newer TDX versions introduce more TDCALLs/SEAMCALLs which use more
input/output registers. Also, the TDH.VP.ENTER (which isn't covered
by the current TDX_MODULE_CALL macro) basically can use all registers
that the TDX_HYPERCALL does. The current TDX_MODULE_CALL macro isn't
extendible to cover those cases.

Similar to the TDX_HYPERCALL macro, simplify the TDX_MODULE_CALL macro
to use a single structure 'struct tdx_module_args' to carry all the
input/output registers.

Currently, the TDX_MODULE_CALL macro depends on the caller to pass a
non-NULL 'struct tdx_module_output' to get additional output registers.
Similar to the TDX_HYPERCALL macro, change the TDX_MODULE_CALL macro to
take a new 'ret' parameter to indicate whether to save the additional
output registers to the 'struct tdx_module_args'. Also introduce a new
__tdcall_ret() for that purpose, similar to the __tdx_hypercall_ret().

Note the tdcall(), which is a wrapper of __tdcall(), is called by three
callers: tdx_parse_tdinfo(), tdx_get_ve_info() and tdx_early_init().
The former two need the additional output but the last one doesn't. For
simplicity, make tdcall() always call __tdcall_ret() to avoid another
"_ret()" wrapper. The last caller tdx_early_init() isn't performance
critical anyway.

Cc: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/coco/tdx/tdcall.S | 48 ++++++++-----------
arch/x86/coco/tdx/tdx.c | 50 +++++++++++---------
arch/x86/include/asm/tdx.h | 10 ++--
arch/x86/kernel/asm-offsets.c | 12 ++---
arch/x86/virt/vmx/tdx/tdxcall.S | 82 +++++++++++++--------------------
5 files changed, 89 insertions(+), 113 deletions(-)

diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
index 6aebac08f2bf..46847e85c372 100644
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -43,44 +43,32 @@
* __tdcall() - Used by TDX guests to request services from the TDX
* module (does not include VMM services) using TDCALL instruction.
*
- * Transforms function call register arguments into the TDCALL register ABI.
- * After TDCALL operation, TDX module output is saved in @out (if it is
- * provided by the user).
- *
- *-------------------------------------------------------------------------
- * TDCALL ABI:
- *-------------------------------------------------------------------------
- * Input Registers:
- *
- * RAX - TDCALL Leaf number.
- * RCX,RDX,R8-R9 - TDCALL Leaf specific input registers.
- *
- * Output Registers:
- *
- * RAX - TDCALL instruction error code.
- * RCX,RDX,R8-R11 - TDCALL Leaf specific output registers.
- *
- *-------------------------------------------------------------------------
- *
* __tdcall() function ABI:
*
- * @fn (RDI) - TDCALL Leaf ID, moved to RAX
- * @rcx (RSI) - Input parameter 1, moved to RCX
- * @rdx (RDX) - Input parameter 2, moved to RDX
- * @r8 (RCX) - Input parameter 3, moved to R8
- * @r9 (R8) - Input parameter 4, moved to R9
- *
- * @out (R9) - struct tdx_module_output pointer
- * stored temporarily in R12 (not
- * shared with the TDX module). It
- * can be NULL.
+ * @fn (RDI) - TDCALL Leaf ID, moved to RAX
+ * @args (RSI) - struct tdx_module_args for input
*
* Return status of TDCALL via RAX.
*/
SYM_FUNC_START(__tdcall)
- TDX_MODULE_CALL host=0
+ TDX_MODULE_CALL host=0 ret=0
SYM_FUNC_END(__tdcall)

+/*
+ * __tdcall_ret() - Used by TDX guests to request services from the TDX
+ * module (does not include VMM services) using TDCALL instruction.
+ *
+ * __tdcall_ret() function ABI:
+ *
+ * @fn (RDI) - TDCALL Leaf ID, moved to RAX
+ * @args (RSI) - struct tdx_module_args for input and output
+ *
+ * Return status of TDCALL via RAX.
+ */
+SYM_FUNC_START(__tdcall_ret)
+ TDX_MODULE_CALL host=0 ret=1
+SYM_FUNC_END(__tdcall_ret)
+
/*
* TDX_HYPERCALL - Make hypercalls to a TDX VMM using TDVMCALL leaf of TDCALL
* instruction
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 268f812ff595..0f16ba52ae62 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -98,10 +98,9 @@ EXPORT_SYMBOL_GPL(tdx_kvm_hypercall);
* should only be used for calls that have no legitimate reason to fail
* or where the kernel can not survive the call failing.
*/
-static inline void tdcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
- struct tdx_module_output *out)
+static inline void tdcall(u64 fn, struct tdx_module_args *args)
{
- if (__tdcall(fn, rcx, rdx, r8, r9, out))
+ if (__tdcall_ret(fn, args))
panic("TDCALL %lld failed (Buggy TDX module!)\n", fn);
}

@@ -121,11 +120,14 @@ static inline void tdcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
*/
int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport)
{
+ struct tdx_module_args args = {
+ .rcx = virt_to_phys(tdreport),
+ .rdx = virt_to_phys(reportdata),
+ .r8 = TDREPORT_SUBTYPE_0,
+ };
u64 ret;

- ret = __tdcall(TDG_MR_REPORT, virt_to_phys(tdreport),
- virt_to_phys(reportdata), TDREPORT_SUBTYPE_0,
- 0, NULL);
+ ret = __tdcall(TDG_MR_REPORT, &args);
if (ret) {
if (TDCALL_RETURN_CODE(ret) == TDCALL_INVALID_OPERAND)
return -EINVAL;
@@ -173,7 +175,7 @@ static void __noreturn tdx_panic(const char *msg)

static void tdx_parse_tdinfo(u64 *cc_mask)
{
- struct tdx_module_output out;
+ struct tdx_module_args args = {};
unsigned int gpa_width;
u64 td_attr;

@@ -184,7 +186,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask)
* Guest-Host-Communication Interface (GHCI), section 2.4.2 TDCALL
* [TDG.VP.INFO].
*/
- tdcall(TDG_VP_INFO, 0, 0, 0, 0, &out);
+ tdcall(TDG_VP_INFO, &args);

/*
* The highest bit of a guest physical address is the "sharing" bit.
@@ -193,7 +195,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask)
* The GPA width that comes out of this call is critical. TDX guests
* can not meaningfully run without it.
*/
- gpa_width = out.rcx & GENMASK(5, 0);
+ gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);

/*
@@ -201,7 +203,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask)
* memory. Ensure that no #VE will be delivered for accesses to
* TD-private memory. Only VMM-shared memory (MMIO) will #VE.
*/
- td_attr = out.rdx;
+ td_attr = args.rdx;
if (!(td_attr & ATTR_SEPT_VE_DISABLE)) {
const char *msg = "TD misconfiguration: SEPT_VE_DISABLE attribute must be set.";

@@ -609,7 +611,7 @@ __init bool tdx_early_handle_ve(struct pt_regs *regs)

void tdx_get_ve_info(struct ve_info *ve)
{
- struct tdx_module_output out;
+ struct tdx_module_args args = {};

/*
* Called during #VE handling to retrieve the #VE info from the
@@ -626,15 +628,15 @@ void tdx_get_ve_info(struct ve_info *ve)
* Note, the TDX module treats virtual NMIs as inhibited if the #VE
* valid flag is set. It means that NMI=>#VE will not result in a #DF.
*/
- tdcall(TDG_VP_VEINFO_GET, 0, 0, 0, 0, &out);
+ tdcall(TDG_VP_VEINFO_GET, &args);

/* Transfer the output parameters */
- ve->exit_reason = out.rcx;
- ve->exit_qual = out.rdx;
- ve->gla = out.r8;
- ve->gpa = out.r9;
- ve->instr_len = lower_32_bits(out.r10);
- ve->instr_info = upper_32_bits(out.r10);
+ ve->exit_reason = args.rcx;
+ ve->exit_qual = args.rdx;
+ ve->gla = args.r8;
+ ve->gpa = args.r9;
+ ve->instr_len = lower_32_bits(args.r10);
+ ve->instr_info = upper_32_bits(args.r10);
}

/*
@@ -738,7 +740,7 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
enum pg_level pg_level)
{
unsigned long accept_size = page_level_size(pg_level);
- u64 tdcall_rcx;
+ struct tdx_module_args args = {};
u8 page_size;

if (!IS_ALIGNED(*start, accept_size))
@@ -767,8 +769,8 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
return false;
}

- tdcall_rcx = *start | page_size;
- if (__tdcall(TDG_MEM_PAGE_ACCEPT, tdcall_rcx, 0, 0, 0, NULL))
+ args.rcx = *start | page_size;
+ if (__tdcall(TDG_MEM_PAGE_ACCEPT, &args))
return false;

*start += accept_size;
@@ -855,6 +857,10 @@ static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,

void __init tdx_early_init(void)
{
+ struct tdx_module_args args = {
+ .rdx = TDCS_NOTIFY_ENABLES,
+ .r9 = -1ULL,
+ };
u64 cc_mask;
u32 eax, sig[3];

@@ -870,7 +876,7 @@ void __init tdx_early_init(void)
cc_set_mask(cc_mask);

/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdcall(TDG_VM_WR, 0, TDCS_NOTIFY_ENABLES, 0, -1ULL, NULL);
+ tdcall(TDG_VM_WR, &args);

/*
* All bits above GPA width are reserved and kernel treats shared bit
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index f19e329c4044..e353c7ec5f24 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -21,16 +21,18 @@
#ifndef __ASSEMBLY__

/*
- * Used to gather the output registers values of the TDCALL and SEAMCALL
+ * Used to gather input/output registers of the TDCALL and SEAMCALL
* instructions when requesting services from the TDX module.
*
* This is a software only structure and not part of the TDX module/VMM ABI.
*/
-struct tdx_module_output {
+struct tdx_module_args {
+ /* input/output */
u64 rcx;
u64 rdx;
u64 r8;
u64 r9;
+ /* additional output */
u64 r10;
u64 r11;
};
@@ -56,8 +58,8 @@ struct ve_info {
void __init tdx_early_init(void);

/* Used to communicate with the TDX module */
-u64 __tdcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
- struct tdx_module_output *out);
+u64 __tdcall(u64 fn, struct tdx_module_args *args);
+u64 __tdcall_ret(u64 fn, struct tdx_module_args *args);

void tdx_get_ve_info(struct ve_info *ve);

diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index dc3576303f1a..50383bc46dd7 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -68,12 +68,12 @@ static void __used common(void)
#endif

BLANK();
- OFFSET(TDX_MODULE_rcx, tdx_module_output, rcx);
- OFFSET(TDX_MODULE_rdx, tdx_module_output, rdx);
- OFFSET(TDX_MODULE_r8, tdx_module_output, r8);
- OFFSET(TDX_MODULE_r9, tdx_module_output, r9);
- OFFSET(TDX_MODULE_r10, tdx_module_output, r10);
- OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
+ OFFSET(TDX_MODULE_rcx, tdx_module_args, rcx);
+ OFFSET(TDX_MODULE_rdx, tdx_module_args, rdx);
+ OFFSET(TDX_MODULE_r8, tdx_module_args, r8);
+ OFFSET(TDX_MODULE_r9, tdx_module_args, r9);
+ OFFSET(TDX_MODULE_r10, tdx_module_args, r10);
+ OFFSET(TDX_MODULE_r11, tdx_module_args, r11);

BLANK();
OFFSET(TDX_HYPERCALL_r8, tdx_hypercall_args, r8);
diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index b5ab919c7fa8..0af1374ad8f5 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -17,34 +17,33 @@
* TDX module and hypercalls to the VMM.
* SEAMCALL - used by TDX hosts to make requests to the
* TDX module.
+ *
+ *-------------------------------------------------------------------------
+ * TDCALL/SEAMCALL ABI:
+ *-------------------------------------------------------------------------
+ * Input Registers:
+ *
+ * RAX - TDCALL/SEAMCALL Leaf number.
+ * RCX,RDX,R8-R9 - TDCALL/SEAMCALL Leaf specific input registers.
+ *
+ * Output Registers:
+ *
+ * RAX - TDCALL/SEAMCALL instruction error code.
+ * RCX,RDX,R8-R11 - TDCALL/SEAMCALL Leaf specific output registers.
+ *
+ *-------------------------------------------------------------------------
*/
-.macro TDX_MODULE_CALL host:req
+.macro TDX_MODULE_CALL host:req ret:req
FRAME_BEGIN
- /*
- * R12 will be used as temporary storage for struct tdx_module_output
- * pointer. Since R12-R15 registers are not used by TDCALL/SEAMCALL
- * services supported by this function, it can be reused.
- */

- /* Callee saved, so preserve it */
- push %r12
-
- /*
- * Push output pointer to stack.
- * After the operation, it will be fetched into R12 register.
- */
- push %r9
-
- /* Mangle function call ABI into TDCALL/SEAMCALL ABI: */
/* Move Leaf ID to RAX */
mov %rdi, %rax
- /* Move input 4 to R9 */
- mov %r8, %r9
- /* Move input 3 to R8 */
- mov %rcx, %r8
- /* Move input 1 to RCX */
- mov %rsi, %rcx
- /* Leave input param 2 in RDX */
+
+ /* Move other input regs from 'struct tdx_module_args' */
+ movq TDX_MODULE_rcx(%rsi), %rcx
+ movq TDX_MODULE_rdx(%rsi), %rdx
+ movq TDX_MODULE_r8(%rsi), %r8
+ movq TDX_MODULE_r9(%rsi), %r9

.if \host
seamcall
@@ -65,34 +64,15 @@
tdcall
.endif

- /*
- * Fetch output pointer from stack to R12 (It is used
- * as temporary storage)
- */
- pop %r12
-
- /*
- * Since this macro can be invoked with NULL as an output pointer,
- * check if caller provided an output struct before storing output
- * registers.
- *
- * Update output registers, even if the call failed (RAX != 0).
- * Other registers may contain details of the failure.
- */
- test %r12, %r12
- jz .Lno_output_struct
-
- /* Copy result registers to output struct: */
- movq %rcx, TDX_MODULE_rcx(%r12)
- movq %rdx, TDX_MODULE_rdx(%r12)
- movq %r8, TDX_MODULE_r8(%r12)
- movq %r9, TDX_MODULE_r9(%r12)
- movq %r10, TDX_MODULE_r10(%r12)
- movq %r11, TDX_MODULE_r11(%r12)
-
-.Lno_output_struct:
- /* Restore the state of R12 register */
- pop %r12
+ .if \ret
+ /* Copy output regs to the structure */
+ movq %rcx, TDX_MODULE_rcx(%rsi)
+ movq %rdx, TDX_MODULE_rdx(%rsi)
+ movq %r8, TDX_MODULE_r8(%rsi)
+ movq %r9, TDX_MODULE_r9(%rsi)
+ movq %r10, TDX_MODULE_r10(%rsi)
+ movq %r11, TDX_MODULE_r11(%rsi)
+ .endif

FRAME_END
RET
--
2.41.0


2023-07-12 09:18:20

by Kai Huang

[permalink] [raw]
Subject: [PATCH 02/10] x86/tdx: Use cmovc to save a label in TDX_MODULE_CALL asm

Change 'jnc .Lno_vmfailinvalid' to 'cmovc %rdi, %rax' to save the
.Lno_vmfailinvalid label in the TDX_MODULE_CALL asm macro.

Note %rdi, which is used as the first argument, has been saved to %rax
as TDCALL leaf ID thus is free to hold the error code for cmovc.

This is basically based on Peter's code.

Cc: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/virt/vmx/tdx/tdxcall.S | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index 49a54356ae99..3524915d8bd9 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -57,10 +57,8 @@
* This value will never be used as actual SEAMCALL error code as
* it is from the Reserved status code class.
*/
- jnc .Lno_vmfailinvalid
- mov $TDX_SEAMCALL_VMFAILINVALID, %rax
-.Lno_vmfailinvalid:
-
+ mov $TDX_SEAMCALL_VMFAILINVALID, %rdi
+ cmovc %rdi, %rax
.else
tdcall
.endif
--
2.41.0


2023-07-12 09:18:30

by Kai Huang

[permalink] [raw]
Subject: [PATCH 08/10] x86/tdx: Unify TDX_HYPERCALL and TDX_MODULE_CALL assembly

Now TDX_HYPERCALL macro shares very similar code pattern with the
TDX_MODULE_CALL macro, except that TDX_HYPERCALL handles below
additional things:

1) Sets RAX to 0 (TDG.VP.VMCALL leaf) internally;
2) Sets RCX to the (fixed) bitmap of shared registers;
3) Panics internally if the TDCALL return code is not zero;
4) After TDCALL, moves R10 to RAX for returning the return code of the
TDVMCALL itself, regardless the "\ret" parameter;
5) Clears shared registers to prevent speculative use of VMM's values.

Using the TDX_MODULE_CALL macro, 1) - 3) can be moved out to the C code.
4) can be done by always turning on the TDX_MODULE_CALL macro's "\ret"
parameter, and moving it to the C function. 5) is already covered by
the TDH.VP.ENTER in the TDX_MODULE_CALL.

In other words, the TDX_HYPERCALL macro is redundant now and should be
removed to use TDX_MODULE_CALL macro instead.

TDVMCALL always uses the extra "callee-saved" registers. Replace the
existing __tdx_hypercall{_ret}() assembly functions with a new
__tdcall_saved_ret(), as mentioned above TDVMCALL relies on saving R10
to the register structure so that the caller (C function) can use it as
the return code of the TDVMCALL itself.

As a result, reimplement the existing __tdx_hypercall() using the
__tdcall_saved_ret() function. __tdx_hypercall_ret() isn't necessary
anymore because __tdx_hypercall() always turns on the "\ret" parameter.

Also remove the 'struct tdx_hypercall_args', which has the exact members
as the 'struct tdx_module_args' (except the RCX which isn't used as
shared register). It's not worth to keep the structure just for one
member.

With this change, the __tdx_hypercall() variants are not implemented in
tdcall.S anymore thus not available to the compressed code. Implement
the __tdx_hypercall() variants in the compressed code directly using the
__tdcall_saved_ret().

Opportunistically fix a checkpatch error complaining using space around
parenthesis '(' and ')' while moving the bitmap of shared registers to
<asm/shared/tdx.h>.

Cc: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/boot/compressed/tdx.c | 26 +++-
arch/x86/coco/tdx/tdcall.S | 194 ++++--------------------------
arch/x86/coco/tdx/tdx.c | 62 ++++++----
arch/x86/include/asm/shared/tdx.h | 48 ++++++--
arch/x86/include/asm/tdx.h | 25 ----
arch/x86/kernel/asm-offsets.c | 14 ---
arch/x86/virt/vmx/tdx/tdxcall.S | 10 +-
7 files changed, 127 insertions(+), 252 deletions(-)

diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
index 8841b945a1e2..4df2259f9f7f 100644
--- a/arch/x86/boot/compressed/tdx.c
+++ b/arch/x86/boot/compressed/tdx.c
@@ -11,14 +11,32 @@
#include <asm/shared/tdx.h>

/* Called from __tdx_hypercall() for unrecoverable failure */
-void __tdx_hypercall_failed(void)
+static inline void __tdx_hypercall_failed(void)
{
error("TDVMCALL failed. TDX module bug?");
}

+static inline u64 __tdx_hypercall(struct tdx_module_args *args)
+{
+ u64 ret;
+
+ args->rcx = TDVMCALL_EXPOSE_REGS_MASK;
+ ret = __tdcall_saved_ret(TDG_VP_VMCALL, args);
+
+ /*
+ * RAX!=0 indicates a failure of the TDVMCALL mechanism itself and that
+ * something has gone horribly wrong with the TDX module.
+ */
+ if (ret)
+ __tdx_hypercall_failed();
+
+ /* The return status of the hypercall itself is in R10. */
+ return args->r10;
+}
+
static inline unsigned int tdx_io_in(int size, u16 port)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
.r12 = size,
@@ -26,7 +44,7 @@ static inline unsigned int tdx_io_in(int size, u16 port)
.r14 = port,
};

- if (__tdx_hypercall_ret(&args))
+ if (__tdx_hypercall(&args))
return UINT_MAX;

return args.r11;
@@ -34,7 +52,7 @@ static inline unsigned int tdx_io_in(int size, u16 port)

static inline void tdx_io_out(int size, u16 port, u32 value)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
.r12 = size,
diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
index d5693330ee9b..6f3c9028577b 100644
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -10,38 +10,11 @@

#include "../../virt/vmx/tdx/tdxcall.S"

-/*
- * Bitmasks of exposed registers (with VMM).
- */
-#define TDX_RDX BIT(2)
-#define TDX_RBX BIT(3)
-#define TDX_RSI BIT(6)
-#define TDX_RDI BIT(7)
-#define TDX_R8 BIT(8)
-#define TDX_R9 BIT(9)
-#define TDX_R10 BIT(10)
-#define TDX_R11 BIT(11)
-#define TDX_R12 BIT(12)
-#define TDX_R13 BIT(13)
-#define TDX_R14 BIT(14)
-#define TDX_R15 BIT(15)
-
-/*
- * These registers are clobbered to hold arguments for each
- * TDVMCALL. They are safe to expose to the VMM.
- * Each bit in this mask represents a register ID. Bit field
- * details can be found in TDX GHCI specification, section
- * titled "TDCALL [TDG.VP.VMCALL] leaf".
- */
-#define TDVMCALL_EXPOSE_REGS_MASK \
- ( TDX_RDX | TDX_RBX | TDX_RSI | TDX_RDI | TDX_R8 | TDX_R9 | \
- TDX_R10 | TDX_R11 | TDX_R12 | TDX_R13 | TDX_R14 | TDX_R15 )
-
.section .noinstr.text, "ax"

/*
* __tdcall() - Used by TDX guests to request services from the TDX
- * module (does not include VMM services) using TDCALL instruction.
+ * module using TDCALL instruction.
*
* __tdcall() function ABI:
*
@@ -56,7 +29,7 @@ SYM_FUNC_END(__tdcall)

/*
* __tdcall_ret() - Used by TDX guests to request services from the TDX
- * module (does not include VMM services) using TDCALL instruction.
+ * module using TDCALL instruction.
*
* __tdcall_ret() function ABI:
*
@@ -70,156 +43,33 @@ SYM_FUNC_START(__tdcall_ret)
SYM_FUNC_END(__tdcall_ret)

/*
- * TDX_HYPERCALL - Make hypercalls to a TDX VMM using TDVMCALL leaf of TDCALL
- * instruction
+ * __tdcall_saved() - Used by the TDX guests to request services from the
+ * TDX module using TDCALL instruction with additional callee-saved
+ * registers as input.
*
- * Transforms values in function call argument struct tdx_hypercall_args @args
- * into the TDCALL register ABI. After TDCALL operation, VMM output is saved
- * back in @args, if \ret is 1.
+ * __tdcall_saved() function ABI:
*
- *-------------------------------------------------------------------------
- * TD VMCALL ABI:
- *-------------------------------------------------------------------------
+ * @fn (RDI) - TDCALL leaf ID, moved to RAX
+ * @args (RSI) - struct 'struct tdx_module_args' for input
*
- * Input Registers:
- *
- * RAX - TDCALL instruction leaf number (0 - TDG.VP.VMCALL)
- * RCX - BITMAP which controls which part of TD Guest GPR
- * is passed as-is to the VMM and back.
- * R10 - Set 0 to indicate TDCALL follows standard TDX ABI
- * specification. Non zero value indicates vendor
- * specific ABI.
- * R11 - VMCALL sub function number
- * RBX, RDX, RDI, RSI - Used to pass VMCALL sub function specific arguments.
- * R8-R9, R12-R15 - Same as above.
- *
- * Output Registers:
- *
- * RAX - TDCALL instruction status (Not related to hypercall
- * output).
- * RBX, RDX, RDI, RSI - Hypercall sub function specific output values.
- * R8-R15 - Same as above.
- *
- */
-.macro TDX_HYPERCALL ret:req
- FRAME_BEGIN
-
- /* Save callee-saved GPRs as mandated by the x86_64 ABI */
- push %r15
- push %r14
- push %r13
- push %r12
- push %rbx
-
- /* Free RDI to be used as TDVMCALL arguments */
- movq %rdi, %rax
-
- /* Copy hypercall registers from arg struct: */
- movq TDX_HYPERCALL_r8(%rax), %r8
- movq TDX_HYPERCALL_r9(%rax), %r9
- movq TDX_HYPERCALL_r10(%rax), %r10
- movq TDX_HYPERCALL_r11(%rax), %r11
- movq TDX_HYPERCALL_r12(%rax), %r12
- movq TDX_HYPERCALL_r13(%rax), %r13
- movq TDX_HYPERCALL_r14(%rax), %r14
- movq TDX_HYPERCALL_r15(%rax), %r15
- movq TDX_HYPERCALL_rdi(%rax), %rdi
- movq TDX_HYPERCALL_rsi(%rax), %rsi
- movq TDX_HYPERCALL_rbx(%rax), %rbx
- movq TDX_HYPERCALL_rdx(%rax), %rdx
-
- push %rax
-
- /* Mangle function call ABI into TDCALL ABI: */
- /* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
- xor %eax, %eax
-
- movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
-
- tdcall
-
- /*
- * RAX!=0 indicates a failure of the TDVMCALL mechanism itself and that
- * something has gone horribly wrong with the TDX module.
- *
- * The return status of the hypercall operation is in a separate
- * register (in R10). Hypercall errors are a part of normal operation
- * and are handled by callers.
- */
- testq %rax, %rax
- jne .Lpanic\@
-
- pop %rax
-
- .if \ret
- movq %r8, TDX_HYPERCALL_r8(%rax)
- movq %r9, TDX_HYPERCALL_r9(%rax)
- movq %r10, TDX_HYPERCALL_r10(%rax)
- movq %r11, TDX_HYPERCALL_r11(%rax)
- movq %r12, TDX_HYPERCALL_r12(%rax)
- movq %r13, TDX_HYPERCALL_r13(%rax)
- movq %r14, TDX_HYPERCALL_r14(%rax)
- movq %r15, TDX_HYPERCALL_r15(%rax)
- movq %rdi, TDX_HYPERCALL_rdi(%rax)
- movq %rsi, TDX_HYPERCALL_rsi(%rax)
- movq %rbx, TDX_HYPERCALL_rbx(%rax)
- movq %rdx, TDX_HYPERCALL_rdx(%rax)
- .endif
-
- /* TDVMCALL leaf return code is in R10 */
- movq %r10, %rax
-
- /*
- * Zero out registers exposed to the VMM to avoid speculative execution
- * with VMM-controlled values. This needs to include all registers
- * present in TDVMCALL_EXPOSE_REGS_MASK, except RBX, and R12-R15 which
- * will be restored.
- */
- xor %r8d, %r8d
- xor %r9d, %r9d
- xor %r10d, %r10d
- xor %r11d, %r11d
- xor %rdi, %rdi
- xor %rsi, %rsi
- xor %rdx, %rdx
-
- /* Restore callee-saved GPRs as mandated by the x86_64 ABI */
- pop %rbx
- pop %r12
- pop %r13
- pop %r14
- pop %r15
-
- FRAME_END
-
- RET
-.Lpanic\@:
- call __tdx_hypercall_failed
- /* __tdx_hypercall_failed never returns */
- REACHABLE
- jmp .Lpanic\@
-.endm
-
-/*
- *
- * __tdx_hypercall() function ABI:
- *
- * @args (RDI) - struct tdx_hypercall_args for input
- *
- * On successful completion, return the hypercall error code.
+ * Return the TDCALL return code (RAX)
*/
-SYM_FUNC_START(__tdx_hypercall)
- TDX_HYPERCALL ret=0
-SYM_FUNC_END(__tdx_hypercall)
+SYM_FUNC_START(__tdcall_saved)
+ TDX_MODULE_CALL host=0 ret=0 saved=1
+SYM_FUNC_END(__tdcall_saved)

/*
+ * __tdcall_saved_ret() - Used by the TDX guests to request services from the
+ * TDX module using TDCALL instruction with additional callee-saved registers
+ * as input/output.
*
- * __tdx_hypercall_ret() function ABI:
+ * __tdcall_saved_ret() function ABI:
*
- * @args (RDI) - struct tdx_hypercall_args for input and output
+ * @fn (RDI) - TDCALL leaf ID, moved to RAX
+ * @args (RDI) - struct 'struct tdx_module_args' for input and output
*
- * On successful completion, return the hypercall error code.
+ * Return the TDCALL return code (RAX)
*/
-SYM_FUNC_START(__tdx_hypercall_ret)
- TDX_HYPERCALL ret=1
-SYM_FUNC_END(__tdx_hypercall_ret)
+SYM_FUNC_START(__tdcall_saved_ret)
+ TDX_MODULE_CALL host=0 ret=1 saved=1
+SYM_FUNC_END(__tdcall_saved_ret)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 0f16ba52ae62..a5e77893b2c0 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -51,13 +51,38 @@

#define TDREPORT_SUBTYPE_0 0

+/* Called from __tdx_hypercall() for unrecoverable failure */
+static noinstr void __tdx_hypercall_failed(void)
+{
+ instrumentation_begin();
+ panic("TDVMCALL failed. TDX module bug?");
+}
+
+static inline u64 __tdx_hypercall(struct tdx_module_args *args)
+{
+ u64 ret;
+
+ args->rcx = TDVMCALL_EXPOSE_REGS_MASK;
+ ret = __tdcall_saved_ret(TDG_VP_VMCALL, args);
+
+ /*
+ * RAX!=0 indicates a failure of the TDVMCALL mechanism itself and that
+ * something has gone horribly wrong with the TDX module.
+ */
+ if (ret)
+ __tdx_hypercall_failed();
+
+ /* The return status of the hypercall itself is in R10. */
+ return args->r10;
+}
+
/*
- * Wrapper for standard use of __tdx_hypercall with no output aside from
- * return code.
+ * Wrapper for standard use of __tdx_hypercall() w/o needing any output
+ * register except the return code.
*/
static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = fn,
.r12 = r12,
@@ -69,18 +94,11 @@ static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
return __tdx_hypercall(&args);
}

-/* Called from __tdx_hypercall() for unrecoverable failure */
-noinstr void __tdx_hypercall_failed(void)
-{
- instrumentation_begin();
- panic("TDVMCALL failed. TDX module bug?");
-}
-
#ifdef CONFIG_KVM_GUEST
long tdx_kvm_hypercall(unsigned int nr, unsigned long p1, unsigned long p2,
unsigned long p3, unsigned long p4)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = nr,
.r11 = p1,
.r12 = p2,
@@ -140,7 +158,7 @@ EXPORT_SYMBOL_GPL(tdx_mcall_get_report0);

static void __noreturn tdx_panic(const char *msg)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = TDVMCALL_REPORT_FATAL_ERROR,
.r12 = 0, /* Error code: 0 is Panic */
@@ -262,7 +280,7 @@ static int ve_instr_len(struct ve_info *ve)

static u64 __cpuidle __halt(const bool irq_disabled)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = hcall_func(EXIT_REASON_HLT),
.r12 = irq_disabled,
@@ -306,7 +324,7 @@ void __cpuidle tdx_safe_halt(void)

static int read_msr(struct pt_regs *regs, struct ve_info *ve)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = hcall_func(EXIT_REASON_MSR_READ),
.r12 = regs->cx,
@@ -317,7 +335,7 @@ static int read_msr(struct pt_regs *regs, struct ve_info *ve)
* can be found in TDX Guest-Host-Communication Interface
* (GHCI), section titled "TDG.VP.VMCALL<Instruction.RDMSR>".
*/
- if (__tdx_hypercall_ret(&args))
+ if (__tdx_hypercall(&args))
return -EIO;

regs->ax = lower_32_bits(args.r11);
@@ -327,7 +345,7 @@ static int read_msr(struct pt_regs *regs, struct ve_info *ve)

static int write_msr(struct pt_regs *regs, struct ve_info *ve)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = hcall_func(EXIT_REASON_MSR_WRITE),
.r12 = regs->cx,
@@ -347,7 +365,7 @@ static int write_msr(struct pt_regs *regs, struct ve_info *ve)

static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = hcall_func(EXIT_REASON_CPUID),
.r12 = regs->ax,
@@ -371,7 +389,7 @@ static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)
* ABI can be found in TDX Guest-Host-Communication Interface
* (GHCI), section titled "VP.VMCALL<Instruction.CPUID>".
*/
- if (__tdx_hypercall_ret(&args))
+ if (__tdx_hypercall(&args))
return -EIO;

/*
@@ -389,7 +407,7 @@ static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)

static bool mmio_read(int size, unsigned long addr, unsigned long *val)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = hcall_func(EXIT_REASON_EPT_VIOLATION),
.r12 = size,
@@ -398,7 +416,7 @@ static bool mmio_read(int size, unsigned long addr, unsigned long *val)
.r15 = *val,
};

- if (__tdx_hypercall_ret(&args))
+ if (__tdx_hypercall(&args))
return false;
*val = args.r11;
return true;
@@ -517,7 +535,7 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)

static bool handle_in(struct pt_regs *regs, int size, int port)
{
- struct tdx_hypercall_args args = {
+ struct tdx_module_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
.r11 = hcall_func(EXIT_REASON_IO_INSTRUCTION),
.r12 = size,
@@ -532,7 +550,7 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
* in TDX Guest-Host-Communication Interface (GHCI) section titled
* "TDG.VP.VMCALL<Instruction.IO>".
*/
- success = !__tdx_hypercall_ret(&args);
+ success = !__tdx_hypercall(&args);

/* Update part of the register affected by the emulated instruction */
regs->ax &= ~mask;
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index b415a24f0d48..f2a6e76dd773 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -13,32 +13,60 @@
#ifndef __ASSEMBLY__

/*
- * Used in __tdx_hypercall() to pass down and get back registers' values of
- * the TDCALL instruction when requesting services from the VMM.
+ * Used to gather input/output registers of the TDCALL and SEAMCALL
+ * instructions when requesting services from the TDX module.
*
* This is a software only structure and not part of the TDX module/VMM ABI.
*/
-struct tdx_hypercall_args {
+struct tdx_module_args {
+ /* callee-clobbered */
+ u64 rcx;
+ u64 rdx;
u64 r8;
u64 r9;
+ /* extra callee-clobbered */
u64 r10;
u64 r11;
+ /* callee-saved + rdi/rsi */
u64 r12;
u64 r13;
u64 r14;
u64 r15;
+ u64 rbx;
u64 rdi;
u64 rsi;
- u64 rbx;
- u64 rdx;
};

-/* Used to request services from the VMM */
-u64 __tdx_hypercall(struct tdx_hypercall_args *args);
-u64 __tdx_hypercall_ret(struct tdx_hypercall_args *args);
+#define TDG_VP_VMCALL 0

-/* Called from __tdx_hypercall() for unrecoverable failure */
-void __tdx_hypercall_failed(void);
+/*
+ * Bitmasks of exposed registers (with VMM).
+ */
+#define TDX_RDX BIT(2)
+#define TDX_RBX BIT(3)
+#define TDX_RSI BIT(6)
+#define TDX_RDI BIT(7)
+#define TDX_R8 BIT(8)
+#define TDX_R9 BIT(9)
+#define TDX_R10 BIT(10)
+#define TDX_R11 BIT(11)
+#define TDX_R12 BIT(12)
+#define TDX_R13 BIT(13)
+#define TDX_R14 BIT(14)
+#define TDX_R15 BIT(15)
+
+/*
+ * These registers are clobbered to hold arguments for each
+ * TDVMCALL. They are safe to expose to the VMM.
+ * Each bit in this mask represents a register ID. Bit field
+ * details can be found in TDX GHCI specification, section
+ * titled "TDCALL [TDG.VP.VMCALL] leaf".
+ */
+#define TDVMCALL_EXPOSE_REGS_MASK \
+ (TDX_RDX | TDX_RBX | TDX_RSI | TDX_RDI | TDX_R8 | TDX_R9 | \
+ TDX_R10 | TDX_R11 | TDX_R12 | TDX_R13 | TDX_R14 | TDX_R15)
+
+u64 __tdcall_saved_ret(u64 fn, struct tdx_module_args *args);

/*
* The TDG.VP.VMCALL-Instruction-execution sub-functions are defined
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index a549bcc77f4f..9b0ad0176e58 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -20,31 +20,6 @@

#ifndef __ASSEMBLY__

-/*
- * Used to gather input/output registers of the TDCALL and SEAMCALL
- * instructions when requesting services from the TDX module.
- *
- * This is a software only structure and not part of the TDX module/VMM ABI.
- */
-struct tdx_module_args {
- /* callee-clobbered */
- u64 rcx;
- u64 rdx;
- u64 r8;
- u64 r9;
- /* extra callee-clobbered */
- u64 r10;
- u64 r11;
- /* callee-saved + rdi/rsi */
- u64 r12;
- u64 r13;
- u64 r14;
- u64 r15;
- u64 rbx;
- u64 rdi;
- u64 rsi;
-};
-
/*
* Used by the #VE exception handler to gather the #VE exception
* info from the TDX module. This is a software only structure
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 1581564a67b7..6913b372ccf7 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -82,20 +82,6 @@ static void __used common(void)
OFFSET(TDX_MODULE_rdi, tdx_module_args, rdi);
OFFSET(TDX_MODULE_rsi, tdx_module_args, rsi);

- BLANK();
- OFFSET(TDX_HYPERCALL_r8, tdx_hypercall_args, r8);
- OFFSET(TDX_HYPERCALL_r9, tdx_hypercall_args, r9);
- OFFSET(TDX_HYPERCALL_r10, tdx_hypercall_args, r10);
- OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_args, r11);
- OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_args, r12);
- OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_args, r13);
- OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_args, r14);
- OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_args, r15);
- OFFSET(TDX_HYPERCALL_rdi, tdx_hypercall_args, rdi);
- OFFSET(TDX_HYPERCALL_rsi, tdx_hypercall_args, rsi);
- OFFSET(TDX_HYPERCALL_rbx, tdx_hypercall_args, rbx);
- OFFSET(TDX_HYPERCALL_rdx, tdx_hypercall_args, rdx);
-
BLANK();
OFFSET(BP_scratch, boot_params, scratch);
OFFSET(BP_secure_boot, boot_params, secure_boot);
diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index 389f4ec5eee8..e4e90ebf5dad 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -145,11 +145,11 @@
.endif /* \ret */

.if \saved
- .if \ret && \host
+ .if \ret
/*
- * Clear registers shared by guest for VP.ENTER to prevent
- * speculative use of guest's values, including those are
- * restored from the stack.
+ * Clear registers shared by guest for VP.ENTER and VP.VMCALL to
+ * prevent speculative use of values from guest/VMM, including
+ * those are restored from the stack.
*
* See arch/x86/kvm/vmx/vmenter.S:
*
@@ -173,7 +173,7 @@
xorq %r15, %r15
xorq %rbx, %rbx
xorq %rdi, %rdi
- .endif /* \ret && \host */
+ .endif /* \ret */

/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
popq %r15
--
2.41.0


2023-07-12 09:19:19

by Kai Huang

[permalink] [raw]
Subject: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

The TDX guest live migration support (TDX 1.5) adds new TDCALL/SEAMCALL
leaf functions. Those new TDCALLs/SEAMCALLs take additional registers
for input (R10-R13) and output (R12-R13). TDG.SERVTD.RD is an example.

Also, the current TDX_MODULE_CALL doesn't aim to handle TDH.VP.ENTER
SEAMCALL, which monitors the TDG.VP.VMCALL in input/output registers
when it returns in case of VMCALL from TDX guest.

With those new TDCALLs/SEAMCALLs and the TDH.VP.ENTER covered, the
TDX_MODULE_CALL macro basically needs to handle the same input/output
registers as the TDX_HYPERCALL does. And as a result, they also share
similar logic in the assembly, thus should be unified to use one common
assembly.

Extend the TDX_MODULE_CALL asm to support the new TDCALLs/SEAMCALLs and
also the TDH.VP.ENTER SEAMCALL. Eventually it will be unified with the
TDX_HYPERCALL.

The new input/output registers fit with the "callee-saved" registers in
the x86 calling convention. Add a new "saved" parameter to support
those new TDCALLs/SEAMCALLs and TDH.VP.ENTER and keep the existing
TDCALLs/SEAMCALLs minimally impacted.

For TDH.VP.ENTER, after it returns the registers shared by the guest
contain guest's values. Explicitly clear them to prevent speculative
use of guest's values.

Note most TDX live migration related SEAMCALLs may also clobber AVX*
state ("AVX, AVX2 and AVX512 state: may be reset to the architectural
INIT state" -- see TDH.EXPORT.MEM for example). And TDH.VP.ENTER also
clobbers XMM0-XMM15 when the corresponding bit is set in RCX. Don't
handle them in the TDX_MODULE_CALL macro but let the caller save and
restore when needed.

This is basically based on Peter's code.

Cc: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/coco/tdx/tdcall.S | 4 +-
arch/x86/include/asm/tdx.h | 12 +++-
arch/x86/kernel/asm-offsets.c | 7 ++
arch/x86/virt/vmx/tdx/tdxcall.S | 121 ++++++++++++++++++++++++++++++--
4 files changed, 134 insertions(+), 10 deletions(-)

diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
index 46847e85c372..d5693330ee9b 100644
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -51,7 +51,7 @@
* Return status of TDCALL via RAX.
*/
SYM_FUNC_START(__tdcall)
- TDX_MODULE_CALL host=0 ret=0
+ TDX_MODULE_CALL host=0 ret=0 saved=0
SYM_FUNC_END(__tdcall)

/*
@@ -66,7 +66,7 @@ SYM_FUNC_END(__tdcall)
* Return status of TDCALL via RAX.
*/
SYM_FUNC_START(__tdcall_ret)
- TDX_MODULE_CALL host=0 ret=1
+ TDX_MODULE_CALL host=0 ret=1 saved=0
SYM_FUNC_END(__tdcall_ret)

/*
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index e353c7ec5f24..a549bcc77f4f 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -27,14 +27,22 @@
* This is a software only structure and not part of the TDX module/VMM ABI.
*/
struct tdx_module_args {
- /* input/output */
+ /* callee-clobbered */
u64 rcx;
u64 rdx;
u64 r8;
u64 r9;
- /* additional output */
+ /* extra callee-clobbered */
u64 r10;
u64 r11;
+ /* callee-saved + rdi/rsi */
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+ u64 rbx;
+ u64 rdi;
+ u64 rsi;
};

/*
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 50383bc46dd7..1581564a67b7 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -74,6 +74,13 @@ static void __used common(void)
OFFSET(TDX_MODULE_r9, tdx_module_args, r9);
OFFSET(TDX_MODULE_r10, tdx_module_args, r10);
OFFSET(TDX_MODULE_r11, tdx_module_args, r11);
+ OFFSET(TDX_MODULE_r12, tdx_module_args, r12);
+ OFFSET(TDX_MODULE_r13, tdx_module_args, r13);
+ OFFSET(TDX_MODULE_r14, tdx_module_args, r14);
+ OFFSET(TDX_MODULE_r15, tdx_module_args, r15);
+ OFFSET(TDX_MODULE_rbx, tdx_module_args, rbx);
+ OFFSET(TDX_MODULE_rdi, tdx_module_args, rdi);
+ OFFSET(TDX_MODULE_rsi, tdx_module_args, rsi);

BLANK();
OFFSET(TDX_HYPERCALL_r8, tdx_hypercall_args, r8);
diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index 0af1374ad8f5..389f4ec5eee8 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -23,17 +23,25 @@
*-------------------------------------------------------------------------
* Input Registers:
*
- * RAX - TDCALL/SEAMCALL Leaf number.
- * RCX,RDX,R8-R9 - TDCALL/SEAMCALL Leaf specific input registers.
+ * RAX - TDCALL/SEAMCALL Leaf number.
+ * RCX,RDX,RDI,RSI,RBX,R8-R15 - TDCALL/SEAMCALL Leaf specific input registers.
*
* Output Registers:
*
- * RAX - TDCALL/SEAMCALL instruction error code.
- * RCX,RDX,R8-R11 - TDCALL/SEAMCALL Leaf specific output registers.
+ * RAX - TDCALL/SEAMCALL Leaf number.
+ * RCX,RDX,RDI,RSI,RBX,R8-R15 - TDCALL/SEAMCALL Leaf specific output registers.
*
*-------------------------------------------------------------------------
+ *
+ * So while the common core (RAX,RCX,RDX,R8-R11) fits nicely in the
+ * callee-clobbered registers and even leaves RDI,RSI free to act as a
+ * base pointer, some leafs (e.g., VP.ENTER) make a giant mess of things.
+ *
+ * For simplicity, assume that anything that needs the callee-saved regs
+ * also tramples on RDI,RSI. This isn't strictly true, see for example
+ * TDH.EXPORT.MEM.
*/
-.macro TDX_MODULE_CALL host:req ret:req
+.macro TDX_MODULE_CALL host:req ret:req saved:req
FRAME_BEGIN

/* Move Leaf ID to RAX */
@@ -44,6 +52,37 @@
movq TDX_MODULE_rdx(%rsi), %rdx
movq TDX_MODULE_r8(%rsi), %r8
movq TDX_MODULE_r9(%rsi), %r9
+ movq TDX_MODULE_r10(%rsi), %r10
+ movq TDX_MODULE_r11(%rsi), %r11
+
+ .if \saved
+ /*
+ * Move additional input regs from the structure. For simplicity
+ * assume that anything needs the callee-saved regs also tramples
+ * on %rdi/%rsi (see VP.ENTER).
+ */
+ /* Save those callee-saved GPRs as mandated by the x86_64 ABI */
+ pushq %rbx
+ pushq %r12
+ pushq %r13
+ pushq %r14
+ pushq %r15
+
+ movq TDX_MODULE_r12(%rsi), %r12
+ movq TDX_MODULE_r13(%rsi), %r13
+ movq TDX_MODULE_r14(%rsi), %r14
+ movq TDX_MODULE_r15(%rsi), %r15
+ movq TDX_MODULE_rbx(%rsi), %rbx
+
+ .if \ret
+ /* Save the structure pointer as %rsi is about to be clobbered */
+ pushq %rsi
+ .endif
+
+ movq TDX_MODULE_rdi(%rsi), %rdi
+ /* %rsi needs to be done at last */
+ movq TDX_MODULE_rsi(%rsi), %rsi
+ .endif /* \saved */

.if \host
seamcall
@@ -65,6 +104,37 @@
.endif

.if \ret
+ .if \saved
+ /*
+ * Restore the structure from stack to saved the output registers
+ *
+ * In case of VP.ENTER returns due to TDVMCALL, all registers are
+ * valid thus no register can be used as spare to restore the
+ * structure from the stack (see "TDH.VP.ENTER Output Operands
+ * Definition on TDCALL(TDG.VP.VMCALL) Following a TD Entry").
+ * For this case, need to make one register as spare by saving it
+ * to the stack and then manually load the structure pointer to
+ * the spare register.
+ *
+ * Note for other TDCALLs/SEAMCALLs there are spare registers
+ * thus no need for such hack but just use this for all for now.
+ */
+ pushq %rax /* save the TDCALL/SEAMCALL return code */
+ movq 8(%rsp), %rax /* restore the structure pointer */
+ movq %rsi, TDX_MODULE_rsi(%rax) /* save %rsi */
+ movq %rax, %rsi /* use %rsi as structure pointer */
+ popq %rax /* restore the return code */
+ popq %rsi /* pop the structure pointer */
+
+ /* Copy additional output regs to the structure */
+ movq %r12, TDX_MODULE_r12(%rsi)
+ movq %r13, TDX_MODULE_r13(%rsi)
+ movq %r14, TDX_MODULE_r14(%rsi)
+ movq %r15, TDX_MODULE_r15(%rsi)
+ movq %rbx, TDX_MODULE_rbx(%rsi)
+ movq %rdi, TDX_MODULE_rdi(%rsi)
+ .endif /* \saved */
+
/* Copy output regs to the structure */
movq %rcx, TDX_MODULE_rcx(%rsi)
movq %rdx, TDX_MODULE_rdx(%rsi)
@@ -72,7 +142,46 @@
movq %r9, TDX_MODULE_r9(%rsi)
movq %r10, TDX_MODULE_r10(%rsi)
movq %r11, TDX_MODULE_r11(%rsi)
- .endif
+ .endif /* \ret */
+
+ .if \saved
+ .if \ret && \host
+ /*
+ * Clear registers shared by guest for VP.ENTER to prevent
+ * speculative use of guest's values, including those are
+ * restored from the stack.
+ *
+ * See arch/x86/kvm/vmx/vmenter.S:
+ *
+ * In theory, a L1 cache miss when restoring register from stack
+ * could lead to speculative execution with guest's values.
+ *
+ * Note: RBP/RSP are not used as shared register. RSI has been
+ * restored already.
+ *
+ * XOR is cheap, thus unconditionally do for all leafs.
+ */
+ xorq %rcx, %rcx
+ xorq %rdx, %rdx
+ xorq %r8, %r8
+ xorq %r9, %r9
+ xorq %r10, %r10
+ xorq %r11, %r11
+ xorq %r12, %r12
+ xorq %r13, %r13
+ xorq %r14, %r14
+ xorq %r15, %r15
+ xorq %rbx, %rbx
+ xorq %rdi, %rdi
+ .endif /* \ret && \host */
+
+ /* Restore callee-saved GPRs as mandated by the x86_64 ABI */
+ popq %r15
+ popq %r14
+ popq %r13
+ popq %r12
+ popq %rbx
+ .endif /* \saved */

FRAME_END
RET
--
2.41.0


2023-07-12 09:20:36

by Kai Huang

[permalink] [raw]
Subject: [PATCH 03/10] x86/tdx: Move FRAME_BEGIN/END to TDX_MODULE_CALL asm macro

Currently, the TDX_MODULE_CALL asm macro and the __tdx_module_call()
take registers directly as input and a 'struct tdx_module_output' as
optional output. This is different from the __tdx_hypercall(), which
simply uses a structure to carry all input/output. There's no point to
leave __tdx_module_call() complicated as it is.

As a preparation to simplify the __tdx_module_call() to make it look
like __tdx_hypercall(), move FRAME_BEGIN/END and RET from the
__tdx_module_call() to the TDX_MODULE_CALL assembly macro. This also
allows more implementation flexibility of the assembly inside the
TDX_MODULE_CALL macro, e.g., allowing putting an _ASM_EXTABLE() after
the main body of the assembly.

This is basically based on Peter's code.

Cc: Kirill A. Shutemov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/coco/tdx/tdcall.S | 3 ---
arch/x86/virt/vmx/tdx/tdxcall.S | 5 +++++
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
index 2eca5f43734f..e5d4b7d8ecd4 100644
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -78,10 +78,7 @@
* Return status of TDCALL via RAX.
*/
SYM_FUNC_START(__tdx_module_call)
- FRAME_BEGIN
TDX_MODULE_CALL host=0
- FRAME_END
- RET
SYM_FUNC_END(__tdx_module_call)

/*
diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index 3524915d8bd9..b5ab919c7fa8 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -1,5 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <asm/asm-offsets.h>
+#include <asm/frame.h>
#include <asm/tdx.h>

/*
@@ -18,6 +19,7 @@
* TDX module.
*/
.macro TDX_MODULE_CALL host:req
+ FRAME_BEGIN
/*
* R12 will be used as temporary storage for struct tdx_module_output
* pointer. Since R12-R15 registers are not used by TDCALL/SEAMCALL
@@ -91,4 +93,7 @@
.Lno_output_struct:
/* Restore the state of R12 register */
pop %r12
+
+ FRAME_END
+ RET
.endm
--
2.41.0


2023-07-12 09:20:52

by Kai Huang

[permalink] [raw]
Subject: [PATCH 04/10] x86/tdx: Make macros of TDCALLs consistent with the spec

The TDX spec names all TDCALLs with prefix "TDG". Currently, the kernel
doesn't follow such convention for the macros of those TDCALLs but uses
prefix "TDX_" for all of them. Although it's arguable whether the TDX
spec names those TDCALLs properly, it's better for the kernel to follow
the spec when naming those macros.

Change all macros of TDCALLs to make them consistent with the spec. As
a bonus, they get distinguished easily from the host-side SEAMCALLs,
which all have prefix "TDH".

No functional change intended.

Signed-off-by: Kai Huang <[email protected]>
---
arch/x86/coco/tdx/tdx.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 5b8056f6c83f..de021df92009 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -15,11 +15,11 @@
#include <asm/pgtable.h>

/* TDX module Call Leaf IDs */
-#define TDX_GET_INFO 1
-#define TDX_GET_VEINFO 3
-#define TDX_GET_REPORT 4
-#define TDX_ACCEPT_PAGE 6
-#define TDX_WR 8
+#define TDG_VP_INFO 1
+#define TDG_VP_VEINFO_GET 3
+#define TDG_MR_REPORT 4
+#define TDG_MEM_PAGE_ACCEPT 6
+#define TDG_VM_WR 8

/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
#define TDCS_NOTIFY_ENABLES 0x9100000000000010
@@ -123,7 +123,7 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport)
{
u64 ret;

- ret = __tdx_module_call(TDX_GET_REPORT, virt_to_phys(tdreport),
+ ret = __tdx_module_call(TDG_MR_REPORT, virt_to_phys(tdreport),
virt_to_phys(reportdata), TDREPORT_SUBTYPE_0,
0, NULL);
if (ret) {
@@ -184,7 +184,7 @@ static void tdx_parse_tdinfo(u64 *cc_mask)
* Guest-Host-Communication Interface (GHCI), section 2.4.2 TDCALL
* [TDG.VP.INFO].
*/
- tdx_module_call(TDX_GET_INFO, 0, 0, 0, 0, &out);
+ tdx_module_call(TDG_VP_INFO, 0, 0, 0, 0, &out);

/*
* The highest bit of a guest physical address is the "sharing" bit.
@@ -626,7 +626,7 @@ void tdx_get_ve_info(struct ve_info *ve)
* Note, the TDX module treats virtual NMIs as inhibited if the #VE
* valid flag is set. It means that NMI=>#VE will not result in a #DF.
*/
- tdx_module_call(TDX_GET_VEINFO, 0, 0, 0, 0, &out);
+ tdx_module_call(TDG_VP_VEINFO_GET, 0, 0, 0, 0, &out);

/* Transfer the output parameters */
ve->exit_reason = out.rcx;
@@ -768,7 +768,7 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
}

tdcall_rcx = *start | page_size;
- if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL))
+ if (__tdx_module_call(TDG_MEM_PAGE_ACCEPT, tdcall_rcx, 0, 0, 0, NULL))
return false;

*start += accept_size;
@@ -805,7 +805,7 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)

/*
* For shared->private conversion, accept the page using
- * TDX_ACCEPT_PAGE TDX module call.
+ * TDG_MEM_PAGE_ACCEPT TDX module call.
*/
while (start < end) {
unsigned long len = end - start;
@@ -870,7 +870,7 @@ void __init tdx_early_init(void)
cc_set_mask(cc_mask);

/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
- tdx_module_call(TDX_WR, 0, TDCS_NOTIFY_ENABLES, 0, -1ULL, NULL);
+ tdx_module_call(TDG_VM_WR, 0, TDCS_NOTIFY_ENABLES, 0, -1ULL, NULL);

/*
* All bits above GPA width are reserved and kernel treats shared bit
--
2.41.0


2023-07-12 17:22:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Wed, Jul 12, 2023 at 06:53:37PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
>
>
> > @@ -72,7 +142,46 @@
> > movq %r9, TDX_MODULE_r9(%rsi)
> > movq %r10, TDX_MODULE_r10(%rsi)
> > movq %r11, TDX_MODULE_r11(%rsi)
> > - .endif
> > + .endif /* \ret */
> > +
> > + .if \saved
> > + .if \ret && \host
> > + /*
> > + * Clear registers shared by guest for VP.ENTER to prevent
> > + * speculative use of guest's values, including those are
> > + * restored from the stack.
> > + *
> > + * See arch/x86/kvm/vmx/vmenter.S:
> > + *
> > + * In theory, a L1 cache miss when restoring register from stack
> > + * could lead to speculative execution with guest's values.
> > + *
> > + * Note: RBP/RSP are not used as shared register. RSI has been
> > + * restored already.
> > + *
> > + * XOR is cheap, thus unconditionally do for all leafs.
> > + */
> > + xorq %rcx, %rcx
> > + xorq %rdx, %rdx
> > + xorq %r8, %r8
> > + xorq %r9, %r9
> > + xorq %r10, %r10
> > + xorq %r11, %r11
>
> > + xorq %r12, %r12
> > + xorq %r13, %r13
> > + xorq %r14, %r14
> > + xorq %r15, %r15
> > + xorq %rbx, %rbx
>
> ^ those are an instant pop below, seems daft to clear them.

Also, please use the 32bit variant:

xorl %ecx, %ecx

saves a RAX prefix each.

> > + xorq %rdi, %rdi
> > + .endif /* \ret && \host */
> > +
> > + /* Restore callee-saved GPRs as mandated by the x86_64 ABI */
> > + popq %r15
> > + popq %r14
> > + popq %r13
> > + popq %r12
> > + popq %rbx
> > + .endif /* \saved */
> >
> > FRAME_END
> > RET
> > --
> > 2.41.0
> >

2023-07-12 17:34:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
> @@ -65,6 +104,37 @@
> .endif
>
> .if \ret
> + .if \saved
> + /*
> + * Restore the structure from stack to saved the output registers
> + *
> + * In case of VP.ENTER returns due to TDVMCALL, all registers are
> + * valid thus no register can be used as spare to restore the
> + * structure from the stack (see "TDH.VP.ENTER Output Operands
> + * Definition on TDCALL(TDG.VP.VMCALL) Following a TD Entry").
> + * For this case, need to make one register as spare by saving it
> + * to the stack and then manually load the structure pointer to
> + * the spare register.
> + *
> + * Note for other TDCALLs/SEAMCALLs there are spare registers
> + * thus no need for such hack but just use this for all for now.
> + */
> + pushq %rax /* save the TDCALL/SEAMCALL return code */
> + movq 8(%rsp), %rax /* restore the structure pointer */
> + movq %rsi, TDX_MODULE_rsi(%rax) /* save %rsi */
> + movq %rax, %rsi /* use %rsi as structure pointer */
> + popq %rax /* restore the return code */
> + popq %rsi /* pop the structure pointer */

Urgghh... At least for the \host case you can simply pop %rsi, no?
VP.ENTER returns with 0 there IIRC.

/* xor-swap (%rsp) and %rax */
xorq (%rsp), %rax
xorq %rax, (%rsp)
xorq (%rsp), %rax

movq %rsi, TDX_MODULE_rsi(%rax);
movq %rax, %rsi

popq %rax

Isn't much better is it :/ Too bad xchg implies lock prefix.

> +
> + /* Copy additional output regs to the structure */
> + movq %r12, TDX_MODULE_r12(%rsi)
> + movq %r13, TDX_MODULE_r13(%rsi)
> + movq %r14, TDX_MODULE_r14(%rsi)
> + movq %r15, TDX_MODULE_r15(%rsi)
> + movq %rbx, TDX_MODULE_rbx(%rsi)
> + movq %rdi, TDX_MODULE_rdi(%rsi)
> + .endif /* \saved */
> +
> /* Copy output regs to the structure */
> movq %rcx, TDX_MODULE_rcx(%rsi)
> movq %rdx, TDX_MODULE_rdx(%rsi)

2023-07-12 17:38:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:


> @@ -72,7 +142,46 @@
> movq %r9, TDX_MODULE_r9(%rsi)
> movq %r10, TDX_MODULE_r10(%rsi)
> movq %r11, TDX_MODULE_r11(%rsi)
> - .endif
> + .endif /* \ret */
> +
> + .if \saved
> + .if \ret && \host
> + /*
> + * Clear registers shared by guest for VP.ENTER to prevent
> + * speculative use of guest's values, including those are
> + * restored from the stack.
> + *
> + * See arch/x86/kvm/vmx/vmenter.S:
> + *
> + * In theory, a L1 cache miss when restoring register from stack
> + * could lead to speculative execution with guest's values.
> + *
> + * Note: RBP/RSP are not used as shared register. RSI has been
> + * restored already.
> + *
> + * XOR is cheap, thus unconditionally do for all leafs.
> + */
> + xorq %rcx, %rcx
> + xorq %rdx, %rdx
> + xorq %r8, %r8
> + xorq %r9, %r9
> + xorq %r10, %r10
> + xorq %r11, %r11

> + xorq %r12, %r12
> + xorq %r13, %r13
> + xorq %r14, %r14
> + xorq %r15, %r15
> + xorq %rbx, %rbx

^ those are an instant pop below, seems daft to clear them.

> + xorq %rdi, %rdi
> + .endif /* \ret && \host */
> +
> + /* Restore callee-saved GPRs as mandated by the x86_64 ABI */
> + popq %r15
> + popq %r14
> + popq %r13
> + popq %r12
> + popq %rbx
> + .endif /* \saved */
>
> FRAME_END
> RET
> --
> 2.41.0
>

Subject: Re: [PATCH 02/10] x86/tdx: Use cmovc to save a label in TDX_MODULE_CALL asm



On 7/12/23 1:55 AM, Kai Huang wrote:
> Change 'jnc .Lno_vmfailinvalid' to 'cmovc %rdi, %rax' to save the
> .Lno_vmfailinvalid label in the TDX_MODULE_CALL asm macro.

You are removing the label, right? What use "save"?

>
> Note %rdi, which is used as the first argument, has been saved to %rax
> as TDCALL leaf ID thus is free to hold the error code for cmovc.

>
> This is basically based on Peter's code.
>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Kai Huang <[email protected]>
> ---
> arch/x86/virt/vmx/tdx/tdxcall.S | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
> index 49a54356ae99..3524915d8bd9 100644
> --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> @@ -57,10 +57,8 @@
> * This value will never be used as actual SEAMCALL error code as
> * it is from the Reserved status code class.
> */
> - jnc .Lno_vmfailinvalid
> - mov $TDX_SEAMCALL_VMFAILINVALID, %rax
> -.Lno_vmfailinvalid:
> -
> + mov $TDX_SEAMCALL_VMFAILINVALID, %rdi
> + cmovc %rdi, %rax
> .else
> tdcall
> .endif

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

Subject: Re: [PATCH 01/10] x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro



On 7/12/23 1:55 AM, Kai Huang wrote:
> In the TDX_HYPERCALL asm, after the TDCALL instruction returns from the
> untrusted VMM, the registers that the TDX guest shares to the VMM need
> to be cleared to avoid speculative execution of VMM-provided values.
>
> RSI is specified in the bitmap of those registers, but it is missing
> when zeroing out those registers in the current TDX_HYPERCALL.
>
> It was there when it was originally added in commit 752d13305c78
> ("x86/tdx: Expand __tdx_hypercall() to handle more arguments"), but was
> later removed in commit 1e70c680375a ("x86/tdx: Do not corrupt
> frame-pointer in __tdx_hypercall()"), which was correct because %rsi is
> later restored in the "pop %rsi". However a later commit 7a3a401874be
> ("x86/tdx: Drop flags from __tdx_hypercall()") removed that "pop %rsi"
> but forgot to add the "xor %rsi, %rsi" back.
>
> Fix by adding it back.
>
> Fixes: 7a3a401874be ("x86/tdx: Drop flags from __tdx_hypercall()")
> Signed-off-by: Kai Huang <[email protected]>
> ---

Looks fine to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

> arch/x86/coco/tdx/tdcall.S | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
> index b193c0a1d8db..2eca5f43734f 100644
> --- a/arch/x86/coco/tdx/tdcall.S
> +++ b/arch/x86/coco/tdx/tdcall.S
> @@ -195,6 +195,7 @@ SYM_FUNC_END(__tdx_module_call)
> xor %r10d, %r10d
> xor %r11d, %r11d
> xor %rdi, %rdi
> + xor %rsi, %rsi
> xor %rdx, %rdx
>
> /* Restore callee-saved GPRs as mandated by the x86_64 ABI */

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

Subject: Re: [PATCH 03/10] x86/tdx: Move FRAME_BEGIN/END to TDX_MODULE_CALL asm macro



On 7/12/23 1:55 AM, Kai Huang wrote:
> Currently, the TDX_MODULE_CALL asm macro and the __tdx_module_call()
> take registers directly as input and a 'struct tdx_module_output' as
> optional output. This is different from the __tdx_hypercall(), which
> simply uses a structure to carry all input/output. There's no point to
> leave __tdx_module_call() complicated as it is.
>
> As a preparation to simplify the __tdx_module_call() to make it look
> like __tdx_hypercall(), move FRAME_BEGIN/END and RET from the
> __tdx_module_call() to the TDX_MODULE_CALL assembly macro. This also
> allows more implementation flexibility of the assembly inside the
> TDX_MODULE_CALL macro, e.g., allowing putting an _ASM_EXTABLE() after
> the main body of the assembly.
>
> This is basically based on Peter's code.
>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Kai Huang <[email protected]>
> ---

Looks fine to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>

> arch/x86/coco/tdx/tdcall.S | 3 ---
> arch/x86/virt/vmx/tdx/tdxcall.S | 5 +++++
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
> index 2eca5f43734f..e5d4b7d8ecd4 100644
> --- a/arch/x86/coco/tdx/tdcall.S
> +++ b/arch/x86/coco/tdx/tdcall.S
> @@ -78,10 +78,7 @@
> * Return status of TDCALL via RAX.
> */
> SYM_FUNC_START(__tdx_module_call)
> - FRAME_BEGIN
> TDX_MODULE_CALL host=0
> - FRAME_END
> - RET
> SYM_FUNC_END(__tdx_module_call)
>
> /*
> diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
> index 3524915d8bd9..b5ab919c7fa8 100644
> --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> @@ -1,5 +1,6 @@
> /* SPDX-License-Identifier: GPL-2.0 */
> #include <asm/asm-offsets.h>
> +#include <asm/frame.h>
> #include <asm/tdx.h>
>
> /*
> @@ -18,6 +19,7 @@
> * TDX module.
> */
> .macro TDX_MODULE_CALL host:req
> + FRAME_BEGIN
> /*
> * R12 will be used as temporary storage for struct tdx_module_output
> * pointer. Since R12-R15 registers are not used by TDCALL/SEAMCALL
> @@ -91,4 +93,7 @@
> .Lno_output_struct:
> /* Restore the state of R12 register */
> pop %r12
> +
> + FRAME_END
> + RET
> .endm

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2023-07-12 22:31:16

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 03/10] x86/tdx: Move FRAME_BEGIN/END to TDX_MODULE_CALL asm macro

On Wed, Jul 12, 2023 at 08:55:17PM +1200,
Kai Huang <[email protected]> wrote:

> Currently, the TDX_MODULE_CALL asm macro and the __tdx_module_call()
> take registers directly as input and a 'struct tdx_module_output' as
> optional output. This is different from the __tdx_hypercall(), which
> simply uses a structure to carry all input/output. There's no point to
> leave __tdx_module_call() complicated as it is.
>
> As a preparation to simplify the __tdx_module_call() to make it look
> like __tdx_hypercall(), move FRAME_BEGIN/END and RET from the
> __tdx_module_call() to the TDX_MODULE_CALL assembly macro. This also
> allows more implementation flexibility of the assembly inside the
> TDX_MODULE_CALL macro, e.g., allowing putting an _ASM_EXTABLE() after
> the main body of the assembly.
>
> This is basically based on Peter's code.
>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Kai Huang <[email protected]>
> ---
> arch/x86/coco/tdx/tdcall.S | 3 ---
> arch/x86/virt/vmx/tdx/tdxcall.S | 5 +++++
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/coco/tdx/tdcall.S b/arch/x86/coco/tdx/tdcall.S
> index 2eca5f43734f..e5d4b7d8ecd4 100644
> --- a/arch/x86/coco/tdx/tdcall.S
> +++ b/arch/x86/coco/tdx/tdcall.S
> @@ -78,10 +78,7 @@
> * Return status of TDCALL via RAX.
> */
> SYM_FUNC_START(__tdx_module_call)
> - FRAME_BEGIN
> TDX_MODULE_CALL host=0
> - FRAME_END
> - RET
> SYM_FUNC_END(__tdx_module_call)
>
> /*
> diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
> index 3524915d8bd9..b5ab919c7fa8 100644
> --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> @@ -1,5 +1,6 @@
> /* SPDX-License-Identifier: GPL-2.0 */
> #include <asm/asm-offsets.h>
> +#include <asm/frame.h>
> #include <asm/tdx.h>
>
> /*
> @@ -18,6 +19,7 @@
> * TDX module.
> */
> .macro TDX_MODULE_CALL host:req
> + FRAME_BEGIN
> /*
> * R12 will be used as temporary storage for struct tdx_module_output
> * pointer. Since R12-R15 registers are not used by TDCALL/SEAMCALL
> @@ -91,4 +93,7 @@
> .Lno_output_struct:
> /* Restore the state of R12 register */
> pop %r12
> +
> + FRAME_END
> + RET
> .endm
> --
> 2.41.0
>

Reviewed-by: Isaku Yamahata <[email protected]>
--
Isaku Yamahata <[email protected]>

2023-07-12 22:49:06

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Wed, Jul 12, 2023 at 08:55:23PM +1200,
Kai Huang <[email protected]> wrote:

> Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> host and certain physical attacks. A CPU-attested software module
> called 'the TDX module' runs inside a new isolated memory range as a
> trusted hypervisor to manage and run protected VMs.
>
> TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
> mode runs only the TDX module itself or other code to load the TDX
> module.
>
> The host kernel communicates with SEAM software via a new SEAMCALL
> instruction. This is conceptually similar to a guest->host hypercall,
> except it is made from the host to SEAM software instead. The TDX
> module establishes a new SEAMCALL ABI which allows the host to
> initialize the module and to manage VMs.
>
> The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> TDCALL infrastructure. Wire up basic functions to make SEAMCALLs for
> the basic TDX support: __seamcall(), __seamcall_ret() and
> __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.

Hi. __seamcall_saved_ret() uses struct tdx_module_arg as input and output. For
KVM TDH.VP.ENTER case, those arguments are already in unsigned long
kvm_vcpu_arch::regs[]. It's silly to move those values twice. From
kvm_vcpu_arch::regs to tdx_module_args. From tdx_module_args to real registers.

If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]? Maybe I can make the
change with TDX KVM patch series.

Thanks,


> To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
> TDX host kernel support. Add a new Kconfig option CONFIG_INTEL_TDX_HOST
> to opt-in TDX host kernel support (to distinguish with TDX guest kernel
> support). So far only KVM uses TDX. Make the new config option depend
> on KVM_INTEL.
>
> Signed-off-by: Kai Huang <[email protected]>
> ---
> arch/x86/Kconfig | 12 +++++++
> arch/x86/Makefile | 2 ++
> arch/x86/include/asm/tdx.h | 7 +++++
> arch/x86/virt/Makefile | 2 ++
> arch/x86/virt/vmx/Makefile | 2 ++
> arch/x86/virt/vmx/tdx/Makefile | 2 ++
> arch/x86/virt/vmx/tdx/seamcall.S | 54 ++++++++++++++++++++++++++++++++
> 7 files changed, 81 insertions(+)
> create mode 100644 arch/x86/virt/Makefile
> create mode 100644 arch/x86/virt/vmx/Makefile
> create mode 100644 arch/x86/virt/vmx/tdx/Makefile
> create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 53bab123a8ee..191587f75810 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1952,6 +1952,18 @@ config X86_SGX
>
> If unsure, say N.
>
> +config INTEL_TDX_HOST
> + bool "Intel Trust Domain Extensions (TDX) host support"
> + depends on CPU_SUP_INTEL
> + depends on X86_64
> + depends on KVM_INTEL
> + help
> + Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
> + host and certain physical attacks. This option enables necessary TDX
> + support in the host kernel to run confidential VMs.
> +
> + If unsure, say N.
> +
> config EFI
> bool "EFI runtime service support"
> depends on ACPI
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index b39975977c03..ec0e71d8fa30 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -252,6 +252,8 @@ archheaders:
>
> libs-y += arch/x86/lib/
>
> +core-y += arch/x86/virt/
> +
> # drivers-y are linked after core-y
> drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
> drivers-$(CONFIG_PCI) += arch/x86/pci/
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 9b0ad0176e58..a82e5249d079 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -74,5 +74,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> return -ENODEV;
> }
> #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
> +
> +#ifdef CONFIG_INTEL_TDX_HOST
> +u64 __seamcall(u64 fn, struct tdx_module_args *args);
> +u64 __seamcall_ret(u64 fn, struct tdx_module_args *args);
> +u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
> +#endif /* CONFIG_INTEL_TDX_HOST */
> +
> #endif /* !__ASSEMBLY__ */
> #endif /* _ASM_X86_TDX_H */
> diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
> new file mode 100644
> index 000000000000..1e36502cd738
> --- /dev/null
> +++ b/arch/x86/virt/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-y += vmx/
> diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
> new file mode 100644
> index 000000000000..feebda21d793
> --- /dev/null
> +++ b/arch/x86/virt/vmx/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_INTEL_TDX_HOST) += tdx/
> diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
> new file mode 100644
> index 000000000000..46ef8f73aebb
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-y += seamcall.o
> diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
> new file mode 100644
> index 000000000000..650a40843afe
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/seamcall.S
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include <linux/linkage.h>
> +#include <asm/frame.h>
> +
> +#include "tdxcall.S"
> +
> +/*
> + * __seamcall() - Host-side interface functions to SEAM software
> + * (the P-SEAMLDR or the TDX module).
> + *
> + * __seamcall() function ABI:
> + *
> + * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
> + * @args (RSI) - struct tdx_module_args for input
> + *
> + * Return (via RAX) TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself
> + * fails, or the completion status of the SEAMCALL leaf function.
> + */
> +SYM_FUNC_START(__seamcall)
> + TDX_MODULE_CALL host=1 ret=0 saved=0
> +SYM_FUNC_END(__seamcall)
> +
> +/*
> + * __seamcall_ret() - Host-side interface functions to SEAM software
> + * (the P-SEAMLDR or the TDX module).
> + *
> + * __seamcall_ret() function ABI:
> + *
> + * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
> + * @args (RSI) - struct tdx_module_args for input and output
> + *
> + * Return (via RAX) TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself
> + * fails, or the completion status of the SEAMCALL leaf function.
> + */
> +SYM_FUNC_START(__seamcall_ret)
> + TDX_MODULE_CALL host=1 ret=1 saved=0
> +SYM_FUNC_END(__seamcall_ret)
> +
> +/*
> + * __seamcall_saved_ret() - Host-side interface functions to SEAM software
> + * (the P-SEAMLDR or the TDX module) with extra
> + * "callee-saved" registers as input/output.
> + *
> + * __seamcall_saved_ret() function ABI:
> + *
> + * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
> + * @args (RSI) - struct tdx_module_args for input and output
> + *
> + * Return (via RAX) TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself
> + * fails, or the completion status of the SEAMCALL leaf function.
> + */
> +SYM_FUNC_START(__seamcall_saved_ret)
> + TDX_MODULE_CALL host=1 ret=1 saved=0
> +SYM_FUNC_END(__seamcall_saved_ret)
> --
> 2.41.0
>

--
Isaku Yamahata <[email protected]>

2023-07-13 04:23:48

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > TDCALL infrastructure.  Wire up basic functions to make SEAMCALLs for
> > the basic TDX support: __seamcall(), __seamcall_ret() and
> > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
>
> Hi.  __seamcall_saved_ret() uses struct tdx_module_arg as input and output.  For
> KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> kvm_vcpu_arch::regs[].  It's silly to move those values twice.  From
> kvm_vcpu_arch::regs to tdx_module_args.  From tdx_module_args to real registers.
>
> If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?  Maybe I can make the
> change with TDX KVM patch series.

The assembly code assumes the second argument is a pointer to 'struct
tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
match 'struct tdx_module_args''s layout and manually convert part of "regs" to
the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.

This was one concern that I mentioned VP.ENTER can be implemented by KVM in its
own assembly in the TDX host v12 discussion. I kinda agree we should leverage
KVM's existing kvm_vcpu_arch::regs[NR_CPU_REGS] infrastructure to minimize the
code change to the KVM's common infrastructure. If so, I guess we have to carry
this memory copy burden between two structures.

Btw, I do find KVM's VP.ENTER code is a little bit redundant to the common
SEAMCALL assembly, which is a good reason for KVM to use __seamcall() variants
for TDH.VP.ENTER.

So it's a tradeoff I think.

On the other hand, given CoCo VMs normally don't expose all GPRs to VMM, it's
also debatable whether we should invent another infrastructure to the KVM code
to handle register access of CoCo VMs too, e.g., we can catch bugs easily when
KVM tries to access the registers that it shouldn't access.

It's better KVM maintainer can provide some input here. :)

2023-07-13 07:56:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Thu, Jul 13, 2023 at 03:46:52AM +0000, Huang, Kai wrote:
> On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > > TDCALL infrastructure.? Wire up basic functions to make SEAMCALLs for
> > > the basic TDX support: __seamcall(), __seamcall_ret() and
> > > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
> >
> > Hi.? __seamcall_saved_ret() uses struct tdx_module_arg as input and output.? For
> > KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> > kvm_vcpu_arch::regs[].? It's silly to move those values twice.? From
> > kvm_vcpu_arch::regs to tdx_module_args.? From tdx_module_args to real registers.
> >
> > If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> > take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?? Maybe I can make the
> > change with TDX KVM patch series.
>
> The assembly code assumes the second argument is a pointer to 'struct
> tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
> achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
> match 'struct tdx_module_args''s layout and manually convert part of "regs" to
> the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.

I suspect the kvm_vcpu_arch::regs layout is given by hardware; so the
only option would be to make tdx_module_args match that. It's a slightly
unfortunate layout, but meh.

Then you can simply do:

__seamcall_saved_ret(leaf, (struct tdx_module_args *)vcpu->arch->regs);



2023-07-13 07:57:35

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Wed, 2023-07-12 at 18:53 +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
>
>
> > @@ -72,7 +142,46 @@
> > movq %r9, TDX_MODULE_r9(%rsi)
> > movq %r10, TDX_MODULE_r10(%rsi)
> > movq %r11, TDX_MODULE_r11(%rsi)
> > - .endif
> > + .endif /* \ret */
> > +
> > + .if \saved
> > + .if \ret && \host
> > + /*
> > + * Clear registers shared by guest for VP.ENTER to prevent
> > + * speculative use of guest's values, including those are
> > + * restored from the stack.
> > + *
> > + * See arch/x86/kvm/vmx/vmenter.S:
> > + *
> > + * In theory, a L1 cache miss when restoring register from stack
> > + * could lead to speculative execution with guest's values.
> > + *
> > + * Note: RBP/RSP are not used as shared register. RSI has been
> > + * restored already.
> > + *
> > + * XOR is cheap, thus unconditionally do for all leafs.
> > + */
> > + xorq %rcx, %rcx
> > + xorq %rdx, %rdx
> > + xorq %r8, %r8
> > + xorq %r9, %r9
> > + xorq %r10, %r10
> > + xorq %r11, %r11
>
> > + xorq %r12, %r12
> > + xorq %r13, %r13
> > + xorq %r14, %r14
> > + xorq %r15, %r15
> > + xorq %rbx, %rbx
>
> ^ those are an instant pop below, seems daft to clear them.
> >

I found below comment in KVM code:

> + * See arch/x86/kvm/vmx/vmenter.S:
> + *
> + * In theory, a L1 cache miss when restoring register from stack
> + * could lead to speculative execution with guest's values.

And KVM explicitly does XOR for the registers that gets "pop"ed almost
instantly, so I followed.

But to be honest I don't quite understand this. :-)

2023-07-13 08:14:04

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Wed, 2023-07-12 at 18:59 +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2023 at 06:53:37PM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
> >
> >
> > > @@ -72,7 +142,46 @@
> > > movq %r9, TDX_MODULE_r9(%rsi)
> > > movq %r10, TDX_MODULE_r10(%rsi)
> > > movq %r11, TDX_MODULE_r11(%rsi)
> > > - .endif
> > > + .endif /* \ret */
> > > +
> > > + .if \saved
> > > + .if \ret && \host
> > > + /*
> > > + * Clear registers shared by guest for VP.ENTER to prevent
> > > + * speculative use of guest's values, including those are
> > > + * restored from the stack.
> > > + *
> > > + * See arch/x86/kvm/vmx/vmenter.S:
> > > + *
> > > + * In theory, a L1 cache miss when restoring register from stack
> > > + * could lead to speculative execution with guest's values.
> > > + *
> > > + * Note: RBP/RSP are not used as shared register. RSI has been
> > > + * restored already.
> > > + *
> > > + * XOR is cheap, thus unconditionally do for all leafs.
> > > + */
> > > + xorq %rcx, %rcx
> > > + xorq %rdx, %rdx
> > > + xorq %r8, %r8
> > > + xorq %r9, %r9
> > > + xorq %r10, %r10
> > > + xorq %r11, %r11
> >
> > > + xorq %r12, %r12
> > > + xorq %r13, %r13
> > > + xorq %r14, %r14
> > > + xorq %r15, %r15
> > > + xorq %rbx, %rbx
> >
> > ^ those are an instant pop below, seems daft to clear them.
>
> Also, please use the 32bit variant:
>
> xorl %ecx, %ecx
>
> saves a RAX prefix each.

Sorry I am ignorant here. Won't "clearing ECX only" leave high bits of
registers still containing guest's value?

I see KVM code uses:

xor %eax, %eax
xor %ecx, %ecx
xor %edx, %edx
xor %ebp, %ebp
xor %esi, %esi
xor %edi, %edi
#ifdef CONFIG_X86_64
xor %r8d, %r8d
xor %r9d, %r9d
xor %r10d, %r10d
xor %r11d, %r11d
xor %r12d, %r12d
xor %r13d, %r13d
xor %r14d, %r14d
xor %r15d, %r15d
#endif

Which makes sense because KVM wants to support 32-bit too.

However for TDX is 64-bit only.

And I also see the current TDVMCALL code has:

xor %r8d, %r8d
xor %r9d, %r9d
xor %r10d, %r10d
xor %r11d, %r11d
xor %rdi, %rdi
xor %rdx, %rdx

Why does it need to use "d" postfix for all r* registers?

Sorry for those questions but I struggled when I wrote those assembly and am
hoping to get my mind cleared on this. :-)

Thanks!

2023-07-13 08:34:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 10/10] x86/virt/tdx: Allow SEAMCALL to handle #UD and #GP

On Wed, Jul 12, 2023 at 08:55:24PM +1200, Kai Huang wrote:
> @@ -85,6 +86,7 @@
> .endif /* \saved */
>
> .if \host
> +1:
> seamcall
> /*
> * SEAMCALL instruction is essentially a VMExit from VMX root
> @@ -99,6 +101,7 @@
> */
> mov $TDX_SEAMCALL_VMFAILINVALID, %rdi
> cmovc %rdi, %rax
> +2:
> .else
> tdcall
> .endif

This is just wrong, if the thing traps you should not do the return
registers. And at this point the mov/cmovc thing doesn't make much sense
either.

> @@ -185,4 +188,21 @@
>
> FRAME_END
> RET
> +
> + .if \host
> +3:
> + /*
> + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> + * the trap number. Convert the trap number to the TDX error
> + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
> + *
> + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
> + * only accepts 32-bit immediate at most.
> + */
> + movq $TDX_SW_ERROR, %r12
> + orq %r12, %rax
> + jmp 2b
> +
> + _ASM_EXTABLE_FAULT(1b, 3b)
> + .endif /* \host */
> .endm

Also, please used named labels where possible and *please* keep asm
directives unindented.


--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -56,7 +56,7 @@
movq TDX_MODULE_r10(%rsi), %r10
movq TDX_MODULE_r11(%rsi), %r11

- .if \saved
+.if \saved
/*
* Move additional input regs from the structure. For simplicity
* assume that anything needs the callee-saved regs also tramples
@@ -75,18 +75,18 @@
movq TDX_MODULE_r15(%rsi), %r15
movq TDX_MODULE_rbx(%rsi), %rbx

- .if \ret
+.if \ret
/* Save the structure pointer as %rsi is about to be clobbered */
pushq %rsi
- .endif
+.endif

movq TDX_MODULE_rdi(%rsi), %rdi
/* %rsi needs to be done at last */
movq TDX_MODULE_rsi(%rsi), %rsi
- .endif /* \saved */
+.endif /* \saved */

- .if \host
-1:
+.if \host
+.Lseamcall:
seamcall
/*
* SEAMCALL instruction is essentially a VMExit from VMX root
@@ -99,15 +99,13 @@
* This value will never be used as actual SEAMCALL error code as
* it is from the Reserved status code class.
*/
- mov $TDX_SEAMCALL_VMFAILINVALID, %rdi
- cmovc %rdi, %rax
-2:
- .else
+ jc .Lseamfail
+.else
tdcall
- .endif
+.endif

- .if \ret
- .if \saved
+.if \ret
+.if \saved
/*
* Restore the structure from stack to saved the output registers
*
@@ -136,7 +134,7 @@
movq %r15, TDX_MODULE_r15(%rsi)
movq %rbx, TDX_MODULE_rbx(%rsi)
movq %rdi, TDX_MODULE_rdi(%rsi)
- .endif /* \saved */
+.endif /* \saved */

/* Copy output regs to the structure */
movq %rcx, TDX_MODULE_rcx(%rsi)
@@ -145,10 +143,11 @@
movq %r9, TDX_MODULE_r9(%rsi)
movq %r10, TDX_MODULE_r10(%rsi)
movq %r11, TDX_MODULE_r11(%rsi)
- .endif /* \ret */
+.endif /* \ret */

- .if \saved
- .if \ret
+.Lout:
+.if \saved
+.if \ret
/*
* Clear registers shared by guest for VP.ENTER and VP.VMCALL to
* prevent speculative use of values from guest/VMM, including
@@ -170,13 +169,8 @@
xorq %r9, %r9
xorq %r10, %r10
xorq %r11, %r11
- xorq %r12, %r12
- xorq %r13, %r13
- xorq %r14, %r14
- xorq %r15, %r15
- xorq %rbx, %rbx
xorq %rdi, %rdi
- .endif /* \ret */
+.endif /* \ret */

/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
popq %r15
@@ -184,13 +178,17 @@
popq %r13
popq %r12
popq %rbx
- .endif /* \saved */
+.endif /* \saved */

FRAME_END
RET

- .if \host
-3:
+.if \host
+.Lseamfail:
+ mov $TDX_SEAMCALL_VMFAILINVALID, %rax
+ jmp .Lout
+
+.Lseamtrap:
/*
* SEAMCALL caused #GP or #UD. By reaching here %eax contains
* the trap number. Convert the trap number to the TDX error
@@ -201,8 +199,8 @@
*/
movq $TDX_SW_ERROR, %r12
orq %r12, %rax
- jmp 2b
+ jmp .Lout

- _ASM_EXTABLE_FAULT(1b, 3b)
- .endif /* \host */
+ _ASM_EXTABLE_FAULT(.Lseamcall, .Lseamtrap)
+.endif /* \host */
.endm

2023-07-13 08:36:14

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Wed, 2023-07-12 at 19:11 +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
> > @@ -65,6 +104,37 @@
> > .endif
> >
> > .if \ret
> > + .if \saved
> > + /*
> > + * Restore the structure from stack to saved the output registers
> > + *
> > + * In case of VP.ENTER returns due to TDVMCALL, all registers are
> > + * valid thus no register can be used as spare to restore the
> > + * structure from the stack (see "TDH.VP.ENTER Output Operands
> > + * Definition on TDCALL(TDG.VP.VMCALL) Following a TD Entry").
> > + * For this case, need to make one register as spare by saving it
> > + * to the stack and then manually load the structure pointer to
> > + * the spare register.
> > + *
> > + * Note for other TDCALLs/SEAMCALLs there are spare registers
> > + * thus no need for such hack but just use this for all for now.
> > + */
> > + pushq %rax /* save the TDCALL/SEAMCALL return code */
> > + movq 8(%rsp), %rax /* restore the structure pointer */
> > + movq %rsi, TDX_MODULE_rsi(%rax) /* save %rsi */
> > + movq %rax, %rsi /* use %rsi as structure pointer */
> > + popq %rax /* restore the return code */
> > + popq %rsi /* pop the structure pointer */
>
> Urgghh... At least for the \host case you can simply pop %rsi, no?
> VP.ENTER returns with 0 there IIRC.

No VP.ENTER doesn't return 0 for RAX. Firstly, VP.ENTER can return for many
reasons but not limited to TDVMCALL (e.g., due to interrupt), and for those
cases RAX contains valid non-zero value. Secondly, even for TDVMCALL, the RAX
isn't 0:

Table 6.256 TDX 1.5 ABI spec:

RAX SEAMCALL instruction return code
The DETAILS_L2 field in bits 31:0 contain the VMCS exit reason, 
indicating TDCALL (77).

2023-07-13 08:46:33

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Thu, 2023-07-13 at 09:42 +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2023 at 03:46:52AM +0000, Huang, Kai wrote:
> > On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > > > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > > > TDCALL infrastructure.  Wire up basic functions to make SEAMCALLs for
> > > > the basic TDX support: __seamcall(), __seamcall_ret() and
> > > > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
> > >
> > > Hi.  __seamcall_saved_ret() uses struct tdx_module_arg as input and output.  For
> > > KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> > > kvm_vcpu_arch::regs[].  It's silly to move those values twice.  From
> > > kvm_vcpu_arch::regs to tdx_module_args.  From tdx_module_args to real registers.
> > >
> > > If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> > > take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?  Maybe I can make the
> > > change with TDX KVM patch series.
> >
> > The assembly code assumes the second argument is a pointer to 'struct
> > tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
> > achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
> > match 'struct tdx_module_args''s layout and manually convert part of "regs" to
> > the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.
>
> I suspect the kvm_vcpu_arch::regs layout is given by hardware; so the
> only option would be to make tdx_module_args match that. It's a slightly
> unfortunate layout, but meh.
>
> Then you can simply do:
>
> __seamcall_saved_ret(leaf, (struct tdx_module_args *)vcpu->arch->regs);
>
>

I don't think the layout matches hardware, especially I think there's no
"hardware layout" for GPRs that are concerned here. They are just for KVM
itself to save guest's registers when the guest exits to KVM, so that KVM can
restore them when returning back to the guest.

2023-07-13 09:01:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, Jul 13, 2023 at 08:02:54AM +0000, Huang, Kai wrote:

> Sorry I am ignorant here. Won't "clearing ECX only" leave high bits of
> registers still containing guest's value?

architecture zero-extends 32bit stores

> I see KVM code uses:
>
> xor %eax, %eax
> xor %ecx, %ecx
> xor %edx, %edx
> xor %ebp, %ebp
> xor %esi, %esi
> xor %edi, %edi
> #ifdef CONFIG_X86_64
> xor %r8d, %r8d
> xor %r9d, %r9d
> xor %r10d, %r10d
> xor %r11d, %r11d
> xor %r12d, %r12d
> xor %r13d, %r13d
> xor %r14d, %r14d
> xor %r15d, %r15d
> #endif
>
> Which makes sense because KVM wants to support 32-bit too.

Encoding for the first lot is shorter, the 64bit regs obviously need the
RAX byte anyway.

> However for TDX is 64-bit only.
>
> And I also see the current TDVMCALL code has:
>
> xor %r8d, %r8d
> xor %r9d, %r9d
> xor %r10d, %r10d
> xor %r11d, %r11d
> xor %rdi, %rdi
> xor %rdx, %rdx
>
> Why does it need to use "d" postfix for all r* registers?

That's the name of the 32bit subword, r#[bwd] for byte, word,
double-word. SDM v1 3.7.2.1 has the whole list, I couldn't quicky find
one for the zero-extention thing.

> Sorry for those questions but I struggled when I wrote those assembly and am
> hoping to get my mind cleared on this. :-)

No problem.



2023-07-13 09:09:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, Jul 13, 2023 at 07:48:20AM +0000, Huang, Kai wrote:

> I found below comment in KVM code:
>
> > + * See arch/x86/kvm/vmx/vmenter.S:
> > + *
> > + * In theory, a L1 cache miss when restoring register from stack
> > + * could lead to speculative execution with guest's values.
>
> And KVM explicitly does XOR for the registers that gets "pop"ed almost
> instantly, so I followed.
>
> But to be honest I don't quite understand this. :-)

Urgh, I suppose that actually makes sense. Since pop is a load it might
continue speculation with the previous value. Whereas the xor-clear
idiom is impossible to speculate through.

Oh well...

2023-07-13 09:11:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, Jul 13, 2023 at 08:09:33AM +0000, Huang, Kai wrote:
> On Wed, 2023-07-12 at 19:11 +0200, Peter Zijlstra wrote:
> > On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
> > > @@ -65,6 +104,37 @@
> > > .endif
> > >
> > > .if \ret
> > > + .if \saved
> > > + /*
> > > + * Restore the structure from stack to saved the output registers
> > > + *
> > > + * In case of VP.ENTER returns due to TDVMCALL, all registers are
> > > + * valid thus no register can be used as spare to restore the
> > > + * structure from the stack (see "TDH.VP.ENTER Output Operands
> > > + * Definition on TDCALL(TDG.VP.VMCALL) Following a TD Entry").
> > > + * For this case, need to make one register as spare by saving it
> > > + * to the stack and then manually load the structure pointer to
> > > + * the spare register.
> > > + *
> > > + * Note for other TDCALLs/SEAMCALLs there are spare registers
> > > + * thus no need for such hack but just use this for all for now.
> > > + */
> > > + pushq %rax /* save the TDCALL/SEAMCALL return code */
> > > + movq 8(%rsp), %rax /* restore the structure pointer */
> > > + movq %rsi, TDX_MODULE_rsi(%rax) /* save %rsi */
> > > + movq %rax, %rsi /* use %rsi as structure pointer */
> > > + popq %rax /* restore the return code */
> > > + popq %rsi /* pop the structure pointer */
> >
> > Urgghh... At least for the \host case you can simply pop %rsi, no?
> > VP.ENTER returns with 0 there IIRC.
>
> No VP.ENTER doesn't return 0 for RAX. Firstly, VP.ENTER can return for many

No, but it *does* return 0 for: RBX,RSI,RDI,R10-R15.

So for \host you can simply do:

pop %rsi
mov $0, TDX_MODULE_rsi(%rsi)

and call it a day.

2023-07-13 09:12:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Thu, Jul 13, 2023 at 08:18:09AM +0000, Huang, Kai wrote:
> On Thu, 2023-07-13 at 09:42 +0200, Peter Zijlstra wrote:
> > On Thu, Jul 13, 2023 at 03:46:52AM +0000, Huang, Kai wrote:
> > > On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > > > > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > > > > TDCALL infrastructure.? Wire up basic functions to make SEAMCALLs for
> > > > > the basic TDX support: __seamcall(), __seamcall_ret() and
> > > > > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
> > > >
> > > > Hi.? __seamcall_saved_ret() uses struct tdx_module_arg as input and output.? For
> > > > KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> > > > kvm_vcpu_arch::regs[].? It's silly to move those values twice.? From
> > > > kvm_vcpu_arch::regs to tdx_module_args.? From tdx_module_args to real registers.
> > > >
> > > > If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> > > > take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?? Maybe I can make the
> > > > change with TDX KVM patch series.
> > >
> > > The assembly code assumes the second argument is a pointer to 'struct
> > > tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
> > > achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
> > > match 'struct tdx_module_args''s layout and manually convert part of "regs" to
> > > the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.
> >
> > I suspect the kvm_vcpu_arch::regs layout is given by hardware; so the
> > only option would be to make tdx_module_args match that. It's a slightly
> > unfortunate layout, but meh.
> >
> > Then you can simply do:
> >
> > __seamcall_saved_ret(leaf, (struct tdx_module_args *)vcpu->arch->regs);
> >
> >
>
> I don't think the layout matches hardware, especially I think there's no
> "hardware layout" for GPRs that are concerned here. They are just for KVM
> itself to save guest's registers when the guest exits to KVM, so that KVM can
> restore them when returning back to the guest.

Either way around it should be possible to make them match I suppose.
Ideally we get the callee-clobbered regs first, but if not, I don't
think that's too big of a problem.



2023-07-13 09:37:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, Jul 13, 2023 at 09:15:49AM +0000, Huang, Kai wrote:
> On Thu, 2023-07-13 at 11:01 +0200, Peter Zijlstra wrote:
> > On Thu, Jul 13, 2023 at 08:09:33AM +0000, Huang, Kai wrote:
> > > On Wed, 2023-07-12 at 19:11 +0200, Peter Zijlstra wrote:
> > > > On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
> > > > > @@ -65,6 +104,37 @@
> > > > > .endif
> > > > >
> > > > > .if \ret
> > > > > + .if \saved
> > > > > + /*
> > > > > + * Restore the structure from stack to saved the output registers
> > > > > + *
> > > > > + * In case of VP.ENTER returns due to TDVMCALL, all registers are
> > > > > + * valid thus no register can be used as spare to restore the
> > > > > + * structure from the stack (see "TDH.VP.ENTER Output Operands
> > > > > + * Definition on TDCALL(TDG.VP.VMCALL) Following a TD Entry").
> > > > > + * For this case, need to make one register as spare by saving it
> > > > > + * to the stack and then manually load the structure pointer to
> > > > > + * the spare register.
> > > > > + *
> > > > > + * Note for other TDCALLs/SEAMCALLs there are spare registers
> > > > > + * thus no need for such hack but just use this for all for now.
> > > > > + */
> > > > > + pushq %rax /* save the TDCALL/SEAMCALL return code */
> > > > > + movq 8(%rsp), %rax /* restore the structure pointer */
> > > > > + movq %rsi, TDX_MODULE_rsi(%rax) /* save %rsi */
> > > > > + movq %rax, %rsi /* use %rsi as structure pointer */
> > > > > + popq %rax /* restore the return code */
> > > > > + popq %rsi /* pop the structure pointer */
> > > >
> > > > Urgghh... At least for the \host case you can simply pop %rsi, no?
> > > > VP.ENTER returns with 0 there IIRC.
> > >
> > > No VP.ENTER doesn't return 0 for RAX. Firstly, VP.ENTER can return for many
> >
> > No, but it *does* return 0 for: RBX,RSI,RDI,R10-R15.
> >
> > So for \host you can simply do:
> >
> > pop %rsi
> > mov $0, TDX_MODULE_rsi(%rsi)
> >
> > and call it a day.
>
> This isn't true for the case that VP.ENTER returns due to a TDVMCALL. In that
> case RCX contains the bitmap of shared registers, and RBX/RDX/RDI/RSI/R8-R15
> contains guest value if the corresponding bit is set in RCX (RBP will be
> excluded by updating the spec I assume).
>
> Or are you suggesting we need to decode RAX to decide whether the VP.ENTER
> return is due to TDVMCALL vs other reasons, and act differently?

Urgh, no I had missed there are *TWO* tables for output :/ Who does
something like that :-(

So yeah, sucks.

2023-07-13 09:52:22

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, 2023-07-13 at 10:46 +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2023 at 07:48:20AM +0000, Huang, Kai wrote:
>
> > I found below comment in KVM code:
> >
> > > + * See arch/x86/kvm/vmx/vmenter.S:
> > > + *
> > > + * In theory, a L1 cache miss when restoring register from stack
> > > + * could lead to speculative execution with guest's values.
> >
> > And KVM explicitly does XOR for the registers that gets "pop"ed almost
> > instantly, so I followed.
> >
> > But to be honest I don't quite understand this. :-)
>
> Urgh, I suppose that actually makes sense. Since pop is a load it might
> continue speculation with the previous value. Whereas the xor-clear
> idiom is impossible to speculate through.
>
> Oh well...

Then should I keep those registers that are "pop"ed immediately afterwards?

2023-07-13 09:53:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, Jul 13, 2023 at 09:34:24AM +0000, Huang, Kai wrote:
> On Thu, 2023-07-13 at 10:46 +0200, Peter Zijlstra wrote:
> > On Thu, Jul 13, 2023 at 07:48:20AM +0000, Huang, Kai wrote:
> >
> > > I found below comment in KVM code:
> > >
> > > > + * See arch/x86/kvm/vmx/vmenter.S:
> > > > + *
> > > > + * In theory, a L1 cache miss when restoring register from stack
> > > > + * could lead to speculative execution with guest's values.
> > >
> > > And KVM explicitly does XOR for the registers that gets "pop"ed almost
> > > instantly, so I followed.
> > >
> > > But to be honest I don't quite understand this. :-)
> >
> > Urgh, I suppose that actually makes sense. Since pop is a load it might
> > continue speculation with the previous value. Whereas the xor-clear
> > idiom is impossible to speculate through.
> >
> > Oh well...
>
> Then should I keep those registers that are "pop"ed immediately afterwards?

Yeah, I suppose so.

2023-07-13 09:53:13

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Thu, 2023-07-13 at 11:03 +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2023 at 08:18:09AM +0000, Huang, Kai wrote:
> > On Thu, 2023-07-13 at 09:42 +0200, Peter Zijlstra wrote:
> > > On Thu, Jul 13, 2023 at 03:46:52AM +0000, Huang, Kai wrote:
> > > > On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > > > > > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > > > > > TDCALL infrastructure.  Wire up basic functions to make SEAMCALLs for
> > > > > > the basic TDX support: __seamcall(), __seamcall_ret() and
> > > > > > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
> > > > >
> > > > > Hi.  __seamcall_saved_ret() uses struct tdx_module_arg as input and output.  For
> > > > > KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> > > > > kvm_vcpu_arch::regs[].  It's silly to move those values twice.  From
> > > > > kvm_vcpu_arch::regs to tdx_module_args.  From tdx_module_args to real registers.
> > > > >
> > > > > If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> > > > > take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?  Maybe I can make the
> > > > > change with TDX KVM patch series.
> > > >
> > > > The assembly code assumes the second argument is a pointer to 'struct
> > > > tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
> > > > achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
> > > > match 'struct tdx_module_args''s layout and manually convert part of "regs" to
> > > > the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.
> > >
> > > I suspect the kvm_vcpu_arch::regs layout is given by hardware; so the
> > > only option would be to make tdx_module_args match that. It's a slightly
> > > unfortunate layout, but meh.
> > >
> > > Then you can simply do:
> > >
> > > __seamcall_saved_ret(leaf, (struct tdx_module_args *)vcpu->arch->regs);
> > >
> > >
> >
> > I don't think the layout matches hardware, especially I think there's no
> > "hardware layout" for GPRs that are concerned here. They are just for KVM
> > itself to save guest's registers when the guest exits to KVM, so that KVM can
> > restore them when returning back to the guest.
>
> Either way around it should be possible to make them match I suppose.
> Ideally we get the callee-clobbered regs first, but if not, I don't
> think that's too big of a problem.
>

Yeah we can. I'll leave this to KVM TDX patches.

2023-07-13 09:57:39

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, 2023-07-13 at 11:01 +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2023 at 08:09:33AM +0000, Huang, Kai wrote:
> > On Wed, 2023-07-12 at 19:11 +0200, Peter Zijlstra wrote:
> > > On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
> > > > @@ -65,6 +104,37 @@
> > > > .endif
> > > >
> > > > .if \ret
> > > > + .if \saved
> > > > + /*
> > > > + * Restore the structure from stack to saved the output registers
> > > > + *
> > > > + * In case of VP.ENTER returns due to TDVMCALL, all registers are
> > > > + * valid thus no register can be used as spare to restore the
> > > > + * structure from the stack (see "TDH.VP.ENTER Output Operands
> > > > + * Definition on TDCALL(TDG.VP.VMCALL) Following a TD Entry").
> > > > + * For this case, need to make one register as spare by saving it
> > > > + * to the stack and then manually load the structure pointer to
> > > > + * the spare register.
> > > > + *
> > > > + * Note for other TDCALLs/SEAMCALLs there are spare registers
> > > > + * thus no need for such hack but just use this for all for now.
> > > > + */
> > > > + pushq %rax /* save the TDCALL/SEAMCALL return code */
> > > > + movq 8(%rsp), %rax /* restore the structure pointer */
> > > > + movq %rsi, TDX_MODULE_rsi(%rax) /* save %rsi */
> > > > + movq %rax, %rsi /* use %rsi as structure pointer */
> > > > + popq %rax /* restore the return code */
> > > > + popq %rsi /* pop the structure pointer */
> > >
> > > Urgghh... At least for the \host case you can simply pop %rsi, no?
> > > VP.ENTER returns with 0 there IIRC.
> >
> > No VP.ENTER doesn't return 0 for RAX. Firstly, VP.ENTER can return for many
>
> No, but it *does* return 0 for: RBX,RSI,RDI,R10-R15.
>
> So for \host you can simply do:
>
> pop %rsi
> mov $0, TDX_MODULE_rsi(%rsi)
>
> and call it a day.

This isn't true for the case that VP.ENTER returns due to a TDVMCALL. In that
case RCX contains the bitmap of shared registers, and RBX/RDX/RDI/RSI/R8-R15
contains guest value if the corresponding bit is set in RCX (RBP will be
excluded by updating the spec I assume).

Or are you suggesting we need to decode RAX to decide whether the VP.ENTER
return is due to TDVMCALL vs other reasons, and act differently?

2023-07-13 10:11:30

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 10/10] x86/virt/tdx: Allow SEAMCALL to handle #UD and #GP

On Thu, 2023-07-13 at 10:07 +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2023 at 08:55:24PM +1200, Kai Huang wrote:
> > @@ -85,6 +86,7 @@
> > .endif /* \saved */
> >
> > .if \host
> > +1:
> > seamcall
> > /*
> > * SEAMCALL instruction is essentially a VMExit from VMX root
> > @@ -99,6 +101,7 @@
> > */
> > mov $TDX_SEAMCALL_VMFAILINVALID, %rdi
> > cmovc %rdi, %rax
> > +2:
> > .else
> > tdcall
> > .endif
>
> This is just wrong, if the thing traps you should not do the return
> registers. And at this point the mov/cmovc thing doesn't make much sense
> either.

OK will do. Yes "do return registers" isn't necessary. I thought to keep code
simple we can just do it. The trap/VMFAILINVALID code path isn't in performance
path anyway.

This is a problem in the current upstream code too. I'll fix it first in a
separate patch.

>
> > @@ -185,4 +188,21 @@
> >
> > FRAME_END
> > RET
> > +
> > + .if \host
> > +3:
> > + /*
> > + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> > + * the trap number. Convert the trap number to the TDX error
> > + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
> > + *
> > + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
> > + * only accepts 32-bit immediate at most.
> > + */
> > + movq $TDX_SW_ERROR, %r12
> > + orq %r12, %rax
> > + jmp 2b
> > +
> > + _ASM_EXTABLE_FAULT(1b, 3b)
> > + .endif /* \host */
> > .endm
>
> Also, please used named labels where possible and *please* keep asm
> directives unindented.

Yes will do.

>
>
> --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> @@ -56,7 +56,7 @@
> movq TDX_MODULE_r10(%rsi), %r10
> movq TDX_MODULE_r11(%rsi), %r11
>
> - .if \saved
> +.if \saved
> /*
> * Move additional input regs from the structure. For simplicity
> * assume that anything needs the callee-saved regs also tramples
> @@ -75,18 +75,18 @@
> movq TDX_MODULE_r15(%rsi), %r15
> movq TDX_MODULE_rbx(%rsi), %rbx
>
> - .if \ret
> +.if \ret
> /* Save the structure pointer as %rsi is about to be clobbered */
> pushq %rsi
> - .endif
> +.endif
>
> movq TDX_MODULE_rdi(%rsi), %rdi
> /* %rsi needs to be done at last */
> movq TDX_MODULE_rsi(%rsi), %rsi
> - .endif /* \saved */
> +.endif /* \saved */
>
> - .if \host
> -1:
> +.if \host
> +.Lseamcall:
> seamcall
> /*
> * SEAMCALL instruction is essentially a VMExit from VMX root
> @@ -99,15 +99,13 @@
> * This value will never be used as actual SEAMCALL error code as
> * it is from the Reserved status code class.
> */
> - mov $TDX_SEAMCALL_VMFAILINVALID, %rdi
> - cmovc %rdi, %rax
> -2:
> - .else
> + jc .Lseamfail
> +.else
> tdcall
> - .endif
> +.endif
>
> - .if \ret
> - .if \saved
> +.if \ret
> +.if \saved
> /*
> * Restore the structure from stack to saved the output registers
> *
> @@ -136,7 +134,7 @@
> movq %r15, TDX_MODULE_r15(%rsi)
> movq %rbx, TDX_MODULE_rbx(%rsi)
> movq %rdi, TDX_MODULE_rdi(%rsi)
> - .endif /* \saved */
> +.endif /* \saved */
>
> /* Copy output regs to the structure */
> movq %rcx, TDX_MODULE_rcx(%rsi)
> @@ -145,10 +143,11 @@
> movq %r9, TDX_MODULE_r9(%rsi)
> movq %r10, TDX_MODULE_r10(%rsi)
> movq %r11, TDX_MODULE_r11(%rsi)
> - .endif /* \ret */
> +.endif /* \ret */
>
> - .if \saved
> - .if \ret
> +.Lout:
> +.if \saved
> +.if \ret
> /*
> * Clear registers shared by guest for VP.ENTER and VP.VMCALL to
> * prevent speculative use of values from guest/VMM, including
> @@ -170,13 +169,8 @@
> xorq %r9, %r9
> xorq %r10, %r10
> xorq %r11, %r11
> - xorq %r12, %r12
> - xorq %r13, %r13
> - xorq %r14, %r14
> - xorq %r15, %r15
> - xorq %rbx, %rbx
> xorq %rdi, %rdi
> - .endif /* \ret */
> +.endif /* \ret */
>
> /* Restore callee-saved GPRs as mandated by the x86_64 ABI */
> popq %r15
> @@ -184,13 +178,17 @@
> popq %r13
> popq %r12
> popq %rbx
> - .endif /* \saved */
> +.endif /* \saved */
>
> FRAME_END
> RET
>
> - .if \host
> -3:
> +.if \host
> +.Lseamfail:
> + mov $TDX_SEAMCALL_VMFAILINVALID, %rax
> + jmp .Lout
> +
> +.Lseamtrap:
> /*
> * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> * the trap number. Convert the trap number to the TDX error
> @@ -201,8 +199,8 @@
> */
> movq $TDX_SW_ERROR, %r12
> orq %r12, %rax
> - jmp 2b
> + jmp .Lout

Thanks for the code.

There might be stack balancing issue here. I'll double check when updating this
patch.

Thanks!

>
> - _ASM_EXTABLE_FAULT(1b, 3b)
> - .endif /* \host */
> + _ASM_EXTABLE_FAULT(.Lseamcall, .Lseamtrap)
> +.endif /* \host */
> .endm

2023-07-13 10:44:36

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, 2023-07-13 at 10:43 +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2023 at 08:02:54AM +0000, Huang, Kai wrote:
>
> > Sorry I am ignorant here. Won't "clearing ECX only" leave high bits of
> > registers still containing guest's value?
>
> architecture zero-extends 32bit stores

Sorry, where can I find this information? Looking at SDM I couldn't find :-(

>
> > I see KVM code uses:
> >
> > xor %eax, %eax
> > xor %ecx, %ecx
> > xor %edx, %edx
> > xor %ebp, %ebp
> > xor %esi, %esi
> > xor %edi, %edi
> > #ifdef CONFIG_X86_64
> > xor %r8d, %r8d
> > xor %r9d, %r9d
> > xor %r10d, %r10d
> > xor %r11d, %r11d
> > xor %r12d, %r12d
> > xor %r13d, %r13d
> > xor %r14d, %r14d
> > xor %r15d, %r15d
> > #endif
> >
> > Which makes sense because KVM wants to support 32-bit too.
>
> Encoding for the first lot is shorter, the 64bit regs obviously need the
> RAX byte anyway.
>
> > However for TDX is 64-bit only.
> >
> > And I also see the current TDVMCALL code has:
> >
> > xor %r8d, %r8d
> > xor %r9d, %r9d
> > xor %r10d, %r10d
> > xor %r11d, %r11d
> > xor %rdi, %rdi
> > xor %rdx, %rdx
> >
> > Why does it need to use "d" postfix for all r* registers?
>
> That's the name of the 32bit subword, r#[bwd] for byte, word,
> double-word. SDM v1 3.7.2.1 has the whole list, I couldn't quicky find
> one for the zero-extention thing
>
> > Sorry for those questions but I struggled when I wrote those assembly and am
> > hoping to get my mind cleared on this. :-)
>
> No problem.
>

I _think_ I understand now? In 64-bit mode

xor %eax, %eax

equals to

xor %rax, %rax

(due to "architecture zero-extends 32bit stores")

Thus using the former (plus using "d" for %r*) can save some memory?

2023-07-13 11:02:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, Jul 13, 2023 at 10:24:48AM +0000, Huang, Kai wrote:
> On Thu, 2023-07-13 at 10:19 +0000, Huang, Kai wrote:
> > On Thu, 2023-07-13 at 10:43 +0200, Peter Zijlstra wrote:
> > > On Thu, Jul 13, 2023 at 08:02:54AM +0000, Huang, Kai wrote:
> > >
> > > > Sorry I am ignorant here. Won't "clearing ECX only" leave high bits of
> > > > registers still containing guest's value?
> > >
> > > architecture zero-extends 32bit stores
> >
> > Sorry, where can I find this information? Looking at SDM I couldn't find :-(
> >
> >
>
> Hmm.. I think I found it -- it's in SDM vol 1:
>
> 3.4.1.1 General-Purpose Registers in 64-Bit Mode
>
> 32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in
> the destination general-purpose register.

Yes, that's it.

2023-07-13 11:03:28

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, 2023-07-13 at 11:25 +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2023 at 09:15:49AM +0000, Huang, Kai wrote:
> > On Thu, 2023-07-13 at 11:01 +0200, Peter Zijlstra wrote:
> > > On Thu, Jul 13, 2023 at 08:09:33AM +0000, Huang, Kai wrote:
> > > > On Wed, 2023-07-12 at 19:11 +0200, Peter Zijlstra wrote:
> > > > > On Wed, Jul 12, 2023 at 08:55:21PM +1200, Kai Huang wrote:
> > > > > > @@ -65,6 +104,37 @@
> > > > > > .endif
> > > > > >
> > > > > > .if \ret
> > > > > > + .if \saved
> > > > > > + /*
> > > > > > + * Restore the structure from stack to saved the output registers
> > > > > > + *
> > > > > > + * In case of VP.ENTER returns due to TDVMCALL, all registers are
> > > > > > + * valid thus no register can be used as spare to restore the
> > > > > > + * structure from the stack (see "TDH.VP.ENTER Output Operands
> > > > > > + * Definition on TDCALL(TDG.VP.VMCALL) Following a TD Entry").
> > > > > > + * For this case, need to make one register as spare by saving it
> > > > > > + * to the stack and then manually load the structure pointer to
> > > > > > + * the spare register.
> > > > > > + *
> > > > > > + * Note for other TDCALLs/SEAMCALLs there are spare registers
> > > > > > + * thus no need for such hack but just use this for all for now.
> > > > > > + */
> > > > > > + pushq %rax /* save the TDCALL/SEAMCALL return code */
> > > > > > + movq 8(%rsp), %rax /* restore the structure pointer */
> > > > > > + movq %rsi, TDX_MODULE_rsi(%rax) /* save %rsi */
> > > > > > + movq %rax, %rsi /* use %rsi as structure pointer */
> > > > > > + popq %rax /* restore the return code */
> > > > > > + popq %rsi /* pop the structure pointer */
> > > > >
> > > > > Urgghh... At least for the \host case you can simply pop %rsi, no?
> > > > > VP.ENTER returns with 0 there IIRC.
> > > >
> > > > No VP.ENTER doesn't return 0 for RAX. Firstly, VP.ENTER can return for many
> > >
> > > No, but it *does* return 0 for: RBX,RSI,RDI,R10-R15.
> > >
> > > So for \host you can simply do:
> > >
> > > pop %rsi
> > > mov $0, TDX_MODULE_rsi(%rsi)
> > >
> > > and call it a day.
> >
> > This isn't true for the case that VP.ENTER returns due to a TDVMCALL. In that
> > case RCX contains the bitmap of shared registers, and RBX/RDX/RDI/RSI/R8-R15
> > contains guest value if the corresponding bit is set in RCX (RBP will be
> > excluded by updating the spec I assume).
> >
> > Or are you suggesting we need to decode RAX to decide whether the VP.ENTER
> > return is due to TDVMCALL vs other reasons, and act differently?
>
> Urgh, no I had missed there are *TWO* tables for output :/ Who does
> something like that :-(
>
> So yeah, sucks.

Yeah. I think for code simplicity we should just do the way the current patch
has implemented :-)

2023-07-13 11:05:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, Jul 13, 2023 at 10:19:49AM +0000, Huang, Kai wrote:
> On Thu, 2023-07-13 at 10:43 +0200, Peter Zijlstra wrote:
> > On Thu, Jul 13, 2023 at 08:02:54AM +0000, Huang, Kai wrote:
> >
> > > Sorry I am ignorant here. Won't "clearing ECX only" leave high bits of
> > > registers still containing guest's value?
> >
> > architecture zero-extends 32bit stores
>
> Sorry, where can I find this information? Looking at SDM I couldn't find :-(

Yeah, I couldn't find it in a hurry either, but bpetkov pasted me this
from the AMD document:

"In 64-bit mode, the following general rules apply to instructions and their operands:
“Promoted to 64 Bit”: If an instruction’s operand size (16-bit or 32-bit) in legacy and
compatibility modes depends on the CS.D bit and the operand-size override prefix, then the
operand-size choices in 64-bit mode are extended from 16-bit and 32-bit to include 64 bits (with a
REX prefix), or the operand size is fixed at 64 bits. Such instructions are said to be “Promoted to
64 bits” in Table B-1. However, byte-operand opcodes of such instructions are not promoted."

> I _think_ I understand now? In 64-bit mode
>
> xor %eax, %eax
>
> equals to
>
> xor %rax, %rax
>
> (due to "architecture zero-extends 32bit stores")
>
> Thus using the former (plus using "d" for %r*) can save some memory?

Yes, 64bit wide instruction get a REX prefix 0x4X (somehow I keep typing
RAX) byte in front to tell it's a 64bit wide op.

31 c0 xor %eax,%eax
48 31 c0 xor %rax,%rax

The REX byte will show up for rN usage, because then we need the actual
Register Extention part of that prefix irrespective of the width.

45 31 d2 xor %r10d,%r10d
4d 31 d2 xor %r10,%r10

x86 instruction encoding is 'fun' :-)

See SDM Vol 2 2.2.1.2 if you want to know more about the REX prefix.

2023-07-13 11:05:40

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, 2023-07-13 at 10:19 +0000, Huang, Kai wrote:
> On Thu, 2023-07-13 at 10:43 +0200, Peter Zijlstra wrote:
> > On Thu, Jul 13, 2023 at 08:02:54AM +0000, Huang, Kai wrote:
> >
> > > Sorry I am ignorant here. Won't "clearing ECX only" leave high bits of
> > > registers still containing guest's value?
> >
> > architecture zero-extends 32bit stores
>
> Sorry, where can I find this information? Looking at SDM I couldn't find :-(
>
>

Hmm.. I think I found it -- it's in SDM vol 1:

3.4.1.1 General-Purpose Registers in 64-Bit Mode

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in
the destination general-purpose register.

Sorry for the noise!

2023-07-13 11:27:19

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, 2023-07-13 at 12:37 +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2023 at 10:19:49AM +0000, Huang, Kai wrote:
> > On Thu, 2023-07-13 at 10:43 +0200, Peter Zijlstra wrote:
> > > On Thu, Jul 13, 2023 at 08:02:54AM +0000, Huang, Kai wrote:
> > >
> > > > Sorry I am ignorant here. Won't "clearing ECX only" leave high bits of
> > > > registers still containing guest's value?
> > >
> > > architecture zero-extends 32bit stores
> >
> > Sorry, where can I find this information? Looking at SDM I couldn't find :-(
>
> Yeah, I couldn't find it in a hurry either, but bpetkov pasted me this
> from the AMD document:
>
> "In 64-bit mode, the following general rules apply to instructions and their operands:
> “Promoted to 64 Bit”: If an instruction’s operand size (16-bit or 32-bit) in legacy and
> compatibility modes depends on the CS.D bit and the operand-size override prefix, then the
> operand-size choices in 64-bit mode are extended from 16-bit and 32-bit to include 64 bits (with a
> REX prefix), or the operand size is fixed at 64 bits. Such instructions are said to be “Promoted to
> 64 bits” in Table B-1. However, byte-operand opcodes of such instructions are not promoted."
>
> > I _think_ I understand now? In 64-bit mode
> >
> > xor %eax, %eax
> >
> > equals to
> >
> > xor %rax, %rax
> >
> > (due to "architecture zero-extends 32bit stores")
> >
> > Thus using the former (plus using "d" for %r*) can save some memory?
>
> Yes, 64bit wide instruction get a REX prefix 0x4X (somehow I keep typing
> RAX) byte in front to tell it's a 64bit wide op.
>
> 31 c0 xor %eax,%eax
> 48 31 c0 xor %rax,%rax
>
> The REX byte will show up for rN usage, because then we need the actual
> Register Extention part of that prefix irrespective of the width.
>
> 45 31 d2 xor %r10d,%r10d
> 4d 31 d2 xor %r10,%r10
>
> x86 instruction encoding is 'fun' :-)
>
> See SDM Vol 2 2.2.1.2 if you want to know more about the REX prefix.

Learned something new. Appreciate your time! :-)

2023-07-13 11:42:56

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 02/10] x86/tdx: Use cmovc to save a label in TDX_MODULE_CALL asm

On Wed, 2023-07-12 at 12:27 -0700, Sathyanarayanan Kuppuswamy wrote:
>
> On 7/12/23 1:55 AM, Kai Huang wrote:
> > Change 'jnc .Lno_vmfailinvalid' to 'cmovc %rdi, %rax' to save the
> > .Lno_vmfailinvalid label in the TDX_MODULE_CALL asm macro.
>
> You are removing the label, right? What use "save"?

Per comments to patch 10 I'll drop this patch, so doesn't matter anymore.

2023-07-13 11:57:33

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 07/10] x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs

On Thu, 2023-07-13 at 12:22 +0100, Andrew Cooper wrote:
> On Thu, 13 Jul 2023 10:47:44 +0000, Huang, Kai wrote:
> > On Thu, 2023-07-13 at 12:37 +0200, Peter Zijlstra wrote:
> > > On Thu, Jul 13, 2023 at 10:19:49AM +0000, Huang, Kai wrote:
> > > > On Thu, 2023-07-13 at 10:43 +0200, Peter Zijlstra wrote:
> > > > > On Thu, Jul 13, 2023 at 08:02:54AM +0000, Huang, Kai wrote:
> > > > >
> > > > > > Sorry I am ignorant here. Won't "clearing ECX only" leave high bits of
> > > > > > registers still containing guest's value?
> > > > >
> > > > > architecture zero-extends 32bit stores
> > > >
> > > > Sorry, where can I find this information? Looking at SDM I couldn't find :-(
> > >
> > > Yeah, I couldn't find it in a hurry either, but bpetkov pasted me this
> > > from the AMD document:
> > >
> > > "In 64-bit mode, the following general rules apply to instructions and their operands:
> > > “Promoted to 64 Bit”: If an instruction’s operand size (16-bit or 32-bit) in legacy and
> > > compatibility modes depends on the CS.D bit and the operand-size override prefix, then the
> > > operand-size choices in 64-bit mode are extended from 16-bit and 32-bit to include 64 bits (with a
> > > REX prefix), or the operand size is fixed at 64 bits. Such instructions are said to be “Promoted to
> > > 64 bits” in Table B-1. However, byte-operand opcodes of such instructions are not promoted."
> > >
> > > > I _think_ I understand now? In 64-bit mode
> > > >
> > > > xor %eax, %eax
> > > >
> > > > equals to
> > > >
> > > > xor %rax, %rax
> > > >
> > > > (due to "architecture zero-extends 32bit stores")
> > > >
> > > > Thus using the former (plus using "d" for %r*) can save some memory?
> > >
> > > Yes, 64bit wide instruction get a REX prefix 0x4X (somehow I keep typing
> > > RAX) byte in front to tell it's a 64bit wide op.
> > >
> > > 31 c0 xor %eax,%eax
> > > 48 31 c0 xor %rax,%rax
> > >
> > > The REX byte will show up for rN usage, because then we need the actual
> > > Register Extention part of that prefix irrespective of the width.
> > >
> > > 45 31 d2 xor %r10d,%r10d
> > > 4d 31 d2 xor %r10,%r10
> > >
> > > x86 instruction encoding is 'fun' :-)
> > >
> > > See SDM Vol 2 2.2.1.2 if you want to know more about the REX prefix.
> >
> > Learned something new. Appreciate your time! :-)
>
> And now for the extra fun...
>
> The Silvermont uarch is 64bit, but only recognises 32bit XORs as zeroing
> idioms.
>
> So for best performance on as many uarches as possible, you should *always*
> use the 32bit forms, even for %r8-15.
>
>

Ah thanks Andrew for the tip :)

2023-07-13 15:05:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Thu, Jul 13, 2023, Kai Huang wrote:
> On Thu, 2023-07-13 at 09:42 +0200, Peter Zijlstra wrote:
> > On Thu, Jul 13, 2023 at 03:46:52AM +0000, Huang, Kai wrote:
> > > On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > > > > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > > > > TDCALL infrastructure.� Wire up basic functions to make SEAMCALLs for
> > > > > the basic TDX support: __seamcall(), __seamcall_ret() and
> > > > > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
> > > >
> > > > Hi.� __seamcall_saved_ret() uses struct tdx_module_arg as input and output.� For
> > > > KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> > > > kvm_vcpu_arch::regs[].� It's silly to move those values twice.� From
> > > > kvm_vcpu_arch::regs to tdx_module_args.� From tdx_module_args to real registers.
> > > >
> > > > If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> > > > take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?� Maybe I can make the
> > > > change with TDX KVM patch series.
> > >
> > > The assembly code assumes the second argument is a pointer to 'struct
> > > tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
> > > achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
> > > match 'struct tdx_module_args''s layout and manually convert part of "regs" to
> > > the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.
> >
> > I suspect the kvm_vcpu_arch::regs layout is given by hardware; so the
> > only option would be to make tdx_module_args match that. It's a slightly
> > unfortunate layout, but meh.
> >
> > Then you can simply do:
> >
> > __seamcall_saved_ret(leaf, (struct tdx_module_args *)vcpu->arch->regs);
> >
> >
>
> I don't think the layout matches hardware, especially I think there's no
> "hardware layout" for GPRs that are concerned here. They are just for KVM
> itself to save guest's registers when the guest exits to KVM, so that KVM can
> restore them when returning back to the guest.

kvm_vcpu_arch::regs does follow the hardware-defined indices, and that's required
for myriad emulation flows, e.g. saving GPRs into SMRAM, anywhere KVM gets a GPR
index from an instruction opcode or vmcs.VMX_INSTRUCTION_INFO, etc.

2023-07-13 19:00:01

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Thu, Jul 13, 2023 at 03:46:52AM +0000,
"Huang, Kai" <[email protected]> wrote:

> On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > > TDCALL infrastructure.  Wire up basic functions to make SEAMCALLs for
> > > the basic TDX support: __seamcall(), __seamcall_ret() and
> > > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
> >
> > Hi.  __seamcall_saved_ret() uses struct tdx_module_arg as input and output.  For
> > KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> > kvm_vcpu_arch::regs[].  It's silly to move those values twice.  From
> > kvm_vcpu_arch::regs to tdx_module_args.  From tdx_module_args to real registers.
> >
> > If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> > take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?  Maybe I can make the
> > change with TDX KVM patch series.
>
> The assembly code assumes the second argument is a pointer to 'struct
> tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
> achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
> match 'struct tdx_module_args''s layout and manually convert part of "regs" to
> the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.
>
> This was one concern that I mentioned VP.ENTER can be implemented by KVM in its
> own assembly in the TDX host v12 discussion. I kinda agree we should leverage
> KVM's existing kvm_vcpu_arch::regs[NR_CPU_REGS] infrastructure to minimize the
> code change to the KVM's common infrastructure. If so, I guess we have to carry
> this memory copy burden between two structures.
>
> Btw, I do find KVM's VP.ENTER code is a little bit redundant to the common
> SEAMCALL assembly, which is a good reason for KVM to use __seamcall() variants
> for TDH.VP.ENTER.
>
> So it's a tradeoff I think.
>
> On the other hand, given CoCo VMs normally don't expose all GPRs to VMM, it's
> also debatable whether we should invent another infrastructure to the KVM code
> to handle register access of CoCo VMs too, e.g., we can catch bugs easily when
> KVM tries to access the registers that it shouldn't access.

Yes, we'd like to save/restore GPRs only for TDVMCALL. Otherwise skip
save/restore.

--
Isaku Yamahata <[email protected]>

2023-07-14 12:50:25

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH 04/10] x86/tdx: Make macros of TDCALLs consistent with the spec



On 12.07.23 г. 11:55 ч., Kai Huang wrote:
> The TDX spec names all TDCALLs with prefix "TDG". Currently, the kernel
> doesn't follow such convention for the macros of those TDCALLs but uses
> prefix "TDX_" for all of them. Although it's arguable whether the TDX
> spec names those TDCALLs properly, it's better for the kernel to follow
> the spec when naming those macros.
>
> Change all macros of TDCALLs to make them consistent with the spec. As
> a bonus, they get distinguished easily from the host-side SEAMCALLs,
> which all have prefix "TDH".
>
> No functional change intended.
>
> Signed-off-by: Kai Huang <[email protected]>
> ---
> arch/x86/coco/tdx/tdx.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 5b8056f6c83f..de021df92009 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -15,11 +15,11 @@
> #include <asm/pgtable.h>
>
> /* TDX module Call Leaf IDs */
> -#define TDX_GET_INFO 1
> -#define TDX_GET_VEINFO 3
> -#define TDX_GET_REPORT 4
> -#define TDX_ACCEPT_PAGE 6
> -#define TDX_WR 8
> +#define TDG_VP_INFO 1
> +#define TDG_VP_VEINFO_GET 3
> +#define TDG_MR_REPORT 4
> +#define TDG_MEM_PAGE_ACCEPT 6
> +#define TDG_VM_WR 8
>
What branch is this patch set based off? Because the existing TDX_GET_*
defines are in arch/x86/include/asm/shared/tdx.h due to ff40b5769a50f ?


<snip>

2023-07-14 12:55:34

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH 04/10] x86/tdx: Make macros of TDCALLs consistent with the spec



On 12.07.23 г. 11:55 ч., Kai Huang wrote:
> The TDX spec names all TDCALLs with prefix "TDG". Currently, the kernel
> doesn't follow such convention for the macros of those TDCALLs but uses
> prefix "TDX_" for all of them. Although it's arguable whether the TDX
> spec names those TDCALLs properly, it's better for the kernel to follow
> the spec when naming those macros.
>
> Change all macros of TDCALLs to make them consistent with the spec. As
> a bonus, they get distinguished easily from the host-side SEAMCALLs,
> which all have prefix "TDH".
>
> No functional change intended.
>
> Signed-off-by: Kai Huang <[email protected]>
> ---
> arch/x86/coco/tdx/tdx.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 5b8056f6c83f..de021df92009 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -15,11 +15,11 @@
> #include <asm/pgtable.h>
>
> /* TDX module Call Leaf IDs */
> -#define TDX_GET_INFO 1
> -#define TDX_GET_VEINFO 3
> -#define TDX_GET_REPORT 4
> -#define TDX_ACCEPT_PAGE 6
> -#define TDX_WR 8
> +#define TDG_VP_INFO 1
> +#define TDG_VP_VEINFO_GET 3
> +#define TDG_MR_REPORT 4
> +#define TDG_MEM_PAGE_ACCEPT 6
> +#define TDG_VM_WR 8
>
What branch is this patch set based off? Because the existing TDX_GET_*
defines are in arch/x86/include/asm/shared/tdx.h due to ff40b5769a50f ?


<snip>

2023-07-15 10:43:45

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH 08/10] x86/tdx: Unify TDX_HYPERCALL and TDX_MODULE_CALL assembly



<snip>

> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 0f16ba52ae62..a5e77893b2c0 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -51,13 +51,38 @@
>
> #define TDREPORT_SUBTYPE_0 0
>
> +/* Called from __tdx_hypercall() for unrecoverable failure */
> +static noinstr void __tdx_hypercall_failed(void)
> +{
> + instrumentation_begin();
> + panic("TDVMCALL failed. TDX module bug?");
> +}

So what's the deal with this instrumentation here. The instruction is
noinstr, so you want to make just the panic call itself instrumentable?,
if so where's the instrumentation_end() cal;?No instrumentation_end()
call. Actually is this complexity really worth it for the failure case?

AFAICS there is a single call site for __tdx_hypercall_failed so why
noot call panic() directly ?

> +
> +static inline u64 __tdx_hypercall(struct tdx_module_args *args)
> +{
> + u64 ret;
> +
> + args->rcx = TDVMCALL_EXPOSE_REGS_MASK;
> + ret = __tdcall_saved_ret(TDG_VP_VMCALL, args);
> +
> + /*
> + * RAX!=0 indicates a failure of the TDVMCALL mechanism itself and that

nit: Why mention the register explicitly, just say that if
__tdcall_saved_ret returns non-zero ....

> + * something has gone horribly wrong with the TDX module.
> + */
> + if (ret)
> + __tdx_hypercall_failed();
> +
> + /* The return status of the hypercall itself is in R10. */
> + return args->r10;
> +}
> +
> /*
> - * Wrapper for standard use of __tdx_hypercall with no output aside from
> - * return code.
> + * Wrapper for standard use of __tdx_hypercall() w/o needing any output
> + * register except the return code.
> */
> static inline u64 _tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
> {
> - struct tdx_hypercall_args args = {
> + struct tdx_module_args args = {
> .r10 = TDX_HYPERCALL_STANDARD,
> .r11 = fn,
> .r12 = r12,

<snip>

2023-07-17 01:13:39

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 04/10] x86/tdx: Make macros of TDCALLs consistent with the spec

On Fri, 2023-07-14 at 15:28 +0300, Nikolay Borisov wrote:
>
> On 12.07.23 г. 11:55 ч., Kai Huang wrote:
> > The TDX spec names all TDCALLs with prefix "TDG". Currently, the kernel
> > doesn't follow such convention for the macros of those TDCALLs but uses
> > prefix "TDX_" for all of them. Although it's arguable whether the TDX
> > spec names those TDCALLs properly, it's better for the kernel to follow
> > the spec when naming those macros.
> >
> > Change all macros of TDCALLs to make them consistent with the spec. As
> > a bonus, they get distinguished easily from the host-side SEAMCALLs,
> > which all have prefix "TDH".
> >
> > No functional change intended.
> >
> > Signed-off-by: Kai Huang <[email protected]>
> > ---
> > arch/x86/coco/tdx/tdx.c | 22 +++++++++++-----------
> > 1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> > index 5b8056f6c83f..de021df92009 100644
> > --- a/arch/x86/coco/tdx/tdx.c
> > +++ b/arch/x86/coco/tdx/tdx.c
> > @@ -15,11 +15,11 @@
> > #include <asm/pgtable.h>
> >
> > /* TDX module Call Leaf IDs */
> > -#define TDX_GET_INFO 1
> > -#define TDX_GET_VEINFO 3
> > -#define TDX_GET_REPORT 4
> > -#define TDX_ACCEPT_PAGE 6
> > -#define TDX_WR 8
> > +#define TDG_VP_INFO 1
> > +#define TDG_VP_VEINFO_GET 3
> > +#define TDG_MR_REPORT 4
> > +#define TDG_MEM_PAGE_ACCEPT 6
> > +#define TDG_VM_WR 8
> >
> What branch is this patch set based off? Because the existing TDX_GET_*
> defines are in arch/x86/include/asm/shared/tdx.h due to ff40b5769a50f ?
>

It was based on tip/x86/tdx, which didn't have above patch. I'll rebase to the
Linus's tree for the next round since it now has all TDX patches anyway.

2023-07-17 04:00:47

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Thu, 2023-07-13 at 07:51 -0700, Sean Christopherson wrote:
> On Thu, Jul 13, 2023, Kai Huang wrote:
> > On Thu, 2023-07-13 at 09:42 +0200, Peter Zijlstra wrote:
> > > On Thu, Jul 13, 2023 at 03:46:52AM +0000, Huang, Kai wrote:
> > > > On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > > > > > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > > > > > TDCALL infrastructure.� Wire up basic functions to make SEAMCALLs for
> > > > > > the basic TDX support: __seamcall(), __seamcall_ret() and
> > > > > > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
> > > > >
> > > > > Hi.� __seamcall_saved_ret() uses struct tdx_module_arg as input and output.� For
> > > > > KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> > > > > kvm_vcpu_arch::regs[].� It's silly to move those values twice.� From
> > > > > kvm_vcpu_arch::regs to tdx_module_args.� From tdx_module_args to real registers.
> > > > >
> > > > > If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> > > > > take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?� Maybe I can make the
> > > > > change with TDX KVM patch series.
> > > >
> > > > The assembly code assumes the second argument is a pointer to 'struct
> > > > tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
> > > > achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
> > > > match 'struct tdx_module_args''s layout and manually convert part of "regs" to
> > > > the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.
> > >
> > > I suspect the kvm_vcpu_arch::regs layout is given by hardware; so the
> > > only option would be to make tdx_module_args match that. It's a slightly
> > > unfortunate layout, but meh.
> > >
> > > Then you can simply do:
> > >
> > > __seamcall_saved_ret(leaf, (struct tdx_module_args *)vcpu->arch->regs);
> > >
> > >
> >
> > I don't think the layout matches hardware, especially I think there's no
> > "hardware layout" for GPRs that are concerned here. They are just for KVM
> > itself to save guest's registers when the guest exits to KVM, so that KVM can
> > restore them when returning back to the guest.
>
> kvm_vcpu_arch::regs does follow the hardware-defined indices, and that's required
> for myriad emulation flows, e.g. saving GPRs into SMRAM, anywhere KVM gets a GPR
> index from an instruction opcode or vmcs.VMX_INSTRUCTION_INFO, etc.

Yes. Sorry I missed this. I forgot x86 uses "index register" and does have a
"hardware layout".

IndexReg:
0 = RAX
1 = RCX
2 = RDX
3 = RBX
4 = RSP
5 = RBP
6 = RSI
7 = RDI
8–15 represent R8–R15, respectively...

2023-07-17 06:56:17

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 08/10] x86/tdx: Unify TDX_HYPERCALL and TDX_MODULE_CALL assembly


> > +/* Called from __tdx_hypercall() for unrecoverable failure */
> > +static noinstr void __tdx_hypercall_failed(void)
> > +{
> > + instrumentation_begin();
> > + panic("TDVMCALL failed. TDX module bug?");
> > +}
>
> So what's the deal with this instrumentation here. The instruction is
> noinstr, so you want to make just the panic call itself instrumentable?,
> if so where's the instrumentation_end() cal;?No instrumentation_end()
> call. Actually is this complexity really worth it for the failure case?
>
> AFAICS there is a single call site for __tdx_hypercall_failed so why
> noot call panic() directly ?

W/o this patch, the __tdx_hypercall_failed() is called from the TDX_HYPERCALL
assembly, which is in .noinstr.text, and 'instrumentation_begin()' was needed to
avoid the build warning I suppose.

However now with this patch __tdx_hypercall_failed() is called from
__tdx_hypercall() which is a C function w/o 'noinstr' annotation, thus I believe
instrumentation_begin() and 'noinstr' annotation are not needed anymore.

I didn't notice this while moving this function around and my kernel build test
didn't warn me about this. I'll change in next version.

In fact, perhaps this patch perhaps is too big for review. I will also try to
split it to smaller ones.

>
> > +
> > +static inline u64 __tdx_hypercall(struct tdx_module_args *args)
> > +{
> > + u64 ret;
> > +
> > + args->rcx = TDVMCALL_EXPOSE_REGS_MASK;
> > + ret = __tdcall_saved_ret(TDG_VP_VMCALL, args);
> > +
> > + /*
> > + * RAX!=0 indicates a failure of the TDVMCALL mechanism itself and that
>
> nit: Why mention the register explicitly, just say that if
> __tdcall_saved_ret returns non-zero ....

OK will do. I basically moved the comment from assembly to here.



2023-07-17 07:10:14

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH 08/10] x86/tdx: Unify TDX_HYPERCALL and TDX_MODULE_CALL assembly



On 17.07.23 г. 9:35 ч., Huang, Kai wrote:
>
>>> +/* Called from __tdx_hypercall() for unrecoverable failure */
>>> +static noinstr void __tdx_hypercall_failed(void)
>>> +{
>>> + instrumentation_begin();
>>> + panic("TDVMCALL failed. TDX module bug?");
>>> +}
>>
>> So what's the deal with this instrumentation here. The instruction is
>> noinstr, so you want to make just the panic call itself instrumentable?,
>> if so where's the instrumentation_end() cal;?No instrumentation_end()
>> call. Actually is this complexity really worth it for the failure case?
>>
>> AFAICS there is a single call site for __tdx_hypercall_failed so why
>> noot call panic() directly ?
>
> W/o this patch, the __tdx_hypercall_failed() is called from the TDX_HYPERCALL
> assembly, which is in .noinstr.text, and 'instrumentation_begin()' was needed to
> avoid the build warning I suppose.
>
> However now with this patch __tdx_hypercall_failed() is called from
> __tdx_hypercall() which is a C function w/o 'noinstr' annotation, thus I believe
> instrumentation_begin() and 'noinstr' annotation are not needed anymore.
>
> I didn't notice this while moving this function around and my kernel build test
> didn't warn me about this. I'll change in next version.
>
> In fact, perhaps this patch perhaps is too big for review. I will also try to
> split it to smaller ones.

Can't you simply call panic() directly? Less going around the code while
someone is reading it?

<snip>

2023-07-17 08:20:36

by Kai Huang

[permalink] [raw]
Subject: RE: [PATCH 08/10] x86/tdx: Unify TDX_HYPERCALL and TDX_MODULE_CALL assembly

> On 17.07.23 г. 9:35 ч., Huang, Kai wrote:
> >
> >>> +/* Called from __tdx_hypercall() for unrecoverable failure */
> >>> +static noinstr void __tdx_hypercall_failed(void) {
> >>> + instrumentation_begin();
> >>> + panic("TDVMCALL failed. TDX module bug?"); }
> >>
> >> So what's the deal with this instrumentation here. The instruction is
> >> noinstr, so you want to make just the panic call itself
> >> instrumentable?, if so where's the instrumentation_end() cal;?No
> >> instrumentation_end() call. Actually is this complexity really worth it for the
> failure case?
> >>
> >> AFAICS there is a single call site for __tdx_hypercall_failed so why
> >> noot call panic() directly ?
> >
> > W/o this patch, the __tdx_hypercall_failed() is called from the
> > TDX_HYPERCALL assembly, which is in .noinstr.text, and
> > 'instrumentation_begin()' was needed to avoid the build warning I suppose.
> >
> > However now with this patch __tdx_hypercall_failed() is called from
> > __tdx_hypercall() which is a C function w/o 'noinstr' annotation, thus
> > I believe
> > instrumentation_begin() and 'noinstr' annotation are not needed anymore.
> >
> > I didn't notice this while moving this function around and my kernel
> > build test didn't warn me about this. I'll change in next version.
> >
> > In fact, perhaps this patch perhaps is too big for review. I will
> > also try to split it to smaller ones.
>
> Can't you simply call panic() directly? Less going around the code while someone
> is reading it?

I can and will do.

2023-07-18 11:05:17

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH 08/10] x86/tdx: Unify TDX_HYPERCALL and TDX_MODULE_CALL assembly

On Mon, 2023-07-17 at 07:58 +0000, Huang, Kai wrote:
> > On 17.07.23 г. 9:35 ч., Huang, Kai wrote:
> > >
> > > > > +/* Called from __tdx_hypercall() for unrecoverable failure */
> > > > > +static noinstr void __tdx_hypercall_failed(void) {
> > > > > + instrumentation_begin();
> > > > > + panic("TDVMCALL failed. TDX module bug?"); }
> > > >
> > > > So what's the deal with this instrumentation here. The instruction is
> > > > noinstr, so you want to make just the panic call itself
> > > > instrumentable?, if so where's the instrumentation_end() cal;?No
> > > > instrumentation_end() call. Actually is this complexity really worth it for the
> > failure case?
> > > >
> > > > AFAICS there is a single call site for __tdx_hypercall_failed so why
> > > > noot call panic() directly ?
> > >
> > > W/o this patch, the __tdx_hypercall_failed() is called from the
> > > TDX_HYPERCALL assembly, which is in .noinstr.text, and
> > > 'instrumentation_begin()' was needed to avoid the build warning I suppose.
> > >
> > > However now with this patch __tdx_hypercall_failed() is called from
> > > __tdx_hypercall() which is a C function w/o 'noinstr' annotation, thus
> > > I believe
> > > instrumentation_begin() and 'noinstr' annotation are not needed anymore.
> > >
> > > I didn't notice this while moving this function around and my kernel
> > > build test didn't warn me about this. I'll change in next version.
> > >
> > > In fact, perhaps this patch perhaps is too big for review. I will
> > > also try to split it to smaller ones.
> >
> > Can't you simply call panic() directly? Less going around the code while someone
> > is reading it?
>
> I can and will do.

After rebasing to the latest TDX code, I found we should keep the
__tdx_hypercall_failed(). The reason is both the core-kernel (vmlinux) and the
compressed code need the __tdx_hypercall() implementation. Implementing the
__tdx_hypercall_failed() in both core-kernel and compressed code separately
allows the __tdx_hypercall() to be shared by both code, otherwise both of them
need to implement their own __tdx_hypercall().

Note __tdx_hypercall_failed() in the vmlinux calls panic(), but the one in the
compressed code calls error().

2023-08-08 16:38:18

by Yuan Yao

[permalink] [raw]
Subject: Re: [PATCH 09/10] x86/virt/tdx: Wire up basic SEAMCALL functions

On Thu, Jul 13, 2023 at 11:44:34AM -0700, Isaku Yamahata wrote:
> On Thu, Jul 13, 2023 at 03:46:52AM +0000,
> "Huang, Kai" <[email protected]> wrote:
>
> > On Wed, 2023-07-12 at 15:15 -0700, Isaku Yamahata wrote:
> > > > The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
> > > > TDCALL infrastructure.  Wire up basic functions to make SEAMCALLs for
> > > > the basic TDX support: __seamcall(), __seamcall_ret() and
> > > > __seamcall_saved_ret() which is for TDH.VP.ENTER leaf function.
> > >
> > > Hi.  __seamcall_saved_ret() uses struct tdx_module_arg as input and output.  For
> > > KVM TDH.VP.ENTER case, those arguments are already in unsigned long
> > > kvm_vcpu_arch::regs[].  It's silly to move those values twice.  From
> > > kvm_vcpu_arch::regs to tdx_module_args.  From tdx_module_args to real registers.
> > >
> > > If TDH.VP.ENTER is the only user of __seamcall_saved_ret(), can we make it to
> > > take unsigned long kvm_vcpu_argh::regs[NR_VCPU_REGS]?  Maybe I can make the
> > > change with TDX KVM patch series.
> >
> > The assembly code assumes the second argument is a pointer to 'struct
> > tdx_module_args'. I don't know how can we change __seamcall_saved_ret() to
> > achieve what you said. We might change the kvm_vcpu_argh::regs[NR_VCPU_REGS] to
> > match 'struct tdx_module_args''s layout and manually convert part of "regs" to
> > the structure and pass to __seamcall_saved_ret(), but it's too hacky I suppose.
> >
> > This was one concern that I mentioned VP.ENTER can be implemented by KVM in its
> > own assembly in the TDX host v12 discussion. I kinda agree we should leverage
> > KVM's existing kvm_vcpu_arch::regs[NR_CPU_REGS] infrastructure to minimize the
> > code change to the KVM's common infrastructure. If so, I guess we have to carry
> > this memory copy burden between two structures.
> >
> > Btw, I do find KVM's VP.ENTER code is a little bit redundant to the common
> > SEAMCALL assembly, which is a good reason for KVM to use __seamcall() variants
> > for TDH.VP.ENTER.
> >
> > So it's a tradeoff I think.
> >
> > On the other hand, given CoCo VMs normally don't expose all GPRs to VMM, it's
> > also debatable whether we should invent another infrastructure to the KVM code
> > to handle register access of CoCo VMs too, e.g., we can catch bugs easily when
> > KVM tries to access the registers that it shouldn't access.
>
> Yes, we'd like to save/restore GPRs only for TDVMCALL. Otherwise skip
> save/restore.

And another case to save/restore GPRs: supports DEBUG TD,
which is type of TD guest allows VMM to change its register
context, for debugging purpose.

>
> --
> Isaku Yamahata <[email protected]>