2023-06-04 14:59:11

by Huang, Kai

[permalink] [raw]
Subject: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
mode runs only the TDX module itself or other code to load the TDX
module.

The host kernel communicates with SEAM software via a new SEAMCALL
instruction. This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead. The TDX
module establishes a new SEAMCALL ABI which allows the host to
initialize the module and to manage VMs.

Add infrastructure to make SEAMCALLs. The SEAMCALL ABI is very similar
to the TDCALL ABI and leverages much TDCALL infrastructure.

SEAMCALL instruction causes #GP when TDX isn't BIOS enabled, and #UD
when CPU is not in VMX operation. Currently, only KVM code mocks with
VMX enabling, and KVM is the only user of TDX. This implementation
chooses to make KVM itself responsible for enabling VMX before using
TDX and let the rest of the kernel stay blissfully unaware of VMX.

The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
first. Architecturally, there is no CPU flag to check whether the CPU
is in VMX operation. Also, if a BIOS were buggy, it could still report
valid TDX private KeyIDs when TDX actually couldn't be enabled.

Extend the TDX_MODULE_CALL macro to handle #UD and #GP to return error
codes. Introduce two new TDX error codes for them respectively so the
caller can distinguish.

Also add a wrapper function of SEAMCALL to convert SEAMCALL error code
to the kernel error code, and print out SEAMCALL error code to help the
user to understand what went wrong.

Signed-off-by: Kai Huang <[email protected]>
---

v10 -> v11:
- No update

v9 -> v10:
- Make the TDX_SEAMCALL_{GP|UD} error codes unconditional but doesn't
define them when INTEL_TDX_HOST is enabled. (Dave)
- Slightly improved changelog to explain why add assembly code to handle
#UD and #GP.

v8 -> v9:
- Changed patch title (Dave).
- Enhanced seamcall() to include the cpu id to the error message when
SEAMCALL fails.

v7 -> v8:
- Improved changelog (Dave):
- Trim down some sentences (Dave).
- Removed __seamcall() and seamcall() function name and changed
accordingly (Dave).
- Improved the sentence explaining why to handle #GP (Dave).
- Added code to print out error message in seamcall(), following
the idea that tdx_enable() to return universal error and print out
error message to make clear what's going wrong (Dave). Also mention
this in changelog.

v6 -> v7:
- No change.

v5 -> v6:
- Added code to handle #UD and #GP (Dave).
- Moved the seamcall() wrapper function to this patch, and used a
temporary __always_unused to avoid compile warning (Dave).

- v3 -> v5 (no feedback on v4):
- Explicitly tell TDX_SEAMCALL_VMFAILINVALID is returned if the
SEAMCALL itself fails.
- Improve the changelog.

---
arch/x86/include/asm/tdx.h | 5 +++
arch/x86/virt/vmx/tdx/Makefile | 2 +-
arch/x86/virt/vmx/tdx/seamcall.S | 52 +++++++++++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.c | 56 ++++++++++++++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 10 ++++++
arch/x86/virt/vmx/tdx/tdxcall.S | 19 +++++++++--
6 files changed, 141 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S
create mode 100644 arch/x86/virt/vmx/tdx/tdx.h

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 4dfe2e794411..b489b5b9de5d 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -8,6 +8,8 @@
#include <asm/ptrace.h>
#include <asm/shared/tdx.h>

+#include <asm/trapnr.h>
+
/*
* SW-defined error codes.
*
@@ -18,6 +20,9 @@
#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))

+#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP)
+#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD)
+
#ifndef __ASSEMBLY__

/* TDX supported page sizes from the TDX module ABI. */
diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
index 93ca8b73e1f1..38d534f2c113 100644
--- a/arch/x86/virt/vmx/tdx/Makefile
+++ b/arch/x86/virt/vmx/tdx/Makefile
@@ -1,2 +1,2 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-y += tdx.o
+obj-y += tdx.o seamcall.o
diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
new file mode 100644
index 000000000000..f81be6b9c133
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/seamcall.S
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/linkage.h>
+#include <asm/frame.h>
+
+#include "tdxcall.S"
+
+/*
+ * __seamcall() - Host-side interface functions to SEAM software module
+ * (the P-SEAMLDR or the TDX module).
+ *
+ * Transform function call register arguments into the SEAMCALL register
+ * ABI. Return TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself fails,
+ * or the completion status of the SEAMCALL leaf function. Additional
+ * output operands are saved in @out (if it is provided by the caller).
+ *
+ *-------------------------------------------------------------------------
+ * SEAMCALL ABI:
+ *-------------------------------------------------------------------------
+ * Input Registers:
+ *
+ * RAX - SEAMCALL Leaf number.
+ * RCX,RDX,R8-R9 - SEAMCALL Leaf specific input registers.
+ *
+ * Output Registers:
+ *
+ * RAX - SEAMCALL completion status code.
+ * RCX,RDX,R8-R11 - SEAMCALL Leaf specific output registers.
+ *
+ *-------------------------------------------------------------------------
+ *
+ * __seamcall() function ABI:
+ *
+ * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
+ * @rcx (RSI) - Input parameter 1, moved to RCX
+ * @rdx (RDX) - Input parameter 2, moved to RDX
+ * @r8 (RCX) - Input parameter 3, moved to R8
+ * @r9 (R8) - Input parameter 4, moved to R9
+ *
+ * @out (R9) - struct tdx_module_output pointer
+ * stored temporarily in R12 (not
+ * used by the P-SEAMLDR or the TDX
+ * module). It can be NULL.
+ *
+ * Return (via RAX) the completion status of the SEAMCALL, or
+ * TDX_SEAMCALL_VMFAILINVALID.
+ */
+SYM_FUNC_START(__seamcall)
+ FRAME_BEGIN
+ TDX_MODULE_CALL host=1
+ FRAME_END
+ RET
+SYM_FUNC_END(__seamcall)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 2d91e7120c90..e82713dd5d54 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -12,14 +12,70 @@
#include <linux/init.h>
#include <linux/errno.h>
#include <linux/printk.h>
+#include <linux/smp.h>
#include <asm/msr-index.h>
#include <asm/msr.h>
#include <asm/tdx.h>
+#include "tdx.h"

static u32 tdx_global_keyid __ro_after_init;
static u32 tdx_guest_keyid_start __ro_after_init;
static u32 tdx_nr_guest_keyids __ro_after_init;

+/*
+ * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
+ * to kernel error code. @seamcall_ret and @out contain the SEAMCALL
+ * leaf function return code and the additional output respectively if
+ * not NULL.
+ */
+static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+ u64 *seamcall_ret,
+ struct tdx_module_output *out)
+{
+ int cpu, ret = 0;
+ u64 sret;
+
+ /* Need a stable CPU id for printing error message */
+ cpu = get_cpu();
+
+ sret = __seamcall(fn, rcx, rdx, r8, r9, out);
+
+ /* Save SEAMCALL return code if the caller wants it */
+ if (seamcall_ret)
+ *seamcall_ret = sret;
+
+ /* SEAMCALL was successful */
+ if (!sret)
+ goto out;
+
+ switch (sret) {
+ case TDX_SEAMCALL_GP:
+ pr_err_once("[firmware bug]: TDX is not enabled by BIOS.\n");
+ ret = -ENODEV;
+ break;
+ case TDX_SEAMCALL_VMFAILINVALID:
+ pr_err_once("TDX module is not loaded.\n");
+ ret = -ENODEV;
+ break;
+ case TDX_SEAMCALL_UD:
+ pr_err_once("SEAMCALL failed: CPU %d is not in VMX operation.\n",
+ cpu);
+ ret = -EINVAL;
+ break;
+ default:
+ pr_err_once("SEAMCALL failed: CPU %d: leaf %llu, error 0x%llx.\n",
+ cpu, fn, sret);
+ if (out)
+ pr_err_once("additional output: rcx 0x%llx, rdx 0x%llx, r8 0x%llx, r9 0x%llx, r10 0x%llx, r11 0x%llx.\n",
+ out->rcx, out->rdx, out->r8,
+ out->r9, out->r10, out->r11);
+ ret = -EIO;
+ }
+out:
+ put_cpu();
+ return ret;
+}
+
static int __init record_keyid_partitioning(u32 *tdx_keyid_start,
u32 *nr_tdx_keyids)
{
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
new file mode 100644
index 000000000000..48ad1a1ba737
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _X86_VIRT_TDX_H
+#define _X86_VIRT_TDX_H
+
+#include <linux/types.h>
+
+struct tdx_module_output;
+u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+ struct tdx_module_output *out);
+#endif
diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index 49a54356ae99..757b0c34be10 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <asm/asm-offsets.h>
#include <asm/tdx.h>
+#include <asm/asm.h>

/*
* TDCALL and SEAMCALL are supported in Binutils >= 2.36.
@@ -45,6 +46,7 @@
/* Leave input param 2 in RDX */

.if \host
+1:
seamcall
/*
* SEAMCALL instruction is essentially a VMExit from VMX root
@@ -57,10 +59,23 @@
* This value will never be used as actual SEAMCALL error code as
* it is from the Reserved status code class.
*/
- jnc .Lno_vmfailinvalid
+ jnc .Lseamcall_out
mov $TDX_SEAMCALL_VMFAILINVALID, %rax
-.Lno_vmfailinvalid:
+ jmp .Lseamcall_out
+2:
+ /*
+ * SEAMCALL caused #GP or #UD. By reaching here %eax contains
+ * the trap number. Convert the trap number to the TDX error
+ * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
+ *
+ * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
+ * only accepts 32-bit immediate at most.
+ */
+ mov $TDX_SW_ERROR, %r12
+ orq %r12, %rax

+ _ASM_EXTABLE_FAULT(1b, 2b)
+.Lseamcall_out:
.else
tdcall
.endif
--
2.40.1



2023-06-07 00:14:44

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On Mon, Jun 05, 2023 at 02:27:18AM +1200,
Kai Huang <[email protected]> wrote:

> TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
> mode runs only the TDX module itself or other code to load the TDX
> module.
>
> The host kernel communicates with SEAM software via a new SEAMCALL
> instruction. This is conceptually similar to a guest->host hypercall,
> except it is made from the host to SEAM software instead. The TDX
> module establishes a new SEAMCALL ABI which allows the host to
> initialize the module and to manage VMs.
>
> Add infrastructure to make SEAMCALLs. The SEAMCALL ABI is very similar
> to the TDCALL ABI and leverages much TDCALL infrastructure.
>
> SEAMCALL instruction causes #GP when TDX isn't BIOS enabled, and #UD
> when CPU is not in VMX operation. Currently, only KVM code mocks with
> VMX enabling, and KVM is the only user of TDX. This implementation
> chooses to make KVM itself responsible for enabling VMX before using
> TDX and let the rest of the kernel stay blissfully unaware of VMX.
>
> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> first. Architecturally, there is no CPU flag to check whether the CPU
> is in VMX operation. Also, if a BIOS were buggy, it could still report
> valid TDX private KeyIDs when TDX actually couldn't be enabled.
>
> Extend the TDX_MODULE_CALL macro to handle #UD and #GP to return error
> codes. Introduce two new TDX error codes for them respectively so the
> caller can distinguish.
>
> Also add a wrapper function of SEAMCALL to convert SEAMCALL error code
> to the kernel error code, and print out SEAMCALL error code to help the
> user to understand what went wrong.
>
> Signed-off-by: Kai Huang <[email protected]>
> ---
>
> v10 -> v11:
> - No update
>
> v9 -> v10:
> - Make the TDX_SEAMCALL_{GP|UD} error codes unconditional but doesn't
> define them when INTEL_TDX_HOST is enabled. (Dave)
> - Slightly improved changelog to explain why add assembly code to handle
> #UD and #GP.
>
> v8 -> v9:
> - Changed patch title (Dave).
> - Enhanced seamcall() to include the cpu id to the error message when
> SEAMCALL fails.
>
> v7 -> v8:
> - Improved changelog (Dave):
> - Trim down some sentences (Dave).
> - Removed __seamcall() and seamcall() function name and changed
> accordingly (Dave).
> - Improved the sentence explaining why to handle #GP (Dave).
> - Added code to print out error message in seamcall(), following
> the idea that tdx_enable() to return universal error and print out
> error message to make clear what's going wrong (Dave). Also mention
> this in changelog.
>
> v6 -> v7:
> - No change.
>
> v5 -> v6:
> - Added code to handle #UD and #GP (Dave).
> - Moved the seamcall() wrapper function to this patch, and used a
> temporary __always_unused to avoid compile warning (Dave).
>
> - v3 -> v5 (no feedback on v4):
> - Explicitly tell TDX_SEAMCALL_VMFAILINVALID is returned if the
> SEAMCALL itself fails.
> - Improve the changelog.
>
> ---
> arch/x86/include/asm/tdx.h | 5 +++
> arch/x86/virt/vmx/tdx/Makefile | 2 +-
> arch/x86/virt/vmx/tdx/seamcall.S | 52 +++++++++++++++++++++++++++++
> arch/x86/virt/vmx/tdx/tdx.c | 56 ++++++++++++++++++++++++++++++++
> arch/x86/virt/vmx/tdx/tdx.h | 10 ++++++
> arch/x86/virt/vmx/tdx/tdxcall.S | 19 +++++++++--
> 6 files changed, 141 insertions(+), 3 deletions(-)
> create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S
> create mode 100644 arch/x86/virt/vmx/tdx/tdx.h
>
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 4dfe2e794411..b489b5b9de5d 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -8,6 +8,8 @@
> #include <asm/ptrace.h>
> #include <asm/shared/tdx.h>
>
> +#include <asm/trapnr.h>
> +
> /*
> * SW-defined error codes.
> *
> @@ -18,6 +20,9 @@
> #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
> #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))
>
> +#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP)
> +#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD)
> +
> #ifndef __ASSEMBLY__
>
> /* TDX supported page sizes from the TDX module ABI. */
> diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
> index 93ca8b73e1f1..38d534f2c113 100644
> --- a/arch/x86/virt/vmx/tdx/Makefile
> +++ b/arch/x86/virt/vmx/tdx/Makefile
> @@ -1,2 +1,2 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -obj-y += tdx.o
> +obj-y += tdx.o seamcall.o
> diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
> new file mode 100644
> index 000000000000..f81be6b9c133
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/seamcall.S
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include <linux/linkage.h>
> +#include <asm/frame.h>
> +
> +#include "tdxcall.S"
> +
> +/*
> + * __seamcall() - Host-side interface functions to SEAM software module
> + * (the P-SEAMLDR or the TDX module).
> + *
> + * Transform function call register arguments into the SEAMCALL register
> + * ABI. Return TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself fails,
> + * or the completion status of the SEAMCALL leaf function. Additional
> + * output operands are saved in @out (if it is provided by the caller).
> + *
> + *-------------------------------------------------------------------------
> + * SEAMCALL ABI:
> + *-------------------------------------------------------------------------
> + * Input Registers:
> + *
> + * RAX - SEAMCALL Leaf number.
> + * RCX,RDX,R8-R9 - SEAMCALL Leaf specific input registers.
> + *
> + * Output Registers:
> + *
> + * RAX - SEAMCALL completion status code.
> + * RCX,RDX,R8-R11 - SEAMCALL Leaf specific output registers.
> + *
> + *-------------------------------------------------------------------------
> + *
> + * __seamcall() function ABI:
> + *
> + * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
> + * @rcx (RSI) - Input parameter 1, moved to RCX
> + * @rdx (RDX) - Input parameter 2, moved to RDX
> + * @r8 (RCX) - Input parameter 3, moved to R8
> + * @r9 (R8) - Input parameter 4, moved to R9
> + *
> + * @out (R9) - struct tdx_module_output pointer
> + * stored temporarily in R12 (not
> + * used by the P-SEAMLDR or the TDX
> + * module). It can be NULL.
> + *
> + * Return (via RAX) the completion status of the SEAMCALL, or
> + * TDX_SEAMCALL_VMFAILINVALID.
> + */
> +SYM_FUNC_START(__seamcall)
> + FRAME_BEGIN
> + TDX_MODULE_CALL host=1
> + FRAME_END
> + RET
> +SYM_FUNC_END(__seamcall)
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 2d91e7120c90..e82713dd5d54 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -12,14 +12,70 @@
> #include <linux/init.h>
> #include <linux/errno.h>
> #include <linux/printk.h>
> +#include <linux/smp.h>
> #include <asm/msr-index.h>
> #include <asm/msr.h>
> #include <asm/tdx.h>
> +#include "tdx.h"
>
> static u32 tdx_global_keyid __ro_after_init;
> static u32 tdx_guest_keyid_start __ro_after_init;
> static u32 tdx_nr_guest_keyids __ro_after_init;
>
> +/*
> + * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
> + * to kernel error code. @seamcall_ret and @out contain the SEAMCALL
> + * leaf function return code and the additional output respectively if
> + * not NULL.
> + */
> +static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> + u64 *seamcall_ret,
> + struct tdx_module_output *out)
> +{
> + int cpu, ret = 0;
> + u64 sret;
> +
> + /* Need a stable CPU id for printing error message */
> + cpu = get_cpu();
> +
> + sret = __seamcall(fn, rcx, rdx, r8, r9, out);
> +
> + /* Save SEAMCALL return code if the caller wants it */
> + if (seamcall_ret)
> + *seamcall_ret = sret;
> +
> + /* SEAMCALL was successful */
> + if (!sret)
> + goto out;
> +
> + switch (sret) {
> + case TDX_SEAMCALL_GP:
> + pr_err_once("[firmware bug]: TDX is not enabled by BIOS.\n");
> + ret = -ENODEV;
> + break;
> + case TDX_SEAMCALL_VMFAILINVALID:
> + pr_err_once("TDX module is not loaded.\n");
> + ret = -ENODEV;
> + break;
> + case TDX_SEAMCALL_UD:
> + pr_err_once("SEAMCALL failed: CPU %d is not in VMX operation.\n",
> + cpu);
> + ret = -EINVAL;
> + break;
> + default:
> + pr_err_once("SEAMCALL failed: CPU %d: leaf %llu, error 0x%llx.\n",
> + cpu, fn, sret);
> + if (out)
> + pr_err_once("additional output: rcx 0x%llx, rdx 0x%llx, r8 0x%llx, r9 0x%llx, r10 0x%llx, r11 0x%llx.\n",
> + out->rcx, out->rdx, out->r8,
> + out->r9, out->r10, out->r11);
> + ret = -EIO;
> + }
> +out:
> + put_cpu();
> + return ret;
> +}
> +
> static int __init record_keyid_partitioning(u32 *tdx_keyid_start,
> u32 *nr_tdx_keyids)
> {
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> new file mode 100644
> index 000000000000..48ad1a1ba737
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _X86_VIRT_TDX_H
> +#define _X86_VIRT_TDX_H
> +
> +#include <linux/types.h>
> +
> +struct tdx_module_output;
> +u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> + struct tdx_module_output *out);
> +#endif
> diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
> index 49a54356ae99..757b0c34be10 100644
> --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> @@ -1,6 +1,7 @@
> /* SPDX-License-Identifier: GPL-2.0 */
> #include <asm/asm-offsets.h>
> #include <asm/tdx.h>
> +#include <asm/asm.h>
>
> /*
> * TDCALL and SEAMCALL are supported in Binutils >= 2.36.
> @@ -45,6 +46,7 @@
> /* Leave input param 2 in RDX */
>
> .if \host
> +1:
> seamcall
> /*
> * SEAMCALL instruction is essentially a VMExit from VMX root
> @@ -57,10 +59,23 @@
> * This value will never be used as actual SEAMCALL error code as
> * it is from the Reserved status code class.
> */
> - jnc .Lno_vmfailinvalid
> + jnc .Lseamcall_out
> mov $TDX_SEAMCALL_VMFAILINVALID, %rax
> -.Lno_vmfailinvalid:
> + jmp .Lseamcall_out
> +2:
> + /*
> + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> + * the trap number. Convert the trap number to the TDX error
> + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
> + *
> + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
> + * only accepts 32-bit immediate at most.
> + */
> + mov $TDX_SW_ERROR, %r12
> + orq %r12, %rax
>
> + _ASM_EXTABLE_FAULT(1b, 2b)
> +.Lseamcall_out:
> .else
> tdcall
> .endif
> --
> 2.40.1
>

Reviewed-by: Isaku Yamahata <[email protected]>
--
Isaku Yamahata <[email protected]>

2023-06-07 14:53:46

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On 6/4/23 07:27, Kai Huang wrote:
> TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
> mode runs only the TDX module itself or other code to load the TDX
> module.
>
> The host kernel communicates with SEAM software via a new SEAMCALL
> instruction. This is conceptually similar to a guest->host hypercall,
> except it is made from the host to SEAM software instead. The TDX
> module establishes a new SEAMCALL ABI which allows the host to
> initialize the module and to manage VMs.
>
> Add infrastructure to make SEAMCALLs. The SEAMCALL ABI is very similar
> to the TDCALL ABI and leverages much TDCALL infrastructure.
>
> SEAMCALL instruction causes #GP when TDX isn't BIOS enabled, and #UD
> when CPU is not in VMX operation. Currently, only KVM code mocks with

"mocks"? Did you mean "mucks"?

> VMX enabling, and KVM is the only user of TDX. This implementation
> chooses to make KVM itself responsible for enabling VMX before using
> TDX and let the rest of the kernel stay blissfully unaware of VMX.
>
> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> first. Architecturally, there is no CPU flag to check whether the CPU
> is in VMX operation. Also, if a BIOS were buggy, it could still report
> valid TDX private KeyIDs when TDX actually couldn't be enabled.

I'm not sure this is a great justification. If the BIOS is lying to the
OS, we _should_ oops.

How else can this happen other than silly kernel bugs. It's OK to oops
in the face of silly kernel bugs.

...
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 4dfe2e794411..b489b5b9de5d 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -8,6 +8,8 @@
> #include <asm/ptrace.h>
> #include <asm/shared/tdx.h>
>
> +#include <asm/trapnr.h>
> +
> /*
> * SW-defined error codes.
> *
> @@ -18,6 +20,9 @@
> #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40))
> #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000))
>
> +#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP)
> +#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD)
> +
> #ifndef __ASSEMBLY__
>
> /* TDX supported page sizes from the TDX module ABI. */
> diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
> index 93ca8b73e1f1..38d534f2c113 100644
> --- a/arch/x86/virt/vmx/tdx/Makefile
> +++ b/arch/x86/virt/vmx/tdx/Makefile
> @@ -1,2 +1,2 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -obj-y += tdx.o
> +obj-y += tdx.o seamcall.o
> diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
> new file mode 100644
> index 000000000000..f81be6b9c133
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/seamcall.S
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include <linux/linkage.h>
> +#include <asm/frame.h>
> +
> +#include "tdxcall.S"
> +
> +/*
> + * __seamcall() - Host-side interface functions to SEAM software module
> + * (the P-SEAMLDR or the TDX module).
> + *
> + * Transform function call register arguments into the SEAMCALL register
> + * ABI. Return TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself fails,
> + * or the completion status of the SEAMCALL leaf function. Additional
> + * output operands are saved in @out (if it is provided by the caller).
> + *
> + *-------------------------------------------------------------------------
> + * SEAMCALL ABI:
> + *-------------------------------------------------------------------------
> + * Input Registers:
> + *
> + * RAX - SEAMCALL Leaf number.
> + * RCX,RDX,R8-R9 - SEAMCALL Leaf specific input registers.
> + *
> + * Output Registers:
> + *
> + * RAX - SEAMCALL completion status code.
> + * RCX,RDX,R8-R11 - SEAMCALL Leaf specific output registers.
> + *
> + *-------------------------------------------------------------------------
> + *
> + * __seamcall() function ABI:
> + *
> + * @fn (RDI) - SEAMCALL Leaf number, moved to RAX
> + * @rcx (RSI) - Input parameter 1, moved to RCX
> + * @rdx (RDX) - Input parameter 2, moved to RDX
> + * @r8 (RCX) - Input parameter 3, moved to R8
> + * @r9 (R8) - Input parameter 4, moved to R9
> + *
> + * @out (R9) - struct tdx_module_output pointer
> + * stored temporarily in R12 (not
> + * used by the P-SEAMLDR or the TDX
> + * module). It can be NULL.
> + *
> + * Return (via RAX) the completion status of the SEAMCALL, or
> + * TDX_SEAMCALL_VMFAILINVALID.
> + */
> +SYM_FUNC_START(__seamcall)
> + FRAME_BEGIN
> + TDX_MODULE_CALL host=1
> + FRAME_END
> + RET
> +SYM_FUNC_END(__seamcall)
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 2d91e7120c90..e82713dd5d54 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -12,14 +12,70 @@
> #include <linux/init.h>
> #include <linux/errno.h>
> #include <linux/printk.h>
> +#include <linux/smp.h>
> #include <asm/msr-index.h>
> #include <asm/msr.h>
> #include <asm/tdx.h>
> +#include "tdx.h"
>
> static u32 tdx_global_keyid __ro_after_init;
> static u32 tdx_guest_keyid_start __ro_after_init;
> static u32 tdx_nr_guest_keyids __ro_after_init;
>
> +/*
> + * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
> + * to kernel error code. @seamcall_ret and @out contain the SEAMCALL
> + * leaf function return code and the additional output respectively if
> + * not NULL.
> + */
> +static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> + u64 *seamcall_ret,
> + struct tdx_module_output *out)
> +{
> + int cpu, ret = 0;
> + u64 sret;
> +
> + /* Need a stable CPU id for printing error message */
> + cpu = get_cpu();
> +
> + sret = __seamcall(fn, rcx, rdx, r8, r9, out);
> +
> + /* Save SEAMCALL return code if the caller wants it */
> + if (seamcall_ret)
> + *seamcall_ret = sret;
> +
> + /* SEAMCALL was successful */
> + if (!sret)
> + goto out;
> +
> + switch (sret) {
> + case TDX_SEAMCALL_GP:
> + pr_err_once("[firmware bug]: TDX is not enabled by BIOS.\n");
> + ret = -ENODEV;
> + break;
> + case TDX_SEAMCALL_VMFAILINVALID:
> + pr_err_once("TDX module is not loaded.\n");
> + ret = -ENODEV;
> + break;
> + case TDX_SEAMCALL_UD:
> + pr_err_once("SEAMCALL failed: CPU %d is not in VMX operation.\n",
> + cpu);
> + ret = -EINVAL;
> + break;
> + default:
> + pr_err_once("SEAMCALL failed: CPU %d: leaf %llu, error 0x%llx.\n",
> + cpu, fn, sret);
> + if (out)
> + pr_err_once("additional output: rcx 0x%llx, rdx 0x%llx, r8 0x%llx, r9 0x%llx, r10 0x%llx, r11 0x%llx.\n",
> + out->rcx, out->rdx, out->r8,
> + out->r9, out->r10, out->r11);
> + ret = -EIO;
> + }
> +out:
> + put_cpu();
> + return ret;
> +}

This fails to distinguish two very different things:

* A SEAMCALL error
and
* A SEAMCALL *failure*

"Errors" are normal. Hypercalls can return errors and so can SEAMCALLs.
No biggie.

But SEAMCALL failures are another matter. They mean that something
really fundamental is *BROKEN*.

Just saying "SEAMCALL was successful" is a bit ambiguous for me.

> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> new file mode 100644
> index 000000000000..48ad1a1ba737
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _X86_VIRT_TDX_H
> +#define _X86_VIRT_TDX_H
> +
> +#include <linux/types.h>
> +
> +struct tdx_module_output;
> +u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> + struct tdx_module_output *out);
> +#endif
> diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
> index 49a54356ae99..757b0c34be10 100644
> --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> @@ -1,6 +1,7 @@
> /* SPDX-License-Identifier: GPL-2.0 */
> #include <asm/asm-offsets.h>
> #include <asm/tdx.h>
> +#include <asm/asm.h>
>
> /*
> * TDCALL and SEAMCALL are supported in Binutils >= 2.36.
> @@ -45,6 +46,7 @@
> /* Leave input param 2 in RDX */
>
> .if \host
> +1:
> seamcall
> /*
> * SEAMCALL instruction is essentially a VMExit from VMX root
> @@ -57,10 +59,23 @@
> * This value will never be used as actual SEAMCALL error code as
> * it is from the Reserved status code class.
> */
> - jnc .Lno_vmfailinvalid
> + jnc .Lseamcall_out
> mov $TDX_SEAMCALL_VMFAILINVALID, %rax
> -.Lno_vmfailinvalid:
> + jmp .Lseamcall_out
> +2:
> + /*
> + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> + * the trap number. Convert the trap number to the TDX error
> + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
> + *
> + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
> + * only accepts 32-bit immediate at most.
> + */
> + mov $TDX_SW_ERROR, %r12
> + orq %r12, %rax

I think the justification for doing the #UD/#GP handling is a bit weak.
In the end, it gets us a nicer error message. Is that error message
*REALLY* needed? Or is an oops OK in the very rare circumstance that
the BIOS is totally buggy?


2023-06-07 19:10:18

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On Wed, Jun 07, 2023 at 07:24:23AM -0700,
Dave Hansen <[email protected]> wrote:

> On 6/4/23 07:27, Kai Huang wrote:
> > TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
> > mode runs only the TDX module itself or other code to load the TDX
> > module.
> >
> > The host kernel communicates with SEAM software via a new SEAMCALL
> > instruction. This is conceptually similar to a guest->host hypercall,
> > except it is made from the host to SEAM software instead. The TDX
> > module establishes a new SEAMCALL ABI which allows the host to
> > initialize the module and to manage VMs.
> >
> > Add infrastructure to make SEAMCALLs. The SEAMCALL ABI is very similar
> > to the TDCALL ABI and leverages much TDCALL infrastructure.
> >
> > SEAMCALL instruction causes #GP when TDX isn't BIOS enabled, and #UD
> > when CPU is not in VMX operation. Currently, only KVM code mocks with
>
> "mocks"? Did you mean "mucks"?
>
> > VMX enabling, and KVM is the only user of TDX. This implementation
> > chooses to make KVM itself responsible for enabling VMX before using
> > TDX and let the rest of the kernel stay blissfully unaware of VMX.
> >
> > The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> > kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> > first. Architecturally, there is no CPU flag to check whether the CPU
> > is in VMX operation. Also, if a BIOS were buggy, it could still report
> > valid TDX private KeyIDs when TDX actually couldn't be enabled.
>
> I'm not sure this is a great justification. If the BIOS is lying to the
> OS, we _should_ oops.
>
> How else can this happen other than silly kernel bugs. It's OK to oops
> in the face of silly kernel bugs.

TDX KVM + reboot can hit #UD. On reboot, VMX is disabled (VMXOFF) via
syscore.shutdown callback. However, guest TD can be still running to issue
SEAMCALL resulting in #UD.

Or we can postpone the change and make the TDX KVM patch series carry a patch
for it.
--
Isaku Yamahata <[email protected]>

2023-06-07 19:35:28

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On 6/7/23 11:53, Isaku Yamahata wrote:
>>> VMX enabling, and KVM is the only user of TDX. This implementation
>>> chooses to make KVM itself responsible for enabling VMX before using
>>> TDX and let the rest of the kernel stay blissfully unaware of VMX.
>>>
>>> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
>>> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
>>> first. Architecturally, there is no CPU flag to check whether the CPU
>>> is in VMX operation. Also, if a BIOS were buggy, it could still report
>>> valid TDX private KeyIDs when TDX actually couldn't be enabled.
>> I'm not sure this is a great justification. If the BIOS is lying to the
>> OS, we _should_ oops.
>>
>> How else can this happen other than silly kernel bugs. It's OK to oops
>> in the face of silly kernel bugs.
> TDX KVM + reboot can hit #UD. On reboot, VMX is disabled (VMXOFF) via
> syscore.shutdown callback. However, guest TD can be still running to issue
> SEAMCALL resulting in #UD.
>
> Or we can postpone the change and make the TDX KVM patch series carry a patch
> for it.

How does the existing KVM use of VMLAUNCH/VMRESUME avoid that problem?

2023-06-07 20:01:39

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On Wed, Jun 07, 2023 at 12:27:33PM -0700,
Dave Hansen <[email protected]> wrote:

> On 6/7/23 11:53, Isaku Yamahata wrote:
> >>> VMX enabling, and KVM is the only user of TDX. This implementation
> >>> chooses to make KVM itself responsible for enabling VMX before using
> >>> TDX and let the rest of the kernel stay blissfully unaware of VMX.
> >>>
> >>> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> >>> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> >>> first. Architecturally, there is no CPU flag to check whether the CPU
> >>> is in VMX operation. Also, if a BIOS were buggy, it could still report
> >>> valid TDX private KeyIDs when TDX actually couldn't be enabled.
> >> I'm not sure this is a great justification. If the BIOS is lying to the
> >> OS, we _should_ oops.
> >>
> >> How else can this happen other than silly kernel bugs. It's OK to oops
> >> in the face of silly kernel bugs.
> > TDX KVM + reboot can hit #UD. On reboot, VMX is disabled (VMXOFF) via
> > syscore.shutdown callback. However, guest TD can be still running to issue
> > SEAMCALL resulting in #UD.
> >
> > Or we can postpone the change and make the TDX KVM patch series carry a patch
> > for it.
>
> How does the existing KVM use of VMLAUNCH/VMRESUME avoid that problem?

extable. From arch/x86/kvm/vmx/vmenter.S

.Lvmresume:
vmresume
jmp .Lvmfail

.Lvmlaunch:
vmlaunch
jmp .Lvmfail

_ASM_EXTABLE(.Lvmresume, .Lfixup)
_ASM_EXTABLE(.Lvmlaunch, .Lfixup)


--
Isaku Yamahata <[email protected]>

2023-06-07 20:17:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On Wed, Jun 07, 2023, Isaku Yamahata wrote:
> On Wed, Jun 07, 2023 at 12:27:33PM -0700,
> Dave Hansen <[email protected]> wrote:
>
> > On 6/7/23 11:53, Isaku Yamahata wrote:
> > >>> VMX enabling, and KVM is the only user of TDX. This implementation
> > >>> chooses to make KVM itself responsible for enabling VMX before using
> > >>> TDX and let the rest of the kernel stay blissfully unaware of VMX.
> > >>>
> > >>> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> > >>> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> > >>> first. Architecturally, there is no CPU flag to check whether the CPU
> > >>> is in VMX operation. Also, if a BIOS were buggy, it could still report
> > >>> valid TDX private KeyIDs when TDX actually couldn't be enabled.
> > >> I'm not sure this is a great justification. If the BIOS is lying to the
> > >> OS, we _should_ oops.
> > >>
> > >> How else can this happen other than silly kernel bugs. It's OK to oops
> > >> in the face of silly kernel bugs.
> > > TDX KVM + reboot can hit #UD. On reboot, VMX is disabled (VMXOFF) via
> > > syscore.shutdown callback. However, guest TD can be still running to issue
> > > SEAMCALL resulting in #UD.
> > >
> > > Or we can postpone the change and make the TDX KVM patch series carry a patch
> > > for it.
> >
> > How does the existing KVM use of VMLAUNCH/VMRESUME avoid that problem?
>
> extable. From arch/x86/kvm/vmx/vmenter.S
>
> .Lvmresume:
> vmresume
> jmp .Lvmfail
>
> .Lvmlaunch:
> vmlaunch
> jmp .Lvmfail
>
> _ASM_EXTABLE(.Lvmresume, .Lfixup)
> _ASM_EXTABLE(.Lvmlaunch, .Lfixup)

More specifically, KVM eats faults on VMX and SVM instructions that occur after
KVM forcefully disables VMX/SVM.

E.g. with reboot -f, this will be reached without first stopping VMs:

static void kvm_shutdown(void)
{
/*
* Disable hardware virtualization and set kvm_rebooting to indicate
* that KVM has asynchronously disabled hardware virtualization, i.e.
* that relevant errors and exceptions aren't entirely unexpected.
* Some flavors of hardware virtualization need to be disabled before
* transferring control to firmware (to perform shutdown/reboot), e.g.
* on x86, virtualization can block INIT interrupts, which are used by
* firmware to pull APs back under firmware control. Note, this path
* is used for both shutdown and reboot scenarios, i.e. neither name is
* 100% comprehensive.
*/
pr_info("kvm: exiting hardware virtualization\n");
kvm_rebooting = true;
on_each_cpu(hardware_disable_nolock, NULL, 1);
}

which KVM x86 (VMX and SVM) then queries when deciding what to do with a spurious
fault on a VMX/SVM instruction

/*
* Handle a fault on a hardware virtualization (VMX or SVM) instruction.
*
* Hardware virtualization extension instructions may fault if a reboot turns
* off virtualization while processes are running. Usually after catching the
* fault we just panic; during reboot instead the instruction is ignored.
*/
noinstr void kvm_spurious_fault(void)
{
/* Fault while not rebooting. We want the trace. */
BUG_ON(!kvm_rebooting);
}
EXPORT_SYMBOL_GPL(kvm_spurious_fault);

2023-06-07 21:40:30

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On 6/7/23 13:08, Sean Christopherson wrote:
>>>>>> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
>>>>>> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
>>>>>> first. Architecturally, there is no CPU flag to check whether the CPU
>>>>>> is in VMX operation. Also, if a BIOS were buggy, it could still report
>>>>>> valid TDX private KeyIDs when TDX actually couldn't be enabled.
>>>>> I'm not sure this is a great justification. If the BIOS is lying to the
>>>>> OS, we _should_ oops.
>>>>>
>>>>> How else can this happen other than silly kernel bugs. It's OK to oops
>>>>> in the face of silly kernel bugs.
>>>> TDX KVM + reboot can hit #UD. On reboot, VMX is disabled (VMXOFF) via
>>>> syscore.shutdown callback. However, guest TD can be still running to issue
>>>> SEAMCALL resulting in #UD.
>>>>
>>>> Or we can postpone the change and make the TDX KVM patch series carry a patch
>>>> for it.
>>> How does the existing KVM use of VMLAUNCH/VMRESUME avoid that problem?
>> extable. From arch/x86/kvm/vmx/vmenter.S
>>
>> .Lvmresume:
>> vmresume
>> jmp .Lvmfail
>>
>> .Lvmlaunch:
>> vmlaunch
>> jmp .Lvmfail
>>
>> _ASM_EXTABLE(.Lvmresume, .Lfixup)
>> _ASM_EXTABLE(.Lvmlaunch, .Lfixup)
> More specifically, KVM eats faults on VMX and SVM instructions that occur after
> KVM forcefully disables VMX/SVM.

<grumble> That's a *TOTALLY* different argument than the patch makes.

KVM is being a _bit_ nutty here, but I do respect it trying to honor the
"-f". I have no objections to the SEAMCALL code being nutty in the same
way.

Why do I get the feeling that code is being written without
understanding _why_, despite this being v11?

2023-06-07 23:09:29

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On Wed, 2023-06-07 at 07:24 -0700, Hansen, Dave wrote:
> On 6/4/23 07:27, Kai Huang wrote:
> > TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
> > mode runs only the TDX module itself or other code to load the TDX
> > module.
> >
> > The host kernel communicates with SEAM software via a new SEAMCALL
> > instruction. This is conceptually similar to a guest->host hypercall,
> > except it is made from the host to SEAM software instead. The TDX
> > module establishes a new SEAMCALL ABI which allows the host to
> > initialize the module and to manage VMs.
> >
> > Add infrastructure to make SEAMCALLs. The SEAMCALL ABI is very similar
> > to the TDCALL ABI and leverages much TDCALL infrastructure.
> >
> > SEAMCALL instruction causes #GP when TDX isn't BIOS enabled, and #UD
> > when CPU is not in VMX operation. Currently, only KVM code mocks with
>
> "mocks"? Did you mean "mucks"?

Yes "mucks". I believe I made some mistake.

>
> > VMX enabling, and KVM is the only user of TDX. This implementation
> > chooses to make KVM itself responsible for enabling VMX before using
> > TDX and let the rest of the kernel stay blissfully unaware of VMX.
> >
> > The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> > kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> > first. Architecturally, there is no CPU flag to check whether the CPU
> > is in VMX operation. Also, if a BIOS were buggy, it could still report
> > valid TDX private KeyIDs when TDX actually couldn't be enabled.
>
> I'm not sure this is a great justification. If the BIOS is lying to the
> OS, we _should_ oops.
>
> How else can this happen other than silly kernel bugs. It's OK to oops
> in the face of silly kernel bugs.

Agreed. And I'll just remove that sentence if you agree with below ...

[...]

> > + /*
> > + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> > + * the trap number. Convert the trap number to the TDX error
> > + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
> > + *
> > + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
> > + * only accepts 32-bit immediate at most.
> > + */
> > + mov $TDX_SW_ERROR, %r12
> > + orq %r12, %rax
>
> I think the justification for doing the #UD/#GP handling is a bit weak.
> In the end, it gets us a nicer error message. Is that error message
> *REALLY* needed? Or is an oops OK in the very rare circumstance that
> the BIOS is totally buggy?

...

It's not just for the "BIOS buggy" case. The main purpose is to give an error
message when the caller mistakenly calls tdx_enable().

Also, now the machine check handler improvement patch also calls SEAMCALL to get
a given page's page type. It's totally legal that a machine check happens when
the CPU isn't in VMX operation (e.g. KVM isn't loaded), and in fact we use the
SEAMCALL return value to detect whether CPU is in VMX operation and handles such
case accordingly.

2023-06-08 01:14:23

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On Wed, 2023-06-07 at 13:22 -0700, Hansen, Dave wrote:
> On 6/7/23 13:08, Sean Christopherson wrote:
> > > > > > > The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> > > > > > > kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> > > > > > > first. Architecturally, there is no CPU flag to check whether the CPU
> > > > > > > is in VMX operation. Also, if a BIOS were buggy, it could still report
> > > > > > > valid TDX private KeyIDs when TDX actually couldn't be enabled.
> > > > > > I'm not sure this is a great justification. If the BIOS is lying to the
> > > > > > OS, we _should_ oops.
> > > > > >
> > > > > > How else can this happen other than silly kernel bugs. It's OK to oops
> > > > > > in the face of silly kernel bugs.
> > > > > TDX KVM + reboot can hit #UD. On reboot, VMX is disabled (VMXOFF) via
> > > > > syscore.shutdown callback. However, guest TD can be still running to issue
> > > > > SEAMCALL resulting in #UD.
> > > > >
> > > > > Or we can postpone the change and make the TDX KVM patch series carry a patch
> > > > > for it.
> > > > How does the existing KVM use of VMLAUNCH/VMRESUME avoid that problem?
> > > extable. From arch/x86/kvm/vmx/vmenter.S
> > >
> > > .Lvmresume:
> > > vmresume
> > > jmp .Lvmfail
> > >
> > > .Lvmlaunch:
> > > vmlaunch
> > > jmp .Lvmfail
> > >
> > > _ASM_EXTABLE(.Lvmresume, .Lfixup)
> > > _ASM_EXTABLE(.Lvmlaunch, .Lfixup)
> > More specifically, KVM eats faults on VMX and SVM instructions that occur after
> > KVM forcefully disables VMX/SVM.
>
> <grumble> That's a *TOTALLY* different argument than the patch makes.
>
> KVM is being a _bit_ nutty here, but I do respect it trying to honor the
> "-f". I have no objections to the SEAMCALL code being nutty in the same
> way.
>
> Why do I get the feeling that code is being written without
> understanding _why_, despite this being v11?

Hi Dave,

As I replied in another email, the main reason is to return an error code
instead of Oops when tdx_enable() is called mistakenly when CPU isn't in VMX
operation. Also in this version, the machine check handler can call SEAMCALL
legally when CPU isn't in VMX operation.

I once mentioned alternatively we could check CR4.VMXE to see whether CPU is in
VMX operation but looks you preferred to use EXTTABLE. From hardware's point of
view, checking CR4.VMXE isn't enough, although currently setting it and doing
VMXON are always done together with IRQ disabled.

https://lore.kernel.org/lkml/[email protected]/T/#m6e5673a191254bf36f48083cd215f7ff8f2b315b

How about I add below to the changelog?

"
The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The kernel would
hit Oops if SEAMCALL were mistakenly made when TDX is enabled by the BIOS or
when CPU isn't in VMX operation. For the former, the callers could check
platform_tdx_enabled() first, although that doesn't rule out the buggy BIOS in
which case the kernel could still get Oops. For the latter, the caller could
check CR4.VMXE based on the fact that currently setting this bit and doing VMXON
are done together when IRQ is disabled, although from hardware's perspective
checking CR4.VMXE isn't enough.

However this could be problematic if SEAMCALL is called in the cases such as
exception handler, NMI handler, etc, as disabling IRQ doesn't prevent any of
them from happening.

To have a clean solution, just make the SEAMCALL always return error code by
using EXTTABLE so the SEAMCALL can be safely called in any context. A later
patch will need to use SEAMCALL in the machine check handler. There might be
such use cases in the future too.
"

2023-06-08 14:46:24

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On 6/7/23 17:51, Huang, Kai wrote:
> How about I add below to the changelog?
>
> "
> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The kernel would
> hit Oops if SEAMCALL were mistakenly made when TDX is enabled by the BIOS or
> when CPU isn't in VMX operation. For the former, the callers could check
> platform_tdx_enabled() first, although that doesn't rule out the buggy BIOS in
> which case the kernel could still get Oops. For the latter, the caller could
> check CR4.VMXE based on the fact that currently setting this bit and doing VMXON
> are done together when IRQ is disabled, although from hardware's perspective
> checking CR4.VMXE isn't enough.
>
> However this could be problematic if SEAMCALL is called in the cases such as
> exception handler, NMI handler, etc, as disabling IRQ doesn't prevent any of
> them from happening.
>
> To have a clean solution, just make the SEAMCALL always return error code by
> using EXTTABLE so the SEAMCALL can be safely called in any context. A later
> patch will need to use SEAMCALL in the machine check handler. There might be
> such use cases in the future too.
> "

No, that's just word salad.

SEAMCALL is like VMRESUME. It's will be called by KVM in unsafe (VMX
off) contexts in normal operation like "reboot -f". That means it needs
an exception handler for #UD(???).

I don't care if a bad BIOS can cause #GP. Bad BIOS == oops. You can
argue that even if I don't care, it's worth having a nice error message
and a common place for SEAMCALL error handling. But it's not
functionally needed.

2023-06-08 15:04:45

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On 6/7/23 15:56, Huang, Kai wrote:
> It's not just for the "BIOS buggy" case. The main purpose is to give an error
> message when the caller mistakenly calls tdx_enable().

It's also OK to oops when there's a kernel bug, aka. caller mistake.

> Also, now the machine check handler improvement patch also calls SEAMCALL to get
> a given page's page type. It's totally legal that a machine check happens when
> the CPU isn't in VMX operation (e.g. KVM isn't loaded), and in fact we use the
> SEAMCALL return value to detect whether CPU is in VMX operation and handles such
> case accordingly.

Listen, I didn't say there wasn't a reason for it. I said that this
patch lacked the justification. So, stop throwing things at the wall,
pick the *REAL* reason, and go rewrite the patch, please.

2023-06-19 13:05:57

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On 04.06.23 16:27, Kai Huang wrote:
> TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
> mode runs only the TDX module itself or other code to load the TDX
> module.
>
> The host kernel communicates with SEAM software via a new SEAMCALL
> instruction. This is conceptually similar to a guest->host hypercall,
> except it is made from the host to SEAM software instead. The TDX
> module establishes a new SEAMCALL ABI which allows the host to
> initialize the module and to manage VMs.
>
> Add infrastructure to make SEAMCALLs. The SEAMCALL ABI is very similar
> to the TDCALL ABI and leverages much TDCALL infrastructure.
>
> SEAMCALL instruction causes #GP when TDX isn't BIOS enabled, and #UD
> when CPU is not in VMX operation. Currently, only KVM code mocks with
> VMX enabling, and KVM is the only user of TDX. This implementation
> chooses to make KVM itself responsible for enabling VMX before using
> TDX and let the rest of the kernel stay blissfully unaware of VMX.
>
> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> first. Architecturally, there is no CPU flag to check whether the CPU
> is in VMX operation. Also, if a BIOS were buggy, it could still report
> valid TDX private KeyIDs when TDX actually couldn't be enabled.
>
> Extend the TDX_MODULE_CALL macro to handle #UD and #GP to return error
> codes. Introduce two new TDX error codes for them respectively so the
> caller can distinguish.
>
> Also add a wrapper function of SEAMCALL to convert SEAMCALL error code
> to the kernel error code, and print out SEAMCALL error code to help the
> user to understand what went wrong.
>
> Signed-off-by: Kai Huang <[email protected]>
> ---

I agree with Dave that a buggy bios is not a good motivation for this
patch. The real strength of this infrastructure IMHO is central error
handling and expressive error messages. Maybe it makes some corner cases
(reboot -f) easier to handle. That would make a better justification
than buggy bios -- and should be spelled out in the patch description.

[...]


> +/*
> + * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
> + * to kernel error code. @seamcall_ret and @out contain the SEAMCALL
> + * leaf function return code and the additional output respectively if
> + * not NULL.
> + */
> +static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> + u64 *seamcall_ret,
> + struct tdx_module_output *out)
> +{
> + int cpu, ret = 0;
> + u64 sret;
> +
> + /* Need a stable CPU id for printing error message */
> + cpu = get_cpu();
> +
> + sret = __seamcall(fn, rcx, rdx, r8, r9, out);
> +


Why not

cpu = get_cpu();
sret = __seamcall(fn, rcx, rdx, r8, r9, out);
put_cpu();


> + /* Save SEAMCALL return code if the caller wants it */
> + if (seamcall_ret)
> + *seamcall_ret = sret;
> +
> + /* SEAMCALL was successful */
> + if (!sret)
> + goto out;

Why not move that into the switch statement below to avoid th goto?
If you do the put_cpu() early, you can avoid "ret" as well.

switch (sret) {
case 0:
/* SEAMCALL was successful */
return 0;
case TDX_SEAMCALL_GP:
pr_err_once("[firmware bug]: TDX is not enabled by BIOS.\n");
return -ENODEV;
...
}

[...]

> +
> static int __init record_keyid_partitioning(u32 *tdx_keyid_start,
> u32 *nr_tdx_keyids)
> {
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> new file mode 100644
> index 000000000000..48ad1a1ba737
> --- /dev/null
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _X86_VIRT_TDX_H
> +#define _X86_VIRT_TDX_H
> +
> +#include <linux/types.h>
> +
> +struct tdx_module_output;
> +u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> + struct tdx_module_output *out);
> +#endif
> diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
> index 49a54356ae99..757b0c34be10 100644
> --- a/arch/x86/virt/vmx/tdx/tdxcall.S
> +++ b/arch/x86/virt/vmx/tdx/tdxcall.S
> @@ -1,6 +1,7 @@
> /* SPDX-License-Identifier: GPL-2.0 */
> #include <asm/asm-offsets.h>
> #include <asm/tdx.h>
> +#include <asm/asm.h>
>
> /*
> * TDCALL and SEAMCALL are supported in Binutils >= 2.36.
> @@ -45,6 +46,7 @@
> /* Leave input param 2 in RDX */
>
> .if \host
> +1:
> seamcall
> /*
> * SEAMCALL instruction is essentially a VMExit from VMX root
> @@ -57,10 +59,23 @@
> * This value will never be used as actual SEAMCALL error code as
> * it is from the Reserved status code class.
> */
> - jnc .Lno_vmfailinvalid
> + jnc .Lseamcall_out
> mov $TDX_SEAMCALL_VMFAILINVALID, %rax
> -.Lno_vmfailinvalid:
> + jmp .Lseamcall_out
> +2:
> + /*
> + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> + * the trap number. Convert the trap number to the TDX error
> + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
> + *
> + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
> + * only accepts 32-bit immediate at most.

Not sure if that comment is really helpful here. It's a common pattern
for large immediates, no?

> + */
> + mov $TDX_SW_ERROR, %r12
> + orq %r12, %rax
>
> + _ASM_EXTABLE_FAULT(1b, 2b)
> +.Lseamcall_out:
> .else
> tdcall
> .endif

--
Cheers,

David / dhildenb


2023-06-20 10:59:18

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On Mon, 2023-06-19 at 14:52 +0200, David Hildenbrand wrote:
> On 04.06.23 16:27, Kai Huang wrote:
> > TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
> > mode runs only the TDX module itself or other code to load the TDX
> > module.
> >
> > The host kernel communicates with SEAM software via a new SEAMCALL
> > instruction. This is conceptually similar to a guest->host hypercall,
> > except it is made from the host to SEAM software instead. The TDX
> > module establishes a new SEAMCALL ABI which allows the host to
> > initialize the module and to manage VMs.
> >
> > Add infrastructure to make SEAMCALLs. The SEAMCALL ABI is very similar
> > to the TDCALL ABI and leverages much TDCALL infrastructure.
> >
> > SEAMCALL instruction causes #GP when TDX isn't BIOS enabled, and #UD
> > when CPU is not in VMX operation. Currently, only KVM code mocks with
> > VMX enabling, and KVM is the only user of TDX. This implementation
> > chooses to make KVM itself responsible for enabling VMX before using
> > TDX and let the rest of the kernel stay blissfully unaware of VMX.
> >
> > The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> > kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> > first. Architecturally, there is no CPU flag to check whether the CPU
> > is in VMX operation. Also, if a BIOS were buggy, it could still report
> > valid TDX private KeyIDs when TDX actually couldn't be enabled.
> >
> > Extend the TDX_MODULE_CALL macro to handle #UD and #GP to return error
> > codes. Introduce two new TDX error codes for them respectively so the
> > caller can distinguish.
> >
> > Also add a wrapper function of SEAMCALL to convert SEAMCALL error code
> > to the kernel error code, and print out SEAMCALL error code to help the
> > user to understand what went wrong.
> >
> > Signed-off-by: Kai Huang <[email protected]>
> > ---
>
> I agree with Dave that a buggy bios is not a good motivation for this
> patch. The real strength of this infrastructure IMHO is central error
> handling and expressive error messages. Maybe it makes some corner cases
> (reboot -f) easier to handle. That would make a better justification
> than buggy bios -- and should be spelled out in the patch description.

Agreed. Will do. Thanks!

>
> [...]
>
>
> > +/*
> > + * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
> > + * to kernel error code. @seamcall_ret and @out contain the SEAMCALL
> > + * leaf function return code and the additional output respectively if
> > + * not NULL.
> > + */
> > +static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> > + u64 *seamcall_ret,
> > + struct tdx_module_output *out)
> > +{
> > + int cpu, ret = 0;
> > + u64 sret;
> > +
> > + /* Need a stable CPU id for printing error message */
> > + cpu = get_cpu();
> > +
> > + sret = __seamcall(fn, rcx, rdx, r8, r9, out);
> > +
>
>
> Why not
>
> cpu = get_cpu();
> sret = __seamcall(fn, rcx, rdx, r8, r9, out);
> put_cpu();

Hmm.. I think this is also OK. The worst case is the error message will be
printed on remote cpu but the message still have the correct "cpu id" printed.

I'll change to above.

>
>
> > + /* Save SEAMCALL return code if the caller wants it */
> > + if (seamcall_ret)
> > + *seamcall_ret = sret;
> > +
> > + /* SEAMCALL was successful */
> > + if (!sret)
> > + goto out;
>
> Why not move that into the switch statement below to avoid th goto?
> If you do the put_cpu() early, you can avoid "ret" as well.

Yeah can do.

>
> switch (sret) {
> case 0:
> /* SEAMCALL was successful */
> return 0;
> case TDX_SEAMCALL_GP:
> pr_err_once("[firmware bug]: TDX is not enabled by BIOS.\n");
> return -ENODEV;
> ...
> }
>

[...]


> > + /*
> > + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> > + * the trap number. Convert the trap number to the TDX error
> > + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
> > + *
> > + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
> > + * only accepts 32-bit immediate at most.
>
> Not sure if that comment is really helpful here. It's a common pattern
> for large immediates, no?

I am not sure. I guess I am not expert of x86 assembly but only casual writer.

Hi Dave, Kirill,

Are you OK to remove it?

>
> > + */
> > + mov $TDX_SW_ERROR, %r12
> > + orq %r12, %rax
> >
> > + _ASM_EXTABLE_FAULT(1b, 2b)
> > +.Lseamcall_out:
> > .else
> > tdcall
> > .endif
>

2023-06-20 12:42:23

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On Tue, Jun 20, 2023 at 10:37:16AM +0000, Huang, Kai wrote:
> > > + /*
> > > + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
> > > + * the trap number. Convert the trap number to the TDX error
> > > + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
> > > + *
> > > + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
> > > + * only accepts 32-bit immediate at most.
> >
> > Not sure if that comment is really helpful here. It's a common pattern
> > for large immediates, no?
>
> I am not sure. I guess I am not expert of x86 assembly but only casual writer.
>
> Hi Dave, Kirill,
>
> Are you OK to remove it?

I would rather keep it. I wanted to ask why separate MOV is needed here,
before I read the comment. Also size of $TDX_SW_ERROR is not visible here,
so it contributes to possible confusion without the comment.

--
Kiryl Shutsemau / Kirill A. Shutemov

2023-06-20 13:05:35

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On 20.06.23 14:20, [email protected] wrote:
> On Tue, Jun 20, 2023 at 10:37:16AM +0000, Huang, Kai wrote:
>>>> + /*
>>>> + * SEAMCALL caused #GP or #UD. By reaching here %eax contains
>>>> + * the trap number. Convert the trap number to the TDX error
>>>> + * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
>>>> + *
>>>> + * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
>>>> + * only accepts 32-bit immediate at most.
>>>
>>> Not sure if that comment is really helpful here. It's a common pattern
>>> for large immediates, no?
>>
>> I am not sure. I guess I am not expert of x86 assembly but only casual writer.
>>
>> Hi Dave, Kirill,
>>
>> Are you OK to remove it?
>
> I would rather keep it. I wanted to ask why separate MOV is needed here,
> before I read the comment. Also size of $TDX_SW_ERROR is not visible here,
> so it contributes to possible confusion without the comment.
>

Fine with me, but I'd assume that the assembler will simply complain in
case we'd try to use a large immediate.

--
Cheers,

David / dhildenb


2023-06-20 15:41:22

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

On 6/19/23 05:52, David Hildenbrand wrote:
>> +    /*
>> +     * SEAMCALL caused #GP or #UD.  By reaching here %eax contains
>> +     * the trap number.  Convert the trap number to the TDX error
>> +     * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
>> +     *
>> +     * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
>> +     * only accepts 32-bit immediate at most.
>
> Not sure if that comment is really helpful here. It's a common pattern
> for large immediates, no?

It's a question of whether you write the comments for folks that read
x86 assembly all the time or not.

I think the comment is helpful.