Subject: [PATCH v1 00/11] Add TDX Guest Support (Initial support)

Hi All,

Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious
hosts and some physical attacks. This series adds the basic TDX guest
infrastructure support (including #VE handler support, and #VE support
for halt and CPUID). This is just a subset of patches in the bare minimum
TDX support patch list which is required for supporting minimal
functional TDX guest. Other basic feature features like #VE support for
IO, MMIO, boot optimization fixes and shared-mm support will be submitted
in a separate patch set. To make reviewing easier we split it into smaller
series. This series alone is not necessarily fully functional.

Also, the host-side support patches, and support for advanced TD guest
features like attestation or debug-mode will be submitted at a later time.
Also, at this point it is not secure with some known holes in drivers, and
also hasn’t been fully audited and fuzzed yet.

TDX has a lot of similarities to SEV. It enhances confidentiality and
of guest memory and state (like registers) and includes a new exception
(#VE) for the same basic reasons as SEV-ES. Like SEV-SNP (not merged
yet), TDX limits the host's ability to effect changes in the guest
physical address space. With TDX the host cannot access the guest memory,
so various functionality that would normally be done in KVM has moved
into a (paravirtualized) guest. Partially this is done using the
Virtualization Exception (#VE) and partially with direct paravirtual hooks.

The TDX architecture also includes a new CPU mode called
Secure-Arbitration Mode (SEAM). The software (TDX module) running in this
mode arbitrates interactions between host and guest and implements many of
the guarantees of the TDX architecture.

Some of the key differences between TD and regular VM is,

1. Multi CPU bring-up is done using the ACPI MADT wake-up table.
2. A new #VE exception handler is added. The TDX module injects #VE exception
   to the guest TD in cases of instructions that need to be emulated, disallowed
   MSR accesses, etc.
3. By default memory is marked as private, and TD will selectively share it with
   VMM based on need.
   
Note that the kernel will also need to be hardened against low level inputs from
the now untrusted hosts. This will be done in follow on patches.

You can find TDX related documents in the following link.

https://software.intel.com/content/www/br/pt/develop/articles/intel-trust-domain-extensions.html

Kirill A. Shutemov (7):
x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
x86/tdx: Get TD execution environment information via TDINFO
x86/traps: Add #VE support for TDX guest
x86/tdx: Add HLT support for TDX guest
x86/tdx: Wire up KVM hypercalls
x86/tdx: Add MSR support for TDX guest
x86/tdx: Handle CPUID via #VE

Kuppuswamy Sathyanarayanan (4):
x86/tdx: Introduce INTEL_TDX_GUEST config option
x86/cpufeatures: Add TDX Guest CPU feature
x86/x86: Add is_tdx_guest() interface
x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper
functions

arch/x86/Kconfig | 20 ++
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/tdx.c | 32 ++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/idtentry.h | 4 +
arch/x86/include/asm/irqflags.h | 40 ++--
arch/x86/include/asm/kvm_para.h | 21 +++
arch/x86/include/asm/paravirt.h | 20 +-
arch/x86/include/asm/paravirt_types.h | 3 +-
arch/x86/include/asm/tdx.h | 153 ++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/asm-offsets.c | 22 +++
arch/x86/kernel/head64.c | 3 +
arch/x86/kernel/idt.c | 6 +
arch/x86/kernel/paravirt.c | 4 +-
arch/x86/kernel/tdcall.S | 254 ++++++++++++++++++++++++++
arch/x86/kernel/tdx.c | 254 ++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 69 +++++++
18 files changed, 877 insertions(+), 31 deletions(-)
create mode 100644 arch/x86/boot/compressed/tdx.c
create mode 100644 arch/x86/include/asm/tdx.h
create mode 100644 arch/x86/kernel/tdcall.S
create mode 100644 arch/x86/kernel/tdx.c

--
2.25.1


Subject: [PATCH v1 06/11] x86/tdx: Get TD execution environment information via TDINFO

From: "Kirill A. Shutemov" <[email protected]>

Per Guest-Host-Communication Interface (GHCI) for Intel Trust
Domain Extensions (Intel TDX) specification, sec 2.4.2,
TDCALL[TDINFO] provides basic TD execution environment information, not
provided by CPUID.

Call TDINFO during early boot to be used for following system
initialization.

The call provides info on which bit in pfn is used to indicate that the
page is shared with the host and attributes of the TD, such as debug.

Information about the number of CPUs need not be saved because there are
no users so far for it.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/kernel/tdx.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 97b54317f799..e4383b416ef3 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -4,6 +4,17 @@
#define pr_fmt(fmt) "TDX: " fmt

#include <asm/tdx.h>
+#include <asm/vmx.h>
+
+#include <linux/cpu.h>
+
+/* TDX Module call Leaf IDs */
+#define TDINFO 1
+
+static struct {
+ unsigned int gpa_width;
+ unsigned long attributes;
+} td_info __ro_after_init;

/*
* Wrapper for simple hypercalls that only return a success/error code.
@@ -63,6 +74,19 @@ bool is_tdx_guest(void)
}
EXPORT_SYMBOL_GPL(is_tdx_guest);

+static void tdg_get_info(void)
+{
+ u64 ret;
+ struct tdx_module_output out = {0};
+
+ ret = __tdx_module_call(TDINFO, 0, 0, 0, 0, &out);
+
+ BUG_ON(ret);
+
+ td_info.gpa_width = out.rcx & GENMASK(5, 0);
+ td_info.attributes = out.rdx;
+}
+
void __init tdx_early_init(void)
{
if (!cpuid_has_tdx_guest())
@@ -70,5 +94,7 @@ void __init tdx_early_init(void)

setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);

+ tdg_get_info();
+
pr_info("TDX guest is initialized\n");
}
--
2.25.1

Subject: [PATCH v1 08/11] x86/tdx: Add HLT support for TDX guest

From: "Kirill A. Shutemov" <[email protected]>

Per Guest-Host-Communication Interface (GHCI) for Intel Trust
Domain Extensions (Intel TDX) specification, sec 3.8,
TDVMCALL[Instruction.HLT] provides HLT operation. Use it to implement
halt() and safe_halt() paravirtualization calls.

The same TDX hypercall is used to handle #VE exception due to
EXIT_REASON_HLT.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/kernel/tdx.c | 50 +++++++++++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 527d2638ddae..76b71bf56b81 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -88,6 +88,33 @@ static void tdg_get_info(void)
td_info.attributes = out.rdx;
}

+static __cpuidle void tdg_halt(void)
+{
+ u64 ret;
+
+ ret = __tdx_hypercall(EXIT_REASON_HLT, irqs_disabled(), 0, 0, 0, NULL);
+
+ /* It should never fail */
+ BUG_ON(ret);
+}
+
+static __cpuidle void tdg_safe_halt(void)
+{
+ u64 ret;
+
+ /*
+ * Enable interrupts next to the TDVMCALL to avoid
+ * performance degradation.
+ */
+ local_irq_enable();
+
+ /* IRQ is enabled, So set R12 as 0 */
+ ret = __tdx_hypercall(EXIT_REASON_HLT, 0, 0, 0, 0, NULL);
+
+ /* It should never fail */
+ BUG_ON(ret);
+}
+
unsigned long tdg_get_ve_info(struct ve_info *ve)
{
u64 ret;
@@ -114,13 +141,19 @@ unsigned long tdg_get_ve_info(struct ve_info *ve)
int tdg_handle_virtualization_exception(struct pt_regs *regs,
struct ve_info *ve)
{
- /*
- * TODO: Add handler support for various #VE exit
- * reasons. It will be added by other patches in
- * the series.
- */
- pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
- return -EFAULT;
+ switch (ve->exit_reason) {
+ case EXIT_REASON_HLT:
+ tdg_halt();
+ break;
+ default:
+ pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
+ return -EFAULT;
+ }
+
+ /* After successful #VE handling, move the IP */
+ regs->ip += ve->instr_len;
+
+ return 0;
}

void __init tdx_early_init(void)
@@ -132,5 +165,8 @@ void __init tdx_early_init(void)

tdg_get_info();

+ pv_ops.irq.safe_halt = tdg_safe_halt;
+ pv_ops.irq.halt = tdg_halt;
+
pr_info("TDX guest is initialized\n");
}
--
2.25.1

Subject: [PATCH v1 10/11] x86/tdx: Add MSR support for TDX guest

From: "Kirill A. Shutemov" <[email protected]>

Operations on context-switched MSRs can be run natively. The rest of
MSRs should be handled through TDVMCALLs.

TDVMCALL[Instruction.RDMSR] and TDVMCALL[Instruction.WRMSR] provide
MSR oprations.

You can find RDMSR and WRMSR details in Guest-Host-Communication
Interface (GHCI) for Intel Trust Domain Extensions (Intel TDX)
specification, sec 3.10, 3.11.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/kernel/tdx.c | 67 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 76b71bf56b81..af7acea500ab 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -115,6 +115,55 @@ static __cpuidle void tdg_safe_halt(void)
BUG_ON(ret);
}

+static bool tdg_is_context_switched_msr(unsigned int msr)
+{
+ switch (msr) {
+ case MSR_EFER:
+ case MSR_IA32_CR_PAT:
+ case MSR_FS_BASE:
+ case MSR_GS_BASE:
+ case MSR_KERNEL_GS_BASE:
+ case MSR_IA32_SYSENTER_CS:
+ case MSR_IA32_SYSENTER_EIP:
+ case MSR_IA32_SYSENTER_ESP:
+ case MSR_STAR:
+ case MSR_LSTAR:
+ case MSR_SYSCALL_MASK:
+ case MSR_IA32_XSS:
+ case MSR_TSC_AUX:
+ case MSR_IA32_BNDCFGS:
+ return true;
+ }
+ return false;
+}
+
+static u64 tdg_read_msr_safe(unsigned int msr, int *err)
+{
+ u64 ret;
+ struct tdx_hypercall_output out = {0};
+
+ WARN_ON_ONCE(tdg_is_context_switched_msr(msr));
+
+ ret = __tdx_hypercall(EXIT_REASON_MSR_READ, msr, 0, 0, 0, &out);
+
+ *err = ret ? -EIO : 0;
+
+ return out.r11;
+}
+
+static int tdg_write_msr_safe(unsigned int msr, unsigned int low,
+ unsigned int high)
+{
+ u64 ret;
+
+ WARN_ON_ONCE(tdg_is_context_switched_msr(msr));
+
+ ret = __tdx_hypercall(EXIT_REASON_MSR_WRITE, msr, (u64)high << 32 | low,
+ 0, 0, NULL);
+
+ return ret ? -EIO : 0;
+}
+
unsigned long tdg_get_ve_info(struct ve_info *ve)
{
u64 ret;
@@ -141,19 +190,33 @@ unsigned long tdg_get_ve_info(struct ve_info *ve)
int tdg_handle_virtualization_exception(struct pt_regs *regs,
struct ve_info *ve)
{
+ unsigned long val;
+ int ret = 0;
+
switch (ve->exit_reason) {
case EXIT_REASON_HLT:
tdg_halt();
break;
+ case EXIT_REASON_MSR_READ:
+ val = tdg_read_msr_safe(regs->cx, (unsigned int *)&ret);
+ if (!ret) {
+ regs->ax = val & UINT_MAX;
+ regs->dx = val >> 32;
+ }
+ break;
+ case EXIT_REASON_MSR_WRITE:
+ ret = tdg_write_msr_safe(regs->cx, regs->ax, regs->dx);
+ break;
default:
pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
return -EFAULT;
}

/* After successful #VE handling, move the IP */
- regs->ip += ve->instr_len;
+ if (!ret)
+ regs->ip += ve->instr_len;

- return 0;
+ return ret;
}

void __init tdx_early_init(void)
--
2.25.1

Subject: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature

Add CPU feature detection for Trusted Domain Extensions support.
TDX feature adds capabilities to keep guest register state and
memory isolated from hypervisor.

For TDX guest platforms, executing CPUID(0x21, 0) will return
following values in EAX, EBX, ECX and EDX.

EAX: Maximum sub-leaf number: 0
EBX/EDX/ECX: Vendor string:

EBX = "Inte"
EDX = "lTDX"
ECX = " "

So when above condition is true, set X86_FEATURE_TDX_GUEST
feature cap bit

Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/tdx.h | 20 ++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/head64.c | 3 +++
arch/x86/kernel/tdx.c | 30 ++++++++++++++++++++++++++++++
5 files changed, 55 insertions(+)
create mode 100644 arch/x86/include/asm/tdx.h
create mode 100644 arch/x86/kernel/tdx.c

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ac37830ae941..dddc3a27cc8a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,7 @@
#define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
#define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */
#define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted function */
+#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Trusted Domain Extensions Guest */

/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
new file mode 100644
index 000000000000..679500e807f3
--- /dev/null
+++ b/arch/x86/include/asm/tdx.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation */
+#ifndef _ASM_X86_TDX_H
+#define _ASM_X86_TDX_H
+
+#define TDX_CPUID_LEAF_ID 0x21
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+
+#include <asm/cpufeature.h>
+
+void __init tdx_early_init(void);
+
+#else // !CONFIG_INTEL_TDX_GUEST
+
+static inline void tdx_early_init(void) { };
+
+#endif /* CONFIG_INTEL_TDX_GUEST */
+
+#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0f66682ac02a..af09ce93a641 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -126,6 +126,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o
obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o

obj-$(CONFIG_JAILHOUSE_GUEST) += jailhouse.o
+obj-$(CONFIG_INTEL_TDX_GUEST) += tdx.o

obj-$(CONFIG_EISA) += eisa.o
obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..d1a4942ae160 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,7 @@
#include <asm/extable.h>
#include <asm/trapnr.h>
#include <asm/sev.h>
+#include <asm/tdx.h>

/*
* Manage page tables very early on.
@@ -491,6 +492,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)

kasan_early_init();

+ tdx_early_init();
+
idt_setup_early_handler();

copy_bootdata(__va(real_mode_data));
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
new file mode 100644
index 000000000000..5b14b72e41c5
--- /dev/null
+++ b/arch/x86/kernel/tdx.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2020 Intel Corporation */
+
+#include <asm/tdx.h>
+
+static inline bool cpuid_has_tdx_guest(void)
+{
+ u32 eax, signature[3];
+
+ if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
+ return false;
+
+ cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &signature[0],
+ &signature[1], &signature[2]);
+
+ if (memcmp("IntelTDX ", signature, 12))
+ return false;
+
+ return true;
+}
+
+void __init tdx_early_init(void)
+{
+ if (!cpuid_has_tdx_guest())
+ return;
+
+ setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
+
+ pr_info("TDX guest is initialized\n");
+}
--
2.25.1

Subject: [PATCH v1 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions

Guests communicate with VMMs with hypercalls. Historically, these
are implemented using instructions that are known to cause VMEXITs
like vmcall, vmlaunch, etc. However, with TDX, VMEXITs no longer
expose guest state to the host.  This prevents the old hypercall
mechanisms from working. So to communicate with VMM, TDX
specification defines a new instruction called TDCALL.

In a TDX based VM, since VMM is an untrusted entity, a intermediary
layer (TDX module) exists between host and guest to facilitate
secure communication. TDX guests communicate with the TDX module
using TDCALL instruction. 

Both TDX module and VMM communication uses TDCALL instruction. Value
of the RAX register when executing TDCALL instruction is used to
determine the TDCALL type. If the TDCALL is executed with value "0"
in RAX, then it is the service request to VMM. Any other value in
RAX means it is a TDX module related call.

Implement common helper functions to communicate with the TDX Module
and VMM (using TDCALL instruction).
   
__tdx_hypercall() - request services from the VMM.
__tdx_module_call()  - communicate with the TDX Module.

Also define two additional wrappers, tdx_hypercall() and
tdx_hypercall_out_r11() to cover common use cases of
__tdx_hypercall() function. Since each use case of
__tdx_module_call() is different, it does not need
multiple wrappers.

Implement __tdx_module_call() and __tdx_hypercall() helper functions
in assembly.

Rationale behind choosing to use assembly over inline assembly is,
since the number of lines of instructions (with comments) in
__tdx_hypercall() implementation is over 70, using inline assembly
to implement it will make it hard to read.
   
Also, just like syscalls, not all TDVMCALL/TDCALLs use cases need to
use the same set of argument registers. The implementation here picks
the current worst-case scenario for TDCALL (4 registers). For TDCALLs
with fewer than 4 arguments, there will end up being a few superfluous
(cheap) instructions.  But, this approach maximizes code reuse. The
same argument applies to __tdx_hypercall() function as well.

For registers used by TDCALL instruction, please check TDX GHCI
specification, sec 2.4 and 3.

https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf

Originally-by: Sean Christopherson <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/include/asm/tdx.h | 38 ++++++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/asm-offsets.c | 22 ++++
arch/x86/kernel/tdcall.S | 228 ++++++++++++++++++++++++++++++++++
arch/x86/kernel/tdx.c | 38 ++++++
5 files changed, 327 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/tdcall.S

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 69af72d08d3d..fcd42119a287 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -8,12 +8,50 @@
#ifdef CONFIG_INTEL_TDX_GUEST

#include <asm/cpufeature.h>
+#include <linux/types.h>
+
+/*
+ * Used in __tdx_module_call() helper function to gather the
+ * output registers' values of TDCALL instruction when requesting
+ * services from the TDX module. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct tdx_module_output {
+ u64 rcx;
+ u64 rdx;
+ u64 r8;
+ u64 r9;
+ u64 r10;
+ u64 r11;
+};
+
+/*
+ * Used in __tdx_hypercall() helper function to gather the
+ * output registers' values of TDCALL instruction when requesting
+ * services from the VMM. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct tdx_hypercall_output {
+ u64 r11;
+ u64 r12;
+ u64 r13;
+ u64 r14;
+ u64 r15;
+};

/* Common API to check TDX support in decompression and common kernel code. */
bool is_tdx_guest(void);

void __init tdx_early_init(void);

+/* Helper function used to communicate with the TDX module */
+u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+ struct tdx_module_output *out);
+
+/* Helper function used to request services from VMM */
+u64 __tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15,
+ struct tdx_hypercall_output *out);
+
#else // !CONFIG_INTEL_TDX_GUEST

static inline bool is_tdx_guest(void)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index af09ce93a641..3410f03ef7aa 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -126,7 +126,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o
obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o

obj-$(CONFIG_JAILHOUSE_GUEST) += jailhouse.o
-obj-$(CONFIG_INTEL_TDX_GUEST) += tdx.o
+obj-$(CONFIG_INTEL_TDX_GUEST) += tdcall.o tdx.o

obj-$(CONFIG_EISA) += eisa.o
obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ecd3fd6993d1..ba6784300174 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -23,6 +23,10 @@
#include <xen/interface/xen.h>
#endif

+#ifdef CONFIG_INTEL_TDX_GUEST
+#include <asm/tdx.h>
+#endif
+
#ifdef CONFIG_X86_32
# include "asm-offsets_32.c"
#else
@@ -68,6 +72,24 @@ static void __used common(void)
OFFSET(XEN_vcpu_info_arch_cr2, vcpu_info, arch.cr2);
#endif

+#ifdef CONFIG_INTEL_TDX_GUEST
+ BLANK();
+ /* Offset for fields in tdx_module_output */
+ OFFSET(TDX_MODULE_rcx, tdx_module_output, rcx);
+ OFFSET(TDX_MODULE_rdx, tdx_module_output, rdx);
+ OFFSET(TDX_MODULE_r8, tdx_module_output, r8);
+ OFFSET(TDX_MODULE_r9, tdx_module_output, r9);
+ OFFSET(TDX_MODULE_r10, tdx_module_output, r10);
+ OFFSET(TDX_MODULE_r11, tdx_module_output, r11);
+
+ /* Offset for fields in tdx_hypercall_output */
+ OFFSET(TDX_HYPERCALL_r11, tdx_hypercall_output, r11);
+ OFFSET(TDX_HYPERCALL_r12, tdx_hypercall_output, r12);
+ OFFSET(TDX_HYPERCALL_r13, tdx_hypercall_output, r13);
+ OFFSET(TDX_HYPERCALL_r14, tdx_hypercall_output, r14);
+ OFFSET(TDX_HYPERCALL_r15, tdx_hypercall_output, r15);
+#endif
+
BLANK();
OFFSET(BP_scratch, boot_params, scratch);
OFFSET(BP_secure_boot, boot_params, secure_boot);
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
new file mode 100644
index 000000000000..d95af4486155
--- /dev/null
+++ b/arch/x86/kernel/tdcall.S
@@ -0,0 +1,228 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <asm/asm-offsets.h>
+#include <asm/asm.h>
+#include <asm/frame.h>
+#include <asm/unwind_hints.h>
+
+#include <linux/linkage.h>
+#include <linux/bits.h>
+
+#define TDG_R10 BIT(10)
+#define TDG_R11 BIT(11)
+#define TDG_R12 BIT(12)
+#define TDG_R13 BIT(13)
+#define TDG_R14 BIT(14)
+#define TDG_R15 BIT(15)
+
+/*
+ * Expose registers R10-R15 to VMM. It is passed via RCX register
+ * to the TDX Module, which will be used by the TDX module to
+ * identify the list of registers exposed to VMM. Each bit in this
+ * mask represents a register ID. You can find the bit field details
+ * in TDX GHCI specification.
+ */
+#define TDVMCALL_EXPOSE_REGS_MASK ( TDG_R10 | TDG_R11 | \
+ TDG_R12 | TDG_R13 | \
+ TDG_R14 | TDG_R15 )
+
+/*
+ * TDX guests use the TDCALL instruction to make requests to the
+ * TDX module and hypercalls to the VMM. It is supported in
+ * Binutils >= 2.36.
+ */
+#define tdcall .byte 0x66,0x0f,0x01,0xcc
+
+/*
+ * __tdx_module_call() - Helper function used by TDX guests to request
+ * services from the TDX module (does not include VMM services).
+ *
+ * This function serves as a wrapper to move user call arguments to the
+ * correct registers as specified by "tdcall" ABI and shares it with the
+ * TDX module. If the "tdcall" operation is successful and a valid
+ * "struct tdx_module_output" pointer is available (in "out" argument),
+ * output from the TDX module is saved to the memory specified in the
+ * "out" pointer. Also the status of the "tdcall" operation is returned
+ * back to the user as a function return value.
+ *
+ * @fn (RDI) - TDCALL Leaf ID, moved to RAX
+ * @rcx (RSI) - Input parameter 1, moved to RCX
+ * @rdx (RDX) - Input parameter 2, moved to RDX
+ * @r8 (RCX) - Input parameter 3, moved to R8
+ * @r9 (R8) - Input parameter 4, moved to R9
+ *
+ * @out (R9) - struct tdx_module_output pointer
+ * stored temporarily in R12 (not
+ * shared with the TDX module)
+ *
+ * Return status of tdcall via RAX.
+ *
+ * NOTE: This function should not be used for TDX hypercall
+ * use cases.
+ */
+SYM_FUNC_START(__tdx_module_call)
+ FRAME_BEGIN
+
+ /*
+ * R12 will be used as temporary storage for
+ * struct tdx_module_output pointer. You can
+ * find struct tdx_module_output details in
+ * arch/x86/include/asm/tdx.h. Also note that
+ * registers R12-R15 are not used by TDCALL
+ * services supported by this helper function.
+ */
+ push %r12 /* Callee saved, so preserve it */
+ mov %r9, %r12 /* Move output pointer to R12 */
+
+ /* Mangle function call ABI into TDCALL ABI: */
+ mov %rdi, %rax /* Move TDCALL Leaf ID to RAX */
+ mov %r8, %r9 /* Move input 4 to R9 */
+ mov %rcx, %r8 /* Move input 3 to R8 */
+ mov %rsi, %rcx /* Move input 1 to RCX */
+ /* Leave input param 2 in RDX */
+
+ tdcall
+
+ /* Check for TDCALL success: 0 - Successful, otherwise failed */
+ test %rax, %rax
+ jnz 1f
+
+ /* Check if caller provided an output struct */
+ test %r12, %r12
+ jz 1f
+
+ /* Copy TDCALL result registers to output struct: */
+ movq %rcx, TDX_MODULE_rcx(%r12)
+ movq %rdx, TDX_MODULE_rdx(%r12)
+ movq %r8, TDX_MODULE_r8(%r12)
+ movq %r9, TDX_MODULE_r9(%r12)
+ movq %r10, TDX_MODULE_r10(%r12)
+ movq %r11, TDX_MODULE_r11(%r12)
+1:
+ pop %r12 /* Restore the state of R12 register */
+
+ FRAME_END
+ ret
+SYM_FUNC_END(__tdx_module_call)
+
+/*
+ * do_tdx_hypercall() - Helper function used by TDX guests to request
+ * services from the VMM. All requests are made via the TDX module
+ * using "TDCALL" instruction.
+ *
+ * This function is created to contain common code between vendor
+ * specific and standard type TDX hypercalls. So the caller of this
+ * function had to set the TDVMCALL type in the R10 register before
+ * calling it.
+ *
+ * This function serves as a wrapper to move user call arguments to the
+ * correct registers as specified by "tdcall" ABI and shares it with VMM
+ * via the TDX module. If the "tdcall" operation is successful and a
+ * valid "struct tdx_hypercall_output" pointer is available (in "out"
+ * argument), output from the VMM is saved to the memory specified in the
+ * "out" pointer. 
+ *
+ * @fn (RDI) - TDVMCALL function, moved to R11
+ * @r12 (RSI) - Input parameter 1, moved to R12
+ * @r13 (RDX) - Input parameter 2, moved to R13
+ * @r14 (RCX) - Input parameter 3, moved to R14
+ * @r15 (R8) - Input parameter 4, moved to R15
+ *
+ * @out (R9) - struct tdx_hypercall_output pointer
+ *
+ * On successful completion, return TDX hypercall error code.
+ *
+ */
+SYM_FUNC_START_LOCAL(do_tdx_hypercall)
+ /* Save non-volatile GPRs that are exposed to the VMM. */
+ push %r15
+ push %r14
+ push %r13
+ push %r12
+
+ /* Leave hypercall output pointer in R9, it's not clobbered by VMM */
+
+ /* Mangle function call ABI into TDCALL ABI: */
+ xor %eax, %eax /* Move TDCALL leaf ID (TDVMCALL (0)) to RAX */
+ mov %rdi, %r11 /* Move TDVMCALL function id to R11 */
+ mov %rsi, %r12 /* Move input 1 to R12 */
+ mov %rdx, %r13 /* Move input 2 to R13 */
+ mov %rcx, %r14 /* Move input 1 to R14 */
+ mov %r8, %r15 /* Move input 1 to R15 */
+ /* Caller of do_tdx_hypercall() will set TDVMCALL type in R10 */
+
+ movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
+
+ tdcall
+
+ /*
+ * Non-zero RAX values indicate a failure of TDCALL itself.
+ * Panic for those. This value is unrelated to the hypercall
+ * result in R10.
+ */
+ test %rax, %rax
+ jnz 2f
+
+ /* Move hypercall error code to RAX to return to user */
+ mov %r10, %rax
+
+ /* Check for hypercall success: 0 - Successful, otherwise failed */
+ test %rax, %rax
+ jnz 1f
+
+ /* Check if caller provided an output struct */
+ test %r9, %r9
+ jz 1f
+
+ /* Copy hypercall result registers to output struct: */
+ movq %r11, TDX_HYPERCALL_r11(%r9)
+ movq %r12, TDX_HYPERCALL_r12(%r9)
+ movq %r13, TDX_HYPERCALL_r13(%r9)
+ movq %r14, TDX_HYPERCALL_r14(%r9)
+ movq %r15, TDX_HYPERCALL_r15(%r9)
+1:
+ /*
+ * Zero out registers exposed to the VMM to avoid
+ * speculative execution with VMM-controlled values.
+ * This needs to include all registers present in
+ * TDVMCALL_EXPOSE_REGS_MASK.
+ */
+ xor %r10d, %r10d
+ xor %r11d, %r11d
+ xor %r12d, %r12d
+ xor %r13d, %r13d
+ xor %r14d, %r14d
+ xor %r15d, %r15d
+
+ /* Restore non-volatile GPRs that are exposed to the VMM. */
+ pop %r12
+ pop %r13
+ pop %r14
+ pop %r15
+
+ ret
+2:
+ /*
+ * Reaching here means failure in TDCALL execution. This is
+ * not supposed to happen in hypercalls. It means the TDX
+ * module is in buggy state. So panic.
+ */
+ ud2
+SYM_FUNC_END(do_tdx_hypercall)
+
+/*
+ * Helper function for standard type of TDVMCALLs. This assembly
+ * wrapper reuses do_tdvmcall() for standard type of hypercalls
+ * (R10 is set as zero).
+ */
+SYM_FUNC_START(__tdx_hypercall)
+ FRAME_BEGIN
+ /*
+ * R10 is not part of the function call ABI, but it is a part
+ * of the TDVMCALL ABI. So set it 0 for standard type TDVMCALL
+ * before making call to the do_tdx_hypercall().
+ */
+ xor %r10, %r10
+ call do_tdx_hypercall
+ FRAME_END
+ retq
+SYM_FUNC_END(__tdx_hypercall)
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 5e70617e9877..97b54317f799 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -1,8 +1,46 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (C) 2020 Intel Corporation */

+#define pr_fmt(fmt) "TDX: " fmt
+
#include <asm/tdx.h>

+/*
+ * Wrapper for simple hypercalls that only return a success/error code.
+ */
+static inline u64 tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
+{
+ u64 err;
+
+ err = __tdx_hypercall(fn, r12, r13, r14, r15, NULL);
+
+ if (err)
+ pr_warn_ratelimited("TDVMCALL fn:%llx failed with err:%llx\n",
+ fn, err);
+
+ return err;
+}
+
+/*
+ * Wrapper for the semi-common case where user need single output
+ * value (R11). Callers of this function does not care about the
+ * hypercall error code (mainly for IN or MMIO usecase).
+ */
+static inline u64 tdx_hypercall_out_r11(u64 fn, u64 r12, u64 r13,
+ u64 r14, u64 r15)
+{
+ struct tdx_hypercall_output out = {0};
+ u64 err;
+
+ err = __tdx_hypercall(fn, r12, r13, r14, r15, &out);
+
+ if (err)
+ pr_warn_ratelimited("TDVMCALL fn:%llx failed with err:%llx\n",
+ fn, err);
+
+ return out.r11;
+}
+
static inline bool cpuid_has_tdx_guest(void)
{
u32 eax, signature[3];
--
2.25.1

Subject: [PATCH v1 04/11] x86/x86: Add is_tdx_guest() interface

Add helper function to detect TDX feature support. It will be used
to protect TDX specific code.

Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/tdx.c | 32 +++++++++++++++++++++++++++++++
arch/x86/include/asm/tdx.h | 8 ++++++++
arch/x86/kernel/tdx.c | 6 ++++++
4 files changed, 47 insertions(+)
create mode 100644 arch/x86/boot/compressed/tdx.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 431bf7f846c3..22a2a6cc2ab4 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -98,6 +98,7 @@ ifdef CONFIG_X86_64
endif

vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o

vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
new file mode 100644
index 000000000000..0a87c1775b67
--- /dev/null
+++ b/arch/x86/boot/compressed/tdx.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * tdx.c - Early boot code for TDX
+ */
+
+#include <asm/tdx.h>
+
+static int __ro_after_init tdx_guest = -1;
+
+static inline bool native_cpuid_has_tdx_guest(void)
+{
+ u32 eax = TDX_CPUID_LEAF_ID, signature[3] = {0};
+
+ if (native_cpuid_eax(0) < TDX_CPUID_LEAF_ID)
+ return false;
+
+ native_cpuid(&eax, &signature[0], &signature[1], &signature[2]);
+
+ if (memcmp("IntelTDX ", signature, 12))
+ return false;
+
+ return true;
+}
+
+bool is_tdx_guest(void)
+{
+ if (tdx_guest < 0)
+ tdx_guest = native_cpuid_has_tdx_guest();
+
+ return !!tdx_guest;
+}
+
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 679500e807f3..69af72d08d3d 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -9,10 +9,18 @@

#include <asm/cpufeature.h>

+/* Common API to check TDX support in decompression and common kernel code. */
+bool is_tdx_guest(void);
+
void __init tdx_early_init(void);

#else // !CONFIG_INTEL_TDX_GUEST

+static inline bool is_tdx_guest(void)
+{
+ return false;
+}
+
static inline void tdx_early_init(void) { };

#endif /* CONFIG_INTEL_TDX_GUEST */
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 5b14b72e41c5..5e70617e9877 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -19,6 +19,12 @@ static inline bool cpuid_has_tdx_guest(void)
return true;
}

+bool is_tdx_guest(void)
+{
+ return static_cpu_has(X86_FEATURE_TDX_GUEST);
+}
+EXPORT_SYMBOL_GPL(is_tdx_guest);
+
void __init tdx_early_init(void)
{
if (!cpuid_has_tdx_guest())
--
2.25.1

Subject: [PATCH v1 11/11] x86/tdx: Handle CPUID via #VE

From: "Kirill A. Shutemov" <[email protected]>

TDX has three classes of CPUID leaves: some CPUID leaves
are always handled by the CPU, others are handled by the TDX module,
and some others are handled by the VMM. Since the VMM cannot directly
intercept the instruction these are reflected with a #VE exception
to the guest, which then converts it into a hypercall to the VMM,
or handled directly.

The TDX module EAS has a full list of CPUID leaves which are handled
natively or by the TDX module in 16.2. Only unknown CPUIDs are handled by
the #VE method. In practice this typically only applies to the
hypervisor specific CPUIDs unknown to the native CPU.

Therefore there is no risk of causing this in early CPUID code which
runs before the #VE handler is set up because it will never access
those exotic CPUID leaves.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/kernel/tdx.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index af7acea500ab..17725646eb30 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -164,6 +164,22 @@ static int tdg_write_msr_safe(unsigned int msr, unsigned int low,
return ret ? -EIO : 0;
}

+static void tdg_handle_cpuid(struct pt_regs *regs)
+{
+ u64 ret;
+ struct tdx_hypercall_output out = {0};
+
+ ret = __tdx_hypercall(EXIT_REASON_CPUID, regs->ax,
+ regs->cx, 0, 0, &out);
+
+ WARN_ON(ret);
+
+ regs->ax = out.r12;
+ regs->bx = out.r13;
+ regs->cx = out.r14;
+ regs->dx = out.r15;
+}
+
unsigned long tdg_get_ve_info(struct ve_info *ve)
{
u64 ret;
@@ -207,6 +223,9 @@ int tdg_handle_virtualization_exception(struct pt_regs *regs,
case EXIT_REASON_MSR_WRITE:
ret = tdg_write_msr_safe(regs->cx, regs->ax, regs->dx);
break;
+ case EXIT_REASON_CPUID:
+ tdg_handle_cpuid(regs);
+ break;
default:
pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
return -EFAULT;
--
2.25.1

Subject: [PATCH v1 09/11] x86/tdx: Wire up KVM hypercalls

From: "Kirill A. Shutemov" <[email protected]>

KVM hypercalls use the "vmcall" or "vmmcall" instructions.
Although the ABI is similar, those instructions no longer
function for TDX guests. Make vendor-specific TDVMCALLs
instead of VMCALL. This enables TDX guests to run with KVM
acting as the hypervisor. TDX guests running under other
hypervisors will continue to use those hypervisors'
hypercalls.

Since KVM driver can be built as a kernel module, export
tdx_kvm_hypercall*() to make the symbols visible to kvm.ko.

[Isaku Yamahata: proposed KVM VENDOR string]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/Kconfig | 5 +++
arch/x86/include/asm/kvm_para.h | 21 ++++++++++
arch/x86/include/asm/tdx.h | 68 +++++++++++++++++++++++++++++++++
arch/x86/kernel/tdcall.S | 26 +++++++++++++
4 files changed, 120 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ff79263aebd1..a99adc683db9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -891,6 +891,11 @@ config INTEL_TDX_GUEST
run in a CPU mode that protects the confidentiality of TD memory
contents and the TD’s CPU state from other software, including VMM.

+# This option enables KVM specific hypercalls in TDX guest.
+config INTEL_TDX_GUEST_KVM
+ def_bool y
+ depends on KVM_GUEST && INTEL_TDX_GUEST
+
endif #HYPERVISOR_GUEST

source "arch/x86/Kconfig.cpu"
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 69299878b200..9700f70300c9 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -6,6 +6,7 @@
#include <asm/alternative.h>
#include <linux/interrupt.h>
#include <uapi/asm/kvm_para.h>
+#include <asm/tdx.h>

#ifdef CONFIG_KVM_GUEST
bool kvm_check_and_clear_guest_paused(void);
@@ -32,6 +33,10 @@ static inline bool kvm_check_and_clear_guest_paused(void)
static inline long kvm_hypercall0(unsigned int nr)
{
long ret;
+
+ if (is_tdx_guest())
+ return tdx_kvm_hypercall0(nr);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr)
@@ -42,6 +47,10 @@ static inline long kvm_hypercall0(unsigned int nr)
static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
{
long ret;
+
+ if (is_tdx_guest())
+ return tdx_kvm_hypercall1(nr, p1);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1)
@@ -53,6 +62,10 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
unsigned long p2)
{
long ret;
+
+ if (is_tdx_guest())
+ return tdx_kvm_hypercall2(nr, p1, p2);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2)
@@ -64,6 +77,10 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
unsigned long p2, unsigned long p3)
{
long ret;
+
+ if (is_tdx_guest())
+ return tdx_kvm_hypercall3(nr, p1, p2, p3);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2), "d"(p3)
@@ -76,6 +93,10 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
unsigned long p4)
{
long ret;
+
+ if (is_tdx_guest())
+ return tdx_kvm_hypercall4(nr, p1, p2, p3, p4);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index f00008cf1361..f0c1912837c8 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -82,4 +82,72 @@ static inline void tdx_early_init(void) { };

#endif /* CONFIG_INTEL_TDX_GUEST */

+#ifdef CONFIG_INTEL_TDX_GUEST_KVM
+u64 __tdx_hypercall_vendor_kvm(u64 fn, u64 r12, u64 r13, u64 r14,
+ u64 r15, struct tdx_hypercall_output *out);
+
+/* Used by kvm_hypercall0() to trigger hypercall in TDX guest */
+static inline long tdx_kvm_hypercall0(unsigned int nr)
+{
+ return __tdx_hypercall_vendor_kvm(nr, 0, 0, 0, 0, NULL);
+}
+
+/* Used by kvm_hypercall1() to trigger hypercall in TDX guest */
+static inline long tdx_kvm_hypercall1(unsigned int nr, unsigned long p1)
+{
+ return __tdx_hypercall_vendor_kvm(nr, p1, 0, 0, 0, NULL);
+}
+
+/* Used by kvm_hypercall2() to trigger hypercall in TDX guest */
+static inline long tdx_kvm_hypercall2(unsigned int nr, unsigned long p1,
+ unsigned long p2)
+{
+ return __tdx_hypercall_vendor_kvm(nr, p1, p2, 0, 0, NULL);
+}
+
+/* Used by kvm_hypercall3() to trigger hypercall in TDX guest */
+static inline long tdx_kvm_hypercall3(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3)
+{
+ return __tdx_hypercall_vendor_kvm(nr, p1, p2, p3, 0, NULL);
+}
+
+/* Used by kvm_hypercall4() to trigger hypercall in TDX guest */
+static inline long tdx_kvm_hypercall4(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3,
+ unsigned long p4)
+{
+ return __tdx_hypercall_vendor_kvm(nr, p1, p2, p3, p4, NULL);
+}
+#else
+static inline long tdx_kvm_hypercall0(unsigned int nr)
+{
+ return -ENODEV;
+}
+
+static inline long tdx_kvm_hypercall1(unsigned int nr, unsigned long p1)
+{
+ return -ENODEV;
+}
+
+static inline long tdx_kvm_hypercall2(unsigned int nr, unsigned long p1,
+ unsigned long p2)
+{
+ return -ENODEV;
+}
+
+static inline long tdx_kvm_hypercall3(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3)
+{
+ return -ENODEV;
+}
+
+static inline long tdx_kvm_hypercall4(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3,
+ unsigned long p4)
+{
+ return -ENODEV;
+}
+#endif /* CONFIG_INTEL_TDX_GUEST_KVM */
+
#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
index d95af4486155..289d7fab5b4a 100644
--- a/arch/x86/kernel/tdcall.S
+++ b/arch/x86/kernel/tdcall.S
@@ -3,6 +3,7 @@
#include <asm/asm.h>
#include <asm/frame.h>
#include <asm/unwind_hints.h>
+#include <asm/export.h>

#include <linux/linkage.h>
#include <linux/bits.h>
@@ -25,6 +26,8 @@
TDG_R12 | TDG_R13 | \
TDG_R14 | TDG_R15 )

+#define TDVMCALL_VENDOR_KVM 0x4d564b2e584454 /* "TDX.KVM" */
+
/*
* TDX guests use the TDCALL instruction to make requests to the
* TDX module and hypercalls to the VMM. It is supported in
@@ -226,3 +229,26 @@ SYM_FUNC_START(__tdx_hypercall)
FRAME_END
retq
SYM_FUNC_END(__tdx_hypercall)
+
+#ifdef CONFIG_INTEL_TDX_GUEST_KVM
+
+/*
+ * Helper function for KVM vendor TDVMCALLs. This assembly wrapper
+ * lets us reuse do_tdvmcall() for KVM-specific hypercalls (
+ * TDVMCALL_VENDOR_KVM).
+ */
+SYM_FUNC_START(__tdx_hypercall_vendor_kvm)
+ FRAME_BEGIN
+ /*
+ * R10 is not part of the function call ABI, but it is a part
+ * of the TDVMCALL ABI. So set it before making call to the
+ * do_tdx_hypercall().
+ */
+ movq $TDVMCALL_VENDOR_KVM, %r10
+ call do_tdx_hypercall
+ FRAME_END
+ retq
+SYM_FUNC_END(__tdx_hypercall_vendor_kvm)
+
+EXPORT_SYMBOL(__tdx_hypercall_vendor_kvm);
+#endif /* CONFIG_INTEL_TDX_GUEST_KVM */
--
2.25.1

Subject: [PATCH v1 07/11] x86/traps: Add #VE support for TDX guest

From: "Kirill A. Shutemov" <[email protected]>

Virtualization Exceptions (#VE) are delivered to TDX guests due to
specific guest actions which may happen in either user space or the kernel:

 * Specific instructions (WBINVD, for example)
 * Specific MSR accesses
 * Specific CPUID leaf accesses
 * Access to TD-shared memory, which includes MMIO

In the settings that Linux will run in, virtual exceptions are never
generated on accesses to normal, TD-private memory that has been
accepted.

The entry paths do not access TD-shared memory, MMIO regions or use
those specific MSRs, instructions, CPUID leaves that might generate #VE.
In addition, all interrupts including NMIs are blocked by the hardware
starting with #VE delivery until TDGETVEINFO is called.  This eliminates
the chance of a #VE during the syscall gap or paranoid entry paths and
simplifies #VE handling.

After TDGETVEINFO #VE could happen in theory (e.g. through an NMI),
but it is expected not to happen because TDX expects NMIs not to
trigger #VEs. Another case where they could happen is if the #VE
exception panics, but in this case there are no guarantees on anything
anyways.

If a guest kernel action which would normally cause a #VE occurs in the
interrupt-disabled region before TDGETVEINFO, a #DF is delivered to the
guest which will result in an oops (and should eventually be a panic, as
we would like to set panic_on_oops to 1 for TDX guests).

Add basic infrastructure to handle any #VE which occurs in the kernel or
userspace.  Later patches will add handling for specific #VE scenarios.

Convert unhandled #VE's (everything, until later in this series) so that
they appear just like a #GP by calling ve_raise_fault() directly.
ve_raise_fault() is similar to #GP handler and is responsible for
sending SIGSEGV to userspace and cpu die and notifying debuggers and
other die chain users.  

Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/include/asm/idtentry.h | 4 ++
arch/x86/include/asm/tdx.h | 19 +++++++++
arch/x86/kernel/idt.c | 6 +++
arch/x86/kernel/tdx.c | 36 +++++++++++++++++
arch/x86/kernel/traps.c | 69 +++++++++++++++++++++++++++++++++
5 files changed, 134 insertions(+)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 73d45b0dfff2..d3c779abbc78 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -634,6 +634,10 @@ DECLARE_IDTENTRY_XENCB(X86_TRAP_OTHER, exc_xen_hypervisor_callback);
DECLARE_IDTENTRY_RAW(X86_TRAP_OTHER, exc_xen_unknown_trap);
#endif

+#ifdef CONFIG_INTEL_TDX_GUEST
+DECLARE_IDTENTRY(X86_TRAP_VE, exc_virtualization_exception);
+#endif
+
/* Device interrupts common/spurious */
DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER, common_interrupt);
#ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index fcd42119a287..f00008cf1361 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -39,6 +39,25 @@ struct tdx_hypercall_output {
u64 r15;
};

+/*
+ * Used by #VE exception handler to gather the #VE exception
+ * info from the TDX module. This is software only structure
+ * and not related to TDX module/VMM.
+ */
+struct ve_info {
+ u64 exit_reason;
+ u64 exit_qual;
+ u64 gla; /* Guest Linear (virtual) Address */
+ u64 gpa; /* Guest Physical (virtual) Address */
+ u32 instr_len;
+ u32 instr_info;
+};
+
+unsigned long tdg_get_ve_info(struct ve_info *ve);
+
+int tdg_handle_virtualization_exception(struct pt_regs *regs,
+ struct ve_info *ve);
+
/* Common API to check TDX support in decompression and common kernel code. */
bool is_tdx_guest(void);

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index d552f177eca0..39390761d9c8 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -64,6 +64,9 @@ static const __initconst struct idt_data early_idts[] = {
*/
INTG(X86_TRAP_PF, asm_exc_page_fault),
#endif
+#ifdef CONFIG_INTEL_TDX_GUEST
+ INTG(X86_TRAP_VE, asm_exc_virtualization_exception),
+#endif
};

/*
@@ -87,6 +90,9 @@ static const __initconst struct idt_data def_idts[] = {
INTG(X86_TRAP_MF, asm_exc_coprocessor_error),
INTG(X86_TRAP_AC, asm_exc_alignment_check),
INTG(X86_TRAP_XF, asm_exc_simd_coprocessor_error),
+#ifdef CONFIG_INTEL_TDX_GUEST
+ INTG(X86_TRAP_VE, asm_exc_virtualization_exception),
+#endif

#ifdef CONFIG_X86_32
TSKG(X86_TRAP_DF, GDT_ENTRY_DOUBLEFAULT_TSS),
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index e4383b416ef3..527d2638ddae 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -10,6 +10,7 @@

/* TDX Module call Leaf IDs */
#define TDINFO 1
+#define TDGETVEINFO 3

static struct {
unsigned int gpa_width;
@@ -87,6 +88,41 @@ static void tdg_get_info(void)
td_info.attributes = out.rdx;
}

+unsigned long tdg_get_ve_info(struct ve_info *ve)
+{
+ u64 ret;
+ struct tdx_module_output out = {0};
+
+ /*
+ * NMIs and machine checks are suppressed. Before this point any
+ * #VE is fatal. After this point (TDGETVEINFO call), NMIs and
+ * additional #VEs are permitted (but we don't expect them to
+ * happen unless you panic).
+ */
+ ret = __tdx_module_call(TDGETVEINFO, 0, 0, 0, 0, &out);
+
+ ve->exit_reason = out.rcx;
+ ve->exit_qual = out.rdx;
+ ve->gla = out.r8;
+ ve->gpa = out.r9;
+ ve->instr_len = out.r10 & UINT_MAX;
+ ve->instr_info = out.r10 >> 32;
+
+ return ret;
+}
+
+int tdg_handle_virtualization_exception(struct pt_regs *regs,
+ struct ve_info *ve)
+{
+ /*
+ * TODO: Add handler support for various #VE exit
+ * reasons. It will be added by other patches in
+ * the series.
+ */
+ pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
+ return -EFAULT;
+}
+
void __init tdx_early_init(void)
{
if (!cpuid_has_tdx_guest())
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 853ea7a80806..d860fdee9cfe 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
#include <asm/insn.h>
#include <asm/insn-eval.h>
#include <asm/vdso.h>
+#include <asm/tdx.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -1139,6 +1140,74 @@ DEFINE_IDTENTRY(exc_device_not_available)
}
}

+#define VEFSTR "VE fault"
+static void ve_raise_fault(struct pt_regs *regs, long error_code)
+{
+ struct task_struct *tsk = current;
+
+ if (user_mode(regs)) {
+ tsk->thread.error_code = error_code;
+ tsk->thread.trap_nr = X86_TRAP_VE;
+
+ /*
+ * Not fixing up VDSO exceptions similar to #GP handler
+ * because we don't expect the VDSO to trigger #VE.
+ */
+ show_signal(tsk, SIGSEGV, "", VEFSTR, regs, error_code);
+ force_sig(SIGSEGV);
+ return;
+ }
+
+ if (fixup_exception(regs, X86_TRAP_VE, error_code, 0))
+ return;
+
+ tsk->thread.error_code = error_code;
+ tsk->thread.trap_nr = X86_TRAP_VE;
+
+ /*
+ * To be potentially processing a kprobe fault and to trust the result
+ * from kprobe_running(), we have to be non-preemptible.
+ */
+ if (!preemptible() &&
+ kprobe_running() &&
+ kprobe_fault_handler(regs, X86_TRAP_VE))
+ return;
+
+ notify_die(DIE_GPF, VEFSTR, regs, error_code, X86_TRAP_VE, SIGSEGV);
+
+ die_addr(VEFSTR, regs, error_code, 0);
+}
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+DEFINE_IDTENTRY(exc_virtualization_exception)
+{
+ struct ve_info ve;
+ int ret;
+
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ /*
+ * NMIs/Machine-checks/Interrupts will be in a disabled state
+ * till TDGETVEINFO TDCALL is executed. This prevents #VE
+ * nesting issue.
+ */
+ ret = tdg_get_ve_info(&ve);
+
+ cond_local_irq_enable(regs);
+
+ if (!ret)
+ ret = tdg_handle_virtualization_exception(regs, &ve);
+ /*
+ * If tdg_handle_virtualization_exception() could not process
+ * it successfully, treat it as #GP(0) and handle it.
+ */
+ if (ret)
+ ve_raise_fault(regs, 0);
+
+ cond_local_irq_disable(regs);
+}
+#endif
+
#ifdef CONFIG_X86_32
DEFINE_IDTENTRY_SW(iret_error)
{
--
2.25.1

2021-06-07 14:36:29

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature

On 6/1/21 9:21 PM, Kuppuswamy Sathyanarayanan wrote:
> Add CPU feature detection for Trusted Domain Extensions support.
> TDX feature adds capabilities to keep guest register state and
> memory isolated from hypervisor.
>
> For TDX guest platforms, executing CPUID(0x21, 0) will return
> following values in EAX, EBX, ECX and EDX.
>
> EAX: Maximum sub-leaf number: 0
> EBX/EDX/ECX: Vendor string:
>
> EBX = "Inte"
> EDX = "lTDX"
> ECX = " "
>
> So when above condition is true, set X86_FEATURE_TDX_GUEST
> feature cap bit
>
> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---

...

> /*
> * Manage page tables very early on.
> @@ -491,6 +492,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
>
> kasan_early_init();
>
> + tdx_early_init();
> +

Just a real minor nit, but does this have to be after kasan_early_init()?
If not, keeping the SME/SEV/TDX calls "together" might read clearer. So
just moving it up before kasan_early_init() will group them.

Thanks,
Tom

Subject: Re: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature



On 6/7/21 7:32 AM, Tom Lendacky wrote:
>> + tdx_early_init();
>> +
> Just a real minor nit, but does this have to be after kasan_early_init()?
> If not, keeping the SME/SEV/TDX calls "together" might read clearer. So
> just moving it up before kasan_early_init() will group them.

We have more patches in this series, and they are dependent on some
boot command line options. So we will eventually move tdx_early_init()
below copy_bootdata(__va(real_mode_data))

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-06-10 12:33:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature

On Tue, Jun 01, 2021 at 07:21:28PM -0700, Kuppuswamy Sathyanarayanan wrote:
> Add CPU feature detection for Trusted Domain Extensions support.
> TDX feature adds capabilities to keep guest register state and
> memory isolated from hypervisor.
>
> For TDX guest platforms, executing CPUID(0x21, 0) will return

I'm assuming that 0 is ECX?

> following values in EAX, EBX, ECX and EDX.
>
> EAX: Maximum sub-leaf number: 0
> EBX/EDX/ECX: Vendor string:
>
> EBX = "Inte"
> EDX = "lTDX"
> ECX = " "
>
> So when above condition is true, set X86_FEATURE_TDX_GUEST
> feature cap bit
^
Fullstop.

> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index ac37830ae941..dddc3a27cc8a 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -238,6 +238,7 @@
> #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
> #define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */
> #define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted function */
> +#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Trusted Domain Extensions Guest */

What's the name of the feature bit? "TDX guest"? Why not only
X86_FEATURE_TDX and then you can have "tdx" in cpuinfo?

>
> /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
> #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> new file mode 100644
> index 000000000000..679500e807f3
> --- /dev/null
> +++ b/arch/x86/include/asm/tdx.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2020 Intel Corporation */
> +#ifndef _ASM_X86_TDX_H
> +#define _ASM_X86_TDX_H
> +
> +#define TDX_CPUID_LEAF_ID 0x21
> +
> +#ifdef CONFIG_INTEL_TDX_GUEST
> +
> +#include <asm/cpufeature.h>

As before - put the include at the top.

> +void __init tdx_early_init(void);
> +
> +#else // !CONFIG_INTEL_TDX_GUEST

No need for that // - comment

> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> new file mode 100644
> index 000000000000..5b14b72e41c5
> --- /dev/null
> +++ b/arch/x86/kernel/tdx.c
> @@ -0,0 +1,30 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2020 Intel Corporation */
> +
> +#include <asm/tdx.h>
> +
> +static inline bool cpuid_has_tdx_guest(void)
> +{
> + u32 eax, signature[3];

Shorten that array name to "sig" so that you don't have to break the
cpuid_count() line below.

> +
> + if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
> + return false;
> +
> + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &signature[0],
> + &signature[1], &signature[2]);
> +
> + if (memcmp("IntelTDX ", signature, 12))
> + return false;
> +
> + return true;

or simply:

return !memcmp(...

> +}
> +
> +void __init tdx_early_init(void)
> +{
> + if (!cpuid_has_tdx_guest())
> + return;
> +
> + setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
> +
> + pr_info("TDX guest is initialized\n");

Use pr_fmt in this file:

#undef pr_fmt
#define pr_fmt(fmt) "x86/tdx: " fmt

and then do

pr_info("Guest initialized\n");

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-10 14:31:15

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature

On Thu, Jun 10, 2021 at 02:28:06PM +0200, Borislav Petkov wrote:
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index ac37830ae941..dddc3a27cc8a 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -238,6 +238,7 @@
> > #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
> > #define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */
> > #define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted function */
> > +#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Trusted Domain Extensions Guest */
>
> What's the name of the feature bit? "TDX guest"? Why not only
> X86_FEATURE_TDX and then you can have "tdx" in cpuinfo?

No, "tdx" is host feature. It is part TDX host enabling. This feature
indicates that kernel runs within TDX guest and named accordingly.

--
Kirill A. Shutemov

Subject: Re: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature

Hi,

On 6/10/21 5:28 AM, Borislav Petkov wrote:
> On Tue, Jun 01, 2021 at 07:21:28PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> Add CPU feature detection for Trusted Domain Extensions support.
>> TDX feature adds capabilities to keep guest register state and
>> memory isolated from hypervisor.
>>
>> For TDX guest platforms, executing CPUID(0x21, 0) will return
>
> I'm assuming that 0 is ECX?

Correct. I will clarify it in commit log.

>
>> following values in EAX, EBX, ECX and EDX.
>>
>> EAX: Maximum sub-leaf number: 0
>> EBX/EDX/ECX: Vendor string:
>>
>> EBX = "Inte"
>> EDX = "lTDX"
>> ECX = " "
>>
>> So when above condition is true, set X86_FEATURE_TDX_GUEST
>> feature cap bit
> ^
> Fullstop.

Will fix it in next version.

>
>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>> index ac37830ae941..dddc3a27cc8a 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -238,6 +238,7 @@
>> #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
>> #define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */
>> #define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted function */
>> +#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Trusted Domain Extensions Guest */
>
> What's the name of the feature bit? "TDX guest"? Why not only
> X86_FEATURE_TDX and then you can have "tdx" in cpuinfo?

Make sense. I can change it to "TDX". I guess we will not a feature bit
for "tdx host"

>
>>
>> /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
>> #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
>> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
>> new file mode 100644
>> index 000000000000..679500e807f3
>> --- /dev/null
>> +++ b/arch/x86/include/asm/tdx.h
>> @@ -0,0 +1,20 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright (C) 2020 Intel Corporation */
>> +#ifndef _ASM_X86_TDX_H
>> +#define _ASM_X86_TDX_H
>> +
>> +#define TDX_CPUID_LEAF_ID 0x21
>> +
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +
>> +#include <asm/cpufeature.h>
>
> As before - put the include at the top.

I will change it.

>
>> +void __init tdx_early_init(void);
>> +
>> +#else // !CONFIG_INTEL_TDX_GUEST
>
> No need for that // - comment

I will remove it in next version.

>
>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>> new file mode 100644
>> index 000000000000..5b14b72e41c5
>> --- /dev/null
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -0,0 +1,30 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (C) 2020 Intel Corporation */
>> +
>> +#include <asm/tdx.h>
>> +
>> +static inline bool cpuid_has_tdx_guest(void)
>> +{
>> + u32 eax, signature[3];
>
> Shorten that array name to "sig" so that you don't have to break the
> cpuid_count() line below.

I will fix it in next version.

>
>> +
>> + if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
>> + return false;
>> +
>> + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &signature[0],
>> + &signature[1], &signature[2]);
>> +
>> + if (memcmp("IntelTDX ", signature, 12))
>> + return false;
>> +
>> + return true;
>
> or simply:
>
> return !memcmp(...

Got it. I will change it in next version.

>
>> +}
>> +
>> +void __init tdx_early_init(void)
>> +{
>> + if (!cpuid_has_tdx_guest())
>> + return;
>> +
>> + setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
>> +
>> + pr_info("TDX guest is initialized\n");
>
> Use pr_fmt in this file:
>
> #undef pr_fmt
> #define pr_fmt(fmt) "x86/tdx: " fmt
>
> and then do
>
> pr_info("Guest initialized\n");

Patch titled "x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions" add
it. It makes sense to move it to this patch.

>
> Thx.
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-06-10 14:37:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature

On Thu, Jun 10, 2021 at 05:29:43PM +0300, Kirill A. Shutemov wrote:
> No, "tdx" is host feature. It is part TDX host enabling. This feature
> indicates that kernel runs within TDX guest and named accordingly.

So there will be X86_FEATURE_TDX_GUEST and X86_FEATURE_TDX, latter to
mean TDX host functionality?

Are those patches somewhere public for me to see?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-10 14:44:58

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature

On Thu, Jun 10, 2021 at 04:35:48PM +0200, Borislav Petkov wrote:
> On Thu, Jun 10, 2021 at 05:29:43PM +0300, Kirill A. Shutemov wrote:
> > No, "tdx" is host feature. It is part TDX host enabling. This feature
> > indicates that kernel runs within TDX guest and named accordingly.
>
> So there will be X86_FEATURE_TDX_GUEST and X86_FEATURE_TDX, latter to
> mean TDX host functionality?
>
> Are those patches somewhere public for me to see?

Yes. You are on CC:

http://lore.kernel.org/r/9a74fb153bc21dc5cac46e84913b88182f216d1b.1605232743.git.isaku.yamahata@intel.com

--
Kirill A. Shutemov

2021-06-10 16:00:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v1 03/11] x86/cpufeatures: Add TDX Guest CPU feature

On Thu, Jun 10, 2021 at 05:41:11PM +0300, Kirill A. Shutemov wrote:
> Yes. You are on CC:
>
> http://lore.kernel.org/r/9a74fb153bc21dc5cac46e84913b88182f216d1b.1605232743.git.isaku.yamahata@intel.com

Aha, a synthetic flag for the hv functionality, ok.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-10 20:01:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v1 04/11] x86/x86: Add is_tdx_guest() interface

On Tue, Jun 01, 2021 at 07:21:29PM -0700, Kuppuswamy Sathyanarayanan wrote:
> +static int __ro_after_init tdx_guest = -1;
> +
> +static inline bool native_cpuid_has_tdx_guest(void)
> +{
> + u32 eax = TDX_CPUID_LEAF_ID, signature[3] = {0};
> +
> + if (native_cpuid_eax(0) < TDX_CPUID_LEAF_ID)
> + return false;
> +
> + native_cpuid(&eax, &signature[0], &signature[1], &signature[2]);
> +
> + if (memcmp("IntelTDX ", signature, 12))
> + return false;
> +
> + return true;

As before,

return !memcmp(...

and then that function can return simply an int.


> +}
> +
> +bool is_tdx_guest(void)

If anything, this should be early_is_tdx_guest().

> +{
> + if (tdx_guest < 0)
> + tdx_guest = native_cpuid_has_tdx_guest();
> +
> + return !!tdx_guest;
> +}
> +

Applying: x86/x86: Add is_tdx_guest() interface
.git/rebase-apply/patch:58: new blank line at EOF.
+
warning: 1 line adds whitespace errors.


> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 5b14b72e41c5..5e70617e9877 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -19,6 +19,12 @@ static inline bool cpuid_has_tdx_guest(void)
> return true;
> }
>
> +bool is_tdx_guest(void)
> +{
> + return static_cpu_has(X86_FEATURE_TDX_GUEST);
> +}
> +EXPORT_SYMBOL_GPL(is_tdx_guest);

I don't like this is_tdx_guest() thing in kernel proper - what's wrong
with

prot_guest_has(PR_GUEST_TDX)

?

Also, why is it exported, for kvm?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v1 04/11] x86/x86: Add is_tdx_guest() interface



On 6/10/21 12:59 PM, Borislav Petkov wrote:
> On Tue, Jun 01, 2021 at 07:21:29PM -0700, Kuppuswamy Sathyanarayanan wrote:

>> + if (memcmp("IntelTDX ", signature, 12))
>> + return false;
>> +
>> + return true;
>
> As before,
>
> return !memcmp(...
>
> and then that function can return simply an int.

I will make the above change in next version.

>
>
>> +}
>> +
>> +bool is_tdx_guest(void)
>
> If anything, this should be early_is_tdx_guest().

If this is the recommendation, I am fine with it. It is
only used by __in/__out macros in decompression code.

>
>> +{
>> + if (tdx_guest < 0)
>> + tdx_guest = native_cpuid_has_tdx_guest();
>> +
>> + return !!tdx_guest;
>> +}
>> +
>
> Applying: x86/x86: Add is_tdx_guest() interface
> .git/rebase-apply/patch:58: new blank line at EOF.
> +
> warning: 1 line adds whitespace errors.

I will fix it in next version.

>
>
>> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
>> index 5b14b72e41c5..5e70617e9877 100644
>> --- a/arch/x86/kernel/tdx.c
>> +++ b/arch/x86/kernel/tdx.c
>> @@ -19,6 +19,12 @@ static inline bool cpuid_has_tdx_guest(void)
>> return true;
>> }
>>
>> +bool is_tdx_guest(void)
>> +{
>> + return static_cpu_has(X86_FEATURE_TDX_GUEST);
>> +}
>> +EXPORT_SYMBOL_GPL(is_tdx_guest);
>
> I don't like this is_tdx_guest() thing in kernel proper - what's wrong
> with
>
> prot_guest_has(PR_GUEST_TDX)

Is it alright to use vendor name in prot_guest_has() flag? I thought
we want to keep them generic. If this is acceptable, we can replace
is_tdx_guest() with prot_guest_has() calls. Currently it is not used
in many places.

>
> ?
>
> Also, why is it exported, for kvm?

Yes. It is used in exported KVM functions.

>
> Thx.
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-06-10 21:10:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v1 04/11] x86/x86: Add is_tdx_guest() interface

On Thu, Jun 10, 2021 at 02:01:41PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Is it alright to use vendor name in prot_guest_has() flag? I thought
> we want to keep them generic.

Sure but keeping them only generic doesn't work in cases like this.

And just like you have:

+/* Protected Guest Feature Flags (leave 0-0xff for arch specific flags) */

there could be ranges for vendor-specific flags.

Intel at 200-2ff
AMD at 300-3ff

which is 256 per vendor, so should be enough. :)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [PATCH v2 03/12] x86/cpufeatures: Add TDX Guest CPU feature

Add CPU feature detection for Trusted Domain Extensions support. TDX
feature adds capabilities to keep guest register state and memory
isolated from hypervisor.

For TDX guest platforms, executing CPUID(eax=0x21, ecx=0) will return
following values in EAX, EBX, ECX and EDX.

EAX: Maximum sub-leaf number: 0
EBX/EDX/ECX: Vendor string:

EBX = "Inte"
EDX = "lTDX"
ECX = " "

So when above condition is true, set X86_FEATURE_TDX_GUEST feature cap
bit.

Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---

Changes since v1:
* Fixed commit log issues reported by Borislav.
* Moved header file include to the start of tdx.h.
* Added pr_fmt for TDX.
* Simplified cpuid_has_tdx_guest() implementation as per
Borislav comments.

arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/tdx.h | 20 ++++++++++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/head64.c | 3 +++
arch/x86/kernel/tdx.c | 29 +++++++++++++++++++++++++++++
5 files changed, 54 insertions(+)
create mode 100644 arch/x86/include/asm/tdx.h
create mode 100644 arch/x86/kernel/tdx.c

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ac37830ae941..dddc3a27cc8a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,7 @@
#define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
#define X86_FEATURE_PVUNLOCK ( 8*32+20) /* "" PV unlock function */
#define X86_FEATURE_VCPUPREEMPT ( 8*32+21) /* "" PV vcpu_is_preempted function */
+#define X86_FEATURE_TDX_GUEST ( 8*32+22) /* Trusted Domain Extensions Guest */

/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
new file mode 100644
index 000000000000..c738bde944d1
--- /dev/null
+++ b/arch/x86/include/asm/tdx.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2020 Intel Corporation */
+#ifndef _ASM_X86_TDX_H
+#define _ASM_X86_TDX_H
+
+#include <linux/cpufeature.h>
+
+#define TDX_CPUID_LEAF_ID 0x21
+
+#ifdef CONFIG_INTEL_TDX_GUEST
+
+void __init tdx_early_init(void);
+
+#else
+
+static inline void tdx_early_init(void) { };
+
+#endif /* CONFIG_INTEL_TDX_GUEST */
+
+#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0f66682ac02a..af09ce93a641 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -126,6 +126,7 @@ obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o
obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o

obj-$(CONFIG_JAILHOUSE_GUEST) += jailhouse.o
+obj-$(CONFIG_INTEL_TDX_GUEST) += tdx.o

obj-$(CONFIG_EISA) += eisa.o
obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..d1a4942ae160 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,7 @@
#include <asm/extable.h>
#include <asm/trapnr.h>
#include <asm/sev.h>
+#include <asm/tdx.h>

/*
* Manage page tables very early on.
@@ -491,6 +492,8 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)

kasan_early_init();

+ tdx_early_init();
+
idt_setup_early_handler();

copy_bootdata(__va(real_mode_data));
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
new file mode 100644
index 000000000000..70494b465392
--- /dev/null
+++ b/arch/x86/kernel/tdx.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2020 Intel Corporation */
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/tdx: " fmt
+
+#include <asm/tdx.h>
+
+static inline bool cpuid_has_tdx_guest(void)
+{
+ u32 eax, sig[3];
+
+ if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
+ return false;
+
+ cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[1], &sig[2]);
+
+ return !memcmp("IntelTDX ", sig, 12);
+}
+
+void __init tdx_early_init(void)
+{
+ if (!cpuid_has_tdx_guest())
+ return;
+
+ setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
+
+ pr_info("Guest is initialized\n");
+}
--
2.25.1

Subject: [PATCH v2 04/12] x86/x86: Add early_is_tdx_guest() interface

Add helper function to detect TDX feature support. It will be used
to protect TDX specific code in decompression code.

Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v1:
* Removed is_tdx_guest() interface. We will rely on
prot_guest_has(PR_GUEST_TDX) for TDX related checks.
* Renamed is_tdx_guest() to early_is_tdx_guest() as per
review suggestion.
* Fixed commit title and log to reflect above changes.
* Simplified native_cpuid_has_tdx_guest() as per review suggestion.

arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/tdx.c | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+)
create mode 100644 arch/x86/boot/compressed/tdx.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 431bf7f846c3..22a2a6cc2ab4 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -98,6 +98,7 @@ ifdef CONFIG_X86_64
endif

vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o

vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
new file mode 100644
index 000000000000..ddfa4a6d1939
--- /dev/null
+++ b/arch/x86/boot/compressed/tdx.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * tdx.c - Early boot code for TDX
+ */
+
+#include <asm/tdx.h>
+
+static int __ro_after_init tdx_guest = -1;
+
+static inline bool native_cpuid_has_tdx_guest(void)
+{
+ u32 eax = TDX_CPUID_LEAF_ID, sig[3] = {0};
+
+ if (native_cpuid_eax(0) < TDX_CPUID_LEAF_ID)
+ return false;
+
+ native_cpuid(&eax, &sig[0], &sig[1], &sig[2]);
+
+ return !memcmp("IntelTDX ", sig, 12);
+}
+
+bool early_is_tdx_guest(void)
+{
+ if (tdx_guest < 0)
+ tdx_guest = native_cpuid_has_tdx_guest();
+
+ return !!tdx_guest;
+}
--
2.25.1

Subject: [PATCH v2 10/12] x86/tdx: Wire up KVM hypercalls

From: "Kirill A. Shutemov" <[email protected]>

KVM hypercalls use the "vmcall" or "vmmcall" instructions.
Although the ABI is similar, those instructions no longer
function for TDX guests. Make vendor-specific TDVMCALLs
instead of VMCALL. This enables TDX guests to run with KVM
acting as the hypervisor. TDX guests running under other
hypervisors will continue to use those hypervisors'
hypercalls.

Since KVM driver can be built as a kernel module, export
tdx_kvm_hypercall*() to make the symbols visible to kvm.ko.

[Isaku Yamahata: proposed KVM VENDOR string]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v1:
* Replaced is_tdx_guest() with prot_guest_has(PR_GUEST_TDX).
* Replaced tdx_kvm_hypercall{1-4} with single generic
function tdx_kvm_hypercall().

arch/x86/Kconfig | 5 +++++
arch/x86/include/asm/kvm_para.h | 21 +++++++++++++++++++++
arch/x86/include/asm/tdx.h | 19 +++++++++++++++++++
arch/x86/kernel/tdcall.S | 26 ++++++++++++++++++++++++++
4 files changed, 71 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d506aae29dd9..fc51579e54ad 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -892,6 +892,11 @@ config INTEL_TDX_GUEST
run in a CPU mode that protects the confidentiality of TD memory
contents and the TD’s CPU state from other software, including VMM.

+# This option enables KVM specific hypercalls in TDX guest.
+config INTEL_TDX_GUEST_KVM
+ def_bool y
+ depends on KVM_GUEST && INTEL_TDX_GUEST
+
endif #HYPERVISOR_GUEST

source "arch/x86/Kconfig.cpu"
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 69299878b200..00cf96de04a0 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -5,6 +5,7 @@
#include <asm/processor.h>
#include <asm/alternative.h>
#include <linux/interrupt.h>
+#include <linux/protected_guest.h>
#include <uapi/asm/kvm_para.h>

#ifdef CONFIG_KVM_GUEST
@@ -32,6 +33,10 @@ static inline bool kvm_check_and_clear_guest_paused(void)
static inline long kvm_hypercall0(unsigned int nr)
{
long ret;
+
+ if (prot_guest_has(PR_GUEST_TDX))
+ return tdx_kvm_hypercall(nr, 0, 0, 0, 0);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr)
@@ -42,6 +47,10 @@ static inline long kvm_hypercall0(unsigned int nr)
static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
{
long ret;
+
+ if (prot_guest_has(PR_GUEST_TDX))
+ return tdx_kvm_hypercall(nr, p1, 0, 0, 0);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1)
@@ -53,6 +62,10 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
unsigned long p2)
{
long ret;
+
+ if (prot_guest_has(PR_GUEST_TDX))
+ return tdx_kvm_hypercall(nr, p1, p2, 0, 0);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2)
@@ -64,6 +77,10 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
unsigned long p2, unsigned long p3)
{
long ret;
+
+ if (prot_guest_has(PR_GUEST_TDX))
+ return tdx_kvm_hypercall(nr, p1, p2, p3, 0);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2), "d"(p3)
@@ -76,6 +93,10 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
unsigned long p4)
{
long ret;
+
+ if (prot_guest_has(PR_GUEST_TDX))
+ return tdx_kvm_hypercall(nr, p1, p2, p3, p4);
+
asm volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 504291a57d48..7076f9c6dbd3 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -78,4 +78,23 @@ static inline bool tdx_protected_guest_has(unsigned long flag) { return false; }

#endif /* CONFIG_INTEL_TDX_GUEST */

+#ifdef CONFIG_INTEL_TDX_GUEST_KVM
+u64 __tdx_hypercall_vendor_kvm(u64 fn, u64 r12, u64 r13, u64 r14,
+ u64 r15, struct tdx_hypercall_output *out);
+
+static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3,
+ unsigned long p4)
+{
+ return __tdx_hypercall_vendor_kvm(nr, p1, p2, p3, p4, NULL);
+}
+#else
+static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3,
+ unsigned long p4)
+{
+ return -ENODEV;
+}
+#endif /* CONFIG_INTEL_TDX_GUEST_KVM */
+
#endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/kernel/tdcall.S b/arch/x86/kernel/tdcall.S
index d95af4486155..289d7fab5b4a 100644
--- a/arch/x86/kernel/tdcall.S
+++ b/arch/x86/kernel/tdcall.S
@@ -3,6 +3,7 @@
#include <asm/asm.h>
#include <asm/frame.h>
#include <asm/unwind_hints.h>
+#include <asm/export.h>

#include <linux/linkage.h>
#include <linux/bits.h>
@@ -25,6 +26,8 @@
TDG_R12 | TDG_R13 | \
TDG_R14 | TDG_R15 )

+#define TDVMCALL_VENDOR_KVM 0x4d564b2e584454 /* "TDX.KVM" */
+
/*
* TDX guests use the TDCALL instruction to make requests to the
* TDX module and hypercalls to the VMM. It is supported in
@@ -226,3 +229,26 @@ SYM_FUNC_START(__tdx_hypercall)
FRAME_END
retq
SYM_FUNC_END(__tdx_hypercall)
+
+#ifdef CONFIG_INTEL_TDX_GUEST_KVM
+
+/*
+ * Helper function for KVM vendor TDVMCALLs. This assembly wrapper
+ * lets us reuse do_tdvmcall() for KVM-specific hypercalls (
+ * TDVMCALL_VENDOR_KVM).
+ */
+SYM_FUNC_START(__tdx_hypercall_vendor_kvm)
+ FRAME_BEGIN
+ /*
+ * R10 is not part of the function call ABI, but it is a part
+ * of the TDVMCALL ABI. So set it before making call to the
+ * do_tdx_hypercall().
+ */
+ movq $TDVMCALL_VENDOR_KVM, %r10
+ call do_tdx_hypercall
+ FRAME_END
+ retq
+SYM_FUNC_END(__tdx_hypercall_vendor_kvm)
+
+EXPORT_SYMBOL(__tdx_hypercall_vendor_kvm);
+#endif /* CONFIG_INTEL_TDX_GUEST_KVM */
--
2.25.1

2021-06-14 08:50:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v1 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions

On Tue, Jun 01, 2021 at 07:21:30PM -0700, Kuppuswamy Sathyanarayanan wrote:
> +/*
> + * __tdx_module_call() - Helper function used by TDX guests to request
> + * services from the TDX module (does not include VMM services).
> + *
> + * This function serves as a wrapper to move user call arguments to the
> + * correct registers as specified by "tdcall" ABI

Please state here explicitly what the TDCALL ABI is. I see below "moved
to" which translates the x86_64 ABI into your ABI but please state it
here explicitly what it is (which register is what in a tabellary form)
so that it is crystal clear and the code can be followed easily.

> and shares it with the
> + * TDX module. If the "tdcall" operation is successful and a valid

Use TDCALL everywhere here in the comments to refer to the
insn/operation.

> + * "struct tdx_module_output" pointer is available (in "out" argument),
> + * output from the TDX module is saved to the memory specified in the
> + * "out" pointer. Also the status of the "tdcall" operation is returned
> + * back to the user as a function return value.
> + *
> + * @fn (RDI) - TDCALL Leaf ID, moved to RAX
> + * @rcx (RSI) - Input parameter 1, moved to RCX
> + * @rdx (RDX) - Input parameter 2, moved to RDX
> + * @r8 (RCX) - Input parameter 3, moved to R8
> + * @r9 (R8) - Input parameter 4, moved to R9
> + *
> + * @out (R9) - struct tdx_module_output pointer
> + * stored temporarily in R12 (not
> + * shared with the TDX module)
> + *
> + * Return status of tdcall via RAX.
> + *
> + * NOTE: This function should not be used for TDX hypercall
> + * use cases.

What does that mean? I think you wanna say here that this function calls
the *module* and not the hypervisor...

> + */
> +SYM_FUNC_START(__tdx_module_call)
> + FRAME_BEGIN
> +
> + /*
> + * R12 will be used as temporary storage for
> + * struct tdx_module_output pointer. You can
> + * find struct tdx_module_output details in
> + * arch/x86/include/asm/tdx.h. Also note that
> + * registers R12-R15 are not used by TDCALL
> + * services supported by this helper function.
> + */
> + push %r12 /* Callee saved, so preserve it */
> + mov %r9, %r12 /* Move output pointer to R12 */

Please make all those side comments, top comments by moving them over
the line they refer to.

> + /* Mangle function call ABI into TDCALL ABI: */
> + mov %rdi, %rax /* Move TDCALL Leaf ID to RAX */
> + mov %r8, %r9 /* Move input 4 to R9 */
> + mov %rcx, %r8 /* Move input 3 to R8 */
> + mov %rsi, %rcx /* Move input 1 to RCX */
> + /* Leave input param 2 in RDX */
> +
> + tdcall
> +
> + /* Check for TDCALL success: 0 - Successful, otherwise failed */
> + test %rax, %rax
> + jnz 1f
> +
> + /* Check if caller provided an output struct */
> + test %r12, %r12
> + jz 1f

Why is that check done *after* the TDCALL and not before?

You can have TDCALL leaf functions without output?

If so, just say so.

> + /* Copy TDCALL result registers to output struct: */
> + movq %rcx, TDX_MODULE_rcx(%r12)
> + movq %rdx, TDX_MODULE_rdx(%r12)
> + movq %r8, TDX_MODULE_r8(%r12)
> + movq %r9, TDX_MODULE_r9(%r12)
> + movq %r10, TDX_MODULE_r10(%r12)
> + movq %r11, TDX_MODULE_r11(%r12)
> +1:
> + pop %r12 /* Restore the state of R12 register */
> +
> + FRAME_END
> + ret
> +SYM_FUNC_END(__tdx_module_call)
> +
> +/*
> + * do_tdx_hypercall() - Helper function used by TDX guests to request
> + * services from the VMM. All requests are made via the TDX module
> + * using "TDCALL" instruction.
> + *
> + * This function is created to contain common code between vendor
> + * specific and standard type TDX hypercalls. So the caller of this
> + * function had to set the TDVMCALL type in the R10 register before
> + * calling it.
> + *
> + * This function serves as a wrapper to move user call arguments to the
> + * correct registers as specified by "tdcall" ABI and shares it with VMM

As above - document the ABI explicitly here pls.

> + * via the TDX module. If the "tdcall" operation is successful and a
> + * valid "struct tdx_hypercall_output" pointer is available (in "out"
> + * argument), output from the VMM is saved to the memory specified in the
> + * "out" pointer. 
> + *
> + * @fn (RDI) - TDVMCALL function, moved to R11
> + * @r12 (RSI) - Input parameter 1, moved to R12
> + * @r13 (RDX) - Input parameter 2, moved to R13
> + * @r14 (RCX) - Input parameter 3, moved to R14
> + * @r15 (R8) - Input parameter 4, moved to R15
> + *
> + * @out (R9) - struct tdx_hypercall_output pointer
> + *
> + * On successful completion, return TDX hypercall error code.
> + *

^ Superfluous line.

> + */
> +SYM_FUNC_START_LOCAL(do_tdx_hypercall)
> + /* Save non-volatile GPRs that are exposed to the VMM. */

"Save callee-s ved GPRs as mandated by the x86_64 ABI"

because you're callee and you have to save them. :)

> + push %r15
> + push %r14
> + push %r13
> + push %r12
> +
> + /* Leave hypercall output pointer in R9, it's not clobbered by VMM */

Guaranteed by what, exactly?

I'm sure you have enough stack to push another u64 value and then
restore it after the TDCALL so that you don't have to care what the VMM
does wrt R9.

> +
> + /* Mangle function call ABI into TDCALL ABI: */
> + xor %eax, %eax /* Move TDCALL leaf ID (TDVMCALL (0)) to RAX */

If leaf function 0 is calling the HV, then says so instead of writing
"Move" but having an XOR there.

> + mov %rdi, %r11 /* Move TDVMCALL function id to R11 */

If I'm reading that pdf correctly, it says:

"R11 TDG.VP.VMCALL sub-function if R10 is 0 (see enumeration below)"

> + mov %rsi, %r12 /* Move input 1 to R12 */
> + mov %rdx, %r13 /* Move input 2 to R13 */
> + mov %rcx, %r14 /* Move input 1 to R14 */
> + mov %r8, %r15 /* Move input 1 to R15 */
> + /* Caller of do_tdx_hypercall() will set TDVMCALL type in R10 */

Ah, there it is. Yuck.

How about passing that vendor-specific leaf set on the stack like all
the other sane ABIs do when they need more than 6 input params passed
through regs?

I don't like a caller function prepping registers for the callee.

> +
> + movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
> +
> + tdcall
> +
> + /*
> + * Non-zero RAX values indicate a failure of TDCALL itself.
> + * Panic for those. This value is unrelated to the hypercall
> + * result in R10.
> + */
> + test %rax, %rax
> + jnz 2f
> +
> + /* Move hypercall error code to RAX to return to user */
> + mov %r10, %rax
> +
> + /* Check for hypercall success: 0 - Successful, otherwise failed */
> + test %rax, %rax
> + jnz 1f
> +
> + /* Check if caller provided an output struct */

Same as for __tdx_module_call

> + test %r9, %r9
> + jz 1f
> +
> + /* Copy hypercall result registers to output struct: */
> + movq %r11, TDX_HYPERCALL_r11(%r9)
> + movq %r12, TDX_HYPERCALL_r12(%r9)
> + movq %r13, TDX_HYPERCALL_r13(%r9)
> + movq %r14, TDX_HYPERCALL_r14(%r9)
> + movq %r15, TDX_HYPERCALL_r15(%r9)
> +1:
> + /*
> + * Zero out registers exposed to the VMM to avoid

Why if you...

> + * speculative execution with VMM-controlled values.
> + * This needs to include all registers present in
> + * TDVMCALL_EXPOSE_REGS_MASK.
> + */
> + xor %r10d, %r10d
> + xor %r11d, %r11d
> + xor %r12d, %r12d
> + xor %r13d, %r13d
> + xor %r14d, %r14d
> + xor %r15d, %r15d
> +
> + /* Restore non-volatile GPRs that are exposed to the VMM. */
> + pop %r12
> + pop %r13
> + pop %r14
> + pop %r15

... are going to overwrite most of them here?

I.e., you can clear only R10 and R11 and the rest will be overwritten by
the callee-saved values.

> +
> + ret
> +2:
> + /*
> + * Reaching here means failure in TDCALL execution. This is
> + * not supposed to happen in hypercalls. It means the TDX
> + * module is in buggy state. So panic.
> + */
> + ud2

How is the user going to know that the module has a bug? Are we issuing
an error message somewhere before that panic or the guest screen will
remain black/freeze and the poor luser won't have a clue what happened?

> +SYM_FUNC_END(do_tdx_hypercall)
> +
> +/*
> + * Helper function for standard type of TDVMCALLs. This assembly
> + * wrapper reuses do_tdvmcall() for standard type of hypercalls
> + * (R10 is set as zero).
> + */
> +SYM_FUNC_START(__tdx_hypercall)
> + FRAME_BEGIN
> + /*
> + * R10 is not part of the function call ABI, but it is a part
> + * of the TDVMCALL ABI. So set it 0 for standard type TDVMCALL
> + * before making call to the do_tdx_hypercall().
> + */
> + xor %r10, %r10
> + call do_tdx_hypercall
> + FRAME_END
> + retq
> +SYM_FUNC_END(__tdx_hypercall)
> diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
> index 5e70617e9877..97b54317f799 100644
> --- a/arch/x86/kernel/tdx.c
> +++ b/arch/x86/kernel/tdx.c
> @@ -1,8 +1,46 @@
> // SPDX-License-Identifier: GPL-2.0
> /* Copyright (C) 2020 Intel Corporation */
>
> +#define pr_fmt(fmt) "TDX: " fmt
> +
> #include <asm/tdx.h>
>
> +/*
> + * Wrapper for simple hypercalls that only return a success/error code.
> + */
> +static inline u64 tdx_hypercall(u64 fn, u64 r12, u64 r13, u64 r14, u64 r15)
> +{
> + u64 err;
> +
> + err = __tdx_hypercall(fn, r12, r13, r14, r15, NULL);
> +


^ Superfluous newline. Ditto below.

> + if (err)
> + pr_warn_ratelimited("TDVMCALL fn:%llx failed with err:%llx\n",

Prefix those hex values with "0x" so that it is clear what the number
format is. Ditto below.

> + fn, err);

You can leave those to stick out and not break the line. Ditto below.

> +
> + return err;
> +}
> +
> +/*
> + * Wrapper for the semi-common case where user need single output
> + * value (R11). Callers of this function does not care about the
> + * hypercall error code (mainly for IN or MMIO usecase).
> + */
> +static inline u64 tdx_hypercall_out_r11(u64 fn, u64 r12, u64 r13,

No need to hardcode the register which has the retval in the function
name - just call it tdx_hypercall_out() or so.

> + u64 r14, u64 r15)
> +{
> + struct tdx_hypercall_output out = {0};
> + u64 err;
> +
> + err = __tdx_hypercall(fn, r12, r13, r14, r15, &out);
> +
> + if (err)
> + pr_warn_ratelimited("TDVMCALL fn:%llx failed with err:%llx\n",
> + fn, err);
> +
> + return out.r11;
> +}
> +
> static inline bool cpuid_has_tdx_guest(void)
> {
> u32 eax, signature[3];
> --
> 2.25.1
>

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v1 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions



On 6/14/21 1:47 AM, Borislav Petkov wrote:
> On Tue, Jun 01, 2021 at 07:21:30PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> +/*
>> + * __tdx_module_call() - Helper function used by TDX guests to request
>> + * services from the TDX module (does not include VMM services).
>> + *
>> + * This function serves as a wrapper to move user call arguments to the
>> + * correct registers as specified by "tdcall" ABI
>
> Please state here explicitly what the TDCALL ABI is. I see below "moved
> to" which translates the x86_64 ABI into your ABI but please state it
> here explicitly what it is (which register is what in a tabellary form)
> so that it is crystal clear and the code can be followed easily.

Ok. I will include the TDCALL ABI details here.

>
>> and shares it with the
>> + * TDX module. If the "tdcall" operation is successful and a valid
>
> Use TDCALL everywhere here in the comments to refer to the
> insn/operation.

Ok. I will fix it in next version.

>
>> + * "struct tdx_module_output" pointer is available (in "out" argument),
>> + * output from the TDX module is saved to the memory specified in the
>> + * "out" pointer. Also the status of the "tdcall" operation is returned
>> + * back to the user as a function return value.
>> + *
>> + * @fn (RDI) - TDCALL Leaf ID, moved to RAX
>> + * @rcx (RSI) - Input parameter 1, moved to RCX
>> + * @rdx (RDX) - Input parameter 2, moved to RDX
>> + * @r8 (RCX) - Input parameter 3, moved to R8
>> + * @r9 (R8) - Input parameter 4, moved to R9
>> + *
>> + * @out (R9) - struct tdx_module_output pointer
>> + * stored temporarily in R12 (not
>> + * shared with the TDX module)
>> + *
>> + * Return status of tdcall via RAX.
>> + *
>> + * NOTE: This function should not be used for TDX hypercall
>> + * use cases.
>
> What does that mean? I think you wanna say here that this function calls
> the *module* and not the hypervisor...

Yes, you are right. But I can remove that comment. It does not add much
value.


>> + */
>> + push %r12 /* Callee saved, so preserve it */
>> + mov %r9, %r12 /* Move output pointer to R12 */
>
> Please make all those side comments, top comments by moving them over
> the line they refer to.

Ok. I will move it up.

>

>> +
>> + /* Check if caller provided an output struct */
>> + test %r12, %r12
>> + jz 1f
>
> Why is that check done *after* the TDCALL and not before?
>
> You can have TDCALL leaf functions without output?

Yes. It is possible to call tdx_module_call() without output
pointer.

Examples are TDREPORT and TDACCEPTPAGE.

>
> If so, just say so.

I will include comment about it.


>> + * do_tdx_hypercall() - Helper function used by TDX guests to request
>> + * services from the VMM. All requests are made via the TDX module
>> + * using "TDCALL" instruction.
>> + *
>> + * This function is created to contain common code between vendor
>> + * specific and standard type TDX hypercalls. So the caller of this
>> + * function had to set the TDVMCALL type in the R10 register before
>> + * calling it.
>> + *
>> + * This function serves as a wrapper to move user call arguments to the
>> + * correct registers as specified by "tdcall" ABI and shares it with VMM
>
> As above - document the ABI explicitly here pls.

Ok. I will add the ABI details in next version.

>

>
> ^ Superfluous line.

will remove it.

>
>> + */
>> +SYM_FUNC_START_LOCAL(do_tdx_hypercall)
>> + /* Save non-volatile GPRs that are exposed to the VMM. */
>
> "Save callee-s ved GPRs as mandated by the x86_64 ABI"
>
> because you're callee and you have to save them. :)
>
>> + push %r15
>> + push %r14
>> + push %r13
>> + push %r12
>> +
>> + /* Leave hypercall output pointer in R9, it's not clobbered by VMM */
>
> Guaranteed by what, exactly?
>
> I'm sure you have enough stack to push another u64 value and then
> restore it after the TDCALL so that you don't have to care what the VMM
> does wrt R9.

Since we don't mark R9 as shared in RCX register, we don't expect VMM to
use it. But I am not sure whether TDX module will guarantee it. So, for our
use case, I can use stack for it.

>
>> +
>> + /* Mangle function call ABI into TDCALL ABI: */
>> + xor %eax, %eax /* Move TDCALL leaf ID (TDVMCALL (0)) to RAX */
>
> If leaf function 0 is calling the HV, then says so instead of writing
> "Move" but having an XOR there.

May be I should define a macro for it and use Mov to keep it uniform
with other register updates.

>
>> + mov %rdi, %r11 /* Move TDVMCALL function id to R11 */
>
> If I'm reading that pdf correctly, it says:
>
> "R11 TDG.VP.VMCALL sub-function if R10 is 0 (see enumeration below)"

Yes, it is the sub function id. I will fix the comment in next version.

>
>> + mov %rsi, %r12 /* Move input 1 to R12 */
>> + mov %rdx, %r13 /* Move input 2 to R13 */
>> + mov %rcx, %r14 /* Move input 1 to R14 */
>> + mov %r8, %r15 /* Move input 1 to R15 */
>> + /* Caller of do_tdx_hypercall() will set TDVMCALL type in R10 */
>
> Ah, there it is. Yuck.
>
> How about passing that vendor-specific leaf set on the stack like all
> the other sane ABIs do when they need more than 6 input params passed
> through regs?
>
> I don't like a caller function prepping registers for the callee.

do_tdx_hypercall() is defined and used only in this assembly file.
It is the helper function for __tdx_hypercall() and
__tdx_hypercall_vendor_kvm() functions which needs different values in
R10 register.

But, I am fine with passing it via stack, if this is recommended.

Please let me know.

>
>> +
>> + movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>> +
>> + tdcall
>> +
>> + /*
>> + * Non-zero RAX values indicate a failure of TDCALL itself.
>> + * Panic for those. This value is unrelated to the hypercall
>> + * result in R10.
>> + */
>> + test %rax, %rax
>> + jnz 2f
>> +
>> + /* Move hypercall error code to RAX to return to user */
>> + mov %r10, %rax
>> +
>> + /* Check for hypercall success: 0 - Successful, otherwise failed */
>> + test %rax, %rax
>> + jnz 1f
>> +
>> + /* Check if caller provided an output struct */
>
> Same as for __tdx_module_call

It is possible to call it without output pointer. I will include comments
about it.

>
>> + test %r9, %r9
>> + jz 1f
>> +
>> + /* Copy hypercall result registers to output struct: */
>> + movq %r11, TDX_HYPERCALL_r11(%r9)
>> + movq %r12, TDX_HYPERCALL_r12(%r9)
>> + movq %r13, TDX_HYPERCALL_r13(%r9)
>> + movq %r14, TDX_HYPERCALL_r14(%r9)
>> + movq %r15, TDX_HYPERCALL_r15(%r9)
>> +1:
>> + /*
>> + * Zero out registers exposed to the VMM to avoid
>
> Why if you...
>
>> + * speculative execution with VMM-controlled values.
>> + * This needs to include all registers present in
>> + * TDVMCALL_EXPOSE_REGS_MASK.
>> + */
>> + xor %r10d, %r10d
>> + xor %r11d, %r11d
>> + xor %r12d, %r12d
>> + xor %r13d, %r13d
>> + xor %r14d, %r14d
>> + xor %r15d, %r15d
>> +
>> + /* Restore non-volatile GPRs that are exposed to the VMM. */
>> + pop %r12
>> + pop %r13
>> + pop %r14
>> + pop %r15
>
> ... are going to overwrite most of them here?
>
> I.e., you can clear only R10 and R11 and the rest will be overwritten by
> the callee-saved values.

Yes. You are correct. I can clear only R10-R11.

>
>> +
>> + ret
>> +2:
>> + /*
>> + * Reaching here means failure in TDCALL execution. This is
>> + * not supposed to happen in hypercalls. It means the TDX
>> + * module is in buggy state. So panic.
>> + */
>> + ud2
>
> How is the user going to know that the module has a bug? Are we issuing
> an error message somewhere before that panic or the guest screen will
> remain black/freeze and the poor luser won't have a clue what happened?

With the trace support, they should be able to see the flow before making
the tdx_*_call(). That should be enough clue for debug right?


>
>> + if (err)
>> + pr_warn_ratelimited("TDVMCALL fn:%llx failed with err:%llx\n",
>
> Prefix those hex values with "0x" so that it is clear what the number
> format is. Ditto below.
>
>> + fn, err);
>
> You can leave those to stick out and not break the line. Ditto below.

Ok. I will follow your recommendation. I have done it this way to fix
checkpatch reports.

>
>> +
>> + return err;
>> +}
>> +
>> +/*
>> + * Wrapper for the semi-common case where user need single output
>> + * value (R11). Callers of this function does not care about the
>> + * hypercall error code (mainly for IN or MMIO usecase).
>> + */
>> +static inline u64 tdx_hypercall_out_r11(u64 fn, u64 r12, u64 r13,
>
> No need to hardcode the register which has the retval in the function
> name - just call it tdx_hypercall_out() or so.

If we need helper functions for other output registers in future, we might
have to add the suffix.



>> --
>> 2.25.1
>>
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-06-14 20:15:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v1 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions

On Mon, Jun 14, 2021 at 12:45:45PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> May be I should define a macro for it and use Mov to keep it uniform
> with other register updates.

Macro?

There's the, well, *MOV* instruction, if you insist on keeping it
uniform. But this is not about keeping it uniform - it is about having
the code as clear as understandable as possible:


/* Set RAX to TDCALL leaf function 0 */
xor %eax, %eax

Plain and simple and clear why the XORing is done.

> But, I am fine with passing it via stack, if this is recommended.
>
> Please let me know.

Yes, please do.

> With the trace support, they should be able to see the flow before making
> the tdx_*_call(). That should be enough clue for debug right?

Are you expecting all those cloud users to trace their guests just to
figure that out? I'm sceptical they will...

Rather, I'd try to allocate a special error value that
do_tdx_hypercall() returns in %eax and then have the wrapper which will
puts %r10 on the stack, check that error value and panic with a nice
error message.

> Ok. I will follow your recommendation. I have done it this way to fix
> checkpatch reports.

Yeah but you should not take checkpatch reports to the letter and use
common sense instead.

> If we need helper functions for other output registers in future, we might
> have to add the suffix.

Btw, where is that function used? Gurgling, it shows it in some MMIO
patch, I'm guessing that's still coming.

As to how to do it properly, you pass in

struct tdx_hypercall_output *out

as a function parameter and caller can pick out whatever it wants from
that struct.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v1 05/11] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions



On 6/14/21 1:11 PM, Borislav Petkov wrote:
> On Mon, Jun 14, 2021 at 12:45:45PM -0700, Kuppuswamy, Sathyanarayanan wrote:
>> May be I should define a macro for it and use Mov to keep it uniform
>> with other register updates.
>
> Macro?
>
> There's the, well, *MOV* instruction, if you insist on keeping it
> uniform. But this is not about keeping it uniform - it is about having
> the code as clear as understandable as possible:
>
>
> /* Set RAX to TDCALL leaf function 0 */
> xor %eax, %eax
>
> Plain and simple and clear why the XORing is done.

Ok. I will fix the comment.


>> With the trace support, they should be able to see the flow before making
>> the tdx_*_call(). That should be enough clue for debug right?
>
> Are you expecting all those cloud users to trace their guests just to
> figure that out? I'm sceptical they will...
>
> Rather, I'd try to allocate a special error value that
> do_tdx_hypercall() returns in %eax and then have the wrapper which will
> puts %r10 on the stack, check that error value and panic with a nice
> error message.
>

I will add r10 to struct tdx_hypercall_output and return it to callers to
check it.


>
> Btw, where is that function used? Gurgling, it shows it in some MMIO
> patch, I'm guessing that's still coming.
>
> As to how to do it properly, you pass in
>
> struct tdx_hypercall_output *out
>
> as a function parameter and caller can pick out whatever it wants from
> that struct.

It looks like it is used only in MMIO use case now. I think we don't need
it anymore. I will remove it.

>
> Thx.
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-06-16 09:54:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 03/12] x86/cpufeatures: Add TDX Guest CPU feature

On Sat, Jun 12, 2021 at 02:02:19PM -0700, Kuppuswamy Sathyanarayanan wrote:
> +void __init tdx_early_init(void)
> +{
> + if (!cpuid_has_tdx_guest())
> + return;
> +
> + setup_force_cpu_cap(X86_FEATURE_TDX_GUEST);
> +
> + pr_info("Guest is initialized\n");

As I had typed, without the "is":

pr_info("Guest initialized\n");

We're trying to keep dmesg style from becoming prose. :)

Rest looks ok.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v2 03/12] x86/cpufeatures: Add TDX Guest CPU feature



On 6/16/21 2:52 AM, Borislav Petkov wrote:
> As I had typed, without the "is":
>
> pr_info("Guest initialized\n");
>
> We're trying to keep dmesg style from becoming prose.:)

Thanks for clarification. I will fix it.

>
> Rest looks ok.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-06-17 19:22:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 04/12] x86/x86: Add early_is_tdx_guest() interface

On Sat, Jun 12, 2021 at 02:04:45PM -0700, Kuppuswamy Sathyanarayanan wrote:

> Subject: Re: [PATCH v2 04/12] x86/x86: Add early_is_tdx_guest() interface

Subject prefix should be "x86/tdx:" ofc.

> diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
> new file mode 100644
> index 000000000000..ddfa4a6d1939
> --- /dev/null
> +++ b/arch/x86/boot/compressed/tdx.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * tdx.c - Early boot code for TDX
> + */
> +
> +#include <asm/tdx.h>

Please no kernel proper includes in the decompressor stage - that thing
is an include hell mess and needs cleaning up. Use cpuid_count() from
arch/x86/boot/cpuflags.c by exporting it properly and add the other
defines here instead of using kernel proper facilities.

I know, I know, there is other misuse but it has to stop.

> +static int __ro_after_init tdx_guest = -1;
> +
> +static inline bool native_cpuid_has_tdx_guest(void)

Why is this function prefixed with "native_"?

> +{
> + u32 eax = TDX_CPUID_LEAF_ID, sig[3] = {0};
> +
> + if (native_cpuid_eax(0) < TDX_CPUID_LEAF_ID)
> + return false;
> +
> + native_cpuid(&eax, &sig[0], &sig[1], &sig[2]);
> +
> + return !memcmp("IntelTDX ", sig, 12);
> +}
> +
> +bool early_is_tdx_guest(void)

So I guess this is going to be used somewhere, because I don't see it.
Or is it going away in favor of prot_guest_has(PR_GUEST_TDX) ?

This is the problem with sending new versions of patches as a reply to
the old ones in the patchset: I get confused. Why don't you wait until
the whole thing has been reviewed and then send a new revision which has
all the changes and I can find my way in the context?

And with the amount of changes so far, you should probably send a new
revision of the initial support set now-ish instead of me reviewing this
one 'til the end.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v2 04/12] x86/x86: Add early_is_tdx_guest() interface



On 6/17/21 10:05 AM, Borislav Petkov wrote:
> On Sat, Jun 12, 2021 at 02:04:45PM -0700, Kuppuswamy Sathyanarayanan wrote:
>
>> Subject: Re: [PATCH v2 04/12] x86/x86: Add early_is_tdx_guest() interface
>
> Subject prefix should be "x86/tdx:" ofc.

I will fix this in next version.

>
>> diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
>> new file mode 100644
>> index 000000000000..ddfa4a6d1939
>> --- /dev/null
>> +++ b/arch/x86/boot/compressed/tdx.c
>> @@ -0,0 +1,28 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * tdx.c - Early boot code for TDX
>> + */
>> +
>> +#include <asm/tdx.h>
>
> Please no kernel proper includes in the decompressor stage - that thing
> is an include hell mess and needs cleaning up. Use cpuid_count() from
> arch/x86/boot/cpuflags.c by exporting it properly and add the other
> defines here instead of using kernel proper facilities.
>
> I know, I know, there is other misuse but it has to stop.

I will re-use cpuid_count() from cpuflags.c. But it is missing definition
for cpuid_eax(0). So, I will have to define a new function to handle it.
Hope its alright with you.

>
>> +static int __ro_after_init tdx_guest = -1;
>> +
>> +static inline bool native_cpuid_has_tdx_guest(void)
>
> Why is this function prefixed with "native_"?
>
>> +{
>> + u32 eax = TDX_CPUID_LEAF_ID, sig[3] = {0};
>> +
>> + if (native_cpuid_eax(0) < TDX_CPUID_LEAF_ID)
>> + return false;
>> +
>> + native_cpuid(&eax, &sig[0], &sig[1], &sig[2]);
>> +
>> + return !memcmp("IntelTDX ", sig, 12);
>> +}
>> +
>> +bool early_is_tdx_guest(void)
>
> So I guess this is going to be used somewhere, because I don't see it.
> Or is it going away in favor of prot_guest_has(PR_GUEST_TDX) ?

It is used for handling TDX specific I/O support in decompressor code.

You can find its usage in https://lore.kernel.org/patchwork/patch/1444186/

May be I should move this patch next to that patch for easy reading.

>
> This is the problem with sending new versions of patches as a reply to
> the old ones in the patchset: I get confused. Why don't you wait until
> the whole thing has been reviewed and then send a new revision which has
> all the changes and I can find my way in the context?
>
> And with the amount of changes so far, you should probably send a new
> revision of the initial support set now-ish instead of me reviewing this
> one 'til the end.

I will send the new series with review fixes. Also, since this patch is only
used by I/O support code, I will move this patch to another TDX series which
adds TDX I/O support.

>
> Thx.
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer