Subject: [PATCH v6 00/10] Add TDX Guest Support (#VE handler support)

Hi All,

Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious
hosts and some physical attacks. This series adds #VE handler support,
for port I/O, MMIO and MWAIT/MONITOR features in TDX guest.

This series is the continuation of the patch series titled "Add TDX Guest
Support (Initial support)" which added initial support for TDX guests. You
can find the patchset in the following link.

[set 1, v7] - https://lore.kernel.org/lkml/20210916183550.15349-1-sathyanarayanan.kuppuswamy@linux.intel.com/

Also please note that this series alone is not necessarily fully
functional.

You can find TDX related documents in the following link.

https://software.intel.com/content/www/br/pt/develop/articles/intel-trust-domain-extensions.html

Changes since v5:
* Rebased on top of v5.15-rc1.
* Rebased on top of Tom Landeckys latest CC support patches.

Changes since v4:
* Renamed tdg_ prefix to tdx_.
* Rest of changelogs are included in patches in-line.

Changes since v3:
* Rebased on top of Tom Lendacky protected guest changes.
* Rest of changelogs are included in patches in-line.

Changes since v2:
* Rebased on top of v5.14-rc1.
* Rest of changelogs are included in patches in-line.

Changes since v1:
* Rebased on top of TDX guest set 1 patches (which had some core API changes).
* Moved "x86/tdx: Add early_is_tdx_guest() interface" patch from set 1 patch
series to this patchset (since it is only used in early I/O support case).
* Rest of changelogs are included in patches in-line.

Andi Kleen (1):
x86/tdx: Handle early IO operations

Kirill A. Shutemov (6):
x86/io: Allow to override inX() and outX() implementation
x86/tdx: Handle port I/O
x86/insn-eval: Introduce insn_get_modrm_reg_ptr()
x86/insn-eval: Introduce insn_decode_mmio()
x86/sev-es: Use insn_decode_mmio() for MMIO implementation
x86/tdx: Handle in-kernel MMIO

Kuppuswamy Sathyanarayanan (3):
x86/tdx: Add early_is_tdx_guest() interface
x86/tdx: Handle port I/O in decompression code
x86/tdx: Handle MWAIT and MONITOR

arch/x86/boot/compressed/Makefile | 2 +
arch/x86/boot/compressed/tdcall.S | 3 +
arch/x86/boot/compressed/tdx.c | 31 +++++
arch/x86/boot/cpuflags.c | 12 +-
arch/x86/boot/cpuflags.h | 2 +
arch/x86/include/asm/insn-eval.h | 13 ++
arch/x86/include/asm/io.h | 24 +++-
arch/x86/include/asm/tdx.h | 64 +++++++++
arch/x86/kernel/cpu/intel.c | 1 +
arch/x86/kernel/head64.c | 3 +
arch/x86/kernel/sev.c | 171 ++++++------------------
arch/x86/kernel/tdx.c | 211 ++++++++++++++++++++++++++++++
arch/x86/lib/insn-eval.c | 102 +++++++++++++++
include/linux/cc_platform.h | 11 ++
14 files changed, 511 insertions(+), 139 deletions(-)
create mode 100644 arch/x86/boot/compressed/tdcall.S
create mode 100644 arch/x86/boot/compressed/tdx.c

--
2.25.1


Subject: [PATCH v6 02/10] x86/tdx: Add early_is_tdx_guest() interface

Add helper function to detect TDX feature support. It will be used
to protect TDX specific code in decompression code (for example to
add TDX specific I/O fixes in decompression code).

Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v5:
* None

Changes since v4:
* None

Changes since v3:
* None

Changes since v2:
* Fixed string order issue in cpuid_count() call.

Changes since v1:
* Modified cpuid_has_tdx_guest() to use cpuid_count() instead of native_cpuid().
* Reused cpuid_count() from cpuflags.c.
* Added a new function cpuid_eax().
* Renamed native_cpuid_has_tdx_guest() as early_cpuid_has_tdx_guest().

arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/tdx.c | 31 +++++++++++++++++++++++++++++++
arch/x86/boot/cpuflags.c | 12 ++++++++++--
arch/x86/boot/cpuflags.h | 2 ++
4 files changed, 44 insertions(+), 2 deletions(-)
create mode 100644 arch/x86/boot/compressed/tdx.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 431bf7f846c3..22a2a6cc2ab4 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -98,6 +98,7 @@ ifdef CONFIG_X86_64
endif

vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o

vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c
new file mode 100644
index 000000000000..ecb3b42992e0
--- /dev/null
+++ b/arch/x86/boot/compressed/tdx.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * tdx.c - Early boot code for TDX
+ */
+
+#include "../cpuflags.h"
+#include "../string.h"
+
+#define TDX_CPUID_LEAF_ID 0x21
+
+static int tdx_guest = -1;
+
+static inline bool early_cpuid_has_tdx_guest(void)
+{
+ u32 eax = TDX_CPUID_LEAF_ID, sig[3] = {0};
+
+ if (cpuid_eax(0) < TDX_CPUID_LEAF_ID)
+ return false;
+
+ cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, &sig[0], &sig[2], &sig[1]);
+
+ return !memcmp("IntelTDX ", sig, 12);
+}
+
+bool early_is_tdx_guest(void)
+{
+ if (tdx_guest < 0)
+ tdx_guest = early_cpuid_has_tdx_guest();
+
+ return !!tdx_guest;
+}
diff --git a/arch/x86/boot/cpuflags.c b/arch/x86/boot/cpuflags.c
index a0b75f73dc63..102613a092aa 100644
--- a/arch/x86/boot/cpuflags.c
+++ b/arch/x86/boot/cpuflags.c
@@ -71,8 +71,7 @@ int has_eflag(unsigned long mask)
# define EBX_REG "=b"
#endif

-static inline void cpuid_count(u32 id, u32 count,
- u32 *a, u32 *b, u32 *c, u32 *d)
+void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d)
{
asm volatile(".ifnc %%ebx,%3 ; movl %%ebx,%3 ; .endif \n\t"
"cpuid \n\t"
@@ -82,6 +81,15 @@ static inline void cpuid_count(u32 id, u32 count,
);
}

+u32 cpuid_eax(u32 id)
+{
+ u32 eax, ebx, ecx, edx;
+
+ cpuid_count(id, 0, &eax, &ebx, &ecx, &edx);
+
+ return eax;
+}
+
#define cpuid(id, a, b, c, d) cpuid_count(id, 0, a, b, c, d)

void get_cpuflags(void)
diff --git a/arch/x86/boot/cpuflags.h b/arch/x86/boot/cpuflags.h
index 2e20814d3ce3..5a72233eb8fd 100644
--- a/arch/x86/boot/cpuflags.h
+++ b/arch/x86/boot/cpuflags.h
@@ -17,5 +17,7 @@ extern u32 cpu_vendor[3];

int has_eflag(unsigned long mask);
void get_cpuflags(void);
+void cpuid_count(u32 id, u32 count, u32 *a, u32 *b, u32 *c, u32 *d);
+u32 cpuid_eax(u32 id);

#endif
--
2.25.1

Subject: [PATCH v6 03/10] x86/tdx: Handle port I/O in decompression code

Add support to replace IN/OUT instructions in decompression code with
TDX IO hypercalls.

TDX cannot do port IO directly. The TDX module triggers a #VE exception
to let the guest kernel to emulate port I/O, by converting them into
TDX hypercalls to call the host.

But for the really early code in the decompressor, #VE cannot be used
because the IDT needed for handling the exception is not set-up, and
some other infrastructure needed by the handler is missing. So to
support port IO in decompressor code, directly replace IN/OUT
instructions with TDX IO hypercalls. This can be easily achieved by
modifying __in/__out macros.

Also, since TDX IO hypercall requires an IO size parameter, modify
__in/__out macros to accept size as input parameter.

Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v5:
* None

Changes since v4:
* Replaced in/out with IN/OUT.
* Fixed typo in commit log and comments.
* Renamed tdg_* prefix with tdx_*

Changes since v3:
* None

Changes since v2:
* None

Changes since v1:
* Modified tdg_{in/out} to adapt to API changes of __tdx_hypercall().
* Removed !CONFIG_INTEL_TDX_GUEST comment in #if else case.

arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/tdcall.S | 3 ++
arch/x86/include/asm/io.h | 9 ++---
arch/x86/include/asm/tdx.h | 60 +++++++++++++++++++++++++++++++
4 files changed, 69 insertions(+), 4 deletions(-)
create mode 100644 arch/x86/boot/compressed/tdcall.S

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 22a2a6cc2ab4..1bfe30ebadbe 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -99,6 +99,7 @@ endif

vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o
+vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdcall.o

vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
diff --git a/arch/x86/boot/compressed/tdcall.S b/arch/x86/boot/compressed/tdcall.S
new file mode 100644
index 000000000000..aafadc136c88
--- /dev/null
+++ b/arch/x86/boot/compressed/tdcall.S
@@ -0,0 +1,3 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include "../../kernel/tdcall.S"
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 3647a96238a9..fa6aa43e5dc3 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -41,6 +41,7 @@
#include <linux/string.h>
#include <linux/compiler.h>
#include <asm/page.h>
+#include <asm/tdx.h>
#include <asm/early_ioremap.h>
#include <asm/pgtable_types.h>

@@ -272,25 +273,25 @@ static inline bool sev_key_active(void) { return false; }
#endif /* CONFIG_AMD_MEM_ENCRYPT */

#ifndef __out
-#define __out(bwl, bw) \
+#define __out(bwl, bw, sz) \
asm volatile("out" #bwl " %" #bw "0, %w1" : : "a"(value), "Nd"(port))
#endif

#ifndef __in
-#define __in(bwl, bw) \
+#define __in(bwl, bw, sz) \
asm volatile("in" #bwl " %w1, %" #bw "0" : "=a"(value) : "Nd"(port))
#endif

#define BUILDIO(bwl, bw, type) \
static inline void out##bwl(unsigned type value, int port) \
{ \
- __out(bwl, bw); \
+ __out(bwl, bw, sizeof(type)); \
} \
\
static inline unsigned type in##bwl(int port) \
{ \
unsigned type value; \
- __in(bwl, bw); \
+ __in(bwl, bw, sizeof(type)); \
return value; \
} \
\
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index d47fec42df11..b35ebb3c54f0 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -5,6 +5,8 @@

#include <linux/cpufeature.h>
#include <linux/types.h>
+#include <vdso/limits.h>
+#include <asm/vmx.h>

#define TDX_CPUID_LEAF_ID 0x21
#define TDX_HYPERCALL_STANDARD 0
@@ -71,6 +73,64 @@ unsigned long tdx_get_ve_info(struct ve_info *ve);
int tdx_handle_virtualization_exception(struct pt_regs *regs,
struct ve_info *ve);

+/*
+ * To support I/O port access in decompressor or early kernel init
+ * code, since #VE exception handler cannot be used, use paravirt
+ * model to implement __in/__out macros which will in turn be used
+ * by in{b,w,l}()/out{b,w,l} I/O helper macros used in kernel. Details
+ * about __in/__out macro usage can be found in arch/x86/include/asm/io.h
+ */
+#ifdef BOOT_COMPRESSED_MISC_H
+
+bool early_is_tdx_guest(void);
+
+/*
+ * Helper function used for making hypercall for "in"
+ * instruction. It will be called from __in IO macro
+ * If IO is failed, it will return all 1s.
+ */
+static inline unsigned int tdx_io_in(int size, int port)
+{
+ struct tdx_hypercall_output out = {0};
+
+ __tdx_hypercall(TDX_HYPERCALL_STANDARD, EXIT_REASON_IO_INSTRUCTION,
+ size, 0, port, 0, &out);
+
+ return out.r10 ? UINT_MAX : out.r11;
+}
+
+/*
+ * Helper function used for making hypercall for "out"
+ * instruction. It will be called from __out IO macro
+ */
+static inline void tdx_io_out(int size, int port, u64 value)
+{
+ struct tdx_hypercall_output out = {0};
+
+ __tdx_hypercall(TDX_HYPERCALL_STANDARD, EXIT_REASON_IO_INSTRUCTION,
+ size, 1, port, value, &out);
+}
+
+#define __out(bwl, bw, sz) \
+do { \
+ if (early_is_tdx_guest()) { \
+ tdx_io_out(sz, port, value); \
+ } else { \
+ asm volatile("out" #bwl " %" #bw "0, %w1" : : \
+ "a"(value), "Nd"(port)); \
+ } \
+} while (0)
+#define __in(bwl, bw, sz) \
+do { \
+ if (early_is_tdx_guest()) { \
+ value = tdx_io_in(sz, port); \
+ } else { \
+ asm volatile("in" #bwl " %w1, %" #bw "0" : \
+ "=a"(value) : "Nd"(port)); \
+ } \
+} while (0)
+#endif
+
#else

static inline void tdx_early_init(void) { };
--
2.25.1

Subject: [PATCH v6 07/10] x86/insn-eval: Introduce insn_decode_mmio()

From: "Kirill A. Shutemov" <[email protected]>

In preparation for sharing MMIO instruction decode between SEV-ES and
TDX, factor out the common decode into a new insn_decode_mmio() helper.

For regular virtual machine, MMIO is handled by the VMM and KVM
emulates instructions that caused MMIO. But, this model doesn't work
for a secure VMs (like SEV or TDX) as VMM doesn't have access to the
guest memory and register state. So, for TDX or SEV VMM needs
assistance in handling MMIO. It induces exception in the guest. Guest
has to decode the instruction and handle it on its own.

The code is based on the current SEV MMIO implementation.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v5:
* None

Changes since v4:
* None

Changes since v3:
* None

Changes since v2:
* None

arch/x86/include/asm/insn-eval.h | 12 +++++
arch/x86/lib/insn-eval.c | 82 ++++++++++++++++++++++++++++++++
2 files changed, 94 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 041f399153b9..4a4ca7e7be66 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -29,4 +29,16 @@ int insn_fetch_from_user_inatomic(struct pt_regs *regs,
bool insn_decode_from_regs(struct insn *insn, struct pt_regs *regs,
unsigned char buf[MAX_INSN_SIZE], int buf_size);

+enum mmio_type {
+ MMIO_DECODE_FAILED,
+ MMIO_WRITE,
+ MMIO_WRITE_IMM,
+ MMIO_READ,
+ MMIO_READ_ZERO_EXTEND,
+ MMIO_READ_SIGN_EXTEND,
+ MMIO_MOVS,
+};
+
+enum mmio_type insn_decode_mmio(struct insn *insn, int *bytes);
+
#endif /* _ASM_X86_INSN_EVAL_H */
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index fbaa3fa24bde..2ab29d8d6731 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1559,3 +1559,85 @@ bool insn_decode_from_regs(struct insn *insn, struct pt_regs *regs,

return true;
}
+
+/**
+ * insn_decode_mmio() - Decode a MMIO instruction
+ * @insn: Structure to store decoded instruction
+ * @bytes: Returns size of memory operand
+ *
+ * Decodes instruction that used for Memory-mapped I/O.
+ *
+ * Returns:
+ *
+ * Type of the instruction. Size of the memory operand is stored in
+ * @bytes. If decode failed, MMIO_DECODE_FAILED returned.
+ */
+enum mmio_type insn_decode_mmio(struct insn *insn, int *bytes)
+{
+ int type = MMIO_DECODE_FAILED;
+
+ *bytes = 0;
+
+ insn_get_opcode(insn);
+ switch (insn->opcode.bytes[0]) {
+ case 0x88: /* MOV m8,r8 */
+ *bytes = 1;
+ fallthrough;
+ case 0x89: /* MOV m16/m32/m64, r16/m32/m64 */
+ if (!*bytes)
+ *bytes = insn->opnd_bytes;
+ type = MMIO_WRITE;
+ break;
+
+ case 0xc6: /* MOV m8, imm8 */
+ *bytes = 1;
+ fallthrough;
+ case 0xc7: /* MOV m16/m32/m64, imm16/imm32/imm64 */
+ if (!*bytes)
+ *bytes = insn->opnd_bytes;
+ type = MMIO_WRITE_IMM;
+ break;
+
+ case 0x8a: /* MOV r8, m8 */
+ *bytes = 1;
+ fallthrough;
+ case 0x8b: /* MOV r16/r32/r64, m16/m32/m64 */
+ if (!*bytes)
+ *bytes = insn->opnd_bytes;
+ type = MMIO_READ;
+ break;
+
+ case 0xa4: /* MOVS m8, m8 */
+ *bytes = 1;
+ fallthrough;
+ case 0xa5: /* MOVS m16/m32/m64, m16/m32/m64 */
+ if (!*bytes)
+ *bytes = insn->opnd_bytes;
+ type = MMIO_MOVS;
+ break;
+
+ case 0x0f: /* Two-byte instruction */
+ switch (insn->opcode.bytes[1]) {
+ case 0xb6: /* MOVZX r16/r32/r64, m8 */
+ *bytes = 1;
+ fallthrough;
+ case 0xb7: /* MOVZX r32/r64, m16 */
+ if (!*bytes)
+ *bytes = 2;
+ type = MMIO_READ_ZERO_EXTEND;
+ break;
+
+ case 0xbe: /* MOVSX r16/r32/r64, m8 */
+ *bytes = 1;
+ fallthrough;
+ case 0xbf: /* MOVSX r32/r64, m16 */
+ if (!*bytes)
+ *bytes = 2;
+ type = MMIO_READ_SIGN_EXTEND;
+ break;
+ }
+ break;
+ }
+
+ return type;
+}
--
2.25.1

Subject: [PATCH v6 08/10] x86/sev-es: Use insn_decode_mmio() for MMIO implementation

From: "Kirill A. Shutemov" <[email protected]>

Switch SEV implementation to insn_decode_mmio(). The helper is going
to be used by TDX too.

No functional changes. It is only build-tested.

Cc: Tom Lendacky <[email protected]>
Cc: Joerg Roedel <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v5:
* None

Changes since v4:
* None

Changes since v3:
* None

Changes since v2:
* None

arch/x86/kernel/sev.c | 171 ++++++++++--------------------------------
1 file changed, 40 insertions(+), 131 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..e2f1ae006114 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -807,22 +807,6 @@ static void __init vc_early_forward_exception(struct es_em_ctxt *ctxt)
do_early_exception(ctxt->regs, trapnr);
}

-static long *vc_insn_get_reg(struct es_em_ctxt *ctxt)
-{
- long *reg_array;
- int offset;
-
- reg_array = (long *)ctxt->regs;
- offset = insn_get_modrm_reg_off(&ctxt->insn, ctxt->regs);
-
- if (offset < 0)
- return NULL;
-
- offset /= sizeof(long);
-
- return reg_array + offset;
-}
-
static long *vc_insn_get_rm(struct es_em_ctxt *ctxt)
{
long *reg_array;
@@ -870,76 +854,6 @@ static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
return sev_es_ghcb_hv_call(ghcb, ctxt, exit_code, exit_info_1, exit_info_2);
}

-static enum es_result vc_handle_mmio_twobyte_ops(struct ghcb *ghcb,
- struct es_em_ctxt *ctxt)
-{
- struct insn *insn = &ctxt->insn;
- unsigned int bytes = 0;
- enum es_result ret;
- int sign_byte;
- long *reg_data;
-
- switch (insn->opcode.bytes[1]) {
- /* MMIO Read w/ zero-extension */
- case 0xb6:
- bytes = 1;
- fallthrough;
- case 0xb7:
- if (!bytes)
- bytes = 2;
-
- ret = vc_do_mmio(ghcb, ctxt, bytes, true);
- if (ret)
- break;
-
- /* Zero extend based on operand size */
- reg_data = vc_insn_get_reg(ctxt);
- if (!reg_data)
- return ES_DECODE_FAILED;
-
- memset(reg_data, 0, insn->opnd_bytes);
-
- memcpy(reg_data, ghcb->shared_buffer, bytes);
- break;
-
- /* MMIO Read w/ sign-extension */
- case 0xbe:
- bytes = 1;
- fallthrough;
- case 0xbf:
- if (!bytes)
- bytes = 2;
-
- ret = vc_do_mmio(ghcb, ctxt, bytes, true);
- if (ret)
- break;
-
- /* Sign extend based on operand size */
- reg_data = vc_insn_get_reg(ctxt);
- if (!reg_data)
- return ES_DECODE_FAILED;
-
- if (bytes == 1) {
- u8 *val = (u8 *)ghcb->shared_buffer;
-
- sign_byte = (*val & 0x80) ? 0xff : 0x00;
- } else {
- u16 *val = (u16 *)ghcb->shared_buffer;
-
- sign_byte = (*val & 0x8000) ? 0xff : 0x00;
- }
- memset(reg_data, sign_byte, insn->opnd_bytes);
-
- memcpy(reg_data, ghcb->shared_buffer, bytes);
- break;
-
- default:
- ret = ES_UNSUPPORTED;
- }
-
- return ret;
-}
-
/*
* The MOVS instruction has two memory operands, which raises the
* problem that it is not known whether the access to the source or the
@@ -1007,83 +921,78 @@ static enum es_result vc_handle_mmio_movs(struct es_em_ctxt *ctxt,
return ES_RETRY;
}

-static enum es_result vc_handle_mmio(struct ghcb *ghcb,
- struct es_em_ctxt *ctxt)
+static enum es_result vc_handle_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
{
struct insn *insn = &ctxt->insn;
unsigned int bytes = 0;
+ enum mmio_type mmio;
enum es_result ret;
+ u8 sign_byte;
long *reg_data;

- switch (insn->opcode.bytes[0]) {
- /* MMIO Write */
- case 0x88:
- bytes = 1;
- fallthrough;
- case 0x89:
- if (!bytes)
- bytes = insn->opnd_bytes;
+ mmio = insn_decode_mmio(insn, &bytes);
+ if (mmio == MMIO_DECODE_FAILED)
+ return ES_DECODE_FAILED;

- reg_data = vc_insn_get_reg(ctxt);
+ if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
+ reg_data = insn_get_modrm_reg_ptr(insn, ctxt->regs);
if (!reg_data)
return ES_DECODE_FAILED;
+ }

+ switch (mmio) {
+ case MMIO_WRITE:
memcpy(ghcb->shared_buffer, reg_data, bytes);
-
ret = vc_do_mmio(ghcb, ctxt, bytes, false);
break;
-
- case 0xc6:
- bytes = 1;
- fallthrough;
- case 0xc7:
- if (!bytes)
- bytes = insn->opnd_bytes;
-
+ case MMIO_WRITE_IMM:
memcpy(ghcb->shared_buffer, insn->immediate1.bytes, bytes);
-
ret = vc_do_mmio(ghcb, ctxt, bytes, false);
break;
-
- /* MMIO Read */
- case 0x8a:
- bytes = 1;
- fallthrough;
- case 0x8b:
- if (!bytes)
- bytes = insn->opnd_bytes;
-
+ case MMIO_READ:
ret = vc_do_mmio(ghcb, ctxt, bytes, true);
if (ret)
break;

- reg_data = vc_insn_get_reg(ctxt);
- if (!reg_data)
- return ES_DECODE_FAILED;
-
/* Zero-extend for 32-bit operation */
if (bytes == 4)
*reg_data = 0;

memcpy(reg_data, ghcb->shared_buffer, bytes);
break;
+ case MMIO_READ_ZERO_EXTEND:
+ ret = vc_do_mmio(ghcb, ctxt, bytes, true);
+ if (ret)
+ break;
+
+ memset(reg_data, 0, insn->opnd_bytes);
+ memcpy(reg_data, ghcb->shared_buffer, bytes);
+ break;
+ case MMIO_READ_SIGN_EXTEND:
+ ret = vc_do_mmio(ghcb, ctxt, bytes, true);
+ if (ret)
+ break;

- /* MOVS instruction */
- case 0xa4:
- bytes = 1;
- fallthrough;
- case 0xa5:
- if (!bytes)
- bytes = insn->opnd_bytes;
+ if (bytes == 1) {
+ u8 *val = (u8 *)ghcb->shared_buffer;

- ret = vc_handle_mmio_movs(ctxt, bytes);
+ sign_byte = (*val & 0x80) ? 0xff : 0x00;
+ } else {
+ u16 *val = (u16 *)ghcb->shared_buffer;
+
+ sign_byte = (*val & 0x8000) ? 0xff : 0x00;
+ }
+
+ /* Sign extend based on operand size */
+ memset(reg_data, sign_byte, insn->opnd_bytes);
+ memcpy(reg_data, ghcb->shared_buffer, bytes);
break;
- /* Two-Byte Opcodes */
- case 0x0f:
- ret = vc_handle_mmio_twobyte_ops(ghcb, ctxt);
+ case MMIO_MOVS:
+ ret = vc_handle_mmio_movs(ctxt, bytes);
break;
default:
ret = ES_UNSUPPORTED;
+ break;
}

return ret;
--
2.25.1

Subject: [PATCH v6 06/10] x86/insn-eval: Introduce insn_get_modrm_reg_ptr()

From: "Kirill A. Shutemov" <[email protected]>

The helper returns a pointer to the register indicated by
ModRM byte.

It's going to replace vc_insn_get_reg() in the SEV MMIO
implementation. TDX MMIO implementation will also use it.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v5:
* None

Changes since v4:
* None

Changes since v3:
* None

Changes since v2:
* None

arch/x86/include/asm/insn-eval.h | 1 +
arch/x86/lib/insn-eval.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 91d7182ad2d6..041f399153b9 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -19,6 +19,7 @@ bool insn_has_rep_prefix(struct insn *insn);
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs *regs);
+void *insn_get_modrm_reg_ptr(struct insn *insn, struct pt_regs *regs);
unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
int insn_get_code_seg_params(struct pt_regs *regs);
int insn_fetch_from_user(struct pt_regs *regs,
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a1d24fdc07cf..fbaa3fa24bde 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -850,6 +850,26 @@ int insn_get_modrm_reg_off(struct insn *insn, struct pt_regs *regs)
return get_reg_offset(insn, regs, REG_TYPE_REG);
}

+/**
+ * insn_get_modrm_reg_ptr() - Obtain register pointer based on ModRM byte
+ * @insn: Instruction containing the ModRM byte
+ * @regs: Register values as seen when entering kernel mode
+ *
+ * Returns:
+ *
+ * The register indicated by the reg part of the ModRM byte.
+ * The register is obtained as a pointer within pt_regs.
+ */
+void *insn_get_modrm_reg_ptr(struct insn *insn, struct pt_regs *regs)
+{
+ int offset;
+
+ offset = insn_get_modrm_reg_off(insn, regs);
+ if (offset < 0)
+ return NULL;
+ return (void *)regs + offset;
+}
+
/**
* get_seg_base_limit() - obtain base address and limit of a segment
* @insn: Instruction. Must be valid.
--
2.25.1

Subject: [PATCH v6 10/10] x86/tdx: Handle MWAIT and MONITOR

When running as a TDX guest, there are a number of existing, privileged
instructions that do not work. If the guest kernel uses these
instructions, the hardware generates a #VE.

List of unsupported instructions can be found in Intel Trust Domain
Extensions (Intel® TDX) Module specification, sec 9.2.2 and in
Guest-Host Communication Interface (GHCI) Specification for Intel TDX,
sec 2.4.1.

To prevent TD guests from using MWAIT/MONITOR instructions, the CPUID
flags for these instructions are already disabled by the TDX module. 
   
After the above mentioned preventive measures, if TD guests still
execute these instructions, add appropriate warning message
(WARN_ONCE()) in #VE handler. This handling behavior is same as KVM
(which also treats MWAIT/MONITOR as nops with warning once in
unsupported platforms).

Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
---

Changes since v5:
* None

Changes since v4:
* Removed usage of We/You in commit log and comments.

Changes since v3:
* None

Changes since v2:
* None

arch/x86/kernel/tdx.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index aa4719c22c2a..a77d8ffae686 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -369,6 +369,14 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
return -EFAULT;
}
break;
+ case EXIT_REASON_MONITOR_INSTRUCTION:
+ case EXIT_REASON_MWAIT_INSTRUCTION:
+ /*
+ * Something in the kernel used MONITOR or MWAIT despite
+ * X86_FEATURE_MWAIT being cleared for TDX guests.
+ */
+ WARN_ONCE(1, "TD Guest used unsupported MWAIT/MONITOR instruction\n");
+ break;
default:
pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
return -EFAULT;
--
2.25.1

Subject: [PATCH v6 04/10] x86/tdx: Handle early IO operations

From: Andi Kleen <[email protected]>

Add an early #VE handler to convert early port IOs into TDCALLs.

TDX cannot do port IO directly. The TDX module triggers a #VE exception
to let the guest kernel to emulate port I/O, by converting them into
TDCALLs to call the host.

To support port IO in early boot code, add a minimal support in early
exception handlers. This is similar to what AMD SEV does. This is
mainly to support early_printk's serial driver, as well as potentially
the VGA driver (although it is expected not to be used).

The early handler only does IO calls and nothing else, and anything
that goes wrong results in a normal early exception panic.

It cannot share the code paths with the normal #VE handler because it
needs to avoid using trace calls or printk.

This early handler allows us to use the normal in*/out* macros without
patching them for every driver. Since, there is no expectation that
early port IO is performance critical, the #VE emulation cost is worth
the simplicity benefit of not patching out port IO usage in early code.
There are also no concerns with nesting, since there should be no NMIs
this early.

Signed-off-by: Andi Kleen <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v5:
* None

Changes since v4:
* Changed order of variable declaration in tdx_early_io()

Changes since v3:
* None

Changes since v2:
* None

Changes since v1:
* Renamed tdx_early_io() to tdg_early_io().

arch/x86/include/asm/tdx.h | 4 +++
arch/x86/kernel/head64.c | 3 ++
arch/x86/kernel/tdx.c | 60 ++++++++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index b35ebb3c54f0..eedacf8ede3d 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -73,6 +73,8 @@ unsigned long tdx_get_ve_info(struct ve_info *ve);
int tdx_handle_virtualization_exception(struct pt_regs *regs,
struct ve_info *ve);

+bool tdx_early_handle_ve(struct pt_regs *regs);
+
/*
* To support I/O port access in decompressor or early kernel init
* code, since #VE exception handler cannot be used, use paravirt
@@ -135,6 +137,8 @@ do { \

static inline void tdx_early_init(void) { };

+static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return false; }
+
#endif /* CONFIG_INTEL_TDX_GUEST */

#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_INTEL_TDX_GUEST)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 10c2568d7c42..f1e09ecbc1b5 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -410,6 +410,9 @@ void __init do_early_exception(struct pt_regs *regs, int trapnr)
trapnr == X86_TRAP_VC && handle_vc_boot_ghcb(regs))
return;

+ if (trapnr == X86_TRAP_VE && tdx_early_handle_ve(regs))
+ return;
+
early_fixup_exception(regs, trapnr);
}

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 45dbd8863ec0..b3515f789f64 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -10,6 +10,11 @@
/* TDX Module call Leaf IDs */
#define TDGETVEINFO 3

+#define VE_IS_IO_OUT(exit_qual) (((exit_qual) & 8) ? 0 : 1)
+#define VE_GET_IO_SIZE(exit_qual) (((exit_qual) & 7) + 1)
+#define VE_GET_PORT_NUM(exit_qual) ((exit_qual) >> 16)
+#define VE_IS_IO_STRING(exit_qual) ((exit_qual) & 16 ? 1 : 0)
+
/*
* Wrapper for standard use of __tdx_hypercall with BUG_ON() check
* for TDCALL error.
@@ -233,6 +238,61 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
return ret;
}

+/*
+ * Handle early IO, mainly for early printks serial output.
+ * This avoids anything that doesn't work early on, like tracing
+ * or printks, by calling the low level functions directly. Any
+ * problems are handled by falling back to a standard early exception.
+ *
+ * Assumes the IO instruction was using ax, which is enforced
+ * by the standard io.h macros.
+ */
+static __init bool tdx_early_io(struct pt_regs *regs, u32 exit_qual)
+{
+ struct tdx_hypercall_output outh;
+ int out, size, port, ret;
+ bool string;
+ u64 mask;
+
+ string = VE_IS_IO_STRING(exit_qual);
+
+ /* I/O strings ops are unrolled at build time. */
+ if (string)
+ return 0;
+
+ out = VE_IS_IO_OUT(exit_qual);
+ size = VE_GET_IO_SIZE(exit_qual);
+ port = VE_GET_PORT_NUM(exit_qual);
+ mask = GENMASK(8 * size, 0);
+
+ ret = _tdx_hypercall(EXIT_REASON_IO_INSTRUCTION, size, out, port,
+ regs->ax, &outh);
+ if (!out && !ret) {
+ regs->ax &= ~mask;
+ regs->ax |= outh.r11 & mask;
+ }
+
+ return !ret;
+}
+
+/*
+ * Early #VE exception handler. Just used to handle port IOs
+ * for early_printk. If anything goes wrong handle it like
+ * a normal early exception.
+ */
+__init bool tdx_early_handle_ve(struct pt_regs *regs)
+{
+ struct ve_info ve;
+
+ if (tdx_get_ve_info(&ve))
+ return false;
+
+ if (ve.exit_reason == EXIT_REASON_IO_INSTRUCTION)
+ return tdx_early_io(regs, ve.exit_qual);
+
+ return false;
+}
+
void __init tdx_early_init(void)
{
if (!cpuid_has_tdx_guest())
--
2.25.1

Subject: [PATCH v6 09/10] x86/tdx: Handle in-kernel MMIO

From: "Kirill A. Shutemov" <[email protected]>

In traditional VMs, MMIO is usually implemented by giving a guest
access to a mapping which will cause a VMEXIT on access and then the
VMM emulating the access. That's not possible in TDX guest because
VMEXIT will expose the register state to the host. TDX guests don't
trust the host and can't have its state exposed to the host. In TDX
the MMIO regions are instead configured to trigger a #VE exception in
the guest. The guest #VE handler then emulates the MMIO instruction
inside the guest and converts them into a controlled TDCALL to the
host, rather than completely exposing the state to the host.

Currently, TDX only supports MMIO for instructions that are known to
come from io.h macros (build_mmio_read/write()). For drivers that don't
use the io.h macros or uses structure overlay to do MMIO are currently
not supported in TDX guest (for example the MMIO based XAPIC is disable
at runtime for TDX).

This way of handling is similar to AMD SEV.

Also, reasons for supporting #VE based MMIO in TDX guest are,

* MMIO is widely used and more drivers will be added in the future.
* To avoid annotating every TDX specific MMIO readl/writel etc.
* If annotation is not preferred, an alternative way is required
for every MMIO access in the kernel (even though 99.9% will never
be used on TDX) which would be a complete waste and incredible
binary bloat for nothing.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
---

Changes since v5:
* None

Changes since v4:
* Changed order of variable declaration in tdx_handle_mmio().
* Changed tdg_* prefix with tdx_*.

Changes since v3:
* None

Changes since v2:
* None

arch/x86/kernel/tdx.c | 108 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 108 insertions(+)

diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index 31a2c2a9722a..aa4719c22c2a 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -6,6 +6,9 @@

#include <asm/tdx.h>
#include <asm/vmx.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
+#include <linux/sched/signal.h> /* force_sig_fault() */

/* TDX Module call Leaf IDs */
#define TDGETVEINFO 3
@@ -212,6 +215,103 @@ static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
}
}

+static unsigned long tdx_mmio(int size, bool write, unsigned long addr,
+ unsigned long *val)
+{
+ struct tdx_hypercall_output out = {0};
+ u64 err;
+
+ err = _tdx_hypercall(EXIT_REASON_EPT_VIOLATION, size, write,
+ addr, *val, &out);
+ *val = out.r11;
+ return err;
+}
+
+static int tdx_handle_mmio(struct pt_regs *regs, struct ve_info *ve)
+{
+ char buffer[MAX_INSN_SIZE];
+ unsigned long *reg, val;
+ struct insn insn = {};
+ enum mmio_type mmio;
+ int size, ret;
+ u8 sign_byte;
+
+ if (user_mode(regs)) {
+ ret = insn_fetch_from_user(regs, buffer);
+ if (!ret)
+ return -EFAULT;
+ if (!insn_decode_from_regs(&insn, regs, buffer, ret))
+ return -EFAULT;
+ } else {
+ ret = copy_from_kernel_nofault(buffer, (void *)regs->ip,
+ MAX_INSN_SIZE);
+ if (ret)
+ return -EFAULT;
+ insn_init(&insn, buffer, MAX_INSN_SIZE, 1);
+ insn_get_length(&insn);
+ }
+
+ mmio = insn_decode_mmio(&insn, &size);
+ if (mmio == MMIO_DECODE_FAILED)
+ return -EFAULT;
+
+ if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
+ reg = insn_get_modrm_reg_ptr(&insn, regs);
+ if (!reg)
+ return -EFAULT;
+ }
+
+ switch (mmio) {
+ case MMIO_WRITE:
+ memcpy(&val, reg, size);
+ ret = tdx_mmio(size, true, ve->gpa, &val);
+ break;
+ case MMIO_WRITE_IMM:
+ val = insn.immediate.value;
+ ret = tdx_mmio(size, true, ve->gpa, &val);
+ break;
+ case MMIO_READ:
+ ret = tdx_mmio(size, false, ve->gpa, &val);
+ if (ret)
+ break;
+ /* Zero-extend for 32-bit operation */
+ if (size == 4)
+ *reg = 0;
+ memcpy(reg, &val, size);
+ break;
+ case MMIO_READ_ZERO_EXTEND:
+ ret = tdx_mmio(size, false, ve->gpa, &val);
+ if (ret)
+ break;
+
+ /* Zero extend based on operand size */
+ memset(reg, 0, insn.opnd_bytes);
+ memcpy(reg, &val, size);
+ break;
+ case MMIO_READ_SIGN_EXTEND:
+ ret = tdx_mmio(size, false, ve->gpa, &val);
+ if (ret)
+ break;
+
+ if (size == 1)
+ sign_byte = (val & 0x80) ? 0xff : 0x00;
+ else
+ sign_byte = (val & 0x8000) ? 0xff : 0x00;
+
+ /* Sign extend based on operand size */
+ memset(reg, sign_byte, insn.opnd_bytes);
+ memcpy(reg, &val, size);
+ break;
+ case MMIO_MOVS:
+ case MMIO_DECODE_FAILED:
+ return -EFAULT;
+ }
+
+ if (ret)
+ return -EFAULT;
+ return insn.length;
+}
+
unsigned long tdx_get_ve_info(struct ve_info *ve)
{
struct tdx_module_output out = {0};
@@ -261,6 +361,14 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
case EXIT_REASON_IO_INSTRUCTION:
tdx_handle_io(regs, ve->exit_qual);
break;
+ case EXIT_REASON_EPT_VIOLATION:
+ /* Currently only MMIO triggers EPT violation */
+ ve->instr_len = tdx_handle_mmio(regs, ve);
+ if (ve->instr_len < 0) {
+ pr_warn_once("MMIO failed\n");
+ return -EFAULT;
+ }
+ break;
default:
pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
return -EFAULT;
--
2.25.1

Subject: [PATCH v6 05/10] x86/tdx: Handle port I/O

From: "Kirill A. Shutemov" <[email protected]>

TDX hypervisors cannot emulate instructions directly. This includes
port IO which is normally emulated in the hypervisor. All port IO
instructions inside TDX trigger the #VE exception in the guest and
would be normally emulated there.

Also string I/O is not supported in TDX guest. So, unroll the string
I/O operation into a loop operating on one element at a time. This
method is similar to AMD SEV, so just extend the support for TDX guest
platform.

Add a new confidential guest flag CC_ATTR_GUEST_UNROLL_STRING_IO to
add string unroll support in asm/io.h

Co-developed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
---

Changes since v5:
* Changed prot_guest_has() to cc_platform_has().

Changes since v4:
* Changed order of variable declaration in tdx_handle_io().
* Changed tdg_* prefix with tdx_*.

Changes since v3:
* Included PATTR_GUEST_UNROLL_STRING_IO protected guest flag
addition change in this patch.
* Rebased on top of Tom Lendacks protected guest change.

Changes since v2:
* None

Changes since v1:
* Fixed comments for tdg_handle_io().
* Used _tdx_hypercall() instead of __tdx_hypercall() in tdg_handle_io().

arch/x86/include/asm/io.h | 7 +++++--
arch/x86/kernel/cpu/intel.c | 1 +
arch/x86/kernel/tdx.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/cc_platform.h | 11 +++++++++++
4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index fa6aa43e5dc3..67e0c4a0a0f4 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -40,6 +40,7 @@

#include <linux/string.h>
#include <linux/compiler.h>
+#include <linux/cc_platform.h>
#include <asm/page.h>
#include <asm/tdx.h>
#include <asm/early_ioremap.h>
@@ -310,7 +311,8 @@ static inline unsigned type in##bwl##_p(int port) \
\
static inline void outs##bwl(int port, const void *addr, unsigned long count) \
{ \
- if (sev_key_active()) { \
+ if (sev_key_active() || \
+ cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) { \
unsigned type *value = (unsigned type *)addr; \
while (count) { \
out##bwl(*value, port); \
@@ -326,7 +328,8 @@ static inline void outs##bwl(int port, const void *addr, unsigned long count) \
\
static inline void ins##bwl(int port, void *addr, unsigned long count) \
{ \
- if (sev_key_active()) { \
+ if (sev_key_active() || \
+ cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) { \
unsigned type *value = (unsigned type *)addr; \
while (count) { \
*value = in##bwl(port); \
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ab337c194beb..c7c02c2e7601 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -66,6 +66,7 @@ bool intel_cc_platform_has(enum cc_attr attr)
{
switch (attr) {
case CC_ATTR_GUEST_TDX:
+ case CC_ATTR_GUEST_UNROLL_STRING_IO:
return cpu_feature_enabled(X86_FEATURE_TDX_GUEST);
default:
return false;
diff --git a/arch/x86/kernel/tdx.c b/arch/x86/kernel/tdx.c
index b3515f789f64..31a2c2a9722a 100644
--- a/arch/x86/kernel/tdx.c
+++ b/arch/x86/kernel/tdx.c
@@ -180,6 +180,38 @@ static u64 tdx_handle_cpuid(struct pt_regs *regs)
return ret;
}

+/*
+ * tdx_handle_early_io() cannot be re-used in #VE handler for handling
+ * I/O because the way of handling string I/O is different between
+ * normal and early I/O case. Also, once trace support is enabled,
+ * tdx_handle_io() will be extended to use trace calls which is also
+ * not valid for early I/O cases.
+ */
+static void tdx_handle_io(struct pt_regs *regs, u32 exit_qual)
+{
+ struct tdx_hypercall_output outh;
+ int out, size, port, ret;
+ bool string;
+ u64 mask;
+
+ string = VE_IS_IO_STRING(exit_qual);
+
+ /* I/O strings ops are unrolled at build time. */
+ BUG_ON(string);
+
+ out = VE_IS_IO_OUT(exit_qual);
+ size = VE_GET_IO_SIZE(exit_qual);
+ port = VE_GET_PORT_NUM(exit_qual);
+ mask = GENMASK(8 * size, 0);
+
+ ret = _tdx_hypercall(EXIT_REASON_IO_INSTRUCTION, size, out, port,
+ regs->ax, &outh);
+ if (!out) {
+ regs->ax &= ~mask;
+ regs->ax |= (ret ? UINT_MAX : outh.r11) & mask;
+ }
+}
+
unsigned long tdx_get_ve_info(struct ve_info *ve)
{
struct tdx_module_output out = {0};
@@ -226,6 +258,9 @@ int tdx_handle_virtualization_exception(struct pt_regs *regs,
case EXIT_REASON_CPUID:
ret = tdx_handle_cpuid(regs);
break;
+ case EXIT_REASON_IO_INSTRUCTION:
+ tdx_handle_io(regs, ve->exit_qual);
+ break;
default:
pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
return -EFAULT;
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index e38430e6e396..253fd4ca7178 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -70,6 +70,17 @@ enum cc_attr {
* Examples include SEV-ES.
*/
CC_ATTR_GUEST_TDX,
+
+ /**
+ * @CC_ATTR_GUEST_UNROLL_STRING_IO: String I/O is implemented with
+ * IN/OUT instructions
+ *
+ * The platform/OS is running as a guest/virtual machine and uses
+ * IN/OUT instructions in place of string I/O.
+ *
+ * Examples include TDX Guest.
+ */
+ CC_ATTR_GUEST_UNROLL_STRING_IO,
};

#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
--
2.25.1

2021-09-23 16:34:21

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] x86/tdx: Handle port I/O

On 9/22/21 5:52 PM, Kuppuswamy Sathyanarayanan wrote:
> From: "Kirill A. Shutemov" <[email protected]>
>
> TDX hypervisors cannot emulate instructions directly. This includes
> port IO which is normally emulated in the hypervisor. All port IO
> instructions inside TDX trigger the #VE exception in the guest and
> would be normally emulated there.
>
> Also string I/O is not supported in TDX guest. So, unroll the string
> I/O operation into a loop operating on one element at a time. This
> method is similar to AMD SEV, so just extend the support for TDX guest
> platform.
>
> Add a new confidential guest flag CC_ATTR_GUEST_UNROLL_STRING_IO to
> add string unroll support in asm/io.h
>
> Co-developed-by: Kuppuswamy Sathyanarayanan <[email protected]>
> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
> Signed-off-by: Kirill A. Shutemov <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> Reviewed-by: Dan Williams <[email protected]>
> ---
>
> Changes since v5:
> * Changed prot_guest_has() to cc_platform_has().
>
> Changes since v4:
> * Changed order of variable declaration in tdx_handle_io().
> * Changed tdg_* prefix with tdx_*.
>
> Changes since v3:
> * Included PATTR_GUEST_UNROLL_STRING_IO protected guest flag
> addition change in this patch.
> * Rebased on top of Tom Lendacks protected guest change.
>
> Changes since v2:
> * None
>
> Changes since v1:
> * Fixed comments for tdg_handle_io().
> * Used _tdx_hypercall() instead of __tdx_hypercall() in tdg_handle_io().
>
> arch/x86/include/asm/io.h | 7 +++++--
> arch/x86/kernel/cpu/intel.c | 1 +
> arch/x86/kernel/tdx.c | 35 +++++++++++++++++++++++++++++++++++
> include/linux/cc_platform.h | 11 +++++++++++
> 4 files changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index fa6aa43e5dc3..67e0c4a0a0f4 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -40,6 +40,7 @@
>
> #include <linux/string.h>
> #include <linux/compiler.h>
> +#include <linux/cc_platform.h>
> #include <asm/page.h>
> #include <asm/tdx.h>
> #include <asm/early_ioremap.h>
> @@ -310,7 +311,8 @@ static inline unsigned type in##bwl##_p(int port) \
> \
> static inline void outs##bwl(int port, const void *addr, unsigned long count) \
> { \
> - if (sev_key_active()) { \ > + if (sev_key_active() || \
> + cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) { \

Would it make sense to make sev_key_active() and sev_enable_key generic
and just re-use those instead of adding CC_ATTR_GUEST_UNROLL_STRING_IO and
having multiple conditions here?

You can set the key in the TDX init routine just like SEV does.

Thanks,
Tom

Subject: Re: [PATCH v6 05/10] x86/tdx: Handle port I/O



On 9/23/21 9:32 AM, Tom Lendacky wrote:
> On 9/22/21 5:52 PM, Kuppuswamy Sathyanarayanan wrote:
>> From: "Kirill A. Shutemov" <[email protected]>
>>
>> TDX hypervisors cannot emulate instructions directly. This includes
>> port IO which is normally emulated in the hypervisor. All port IO
>> instructions inside TDX trigger the #VE exception in the guest and
>> would be normally emulated there.
>>
>> Also string I/O is not supported in TDX guest. So, unroll the string
>> I/O operation into a loop operating on one element at a time. This
>> method is similar to AMD SEV, so just extend the support for TDX guest
>> platform.
>>
>> Add a new confidential guest flag CC_ATTR_GUEST_UNROLL_STRING_IO to
>> add string unroll support in asm/io.h
>>
>> Co-developed-by: Kuppuswamy Sathyanarayanan <[email protected]>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]>
>> Signed-off-by: Kirill A. Shutemov <[email protected]>
>> Reviewed-by: Andi Kleen <[email protected]>
>> Reviewed-by: Dan Williams <[email protected]>
>> ---
>>
>> Changes since v5:
>>   * Changed prot_guest_has() to cc_platform_has().
>>
>> Changes since v4:
>>   * Changed order of variable declaration in tdx_handle_io().
>>   * Changed tdg_* prefix with tdx_*.
>>
>> Changes since v3:
>>   * Included PATTR_GUEST_UNROLL_STRING_IO protected guest flag
>>     addition change in this patch.
>>   * Rebased on top of Tom Lendacks protected guest change.
>>
>> Changes since v2:
>>   * None
>>
>> Changes since v1:
>>   * Fixed comments for tdg_handle_io().
>>   * Used _tdx_hypercall() instead of __tdx_hypercall() in tdg_handle_io().
>>
>>   arch/x86/include/asm/io.h   |  7 +++++--
>>   arch/x86/kernel/cpu/intel.c |  1 +
>>   arch/x86/kernel/tdx.c       | 35 +++++++++++++++++++++++++++++++++++
>>   include/linux/cc_platform.h | 11 +++++++++++
>>   4 files changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index fa6aa43e5dc3..67e0c4a0a0f4 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -40,6 +40,7 @@
>>   #include <linux/string.h>
>>   #include <linux/compiler.h>
>> +#include <linux/cc_platform.h>
>>   #include <asm/page.h>
>>   #include <asm/tdx.h>
>>   #include <asm/early_ioremap.h>
>> @@ -310,7 +311,8 @@ static inline unsigned type in##bwl##_p(int port)            \
>>                                       \
>>   static inline void outs##bwl(int port, const void *addr, unsigned long count) \
>>   {                                    \
>> -    if (sev_key_active()) {                        \ > +    if (sev_key_active()
>> ||                        \
>> +        cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) {        \
>
> Would it make sense to make sev_key_active() and sev_enable_key generic and just re-use those
> instead of adding CC_ATTR_GUEST_UNROLL_STRING_IO and having multiple conditions here?
>
> You can set the key in the TDX init routine just like SEV does.

Any reason for using sev_enable_key over CC attribute? IMO, CC attribute exist
to generalize the common feature code. My impression is SEV is specific to AMD
code.

>
> Thanks,
> Tom
>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

2021-09-23 18:03:49

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] x86/tdx: Handle port I/O

On 9/23/21 12:24 PM, Kuppuswamy, Sathyanarayanan wrote:
>
>
> On 9/23/21 9:32 AM, Tom Lendacky wrote:
>> On 9/22/21 5:52 PM, Kuppuswamy Sathyanarayanan wrote:
>>> From: "Kirill A. Shutemov" <[email protected]>
>>>
>>> TDX hypervisors cannot emulate instructions directly. This includes
>>> port IO which is normally emulated in the hypervisor. All port IO
>>> instructions inside TDX trigger the #VE exception in the guest and
>>> would be normally emulated there.
>>>
>>> Also string I/O is not supported in TDX guest. So, unroll the string
>>> I/O operation into a loop operating on one element at a time. This
>>> method is similar to AMD SEV, so just extend the support for TDX guest
>>> platform.
>>>
>>> Add a new confidential guest flag CC_ATTR_GUEST_UNROLL_STRING_IO to
>>> add string unroll support in asm/io.h
>>>
>>> Co-developed-by: Kuppuswamy Sathyanarayanan
>>> <[email protected]>
>>> Signed-off-by: Kuppuswamy Sathyanarayanan
>>> <[email protected]>
>>> Signed-off-by: Kirill A. Shutemov <[email protected]>
>>> Reviewed-by: Andi Kleen <[email protected]>
>>> Reviewed-by: Dan Williams <[email protected]>
>>> ---
>>>
>>> Changes since v5:
>>>   * Changed prot_guest_has() to cc_platform_has().
>>>
>>> Changes since v4:
>>>   * Changed order of variable declaration in tdx_handle_io().
>>>   * Changed tdg_* prefix with tdx_*.
>>>
>>> Changes since v3:
>>>   * Included PATTR_GUEST_UNROLL_STRING_IO protected guest flag
>>>     addition change in this patch.
>>>   * Rebased on top of Tom Lendacks protected guest change.
>>>
>>> Changes since v2:
>>>   * None
>>>
>>> Changes since v1:
>>>   * Fixed comments for tdg_handle_io().
>>>   * Used _tdx_hypercall() instead of __tdx_hypercall() in tdg_handle_io().
>>>
>>>   arch/x86/include/asm/io.h   |  7 +++++--
>>>   arch/x86/kernel/cpu/intel.c |  1 +
>>>   arch/x86/kernel/tdx.c       | 35 +++++++++++++++++++++++++++++++++++
>>>   include/linux/cc_platform.h | 11 +++++++++++
>>>   4 files changed, 52 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>>> index fa6aa43e5dc3..67e0c4a0a0f4 100644
>>> --- a/arch/x86/include/asm/io.h
>>> +++ b/arch/x86/include/asm/io.h
>>> @@ -40,6 +40,7 @@
>>>   #include <linux/string.h>
>>>   #include <linux/compiler.h>
>>> +#include <linux/cc_platform.h>
>>>   #include <asm/page.h>
>>>   #include <asm/tdx.h>
>>>   #include <asm/early_ioremap.h>
>>> @@ -310,7 +311,8 @@ static inline unsigned type in##bwl##_p(int
>>> port)            \
>>>                                       \
>>>   static inline void outs##bwl(int port, const void *addr, unsigned
>>> long count) \
>>>   {                                    \
>>> -    if (sev_key_active()) {                        \ > +    if
>>> (sev_key_active() ||                        \
>>> +        cc_platform_has(CC_ATTR_GUEST_UNROLL_STRING_IO)) {        \
>>
>> Would it make sense to make sev_key_active() and sev_enable_key generic
>> and just re-use those instead of adding CC_ATTR_GUEST_UNROLL_STRING_IO
>> and having multiple conditions here?
>>
>> You can set the key in the TDX init routine just like SEV does.
>
> Any reason for using sev_enable_key over CC attribute? IMO, CC attribute
> exist
> to generalize the common feature code. My impression is SEV is specific to
> AMD
> code.

When the SEV series was initially submitted, it originally did an
sev_active() check. For various reasons a static key and the
sev_key_active() call was requested.

My suggestion was to change the name to something that doesn't have
SEV/sev in it that can be used by both SEV and TDX. The sev_enable_key can
be moved to a common file (maybe cc_platform.c) and renamed. Then
arch/x86/include/asm/io.h can change the #ifdef from
CONFIG_AMD_MEM_ENCRYPT to CONFIG_ARCH_HAS_CC_PLATFORM.

Not sure if anyone else feels the same, though, so just my suggestion.

Thanks,
Tom

>
>>
>> Thanks,
>> Tom
>>
>

2021-10-15 07:51:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 05/10] x86/tdx: Handle port I/O

On Thu, Sep 23, 2021, Tom Lendacky wrote:
> On 9/23/21 12:24 PM, Kuppuswamy, Sathyanarayanan wrote:
> > Any reason for using sev_enable_key over CC attribute? IMO, CC attribute
> > exist to generalize the common feature code. My impression is SEV is
> > specific to AMD code.

Unless CC attributes have static_<whatever> support, that would add a CMP+Jcc to
every I/O instruction in the kernel.

> When the SEV series was initially submitted, it originally did an
> sev_active() check. For various reasons a static key and the
> sev_key_active() call was requested.
>
> My suggestion was to change the name to something that doesn't have SEV/sev
> in it that can be used by both SEV and TDX. The sev_enable_key can be moved
> to a common file (maybe cc_platform.c) and renamed. Then
> arch/x86/include/asm/io.h can change the #ifdef from CONFIG_AMD_MEM_ENCRYPT
> to CONFIG_ARCH_HAS_CC_PLATFORM.
>
> Not sure if anyone else feels the same, though, so just my suggestion.

+1 to a static key to gate high volume and/or performance critical things that
are common to SEV and TDX.