2020-11-16 18:40:03

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 00/67] KVM: X86: TDX support

From: Isaku Yamahata <[email protected]>

* What's TDX?
TDX stands for Trust Domain Extensions which isolates VMs from
the virtual-machine manager (VMM)/hypervisor and any other software on
the platform. [1]
For details, the specifications, [2], [3], [4], [5], [6], [7], are
available.


* The goal of this RFC patch
The purpose of this post is to get feedback early on high level design
issue of KVM enhancement for TDX. The detailed coding (variable naming
etc) is not cared of. This patch series is incomplete (not working).
Although multiple software components, not only KVM but also QEMU,
guest Linux and virtual bios, need to be updated, this includes only
KVM VMM part. For those who are curious to changes to other
component, there are public repositories at github. [8], [9]


* Terminology
Here are short explanations of key concepts.
For detailed explanation or other terminologies, please refer to the
specifications. [2], [3], [4], [5], [6], [7].
- Trusted Domain(TD)
Hardware-isolated virtual machines managed by TDX-module.
- Secure-Arbitration Mode(SEAM)
A new mode of the CPU. It consists of SEAM Root and SEAM Non-Root
which corresponds to VMX Root and VMX Non-Root.
- TDX-module
TDX-module runs in SEAM Root that manages TD guest state.
It provides ABI for VMM to manages TDs. It's expensive operation.
- SEAM loader(SEAMLDR)
Authenticated Code Module(ACM) to load the TDX-module.
- Secure EPT (S-EPT)
An extended Page table that is encrypted.
Shared bit(bit 51 or 47) in GPA selects shared vs private.
0: private to TD, 1: shared with host VMM.


* Major touch/discussion points
The followings are the major touch points where feedback is wanted.

** the file location of the boot code
BSP launches SEAM Loader on BSP to load TDX module. TDX module is on
all CPUs. The directory, arch/x86/kvm/boot/seam, is chosen to locate
the related files in near directory. When maintenance/enhancement in
future, it will be easy to identify that they're related to be synced
with.

- arch/x86/kvm/boot/seam: the current choice
Pros:
- The directory clearly indicates that the code is related to only
KVM.
- Keep files near to the related code (KVM TDX code).
Cons:
- It doesn't follow the existing convention.

Alternative:
The alternative is to follow the existing convention.
- arch/x86/kernel/cpu/
Pros:
- It follows the existing convention.
Cons:
- It's unclear that it's related to only KVM TDX.

- drivers/firmware/
As TDX module can be considered a firmware, yet other choice is
Pros:
- It follows the existing convention. it clarifies that TDX module
is a firmware.
Cons:
- It's hard to understand the firmware is only for KVM TDX.
- The files are far from the related code(KVM TDX).

** Coexistence of normal(VMX) VM and TD VM
It's required to allow both legacy(normal VMX) VMs and new TD VMs to
coexist. Otherwise the benefits of VM flexibility would be eliminated.
The main issue for it is that the logic of kvm_x86_ops callbacks for
TDX is different from VMX. On the other hand, the variable,
kvm_x86_ops, is global single variable. Not per-VM, not per-vcpu.

Several points to be considered.
. No or minimal overhead when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).
. Avoid overhead of indirect call via function pointers.
. Contain the changes under arch/x86/kvm/vmx directory and share logic
with VMX for maintenance.
Even though the ways to operation on VM (VMX instruction vs TDX
SEAM call) is different, the basic idea remains same. So, many
logic can be shared.
. Future maintenance
The huge change of kvm_x86_ops in (near) future isn't expected.
a centralized file is acceptable.

- Wrapping kvm x86_ops: The current choice
Introduce dedicated file for arch/x86/kvm/vmx/main.c (the name,
main.c, is just chosen to show main entry points for callbacks.) and
wrapper functions around all the callbacks with
"if (is-tdx) tdx-callback() else vmx-callback()".

Pros:
- No major change in common x86 KVM code. The change is (mostly)
contained under arch/x86/kvm/vmx/.
- When TDX is disabled(CONFIG_KVM_INTEL_TDX=n), the overhead is
optimized out.
- Micro optimization by avoiding function pointer.
Cons:
- Many boiler plates in arch/x86/kvm/vmx/main.c.

Alternative:
- Introduce another callback layer under arch/x86/kvm/vmx.
Pros:
- No major change in common x86 KVM code. The change is (mostly)
contained under arch/x86/kvm/vmx/.
- clear separation on callbacks.
Cons:
- overhead in VMX even when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).

- Allow per-VM kvm_x86_ops callbacks instead of global kvm_x86_ops
Pros:
- clear separation on callbacks.
Cons:
- Big change in common x86 code.
- overhead in common code even when TDX is
disabled(CONFIG_KVM_INTEL_TDX=n).

- Introduce new directory arch/x86/kvm/tdx
Pros:
- It clarifies that TDX is different from VMX.
Cons:
- Given the level of code sharing, it complicates code sharing.

** KVM MMU Changes
KVM MMU needs to be enhanced to handle Secure/Shared-EPT. The
high-level execution flow is mostly same to normal EPT case.
EPT violation/misconfiguration -> invoke TDP fault handler ->
resolve TDP fault -> resume execution. (or emulate MMIO)
The difference is, that S-EPT is operated(read/write) via TDX SEAM
call which is expensive instead of direct read/write EPT entry.
One bit of GPA (51 or 47 bit) is repurposed so that it means shared
with host(if set to 1) or private to TD(if cleared to 0).

- The current implementation
. Reuse the existing MMU code with minimal update. Because the
execution flow is mostly same. But additional operation, TDX call
for S-EPT, is needed. So add hooks for it to kvm_x86_ops.
. For performance, minimize TDX SEAM call to operate on S-EPT. When
getting corresponding S-EPT pages/entry from faulting GPA, don't
use TDX SEAM call to read S-EPT entry. Instead create shadow copy
in host memory.
Repurpose the existing kvm_mmu_page as shadow copy of S-EPT and
associate S-EPT to it.
. Treats share bit as attributes. mask/unmask the bit where
necessary to keep the existing traversing code works.
Introduce kvm.arch.gfn_shared_mask and use "if (gfn_share_mask)"
for special case.
= 0 : for non-TDX case
= 51 or 47 bit set for TDX case.

Pros:
- Large code reuse with minimal new hooks.
- Execution path is same.
Cons:
- Complicates the existing code.
- Repurpose kvm_mmu_page as shadow of Secure-EPT can be confusing.

Alternative:
- Replace direct read/write on EPT entry with TDX-SEAM call by
introducing callbacks on EPT entry.
Pros:
- Straightforward.
Cons:
- Too many touching point.
- Too slow due to TDX-SEAM call.
- Overhead even when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).

- Sprinkle "if (is-tdx)" for TDX special case
Pros:
- Straightforward.
Cons:
- The result is non-generic and ugly.
- Put TDX specific logic into common KVM MMU code.

** New KVM API, ioctl (sub)command, to manage TD VMs
Additional KVM API are needed to control TD VMs. The operations on TD
VMs are specific to TDX.

- Piggyback and repurpose KVM_MEMORY_ENCRYPT_OP
Although not all operation isn't memory encryption, repupose to get
TDX specific ioctls.
Pros:
- No major change in common x86 KVM code.
Cons:
- The operations aren't actually memory encryption, but operations
on TD VMs.

Alternative:
- Introduce new ioctl for guest protection like
KVM_GUEST_PROTECTION_OP and introduce subcommand for TDX.
Pros:
- Clean name.
Cons:
- One more new ioctl for guest protection.
- Confusion with KVM_MEMORY_ENCRYPT_OP with KVM_GUEST_PROTECTION_OP.

- Rename KVM_MEMORY_ENCRYPT_OP to KVM_GUEST_PROTECTION_OP and keep
KVM_MEMORY_ENCRYPT_OP as same value for user API for compatibility.
"#define KVM_MEMORY_ENCRYPT_OP KVM_GUEST_PROTECTION_OP" for uapi
compatibility.
Pros:
- No new ioctl with more suitable name.
Cons:
- May cause confusion to the existing user program.


* Items unsupported/out of the scope
Those items are unsupported at the moment or out of the scope.
- Large page(2MB, 1GB) support
- Page migration
- Debugger support(qemu gdb stub)
- Removing user space(qemu) mapping of guest private memory
Because this topic itself is big and will take time, the effort is
taking place independently. [12]
- Attestation
The end-to-end integration is required.
- Live migration
TDX 1.0 doesn't support this.
- Nested virtualization
TDX 1.0 doesn't support this.


* Related repositories
TDX enabling software are composed of several components. Not only
KVM/x86 enablement, but also other components. There are several
publicly available repositories in github. Those are not complete, not
working, but only for reference for those who are curious.
- TDX host/guest [8]
- TDX Virtual Firmware [9]
- qemu change isn't published (yet).


* Related presentations
At KVM forum 2020, several presentation related to TDX were given. [10] [11]
They are helpful to understand TDX and KVM/qemu related changes.


* Patch organization
The main changes are only 2 patches(62 and 64).
The preceding patches(01-61) are refactoring the code and introducing
additional hooks. The patch 64 plugs hooks into TDX implementation.

- patch 01-16: They are preparations. introduce architecture
constants, code refactoring, export symbols for
following patches.
- patch 17-33: start to introduce the new type of VM and allow the
coexistence of multiple type of VM. allow/disallow KVM
ioctl where appropriate. Especially make per-system
ioctl to per-VM ioctl.
- patch 34-43: refactoring KVM MMU and adding new hooks for Secure
EPT.
- patch 44-48: refactoring KVM/VMX code + wrapper for kvm_x86_ops for
VMX and TDX.
- patch 52-61: introducing TDX architectural constants/structures and
helper functions.
- patch 62-63: load/init TDX module during boot.
- patch 64-65: main patch to add "basic" support for building/running
TDX.
- patch 66 : This patch is not for review, but to make build success.


[1] TDX specification
https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
[2] Intel Trust Domain Extensions (Intel TDX)
https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-whitepaper-final9-17.pdf
[3] Intel CPU Architectural Extensions Specification
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-cpu-architectural-specification.pdf
[4] Intel TDX Module 1.0 EAS
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf
[5] Intel TDX Loader Interface Specification
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-seamldr-interface-specification.pdf
[6] Intel TDX Guest-Hypervisor Communication Interface
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf
[7] Intel TDX Virtual Firmware Design Guide
https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.
[8] intel public github
kvm TDX branch: https://github.com/intel/tdx/tree/kvm
TDX guest branch: https://github.com/intel/tdx/tree/guest
[9] tdvf
https://github.com/tianocore/edk2-staging/tree/TDVF
[10] KVM forum 2020: Intel Virtualization Technology Extensions to
Enable Hardware Isolated VMs
https://osseu2020.sched.com/event/eDzm/intel-virtualization-technology-extensions-to-enable-hardware-isolated-vms-sean-christopherson-intel
[11] Linux Security Summit EU 2020:
Architectural Extensions for Hardware Virtual Machine Isolation
to Advance Confidential Computing in Public Clouds - Ravi Sahita
& Jun Nakajima, Intel Corporation
https://osseu2020.sched.com/event/eDOx/architectural-extensions-for-hardware-virtual-machine-isolation-to-advance-confidential-computing-in-public-clouds-ravi-sahita-jun-nakajima-intel-corporation
[12] [RFCv2,00/16] KVM protected memory extension
https://lkml.org/lkml/2020/10/20/66


Isaku Yamahata (4):
KVM: x86: Make KVM_CAP_X86_SMM a per-VM capability
KVM: Add per-VM flag to mark read-only memory as unsupported
fixup! KVM: TDX: Add "basic" support for building and running Trust
Domains
KVM: X86: not for review: add dummy file for TDX-SEAM module

Kai Huang (3):
KVM: x86: Add per-VM flag to disable in-kernel I/O APIC and level
routes
KVM: TDX: Add SEAMRR related MSRs macro definition
cpu/hotplug: Document that TDX also depends on booting CPUs once

Rick Edgecombe (1):
KVM: x86: Add infrastructure for stolen GPA bits

Sean Christopherson (58):
x86/cpufeatures: Add synthetic feature flag for TDX (in host)
x86/msr-index: Define MSR_IA32_MKTME_KEYID_PART used by TDX
KVM: Export kvm_io_bus_read for use by TDX for PV MMIO
KVM: Enable hardware before doing arch VM initialization
KVM: x86: Split core of hypercall emulation to helper function
KVM: x86: Export kvm_mmio tracepoint for use by TDX for PV MMIO
KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot by default
KVM: Add infrastructure and macro to mark VM as bugged
KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
bugged
KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
VM
KVM: x86/mmu: Mark VM as bugged if page fault returns RET_PF_INVALID
KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd()
KVM: Add max_vcpus field in common 'struct kvm'
KVM: x86: Add vm_type to differentiate legacy VMs from protected VMs
KVM: x86: Hoist kvm_dirty_regs check out of sync_regs()
KVM: x86: Introduce "protected guest" concept and block disallowed
ioctls
KVM: x86: Add per-VM flag to disable direct IRQ injection
KVM: x86: Add flag to disallow #MC injection / KVM_X86_SETUP_MCE
KVM: x86: Add flag to mark TSC as immutable (for TDX)
KVM: Add per-VM flag to disable dirty logging of memslots for TDs
KVM: x86: Allow host-initiated WRMSR to set X2APIC regardless of CPUID
KVM: x86: Add kvm_x86_ops .cache_gprs() and .flush_gprs()
KVM: x86: Add support for vCPU and device-scoped KVM_MEMORY_ENCRYPT_OP
KVM: x86: Introduce vm_teardown() hook in kvm_arch_vm_destroy()
KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched
behavior
KVM: x86: Check for pending APICv interrupt in kvm_vcpu_has_events()
KVM: x86: Add option to force LAPIC expiration wait
KVM: x86: Add guest_supported_xss placholder
KVM: Export kvm_is_reserved_pfn() for use by TDX
KVM: x86/mmu: Explicitly check for MMIO spte in fast page fault
KVM: x86/mmu: Track shadow MMIO value on a per-VM basis
KVM: x86/mmu: Ignore bits 63 and 62 when checking for "present" SPTEs
KVM: x86/mmu: Allow non-zero init value for shadow PTE
KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce
indentation
KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits()
KVM: x86/mmu: Frame in support for private/inaccessible shadow pages
KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX
KVM: VMX: Modify NMI and INTR handlers to take intr_info as param
KVM: VMX: Move NMI/exception handler to common helper
KVM: VMX: Split out guts of EPT violation to common/exposed function
KVM: VMX: Define EPT Violation architectural bits
KVM: VMX: Define VMCS encodings for shared EPT pointer
KVM: VMX: Add 'main.c' to wrap VMX and TDX
KVM: VMX: Move setting of EPT MMU masks to common VT-x code
KVM: VMX: Move register caching logic to common code
KVM: TDX: Add TDX "architectural" error codes
KVM: TDX: Add architectural definitions for structures and values
KVM: TDX: Define TDCALL exit reason
KVM: TDX: Add macro framework to wrap TDX SEAMCALLs
KVM: TDX: Stub in tdx.h with structs, accessors, and VMCS helpers
KVM: VMX: Add macro framework to read/write VMCS for VMs and TDs
KVM: VMX: Move AR_BYTES encoder/decoder helpers to common.h
KVM: VMX: MOVE GDT and IDT accessors to common code
KVM: VMX: Move .get_interrupt_shadow() implementation to common VMX
code
KVM: TDX: Load and init TDX-SEAM module during boot
KVM: TDX: Add "basic" support for building and running Trust Domains
KVM: x86: Mark the VM (TD) as bugged if non-coherent DMA is detected

Zhang Chen (1):
x86/cpu: Move get_builtin_firmware() common code (from microcode only)

arch/arm64/include/asm/kvm_host.h | 3 -
arch/arm64/kvm/arm.c | 7 +-
arch/arm64/kvm/vgic/vgic-init.c | 6 +-
arch/x86/Kbuild | 1 +
arch/x86/include/asm/cpu.h | 5 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_boot.h | 43 +
arch/x86/include/asm/kvm_host.h | 52 +-
arch/x86/include/asm/microcode.h | 3 -
arch/x86/include/asm/msr-index.h | 10 +
arch/x86/include/asm/vmx.h | 6 +
arch/x86/include/asm/vmxfeatures.h | 2 +-
arch/x86/include/uapi/asm/kvm.h | 55 +
arch/x86/include/uapi/asm/vmx.h | 4 +-
arch/x86/kernel/cpu/common.c | 20 +
arch/x86/kernel/cpu/intel.c | 4 +
arch/x86/kernel/cpu/microcode/core.c | 18 -
arch/x86/kernel/cpu/microcode/intel.c | 1 +
arch/x86/kernel/setup.c | 3 +
arch/x86/kvm/Kconfig | 8 +
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/boot/Makefile | 5 +
arch/x86/kvm/boot/seam/seamldr.S | 188 +++
arch/x86/kvm/boot/seam/seamloader.c | 162 +++
arch/x86/kvm/boot/seam/tdx.c | 1131 +++++++++++++++
arch/x86/kvm/ioapic.c | 4 +
arch/x86/kvm/irq_comm.c | 6 +-
arch/x86/kvm/lapic.c | 9 +-
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/mmu.h | 33 +-
arch/x86/kvm/mmu/mmu.c | 519 +++++--
arch/x86/kvm/mmu/mmu_internal.h | 5 +
arch/x86/kvm/mmu/paging_tmpl.h | 27 +-
arch/x86/kvm/mmu/spte.c | 36 +-
arch/x86/kvm/mmu/spte.h | 30 +-
arch/x86/kvm/svm/svm.c | 22 +-
arch/x86/kvm/trace.h | 57 +
arch/x86/kvm/vmx/common.h | 180 +++
arch/x86/kvm/vmx/main.c | 1130 +++++++++++++++
arch/x86/kvm/vmx/posted_intr.c | 6 +
arch/x86/kvm/vmx/tdx.c | 1847 +++++++++++++++++++++++++
arch/x86/kvm/vmx/tdx.h | 245 ++++
arch/x86/kvm/vmx/tdx_arch.h | 230 +++
arch/x86/kvm/vmx/tdx_errno.h | 91 ++
arch/x86/kvm/vmx/tdx_ops.h | 544 ++++++++
arch/x86/kvm/vmx/tdx_stubs.c | 45 +
arch/x86/kvm/vmx/vmenter.S | 140 ++
arch/x86/kvm/vmx/vmx.c | 537 ++-----
arch/x86/kvm/vmx/vmx.h | 2 +
arch/x86/kvm/x86.c | 296 +++-
include/linux/kvm_host.h | 51 +-
include/uapi/linux/kvm.h | 2 +
kernel/cpu.c | 4 +
lib/firmware/intel-seam/libtdx.so | 0
tools/arch/x86/include/uapi/asm/kvm.h | 55 +
tools/include/uapi/linux/kvm.h | 2 +
virt/kvm/kvm_main.c | 45 +-
57 files changed, 7230 insertions(+), 712 deletions(-)
create mode 100644 arch/x86/include/asm/kvm_boot.h
create mode 100644 arch/x86/kvm/boot/Makefile
create mode 100644 arch/x86/kvm/boot/seam/seamldr.S
create mode 100644 arch/x86/kvm/boot/seam/seamloader.c
create mode 100644 arch/x86/kvm/boot/seam/tdx.c
create mode 100644 arch/x86/kvm/vmx/common.h
create mode 100644 arch/x86/kvm/vmx/main.c
create mode 100644 arch/x86/kvm/vmx/tdx.c
create mode 100644 arch/x86/kvm/vmx/tdx.h
create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
create mode 100644 arch/x86/kvm/vmx/tdx_errno.h
create mode 100644 arch/x86/kvm/vmx/tdx_ops.h
create mode 100644 arch/x86/kvm/vmx/tdx_stubs.c
create mode 100644 lib/firmware/intel-seam/libtdx.so

--
2.17.1


2020-11-16 21:14:05

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 66/67] fixup! KVM: TDX: Add "basic" support for building and running Trust Domains

From: Isaku Yamahata <[email protected]>

---
arch/x86/kvm/vmx/tdx.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index adcb866861b7..d2c1766416f2 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -331,9 +331,6 @@ static int tdx_vm_init(struct kvm *kvm)
kvm->arch.mce_injection_disallowed = true;
kvm_mmu_set_mmio_spte_mask(kvm, 0, 0);

- /* TODO: Enable 2mb and 1gb large page support. */
- kvm->arch.tdp_max_page_level = PG_LEVEL_4K;
-
kvm_apicv_init(kvm, true);

/* vCPUs can't be created until after KVM_TDX_INIT_VM. */
--
2.17.1

2020-11-16 21:14:05

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 53/67] KVM: TDX: Add architectural definitions for structures and values

From: Sean Christopherson <[email protected]>

Co-developed-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Co-developed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/tdx_arch.h | 230 ++++++++++++++++++++++++++++++++++++
1 file changed, 230 insertions(+)
create mode 100644 arch/x86/kvm/vmx/tdx_arch.h

diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
new file mode 100644
index 000000000000..d13db55e5086
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -0,0 +1,230 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_TDX_ARCH_H
+#define __KVM_X86_TDX_ARCH_H
+
+#include <linux/types.h>
+
+/*
+ * SEAMCALL API function leaf
+ */
+#define SEAMCALL_TDENTER 0
+#define SEAMCALL_TDADDCX 1
+#define SEAMCALL_TDADDPAGE 2
+#define SEAMCALL_TDADDSEPT 3
+#define SEAMCALL_TDADDVPX 4
+#define SEAMCALL_TDASSIGNHKID 5
+#define SEAMCALL_TDAUGPAGE 6
+#define SEAMCALL_TDBLOCK 7
+#define SEAMCALL_TDCONFIGKEY 8
+#define SEAMCALL_TDCREATE 9
+#define SEAMCALL_TDCREATEVP 10
+#define SEAMCALL_TDDBGRD 11
+#define SEAMCALL_TDDBGRDMEM 12
+#define SEAMCALL_TDDBGWR 13
+#define SEAMCALL_TDDBGWRMEM 14
+#define SEAMCALL_TDDEMOTEPAGE 15
+#define SEAMCALL_TDEXTENDMR 16
+#define SEAMCALL_TDFINALIZEMR 17
+#define SEAMCALL_TDFLUSHVP 18
+#define SEAMCALL_TDFLUSHVPDONE 19
+#define SEAMCALL_TDFREEHKIDS 20
+#define SEAMCALL_TDINIT 21
+#define SEAMCALL_TDINITVP 22
+#define SEAMCALL_TDPROMOTEPAGE 23
+#define SEAMCALL_TDRDPAGEMD 24
+#define SEAMCALL_TDRDSEPT 25
+#define SEAMCALL_TDRDVPS 26
+#define SEAMCALL_TDRECLAIMHKIDS 27
+#define SEAMCALL_TDRECLAIMPAGE 28
+#define SEAMCALL_TDREMOVEPAGE 29
+#define SEAMCALL_TDREMOVESEPT 30
+#define SEAMCALL_TDSYSCONFIGKEY 31
+#define SEAMCALL_TDSYSINFO 32
+#define SEAMCALL_TDSYSINIT 33
+
+#define SEAMCALL_TDSYSINITLP 35
+#define SEAMCALL_TDSYSINITTDMR 36
+#define SEAMCALL_TDTEARDOWN 37
+#define SEAMCALL_TDTRACK 38
+#define SEAMCALL_TDUNBLOCK 39
+#define SEAMCALL_TDWBCACHE 40
+#define SEAMCALL_TDWBINVDPAGE 41
+#define SEAMCALL_TDWRSEPT 42
+#define SEAMCALL_TDWRVPS 43
+#define SEAMCALL_TDSYSSHUTDOWNLP 44
+#define SEAMCALL_TDSYSCONFIG 45
+
+#define TDVMCALL_MAP_GPA 0x10001
+#define TDVMCALL_REPORT_FATAL_ERROR 0x10003
+
+/* TDX control structure (TDR/TDCS/TDVPS) field access codes */
+#define TDX_CLASS_SHIFT 56
+#define TDX_FIELD_MASK GENMASK_ULL(31, 0)
+
+#define BUILD_TDX_FIELD(class, field) \
+ (((u64)(class) << TDX_CLASS_SHIFT) | ((u64)(field) & TDX_FIELD_MASK))
+
+/* @field is the VMCS field encoding */
+#define TDVPS_VMCS(field) BUILD_TDX_FIELD(0, (field))
+
+/*
+ * @offset is the offset (in bytes) from the beginning of the architectural
+ * virtual APIC page.
+ */
+#define TDVPS_APIC(offset) BUILD_TDX_FIELD(1, (offset))
+
+/* @gpr is the index of a general purpose register, e.g. eax=0 */
+#define TDVPS_GPR(gpr) BUILD_TDX_FIELD(16, (gpr))
+
+#define TDVPS_DR(dr) BUILD_TDX_FIELD(17, (0 + (dr)))
+
+enum tdx_guest_other_state {
+ TD_VCPU_XCR0 = 32,
+ TD_VCPU_IWK_ENCKEY0 = 64,
+ TD_VCPU_IWK_ENCKEY1,
+ TD_VCPU_IWK_ENCKEY2,
+ TD_VCPU_IWK_ENCKEY3,
+ TD_VCPU_IWK_INTKEY0 = 68,
+ TD_VCPU_IWK_INTKEY1,
+ TD_VCPU_IWK_FLAGS = 70,
+};
+
+/* @field is any of enum tdx_guest_other_state */
+#define TDVPS_STATE(field) BUILD_TDX_FIELD(17, (field))
+
+/* @msr is the MSR index */
+#define TDVPS_MSR(msr) BUILD_TDX_FIELD(19, (msr))
+
+/* Management class fields */
+enum tdx_guest_management {
+ TD_VCPU_PEND_NMI = 11,
+};
+
+/* @field is any of enum tdx_guest_management */
+#define TDVPS_MANAGEMENT(field) BUILD_TDX_FIELD(32, (field))
+
+#define TDX1_NR_TDCX_PAGES 4
+#define TDX1_NR_TDVPX_PAGES 5
+
+#define TDX1_MAX_NR_CPUID_CONFIGS 6
+#define TDX1_MAX_NR_CMRS 32
+#define TDX1_MAX_NR_TDMRS 64
+#define TDX1_EXTENDMR_CHUNKSIZE 256
+
+struct tdx_cpuid_config {
+ u32 leaf;
+ u32 sub_leaf;
+ u32 eax;
+ u32 ebx;
+ u32 ecx;
+ u32 edx;
+} __packed;
+
+struct tdx_cpuid_value {
+ u32 eax;
+ u32 ebx;
+ u32 ecx;
+ u32 edx;
+} __packed;
+
+#define TDX1_TD_ATTRIBUTE_DEBUG BIT_ULL(0)
+#define TDX1_TD_ATTRIBUTE_SYSPROF BIT_ULL(1)
+#define TDX1_TD_ATTRIBUTE_PKS BIT_ULL(30)
+#define TDX1_TD_ATTRIBUTE_KL BIT_ULL(31)
+#define TDX1_TD_ATTRIBUTE_PERFMON BIT_ULL(63)
+
+/*
+ * TD_PARAMS is provided as an input to TDINIT, the size of which is 1024B.
+ */
+struct td_params {
+ u64 attributes;
+ u64 xfam;
+ u32 max_vcpus;
+ u32 reserved0;
+
+ u64 eptp_controls;
+ u64 exec_controls;
+ u16 tsc_frequency;
+ u8 reserved1[38];
+
+ u64 mrconfigid[6];
+ u64 mrowner[6];
+ u64 mrownerconfig[6];
+ u64 reserved2[4];
+
+ union {
+ struct tdx_cpuid_value cpuid_values[0];
+ u8 reserved3[768];
+ };
+} __packed __aligned(1024);
+
+/* Guest uses MAX_PA for GPAW when set. */
+#define TDX1_EXEC_CONTROL_MAX_GPAW BIT_ULL(0)
+
+/*
+ * TDX1 requires the frequency to be defined in units of 25MHz, which is the
+ * frequency of the core crystal clock on TDX-capable platforms, i.e. TDX-SEAM
+ * can only program frequencies that are multiples of 25MHz. The frequency
+ * must be between 1ghz and 10ghz (inclusive).
+ */
+#define TDX1_TSC_KHZ_TO_25MHZ(tsc_in_khz) ((tsc_in_khz) / (25 * 1000))
+#define TDX1_TSC_25MHZ_TO_KHZ(tsc_in_25mhz) ((tsc_in_25mhz) * (25 * 1000))
+#define TDX1_MIN_TSC_FREQUENCY_KHZ 1 * 1000 * 1000
+#define TDX1_MAX_TSC_FREQUENCY_KHZ 10 * 1000 * 1000
+
+struct tdmr_reserved_area {
+ u64 offset;
+ u64 size;
+} __packed;
+
+struct tdmr_info {
+ u64 base;
+ u64 size;
+ u64 pamt_1g_base;
+ u64 pamt_1g_size;
+ u64 pamt_2m_base;
+ u64 pamt_2m_size;
+ u64 pamt_4k_base;
+ u64 pamt_4k_size;
+ struct tdmr_reserved_area reserved_areas[16];
+} __packed __aligned(4096);
+
+struct cmr_info {
+ u64 base;
+ u64 size;
+} __packed;
+
+struct tdsysinfo_struct {
+ /* TDX-SEAM Module Info */
+ u32 attributes;
+ u32 vendor_id;
+ u32 build_date;
+ u16 build_num;
+ u16 minor_version;
+ u16 major_version;
+ u8 reserved0[14];
+ /* Memory Info */
+ u16 max_tdmrs;
+ u16 max_reserved_per_tdmr;
+ u16 pamt_entry_size;
+ u8 reserved1[10];
+ /* Control Struct Info */
+ u16 tdcs_base_size;
+ u8 reserved2[2];
+ u16 tdvps_base_size;
+ u8 tdvps_xfam_dependent_size;
+ u8 reserved3[9];
+ /* TD Capabilities */
+ u64 attributes_fixed0;
+ u64 attributes_fixed1;
+ u64 xfam_fixed0;
+ u64 xfam_fixed1;
+ u8 reserved4[32];
+ u32 num_cpuid_config;
+ union {
+ struct tdx_cpuid_config cpuid_configs[0];
+ u8 reserved5[892];
+ };
+} __packed __aligned(1024);
+
+#endif /* __KVM_X86_TDX_ARCH_H */
--
2.17.1

2020-11-16 21:14:12

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 29/67] KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior

From: Sean Christopherson <[email protected]>

Add a flag, KVM_DEBUGREG_AUTO_SWITCHED, to skip saving/restoring DRs
irrespective of any other flags. TDX-SEAM unconditionally saves and
restores host DRs, ergo there is nothing to do.

Opportunistically convert the KVM_DEBUGREG_* definitions to use BIT().

Reported-by: Xiaoyao Li <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 7 ++++---
arch/x86/kvm/x86.c | 6 ++++--
2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a6c89666ec49..815469875445 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -464,9 +464,10 @@ struct kvm_pmu {
struct kvm_pmu_ops;

enum {
- KVM_DEBUGREG_BP_ENABLED = 1,
- KVM_DEBUGREG_WONT_EXIT = 2,
- KVM_DEBUGREG_RELOAD = 4,
+ KVM_DEBUGREG_BP_ENABLED = BIT(0),
+ KVM_DEBUGREG_WONT_EXIT = BIT(1),
+ KVM_DEBUGREG_RELOAD = BIT(2),
+ KVM_DEBUGREG_AUTO_SWITCHED = BIT(3),
};

struct kvm_mtrr_range {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bd24ba7fdd..098888edc3ad 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9009,7 +9009,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return();

- if (unlikely(vcpu->arch.switch_db_regs)) {
+ if (unlikely(vcpu->arch.switch_db_regs & ~KVM_DEBUGREG_AUTO_SWITCHED)) {
set_debugreg(0, 7);
set_debugreg(vcpu->arch.eff_db[0], 0);
set_debugreg(vcpu->arch.eff_db[1], 1);
@@ -9029,6 +9029,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
*/
if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) {
WARN_ON(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP);
+ WARN_ON(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCHED);
kvm_x86_ops.sync_dirty_debug_regs(vcpu);
kvm_update_dr0123(vcpu);
kvm_update_dr7(vcpu);
@@ -9042,7 +9043,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
* care about the messed up debug address registers. But if
* we have some of them active, restore the old state.
*/
- if (hw_breakpoint_active())
+ if (hw_breakpoint_active() &&
+ !(vcpu->arch.switch_db_regs & KVM_DEBUGREG_AUTO_SWITCHED))
hw_breakpoint_restore();

vcpu->arch.last_vmentry_cpu = vcpu->cpu;
--
2.17.1

2020-11-16 21:14:22

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 15/67] KVM: x86: Add vm_type to differentiate legacy VMs from protected VMs

From: Sean Christopherson <[email protected]>

Add a capability to effectively allow userspace to query what VM types
are supported by KVM.

Co-developed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/include/uapi/asm/kvm.h | 4 ++++
arch/x86/kvm/svm/svm.c | 6 ++++++
arch/x86/kvm/vmx/vmx.c | 6 ++++++
arch/x86/kvm/x86.c | 9 ++++++++-
include/uapi/linux/kvm.h | 2 ++
tools/arch/x86/include/uapi/asm/kvm.h | 4 ++++
tools/include/uapi/linux/kvm.h | 2 ++
8 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c2639744ea09..1ff33efd6394 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -897,6 +897,7 @@ enum kvm_irqchip_mode {
#define APICV_INHIBIT_REASON_X2APIC 5

struct kvm_arch {
+ unsigned long vm_type;
unsigned long n_used_mmu_pages;
unsigned long n_requested_mmu_pages;
unsigned long n_max_mmu_pages;
@@ -1090,6 +1091,7 @@ struct kvm_x86_ops {
bool (*has_emulated_msr)(u32 index);
void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);

+ bool (*is_vm_type_supported)(unsigned long vm_type);
unsigned int vm_size;
int (*vm_init)(struct kvm *kvm);
void (*vm_destroy)(struct kvm *kvm);
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 89e5f3d1bba8..29cdf262e516 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -486,4 +486,8 @@ struct kvm_pmu_event_filter {
#define KVM_PMU_EVENT_ALLOW 0
#define KVM_PMU_EVENT_DENY 1

+#define KVM_X86_LEGACY_VM 0
+#define KVM_X86_SEV_ES_VM 1
+#define KVM_X86_TDX_VM 2
+
#endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e001e3c9e4bc..11ab330a9b55 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4161,6 +4161,11 @@ static void svm_vm_destroy(struct kvm *kvm)
sev_vm_destroy(kvm);
}

+static bool svm_is_vm_type_supported(unsigned long type)
+{
+ return type == KVM_X86_LEGACY_VM;
+}
+
static int svm_vm_init(struct kvm *kvm)
{
if (!pause_filter_count || !pause_filter_thresh)
@@ -4187,6 +4192,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_free = svm_free_vcpu,
.vcpu_reset = svm_vcpu_reset,

+ .is_vm_type_supported = svm_is_vm_type_supported,
.vm_size = sizeof(struct kvm_svm),
.vm_init = svm_vm_init,
.vm_destroy = svm_vm_destroy,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0703d82e7bad..b3ecdb96789a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6966,6 +6966,11 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
return err;
}

+static bool vmx_is_vm_type_supported(unsigned long type)
+{
+ return type == KVM_X86_LEGACY_VM;
+}
+
#define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
#define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"

@@ -7603,6 +7608,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.cpu_has_accelerated_tpr = report_flexpriority,
.has_emulated_msr = vmx_has_emulated_msr,

+ .is_vm_type_supported = vmx_is_vm_type_supported,
.vm_size = sizeof(struct kvm_vmx),
.vm_init = vmx_vm_init,

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19b53aedc6c8..346394d83672 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3771,6 +3771,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_STEAL_TIME:
r = sched_info_on();
break;
+ case KVM_CAP_VM_TYPES:
+ r = BIT(KVM_X86_LEGACY_VM);
+ if (kvm_x86_ops.is_vm_type_supported(KVM_X86_TDX_VM))
+ r |= BIT(KVM_X86_TDX_VM);
+ break;
default:
break;
}
@@ -10249,9 +10254,11 @@ void kvm_arch_free_vm(struct kvm *kvm)

int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
- if (type)
+ if (!kvm_x86_ops.is_vm_type_supported(type))
return -EINVAL;

+ kvm->arch.vm_type = type;
+
INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ca41220b40b8..c603e9a004f1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1054,6 +1054,8 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_X86_MSR_FILTER 189
#define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190

+#define KVM_CAP_VM_TYPES 1000
+
#ifdef KVM_CAP_IRQ_ROUTING

struct kvm_irq_routing_irqchip {
diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include/uapi/asm/kvm.h
index 0780f97c1850..44313ac967dd 100644
--- a/tools/arch/x86/include/uapi/asm/kvm.h
+++ b/tools/arch/x86/include/uapi/asm/kvm.h
@@ -466,4 +466,8 @@ struct kvm_pmu_event_filter {
#define KVM_PMU_EVENT_ALLOW 0
#define KVM_PMU_EVENT_DENY 1

+#define KVM_X86_LEGACY_VM 0
+#define KVM_X86_SEV_ES_VM 1
+#define KVM_X86_TDX_VM 2
+
#endif /* _ASM_X86_KVM_H */
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index 7d8eced6f459..b043b01f0d87 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1038,6 +1038,8 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_S390_DIAG318 186
#define KVM_CAP_STEAL_TIME 187

+#define KVM_CAP_VM_TYPES 1000
+
#ifdef KVM_CAP_IRQ_ROUTING

struct kvm_irq_routing_irqchip {
--
2.17.1

2020-11-17 01:58:42

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 02/67] x86/msr-index: Define MSR_IA32_MKTME_KEYID_PART used by TDX

From: Sean Christopherson <[email protected]>

Define MSR_IA32_MKTME_KEYID_PART, used by TDX to enumerate the TDX KeyID
space, which is carved out from the regular MKTME KeyIDs.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/msr-index.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 972a34d93505..aad12236b33c 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -628,6 +628,8 @@
#define MSR_IA32_UCODE_WRITE 0x00000079
#define MSR_IA32_UCODE_REV 0x0000008b

+#define MSR_IA32_MKTME_KEYID_PART 0x00000087
+
#define MSR_IA32_SMM_MONITOR_CTL 0x0000009b
#define MSR_IA32_SMBASE 0x0000009e

--
2.17.1

2020-11-17 01:58:57

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 49/67] KVM: VMX: Add 'main.c' to wrap VMX and TDX

From: Sean Christopherson <[email protected]>

Wrap the VMX kvm_x86_ops hooks in preparation of adding TDX, which can
coexist with VMX, i.e. KVM can run both VMs and TDs. Use 'vt' for the
naming scheme as a nod to VT-x and as a concatenation of VmxTdx.

Co-developed-by: Xiaoyao Li <[email protected]>
Signed-off-by: Xiaoyao Li <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/vmx/main.c | 720 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 304 ++++-------------
3 files changed, 784 insertions(+), 242 deletions(-)
create mode 100644 arch/x86/kvm/vmx/main.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index b804444e16d4..4192b252eba0 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -18,7 +18,7 @@ kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \
hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
mmu/spte.o mmu/tdp_iter.o mmu/tdp_mmu.o

-kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
+kvm-intel-y += vmx/main.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
new file mode 100644
index 000000000000..85bc238c0852
--- /dev/null
+++ b/arch/x86/kvm/vmx/main.c
@@ -0,0 +1,720 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/moduleparam.h>
+
+#include "vmx.c"
+
+static struct kvm_x86_ops vt_x86_ops __initdata;
+
+static int __init vt_cpu_has_kvm_support(void)
+{
+ return cpu_has_vmx();
+}
+
+static int __init vt_disabled_by_bios(void)
+{
+ return vmx_disabled_by_bios();
+}
+
+static int __init vt_check_processor_compatibility(void)
+{
+ int ret;
+
+ ret = vmx_check_processor_compat();
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static __init int vt_hardware_setup(void)
+{
+ int ret;
+
+ ret = hardware_setup(&vt_x86_ops);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int vt_hardware_enable(void)
+{
+ return hardware_enable();
+}
+
+static void vt_hardware_disable(void)
+{
+ hardware_disable();
+}
+
+static bool vt_cpu_has_accelerated_tpr(void)
+{
+ return report_flexpriority();
+}
+
+static bool vt_is_vm_type_supported(unsigned long type)
+{
+ return type == KVM_X86_LEGACY_VM;
+}
+
+static int vt_vm_init(struct kvm *kvm)
+{
+ return vmx_vm_init(kvm);
+}
+
+static void vt_vm_teardown(struct kvm *kvm)
+{
+
+}
+
+static void vt_vm_destroy(struct kvm *kvm)
+{
+
+}
+
+static int vt_vcpu_create(struct kvm_vcpu *vcpu)
+{
+ return vmx_create_vcpu(vcpu);
+}
+
+static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu)
+{
+ return vmx_vcpu_run(vcpu);
+}
+
+static void vt_vcpu_free(struct kvm_vcpu *vcpu)
+{
+ return vmx_free_vcpu(vcpu);
+}
+
+static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+ return vmx_vcpu_reset(vcpu, init_event);
+}
+
+static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+ return vmx_vcpu_load(vcpu, cpu);
+}
+
+static void vt_vcpu_put(struct kvm_vcpu *vcpu)
+{
+ return vmx_vcpu_put(vcpu);
+}
+
+static int vt_handle_exit(struct kvm_vcpu *vcpu,
+ enum exit_fastpath_completion fastpath)
+{
+ return vmx_handle_exit(vcpu, fastpath);
+}
+
+static void vt_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+{
+ vmx_handle_exit_irqoff(vcpu);
+}
+
+static int vt_skip_emulated_instruction(struct kvm_vcpu *vcpu)
+{
+ return vmx_skip_emulated_instruction(vcpu);
+}
+
+static void vt_update_emulated_instruction(struct kvm_vcpu *vcpu)
+{
+ vmx_update_emulated_instruction(vcpu);
+}
+
+static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ return vmx_set_msr(vcpu, msr_info);
+}
+
+static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+ return vmx_smi_allowed(vcpu, for_injection);
+}
+
+static int vt_pre_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
+{
+ return vmx_pre_enter_smm(vcpu, smstate);
+}
+
+static int vt_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
+{
+ return vmx_pre_leave_smm(vcpu, smstate);
+}
+
+static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
+{
+ /* RSM will cause a vmexit anyway. */
+}
+
+static bool vt_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn,
+ int insn_len)
+{
+ return vmx_can_emulate_instruction(vcpu, insn, insn_len);
+}
+
+static int vt_check_intercept(struct kvm_vcpu *vcpu,
+ struct x86_instruction_info *info,
+ enum x86_intercept_stage stage,
+ struct x86_exception *exception)
+{
+ return vmx_check_intercept(vcpu, info, stage, exception);
+}
+
+static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+{
+ return vmx_apic_init_signal_blocked(vcpu);
+}
+
+static void vt_migrate_timers(struct kvm_vcpu *vcpu)
+{
+ vmx_migrate_timers(vcpu);
+}
+
+static void vt_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+ return vmx_set_virtual_apic_mode(vcpu);
+}
+
+static void vt_apicv_post_state_restore(struct kvm_vcpu *vcpu)
+{
+ return vmx_apicv_post_state_restore(vcpu);
+}
+
+static bool vt_check_apicv_inhibit_reasons(ulong bit)
+{
+ ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
+ BIT(APICV_INHIBIT_REASON_HYPERV);
+
+ return supported & BIT(bit);
+}
+
+static void vt_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
+{
+ return vmx_hwapic_irr_update(vcpu, max_irr);
+}
+
+static void vt_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
+{
+ return vmx_hwapic_isr_update(vcpu, max_isr);
+}
+
+static bool vt_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
+{
+ return vmx_guest_apic_has_interrupt(vcpu);
+}
+
+static int vt_sync_pir_to_irr(struct kvm_vcpu *vcpu)
+{
+ return vmx_sync_pir_to_irr(vcpu);
+}
+
+static int vt_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
+{
+ return vmx_deliver_posted_interrupt(vcpu, vector);
+}
+
+static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+ return vmx_vcpu_after_set_cpuid(vcpu);
+}
+
+static bool vt_has_emulated_msr(struct kvm *kvm, u32 index)
+{
+ return vmx_has_emulated_msr(index);
+}
+
+static void vt_msr_filter_changed(struct kvm_vcpu *vcpu)
+{
+ vmx_msr_filter_changed(vcpu);
+}
+
+static void vt_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
+{
+ vmx_prepare_switch_to_guest(vcpu);
+}
+
+static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu)
+{
+ update_exception_bitmap(vcpu);
+}
+
+static int vt_get_msr_feature(struct kvm_msr_entry *msr)
+{
+ return vmx_get_msr_feature(msr);
+}
+
+static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ return vmx_get_msr(vcpu, msr_info);
+}
+
+static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg)
+{
+ return vmx_get_segment_base(vcpu, seg);
+}
+
+static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+ int seg)
+{
+ vmx_get_segment(vcpu, var, seg);
+}
+
+static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
+ int seg)
+{
+ vmx_set_segment(vcpu, var, seg);
+}
+
+static int vt_get_cpl(struct kvm_vcpu *vcpu)
+{
+ return vmx_get_cpl(vcpu);
+}
+
+static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l)
+{
+ vmx_get_cs_db_l_bits(vcpu, db, l);
+}
+
+static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+ vmx_set_cr0(vcpu, cr0);
+}
+
+static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
+ int pgd_level)
+{
+ vmx_load_mmu_pgd(vcpu, pgd, pgd_level);
+}
+
+static int vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+ return vmx_set_cr4(vcpu, cr4);
+}
+
+static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+{
+ return vmx_set_efer(vcpu, efer);
+}
+
+static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+ vmx_get_idt(vcpu, dt);
+}
+
+static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+ vmx_set_idt(vcpu, dt);
+}
+
+static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+ vmx_get_gdt(vcpu, dt);
+}
+
+static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
+{
+ vmx_set_gdt(vcpu, dt);
+}
+
+static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ vmx_set_dr7(vcpu, val);
+}
+
+static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
+{
+ vmx_sync_dirty_debug_regs(vcpu);
+}
+
+static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
+{
+ vmx_cache_reg(vcpu, reg);
+}
+
+static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu)
+{
+ return vmx_get_rflags(vcpu);
+}
+
+static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
+{
+ vmx_set_rflags(vcpu, rflags);
+}
+
+static void vt_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+ vmx_flush_tlb_all(vcpu);
+}
+
+static void vt_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+ vmx_flush_tlb_current(vcpu);
+}
+
+static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
+{
+ vmx_flush_tlb_gva(vcpu, addr);
+}
+
+static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+ vmx_flush_tlb_guest(vcpu);
+}
+
+static void vt_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask)
+{
+ vmx_set_interrupt_shadow(vcpu, mask);
+}
+
+static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu)
+{
+ return vmx_get_interrupt_shadow(vcpu);
+}
+
+static void vt_patch_hypercall(struct kvm_vcpu *vcpu,
+ unsigned char *hypercall)
+{
+ vmx_patch_hypercall(vcpu, hypercall);
+}
+
+static void vt_inject_irq(struct kvm_vcpu *vcpu)
+{
+ vmx_inject_irq(vcpu);
+}
+
+static void vt_inject_nmi(struct kvm_vcpu *vcpu)
+{
+ vmx_inject_nmi(vcpu);
+}
+
+static void vt_queue_exception(struct kvm_vcpu *vcpu)
+{
+ vmx_queue_exception(vcpu);
+}
+
+static void vt_cancel_injection(struct kvm_vcpu *vcpu)
+{
+ vmx_cancel_injection(vcpu);
+}
+
+static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+ return vmx_interrupt_allowed(vcpu, for_injection);
+}
+
+static int vt_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+ return vmx_nmi_allowed(vcpu, for_injection);
+}
+
+static bool vt_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+ return vmx_get_nmi_mask(vcpu);
+}
+
+static void vt_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
+{
+ vmx_set_nmi_mask(vcpu, masked);
+}
+
+static void vt_enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+ enable_nmi_window(vcpu);
+}
+
+static void vt_enable_irq_window(struct kvm_vcpu *vcpu)
+{
+ enable_irq_window(vcpu);
+}
+
+static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
+{
+ update_cr8_intercept(vcpu, tpr, irr);
+}
+
+static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
+{
+ vmx_set_apic_access_page_addr(vcpu);
+}
+
+static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+ vmx_refresh_apicv_exec_ctrl(vcpu);
+}
+
+static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
+{
+ vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap);
+}
+
+static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr)
+{
+ return vmx_set_tss_addr(kvm, addr);
+}
+
+static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
+{
+ return vmx_set_identity_map_addr(kvm, ident_addr);
+}
+
+static u64 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+{
+ return vmx_get_mt_mask(vcpu, gfn, is_mmio);
+}
+
+static void vt_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
+ u32 *intr_info, u32 *error_code)
+{
+
+ return vmx_get_exit_info(vcpu, info1, info2, intr_info, error_code);
+}
+
+static u64 vt_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+{
+ return vmx_write_l1_tsc_offset(vcpu, offset);
+}
+
+static void vt_request_immediate_exit(struct kvm_vcpu *vcpu)
+{
+ vmx_request_immediate_exit(vcpu);
+}
+
+static void vt_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+ vmx_sched_in(vcpu, cpu);
+}
+
+static void vt_slot_enable_log_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *slot)
+{
+ vmx_slot_enable_log_dirty(kvm, slot);
+}
+
+static void vt_slot_disable_log_dirty(struct kvm *kvm,
+ struct kvm_memory_slot *slot)
+{
+ vmx_slot_disable_log_dirty(kvm, slot);
+}
+
+static void vt_flush_log_dirty(struct kvm *kvm)
+{
+ vmx_flush_log_dirty(kvm);
+}
+
+static void vt_enable_log_dirty_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *memslot,
+ gfn_t offset, unsigned long mask)
+{
+ vmx_enable_log_dirty_pt_masked(kvm, memslot, offset, mask);
+}
+
+static int vt_pre_block(struct kvm_vcpu *vcpu)
+{
+ if (pi_pre_block(vcpu))
+ return 1;
+
+ return vmx_pre_block(vcpu);
+}
+
+static void vt_post_block(struct kvm_vcpu *vcpu)
+{
+ vmx_post_block(vcpu);
+
+ pi_post_block(vcpu);
+}
+
+
+#ifdef CONFIG_X86_64
+static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
+ bool *expired)
+{
+ return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired);
+}
+
+static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu)
+{
+ vmx_cancel_hv_timer(vcpu);
+}
+#endif
+
+static void vt_setup_mce(struct kvm_vcpu *vcpu)
+{
+ vmx_setup_mce(vcpu);
+}
+
+static struct kvm_x86_ops vt_x86_ops __initdata = {
+ .hardware_unsetup = hardware_unsetup,
+
+ .hardware_enable = vt_hardware_enable,
+ .hardware_disable = vt_hardware_disable,
+ .cpu_has_accelerated_tpr = vt_cpu_has_accelerated_tpr,
+ .has_emulated_msr = vt_has_emulated_msr,
+
+ .is_vm_type_supported = vt_is_vm_type_supported,
+ .vm_size = sizeof(struct kvm_vmx),
+ .vm_init = vt_vm_init,
+ .vm_teardown = vt_vm_teardown,
+ .vm_destroy = vt_vm_destroy,
+
+ .vcpu_create = vt_vcpu_create,
+ .vcpu_free = vt_vcpu_free,
+ .vcpu_reset = vt_vcpu_reset,
+
+ .prepare_guest_switch = vt_prepare_switch_to_guest,
+ .vcpu_load = vt_vcpu_load,
+ .vcpu_put = vt_vcpu_put,
+
+ .update_exception_bitmap = vt_update_exception_bitmap,
+ .get_msr_feature = vt_get_msr_feature,
+ .get_msr = vt_get_msr,
+ .set_msr = vt_set_msr,
+ .get_segment_base = vt_get_segment_base,
+ .get_segment = vt_get_segment,
+ .set_segment = vt_set_segment,
+ .get_cpl = vt_get_cpl,
+ .get_cs_db_l_bits = vt_get_cs_db_l_bits,
+ .set_cr0 = vt_set_cr0,
+ .set_cr4 = vt_set_cr4,
+ .set_efer = vt_set_efer,
+ .get_idt = vt_get_idt,
+ .set_idt = vt_set_idt,
+ .get_gdt = vt_get_gdt,
+ .set_gdt = vt_set_gdt,
+ .set_dr7 = vt_set_dr7,
+ .sync_dirty_debug_regs = vt_sync_dirty_debug_regs,
+ .cache_reg = vt_cache_reg,
+ .get_rflags = vt_get_rflags,
+ .set_rflags = vt_set_rflags,
+
+ .tlb_flush_all = vt_flush_tlb_all,
+ .tlb_flush_current = vt_flush_tlb_current,
+ .tlb_flush_gva = vt_flush_tlb_gva,
+ .tlb_flush_guest = vt_flush_tlb_guest,
+
+ .run = vt_vcpu_run,
+ .handle_exit = vt_handle_exit,
+ .skip_emulated_instruction = vt_skip_emulated_instruction,
+ .update_emulated_instruction = vt_update_emulated_instruction,
+ .set_interrupt_shadow = vt_set_interrupt_shadow,
+ .get_interrupt_shadow = vt_get_interrupt_shadow,
+ .patch_hypercall = vt_patch_hypercall,
+ .set_irq = vt_inject_irq,
+ .set_nmi = vt_inject_nmi,
+ .queue_exception = vt_queue_exception,
+ .cancel_injection = vt_cancel_injection,
+ .interrupt_allowed = vt_interrupt_allowed,
+ .nmi_allowed = vt_nmi_allowed,
+ .get_nmi_mask = vt_get_nmi_mask,
+ .set_nmi_mask = vt_set_nmi_mask,
+ .enable_nmi_window = vt_enable_nmi_window,
+ .enable_irq_window = vt_enable_irq_window,
+ .update_cr8_intercept = vt_update_cr8_intercept,
+ .set_virtual_apic_mode = vt_set_virtual_apic_mode,
+ .set_apic_access_page_addr = vt_set_apic_access_page_addr,
+ .refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl,
+ .load_eoi_exitmap = vt_load_eoi_exitmap,
+ .apicv_post_state_restore = vt_apicv_post_state_restore,
+ .check_apicv_inhibit_reasons = vt_check_apicv_inhibit_reasons,
+ .hwapic_irr_update = vt_hwapic_irr_update,
+ .hwapic_isr_update = vt_hwapic_isr_update,
+ .guest_apic_has_interrupt = vt_guest_apic_has_interrupt,
+ .sync_pir_to_irr = vt_sync_pir_to_irr,
+ .deliver_posted_interrupt = vt_deliver_posted_interrupt,
+ .dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
+
+ .set_tss_addr = vt_set_tss_addr,
+ .set_identity_map_addr = vt_set_identity_map_addr,
+ .get_mt_mask = vt_get_mt_mask,
+
+ .get_exit_info = vt_get_exit_info,
+
+ .vcpu_after_set_cpuid = vt_vcpu_after_set_cpuid,
+
+ .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
+
+ .write_l1_tsc_offset = vt_write_l1_tsc_offset,
+
+ .load_mmu_pgd = vt_load_mmu_pgd,
+
+ .check_intercept = vt_check_intercept,
+ .handle_exit_irqoff = vt_handle_exit_irqoff,
+
+ .request_immediate_exit = vt_request_immediate_exit,
+
+ .sched_in = vt_sched_in,
+
+ .slot_enable_log_dirty = vt_slot_enable_log_dirty,
+ .slot_disable_log_dirty = vt_slot_disable_log_dirty,
+ .flush_log_dirty = vt_flush_log_dirty,
+ .enable_log_dirty_pt_masked = vt_enable_log_dirty_pt_masked,
+
+ .pre_block = vt_pre_block,
+ .post_block = vt_post_block,
+
+ .pmu_ops = &intel_pmu_ops,
+ .nested_ops = &vmx_nested_ops,
+
+ .update_pi_irte = pi_update_irte,
+
+#ifdef CONFIG_X86_64
+ .set_hv_timer = vt_set_hv_timer,
+ .cancel_hv_timer = vt_cancel_hv_timer,
+#endif
+
+ .setup_mce = vt_setup_mce,
+
+ .smi_allowed = vt_smi_allowed,
+ .pre_enter_smm = vt_pre_enter_smm,
+ .pre_leave_smm = vt_pre_leave_smm,
+ .enable_smi_window = vt_enable_smi_window,
+
+ .can_emulate_instruction = vt_can_emulate_instruction,
+ .apic_init_signal_blocked = vt_apic_init_signal_blocked,
+ .migrate_timers = vt_migrate_timers,
+
+ .msr_filter_changed = vt_msr_filter_changed,
+};
+
+static struct kvm_x86_init_ops vt_init_ops __initdata = {
+ .cpu_has_kvm_support = vt_cpu_has_kvm_support,
+ .disabled_by_bios = vt_disabled_by_bios,
+ .check_processor_compatibility = vt_check_processor_compatibility,
+ .hardware_setup = vt_hardware_setup,
+
+ .runtime_ops = &vt_x86_ops,
+};
+
+static int __init vt_init(void)
+{
+ unsigned int vcpu_size = 0, vcpu_align = 0;
+ int r;
+
+ vmx_pre_kvm_init(&vcpu_size, &vcpu_align, &vt_x86_ops);
+
+ r = kvm_init(&vt_init_ops, vcpu_size, vcpu_align, THIS_MODULE);
+ if (r)
+ goto err_vmx_post_exit;
+
+ r = vmx_init();
+ if (r)
+ goto err_kvm_exit;
+
+ return 0;
+
+err_kvm_exit:
+ kvm_exit();
+err_vmx_post_exit:
+ vmx_post_kvm_exit();
+ return r;
+}
+module_init(vt_init);
+
+static void vt_exit(void)
+{
+ vmx_exit();
+ kvm_exit();
+ vmx_post_kvm_exit();
+}
+module_exit(vt_exit);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0dad9d1816b0..966d48eada40 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2251,11 +2251,6 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
}
}

-static __init int cpu_has_kvm_support(void)
-{
- return cpu_has_vmx();
-}
-
static __init int vmx_disabled_by_bios(void)
{
return !boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
@@ -6338,7 +6333,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
handle_exception_nmi_irqoff(vcpu, vmx_get_intr_info(vcpu));
}

-static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
+static bool vmx_has_emulated_msr(u32 index)
{
switch (index) {
case MSR_IA32_SMBASE:
@@ -6899,11 +6894,6 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
return err;
}

-static bool vmx_is_vm_type_supported(unsigned long type)
-{
- return type == KVM_X86_LEGACY_VM;
-}
-
#define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"
#define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n"

@@ -6950,16 +6940,6 @@ static int vmx_vm_init(struct kvm *kvm)
return 0;
}

-static void vmx_vm_teardown(struct kvm *kvm)
-{
-
-}
-
-static void vmx_vm_destroy(struct kvm *kvm)
-{
-
-}
-
static int __init vmx_check_processor_compat(void)
{
struct vmcs_config vmcs_conf;
@@ -7445,9 +7425,6 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,

static int vmx_pre_block(struct kvm_vcpu *vcpu)
{
- if (pi_pre_block(vcpu))
- return 1;
-
if (kvm_lapic_hv_timer_in_use(vcpu))
kvm_lapic_switch_to_sw_timer(vcpu);

@@ -7458,8 +7435,6 @@ static void vmx_post_block(struct kvm_vcpu *vcpu)
{
if (kvm_x86_ops.set_hv_timer)
kvm_lapic_switch_to_hv_timer(vcpu);
-
- pi_post_block(vcpu);
}

static void vmx_setup_mce(struct kvm_vcpu *vcpu)
@@ -7514,11 +7489,6 @@ static int vmx_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
return 0;
}

-static void enable_smi_window(struct kvm_vcpu *vcpu)
-{
- /* RSM will cause a vmexit anyway. */
-}
-
static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
{
return to_vmx(vcpu)->nested.vmxon;
@@ -7542,148 +7512,7 @@ static void hardware_unsetup(void)
free_kvm_area();
}

-static bool vmx_check_apicv_inhibit_reasons(ulong bit)
-{
- ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
- BIT(APICV_INHIBIT_REASON_HYPERV);
-
- return supported & BIT(bit);
-}
-
-static struct kvm_x86_ops vmx_x86_ops __initdata = {
- .hardware_unsetup = hardware_unsetup,
-
- .hardware_enable = hardware_enable,
- .hardware_disable = hardware_disable,
- .cpu_has_accelerated_tpr = report_flexpriority,
- .has_emulated_msr = vmx_has_emulated_msr,
-
- .is_vm_type_supported = vmx_is_vm_type_supported,
- .vm_size = sizeof(struct kvm_vmx),
- .vm_init = vmx_vm_init,
- .vm_teardown = vmx_vm_teardown,
- .vm_destroy = vmx_vm_destroy,
-
- .vcpu_create = vmx_create_vcpu,
- .vcpu_free = vmx_free_vcpu,
- .vcpu_reset = vmx_vcpu_reset,
-
- .prepare_guest_switch = vmx_prepare_switch_to_guest,
- .vcpu_load = vmx_vcpu_load,
- .vcpu_put = vmx_vcpu_put,
-
- .update_exception_bitmap = update_exception_bitmap,
- .get_msr_feature = vmx_get_msr_feature,
- .get_msr = vmx_get_msr,
- .set_msr = vmx_set_msr,
- .get_segment_base = vmx_get_segment_base,
- .get_segment = vmx_get_segment,
- .set_segment = vmx_set_segment,
- .get_cpl = vmx_get_cpl,
- .get_cs_db_l_bits = vmx_get_cs_db_l_bits,
- .set_cr0 = vmx_set_cr0,
- .set_cr4 = vmx_set_cr4,
- .set_efer = vmx_set_efer,
- .get_idt = vmx_get_idt,
- .set_idt = vmx_set_idt,
- .get_gdt = vmx_get_gdt,
- .set_gdt = vmx_set_gdt,
- .set_dr7 = vmx_set_dr7,
- .sync_dirty_debug_regs = vmx_sync_dirty_debug_regs,
- .cache_reg = vmx_cache_reg,
- .get_rflags = vmx_get_rflags,
- .set_rflags = vmx_set_rflags,
-
- .tlb_flush_all = vmx_flush_tlb_all,
- .tlb_flush_current = vmx_flush_tlb_current,
- .tlb_flush_gva = vmx_flush_tlb_gva,
- .tlb_flush_guest = vmx_flush_tlb_guest,
-
- .run = vmx_vcpu_run,
- .handle_exit = vmx_handle_exit,
- .skip_emulated_instruction = vmx_skip_emulated_instruction,
- .update_emulated_instruction = vmx_update_emulated_instruction,
- .set_interrupt_shadow = vmx_set_interrupt_shadow,
- .get_interrupt_shadow = vmx_get_interrupt_shadow,
- .patch_hypercall = vmx_patch_hypercall,
- .set_irq = vmx_inject_irq,
- .set_nmi = vmx_inject_nmi,
- .queue_exception = vmx_queue_exception,
- .cancel_injection = vmx_cancel_injection,
- .interrupt_allowed = vmx_interrupt_allowed,
- .nmi_allowed = vmx_nmi_allowed,
- .get_nmi_mask = vmx_get_nmi_mask,
- .set_nmi_mask = vmx_set_nmi_mask,
- .enable_nmi_window = enable_nmi_window,
- .enable_irq_window = enable_irq_window,
- .update_cr8_intercept = update_cr8_intercept,
- .set_virtual_apic_mode = vmx_set_virtual_apic_mode,
- .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
- .refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
- .load_eoi_exitmap = vmx_load_eoi_exitmap,
- .apicv_post_state_restore = vmx_apicv_post_state_restore,
- .check_apicv_inhibit_reasons = vmx_check_apicv_inhibit_reasons,
- .hwapic_irr_update = vmx_hwapic_irr_update,
- .hwapic_isr_update = vmx_hwapic_isr_update,
- .guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
- .sync_pir_to_irr = vmx_sync_pir_to_irr,
- .deliver_posted_interrupt = vmx_deliver_posted_interrupt,
- .dy_apicv_has_pending_interrupt = pi_has_pending_interrupt,
-
- .set_tss_addr = vmx_set_tss_addr,
- .set_identity_map_addr = vmx_set_identity_map_addr,
- .get_mt_mask = vmx_get_mt_mask,
-
- .get_exit_info = vmx_get_exit_info,
-
- .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid,
-
- .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
-
- .write_l1_tsc_offset = vmx_write_l1_tsc_offset,
-
- .load_mmu_pgd = vmx_load_mmu_pgd,
-
- .check_intercept = vmx_check_intercept,
- .handle_exit_irqoff = vmx_handle_exit_irqoff,
-
- .request_immediate_exit = vmx_request_immediate_exit,
-
- .sched_in = vmx_sched_in,
-
- .slot_enable_log_dirty = vmx_slot_enable_log_dirty,
- .slot_disable_log_dirty = vmx_slot_disable_log_dirty,
- .flush_log_dirty = vmx_flush_log_dirty,
- .enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
-
- .pre_block = vmx_pre_block,
- .post_block = vmx_post_block,
-
- .pmu_ops = &intel_pmu_ops,
- .nested_ops = &vmx_nested_ops,
-
- .update_pi_irte = pi_update_irte,
-
-#ifdef CONFIG_X86_64
- .set_hv_timer = vmx_set_hv_timer,
- .cancel_hv_timer = vmx_cancel_hv_timer,
-#endif
-
- .setup_mce = vmx_setup_mce,
-
- .smi_allowed = vmx_smi_allowed,
- .pre_enter_smm = vmx_pre_enter_smm,
- .pre_leave_smm = vmx_pre_leave_smm,
- .enable_smi_window = enable_smi_window,
-
- .can_emulate_instruction = vmx_can_emulate_instruction,
- .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
- .migrate_timers = vmx_migrate_timers,
-
- .msr_filter_changed = vmx_msr_filter_changed,
-};
-
-static __init int hardware_setup(void)
+static __init int hardware_setup(struct kvm_x86_ops *x86_ops)
{
unsigned long host_bndcfgs;
struct desc_ptr dt;
@@ -7738,16 +7567,16 @@ static __init int hardware_setup(void)
* using the APIC_ACCESS_ADDR VMCS field.
*/
if (!flexpriority_enabled)
- vmx_x86_ops.set_apic_access_page_addr = NULL;
+ x86_ops->set_apic_access_page_addr = NULL;

if (!cpu_has_vmx_tpr_shadow())
- vmx_x86_ops.update_cr8_intercept = NULL;
+ x86_ops->update_cr8_intercept = NULL;

#if IS_ENABLED(CONFIG_HYPERV)
if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
&& enable_ept) {
- vmx_x86_ops.tlb_remote_flush = hv_remote_flush_tlb;
- vmx_x86_ops.tlb_remote_flush_with_range =
+ x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
+ x86_ops->tlb_remote_flush_with_range =
hv_remote_flush_tlb_with_range;
}
#endif
@@ -7762,7 +7591,7 @@ static __init int hardware_setup(void)

if (!cpu_has_vmx_apicv()) {
enable_apicv = 0;
- vmx_x86_ops.sync_pir_to_irr = NULL;
+ x86_ops->sync_pir_to_irr = NULL;
}

if (cpu_has_vmx_tsc_scaling()) {
@@ -7794,10 +7623,10 @@ static __init int hardware_setup(void)
enable_pml = 0;

if (!enable_pml) {
- vmx_x86_ops.slot_enable_log_dirty = NULL;
- vmx_x86_ops.slot_disable_log_dirty = NULL;
- vmx_x86_ops.flush_log_dirty = NULL;
- vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
+ x86_ops->slot_enable_log_dirty = NULL;
+ x86_ops->slot_disable_log_dirty = NULL;
+ x86_ops->flush_log_dirty = NULL;
+ x86_ops->enable_log_dirty_pt_masked = NULL;
}

if (!cpu_has_vmx_preemption_timer())
@@ -7825,9 +7654,9 @@ static __init int hardware_setup(void)
}

if (!enable_preemption_timer) {
- vmx_x86_ops.set_hv_timer = NULL;
- vmx_x86_ops.cancel_hv_timer = NULL;
- vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit;
+ x86_ops->set_hv_timer = NULL;
+ x86_ops->cancel_hv_timer = NULL;
+ x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
}

kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
@@ -7856,15 +7685,6 @@ static __init int hardware_setup(void)
return r;
}

-static struct kvm_x86_init_ops vmx_init_ops __initdata = {
- .cpu_has_kvm_support = cpu_has_kvm_support,
- .disabled_by_bios = vmx_disabled_by_bios,
- .check_processor_compatibility = vmx_check_processor_compat,
- .hardware_setup = hardware_setup,
-
- .runtime_ops = &vmx_x86_ops,
-};
-
static void vmx_cleanup_l1d_flush(void)
{
if (vmx_l1d_flush_pages) {
@@ -7875,45 +7695,14 @@ static void vmx_cleanup_l1d_flush(void)
l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
}

-static void vmx_exit(void)
+static void __init vmx_pre_kvm_init(unsigned int *vcpu_size,
+ unsigned int *vcpu_align,
+ struct kvm_x86_ops *x86_ops)
{
-#ifdef CONFIG_KEXEC_CORE
- RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
- synchronize_rcu();
-#endif
-
- kvm_exit();
-
-#if IS_ENABLED(CONFIG_HYPERV)
- if (static_branch_unlikely(&enable_evmcs)) {
- int cpu;
- struct hv_vp_assist_page *vp_ap;
- /*
- * Reset everything to support using non-enlightened VMCS
- * access later (e.g. when we reload the module with
- * enlightened_vmcs=0)
- */
- for_each_online_cpu(cpu) {
- vp_ap = hv_get_vp_assist_page(cpu);
-
- if (!vp_ap)
- continue;
-
- vp_ap->nested_control.features.directhypercall = 0;
- vp_ap->current_nested_vmcs = 0;
- vp_ap->enlighten_vmentry = 0;
- }
-
- static_branch_disable(&enable_evmcs);
- }
-#endif
- vmx_cleanup_l1d_flush();
-}
-module_exit(vmx_exit);
-
-static int __init vmx_init(void)
-{
- int r, cpu;
+ if (sizeof(struct vcpu_vmx) > *vcpu_size)
+ *vcpu_size = sizeof(struct vcpu_vmx);
+ if (__alignof__(struct vcpu_vmx) > *vcpu_align)
+ *vcpu_align = __alignof__(struct vcpu_vmx);

#if IS_ENABLED(CONFIG_HYPERV)
/*
@@ -7941,18 +7730,45 @@ static int __init vmx_init(void)
}

if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
- vmx_x86_ops.enable_direct_tlbflush
+ x86_ops->enable_direct_tlbflush
= hv_enable_direct_tlbflush;

} else {
enlightened_vmcs = false;
}
#endif
+}

- r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
- __alignof__(struct vcpu_vmx), THIS_MODULE);
- if (r)
- return r;
+static void vmx_post_kvm_exit(void)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs)) {
+ int cpu;
+ struct hv_vp_assist_page *vp_ap;
+ /*
+ * Reset everything to support using non-enlightened VMCS
+ * access later (e.g. when we reload the module with
+ * enlightened_vmcs=0)
+ */
+ for_each_online_cpu(cpu) {
+ vp_ap = hv_get_vp_assist_page(cpu);
+
+ if (!vp_ap)
+ continue;
+
+ vp_ap->nested_control.features.directhypercall = 0;
+ vp_ap->current_nested_vmcs = 0;
+ vp_ap->enlighten_vmentry = 0;
+ }
+
+ static_branch_disable(&enable_evmcs);
+ }
+#endif
+}
+
+static int __init vmx_init(void)
+{
+ int r, cpu;

/*
* Must be called after kvm_init() so enable_ept is properly set
@@ -7962,10 +7778,8 @@ static int __init vmx_init(void)
* mitigation mode.
*/
r = vmx_setup_l1d_flush(vmentry_l1d_flush_param);
- if (r) {
- vmx_exit();
+ if (r)
return r;
- }

for_each_possible_cpu(cpu) {
INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
@@ -7989,4 +7803,12 @@ static int __init vmx_init(void)

return 0;
}
-module_init(vmx_init);
+
+static void vmx_exit(void)
+{
+#ifdef CONFIG_KEXEC_CORE
+ RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
+ synchronize_rcu();
+#endif
+ vmx_cleanup_l1d_flush();
+}
--
2.17.1

2020-11-17 01:59:06

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 23/67] KVM: Add per-VM flag to disable dirty logging of memslots for TDs

From: Sean Christopherson <[email protected]>

Add a flag for TDX to mark dirty logging as unsupported.

Suggested-by: Kai Huang <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1a0df7b83fd0..9682282cb258 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -517,6 +517,7 @@ struct kvm {
pid_t userspace_pid;
unsigned int max_halt_poll_ns;

+ bool dirty_log_unsupported;
#ifdef __KVM_HAVE_READONLY_MEM
bool readonly_mem_unsupported;
#endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 572a66a61c29..aa5f27753756 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1103,7 +1103,10 @@ static void update_memslots(struct kvm_memslots *slots,
static int check_memory_region_flags(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem)
{
- u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
+ u32 valid_flags = 0;
+
+ if (!kvm->dirty_log_unsupported)
+ valid_flags |= KVM_MEM_LOG_DIRTY_PAGES;

#ifdef __KVM_HAVE_READONLY_MEM
if (!kvm->readonly_mem_unsupported)
--
2.17.1

2021-05-19 20:17:52

by Connor Kuehl

[permalink] [raw]
Subject: Re: [RFC PATCH 00/67] KVM: X86: TDX support

On 11/16/20 12:25 PM, [email protected] wrote:
> From: Isaku Yamahata <[email protected]>
>
> * What's TDX?
> TDX stands for Trust Domain Extensions which isolates VMs from
> the virtual-machine manager (VMM)/hypervisor and any other software on
> the platform. [1]
> For details, the specifications, [2], [3], [4], [5], [6], [7], are
> available.
>
>
> * The goal of this RFC patch
> The purpose of this post is to get feedback early on high level design
> issue of KVM enhancement for TDX. The detailed coding (variable naming
> etc) is not cared of. This patch series is incomplete (not working).
> Although multiple software components, not only KVM but also QEMU,
> guest Linux and virtual bios, need to be updated, this includes only
> KVM VMM part. For those who are curious to changes to other
> component, there are public repositories at github. [8], [9]

Hi,

I'm planning on reading through this patch set; but before I do, since
it's been several months and it's a non-trivially sized series, I just
wanted to confirm that this is the latest revision of the RFC that
you'd like comments on. Or, if there's a more recent series that I've
missed, I would be grateful for a pointer to it.

Thanks,

Connor


2021-05-20 16:20:17

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [RFC PATCH 00/67] KVM: X86: TDX support

On Wed, May 19, 2021 at 11:37:23AM -0500,
Connor Kuehl <[email protected]> wrote:

> On 11/16/20 12:25 PM, [email protected] wrote:
> > From: Isaku Yamahata <[email protected]>
> >
> > * What's TDX?
> > TDX stands for Trust Domain Extensions which isolates VMs from
> > the virtual-machine manager (VMM)/hypervisor and any other software on
> > the platform. [1]
> > For details, the specifications, [2], [3], [4], [5], [6], [7], are
> > available.
> >
> >
> > * The goal of this RFC patch
> > The purpose of this post is to get feedback early on high level design
> > issue of KVM enhancement for TDX. The detailed coding (variable naming
> > etc) is not cared of. This patch series is incomplete (not working).
> > Although multiple software components, not only KVM but also QEMU,
> > guest Linux and virtual bios, need to be updated, this includes only
> > KVM VMM part. For those who are curious to changes to other
> > component, there are public repositories at github. [8], [9]
>
> Hi,
>
> I'm planning on reading through this patch set; but before I do, since
> it's been several months and it's a non-trivially sized series, I just
> wanted to confirm that this is the latest revision of the RFC that
> you'd like comments on. Or, if there's a more recent series that I've
> missed, I would be grateful for a pointer to it.

Hi. I'm planning to post rebased/updated v2 soon. Hopefully next week.
So please wait for it. It will include non-trivial change and catch up for the updated spec.

Thanks,

--
Isaku Yamahata <[email protected]>

2021-05-21 20:18:27

by Connor Kuehl

[permalink] [raw]
Subject: Re: [RFC PATCH 00/67] KVM: X86: TDX support

On 5/20/21 4:31 AM, Isaku Yamahata wrote:
> On Wed, May 19, 2021 at 11:37:23AM -0500,
> Connor Kuehl <[email protected]> wrote:
>
>> On 11/16/20 12:25 PM, [email protected] wrote:
>>> From: Isaku Yamahata <[email protected]>
>>>
>>> * What's TDX?
>>> TDX stands for Trust Domain Extensions which isolates VMs from
>>> the virtual-machine manager (VMM)/hypervisor and any other software on
>>> the platform. [1]
>>> For details, the specifications, [2], [3], [4], [5], [6], [7], are
>>> available.
>>>
>>>
>>> * The goal of this RFC patch
>>> The purpose of this post is to get feedback early on high level design
>>> issue of KVM enhancement for TDX. The detailed coding (variable naming
>>> etc) is not cared of. This patch series is incomplete (not working).
>>> Although multiple software components, not only KVM but also QEMU,
>>> guest Linux and virtual bios, need to be updated, this includes only
>>> KVM VMM part. For those who are curious to changes to other
>>> component, there are public repositories at github. [8], [9]
>>
>> Hi,
>>
>> I'm planning on reading through this patch set; but before I do, since
>> it's been several months and it's a non-trivially sized series, I just
>> wanted to confirm that this is the latest revision of the RFC that
>> you'd like comments on. Or, if there's a more recent series that I've
>> missed, I would be grateful for a pointer to it.
>
> Hi. I'm planning to post rebased/updated v2 soon. Hopefully next week.
> So please wait for it. It will include non-trivial change and catch up for the updated spec.

That sounds great. I'll keep an eye out.

Thank you!

Connor

2021-06-11 02:29:02

by Erdem Aktas

[permalink] [raw]
Subject: Re: [RFC PATCH 53/67] KVM: TDX: Add architectural definitions for structures and values

Hi Isaku,

Can we add more explanation in comments or documentation about what
the TDX ATTRIBUTES are and their impact/use cases in the next patch
series?

-Erdem

On Mon, Nov 16, 2020 at 12:01 PM <[email protected]> wrote:
>
> From: Sean Christopherson <[email protected]>
>
> Co-developed-by: Kai Huang <[email protected]>
> Signed-off-by: Kai Huang <[email protected]>
> Co-developed-by: Xiaoyao Li <[email protected]>
> Signed-off-by: Xiaoyao Li <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/tdx_arch.h | 230 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 230 insertions(+)
> create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
>
> diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> new file mode 100644
> index 000000000000..d13db55e5086
> --- /dev/null
> +++ b/arch/x86/kvm/vmx/tdx_arch.h
> @@ -0,0 +1,230 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __KVM_X86_TDX_ARCH_H
> +#define __KVM_X86_TDX_ARCH_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * SEAMCALL API function leaf
> + */
> +#define SEAMCALL_TDENTER 0
> +#define SEAMCALL_TDADDCX 1
> +#define SEAMCALL_TDADDPAGE 2
> +#define SEAMCALL_TDADDSEPT 3
> +#define SEAMCALL_TDADDVPX 4
> +#define SEAMCALL_TDASSIGNHKID 5
> +#define SEAMCALL_TDAUGPAGE 6
> +#define SEAMCALL_TDBLOCK 7
> +#define SEAMCALL_TDCONFIGKEY 8
> +#define SEAMCALL_TDCREATE 9
> +#define SEAMCALL_TDCREATEVP 10
> +#define SEAMCALL_TDDBGRD 11
> +#define SEAMCALL_TDDBGRDMEM 12
> +#define SEAMCALL_TDDBGWR 13
> +#define SEAMCALL_TDDBGWRMEM 14
> +#define SEAMCALL_TDDEMOTEPAGE 15
> +#define SEAMCALL_TDEXTENDMR 16
> +#define SEAMCALL_TDFINALIZEMR 17
> +#define SEAMCALL_TDFLUSHVP 18
> +#define SEAMCALL_TDFLUSHVPDONE 19
> +#define SEAMCALL_TDFREEHKIDS 20
> +#define SEAMCALL_TDINIT 21
> +#define SEAMCALL_TDINITVP 22
> +#define SEAMCALL_TDPROMOTEPAGE 23
> +#define SEAMCALL_TDRDPAGEMD 24
> +#define SEAMCALL_TDRDSEPT 25
> +#define SEAMCALL_TDRDVPS 26
> +#define SEAMCALL_TDRECLAIMHKIDS 27
> +#define SEAMCALL_TDRECLAIMPAGE 28
> +#define SEAMCALL_TDREMOVEPAGE 29
> +#define SEAMCALL_TDREMOVESEPT 30
> +#define SEAMCALL_TDSYSCONFIGKEY 31
> +#define SEAMCALL_TDSYSINFO 32
> +#define SEAMCALL_TDSYSINIT 33
> +
> +#define SEAMCALL_TDSYSINITLP 35
> +#define SEAMCALL_TDSYSINITTDMR 36
> +#define SEAMCALL_TDTEARDOWN 37
> +#define SEAMCALL_TDTRACK 38
> +#define SEAMCALL_TDUNBLOCK 39
> +#define SEAMCALL_TDWBCACHE 40
> +#define SEAMCALL_TDWBINVDPAGE 41
> +#define SEAMCALL_TDWRSEPT 42
> +#define SEAMCALL_TDWRVPS 43
> +#define SEAMCALL_TDSYSSHUTDOWNLP 44
> +#define SEAMCALL_TDSYSCONFIG 45
> +
> +#define TDVMCALL_MAP_GPA 0x10001
> +#define TDVMCALL_REPORT_FATAL_ERROR 0x10003
> +
> +/* TDX control structure (TDR/TDCS/TDVPS) field access codes */
> +#define TDX_CLASS_SHIFT 56
> +#define TDX_FIELD_MASK GENMASK_ULL(31, 0)
> +
> +#define BUILD_TDX_FIELD(class, field) \
> + (((u64)(class) << TDX_CLASS_SHIFT) | ((u64)(field) & TDX_FIELD_MASK))
> +
> +/* @field is the VMCS field encoding */
> +#define TDVPS_VMCS(field) BUILD_TDX_FIELD(0, (field))
> +
> +/*
> + * @offset is the offset (in bytes) from the beginning of the architectural
> + * virtual APIC page.
> + */
> +#define TDVPS_APIC(offset) BUILD_TDX_FIELD(1, (offset))
> +
> +/* @gpr is the index of a general purpose register, e.g. eax=0 */
> +#define TDVPS_GPR(gpr) BUILD_TDX_FIELD(16, (gpr))
> +
> +#define TDVPS_DR(dr) BUILD_TDX_FIELD(17, (0 + (dr)))
> +
> +enum tdx_guest_other_state {
> + TD_VCPU_XCR0 = 32,
> + TD_VCPU_IWK_ENCKEY0 = 64,
> + TD_VCPU_IWK_ENCKEY1,
> + TD_VCPU_IWK_ENCKEY2,
> + TD_VCPU_IWK_ENCKEY3,
> + TD_VCPU_IWK_INTKEY0 = 68,
> + TD_VCPU_IWK_INTKEY1,
> + TD_VCPU_IWK_FLAGS = 70,
> +};
> +
> +/* @field is any of enum tdx_guest_other_state */
> +#define TDVPS_STATE(field) BUILD_TDX_FIELD(17, (field))
> +
> +/* @msr is the MSR index */
> +#define TDVPS_MSR(msr) BUILD_TDX_FIELD(19, (msr))
> +
> +/* Management class fields */
> +enum tdx_guest_management {
> + TD_VCPU_PEND_NMI = 11,
> +};
> +
> +/* @field is any of enum tdx_guest_management */
> +#define TDVPS_MANAGEMENT(field) BUILD_TDX_FIELD(32, (field))
> +
> +#define TDX1_NR_TDCX_PAGES 4
> +#define TDX1_NR_TDVPX_PAGES 5
> +
> +#define TDX1_MAX_NR_CPUID_CONFIGS 6
> +#define TDX1_MAX_NR_CMRS 32
> +#define TDX1_MAX_NR_TDMRS 64
> +#define TDX1_EXTENDMR_CHUNKSIZE 256
> +
> +struct tdx_cpuid_config {
> + u32 leaf;
> + u32 sub_leaf;
> + u32 eax;
> + u32 ebx;
> + u32 ecx;
> + u32 edx;
> +} __packed;
> +
> +struct tdx_cpuid_value {
> + u32 eax;
> + u32 ebx;
> + u32 ecx;
> + u32 edx;
> +} __packed;
> +
> +#define TDX1_TD_ATTRIBUTE_DEBUG BIT_ULL(0)
> +#define TDX1_TD_ATTRIBUTE_SYSPROF BIT_ULL(1)
> +#define TDX1_TD_ATTRIBUTE_PKS BIT_ULL(30)
> +#define TDX1_TD_ATTRIBUTE_KL BIT_ULL(31)
> +#define TDX1_TD_ATTRIBUTE_PERFMON BIT_ULL(63)
> +
> +/*
> + * TD_PARAMS is provided as an input to TDINIT, the size of which is 1024B.
> + */
> +struct td_params {
> + u64 attributes;
> + u64 xfam;
> + u32 max_vcpus;
> + u32 reserved0;
> +
> + u64 eptp_controls;
> + u64 exec_controls;
> + u16 tsc_frequency;
> + u8 reserved1[38];
> +
> + u64 mrconfigid[6];
> + u64 mrowner[6];
> + u64 mrownerconfig[6];
> + u64 reserved2[4];
> +
> + union {
> + struct tdx_cpuid_value cpuid_values[0];
> + u8 reserved3[768];
> + };
> +} __packed __aligned(1024);
> +
> +/* Guest uses MAX_PA for GPAW when set. */
> +#define TDX1_EXEC_CONTROL_MAX_GPAW BIT_ULL(0)
> +
> +/*
> + * TDX1 requires the frequency to be defined in units of 25MHz, which is the
> + * frequency of the core crystal clock on TDX-capable platforms, i.e. TDX-SEAM
> + * can only program frequencies that are multiples of 25MHz. The frequency
> + * must be between 1ghz and 10ghz (inclusive).
> + */
> +#define TDX1_TSC_KHZ_TO_25MHZ(tsc_in_khz) ((tsc_in_khz) / (25 * 1000))
> +#define TDX1_TSC_25MHZ_TO_KHZ(tsc_in_25mhz) ((tsc_in_25mhz) * (25 * 1000))
> +#define TDX1_MIN_TSC_FREQUENCY_KHZ 1 * 1000 * 1000
> +#define TDX1_MAX_TSC_FREQUENCY_KHZ 10 * 1000 * 1000
> +
> +struct tdmr_reserved_area {
> + u64 offset;
> + u64 size;
> +} __packed;
> +
> +struct tdmr_info {
> + u64 base;
> + u64 size;
> + u64 pamt_1g_base;
> + u64 pamt_1g_size;
> + u64 pamt_2m_base;
> + u64 pamt_2m_size;
> + u64 pamt_4k_base;
> + u64 pamt_4k_size;
> + struct tdmr_reserved_area reserved_areas[16];
> +} __packed __aligned(4096);
> +
> +struct cmr_info {
> + u64 base;
> + u64 size;
> +} __packed;
> +
> +struct tdsysinfo_struct {
> + /* TDX-SEAM Module Info */
> + u32 attributes;
> + u32 vendor_id;
> + u32 build_date;
> + u16 build_num;
> + u16 minor_version;
> + u16 major_version;
> + u8 reserved0[14];
> + /* Memory Info */
> + u16 max_tdmrs;
> + u16 max_reserved_per_tdmr;
> + u16 pamt_entry_size;
> + u8 reserved1[10];
> + /* Control Struct Info */
> + u16 tdcs_base_size;
> + u8 reserved2[2];
> + u16 tdvps_base_size;
> + u8 tdvps_xfam_dependent_size;
> + u8 reserved3[9];
> + /* TD Capabilities */
> + u64 attributes_fixed0;
> + u64 attributes_fixed1;
> + u64 xfam_fixed0;
> + u64 xfam_fixed1;
> + u8 reserved4[32];
> + u32 num_cpuid_config;
> + union {
> + struct tdx_cpuid_config cpuid_configs[0];
> + u8 reserved5[892];
> + };
> +} __packed __aligned(1024);
> +
> +#endif /* __KVM_X86_TDX_ARCH_H */
> --
> 2.17.1
>